Under certain soft-stopping conditions (ie: sticktable attached to proxy
and in-progress connections to the proxy that prevent haproxy from
exiting), manage_proxy() (p->task) will wake up every second to perform
a cleanup attempt on the proxy sticktable (to purge unused entries).
However, as reported by TimWolla in GH #2091, it was found that a
systematic call to pool_gc() could cause some CPU waste, mainly
because malloc_trim() (which is rather expensive) is being called
for each pool_gc() invocation.
As a result, such soft-stopping process could be spending a significant
amount of time in the malloc_trim->madvise() syscall for nothing.
Example "strace -c -f -p `pidof haproxy`" output (taken from
Tim's report):
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
46.77 1.840549 3941 467 1 epoll_wait
43.82 1.724708 13 128509 sched_yield
8.82 0.346968 11 29696 madvise
0.58 0.023011 24 951 clock_gettime
0.01 0.000257 10 25 7 recvfrom
0.00 0.000033 11 3 sendto
0.00 0.000021 21 1 rt_sigreturn
0.00 0.000021 21 1 timer_settime
------ ----------- ----------- --------- --------- ----------------
100.00 3.935568 24 159653 8 total
To prevent this, we now only call pool_gc() when some memory is
really expected to be reclaimed as a direct result of the previous
stick table cleanup.
This is pretty straightforward since stktable_trash_oldest() returns
the number of trashed sticky sessions.
This may be backported to every stable versions.
On soft-stop, we must properlu stop backends and not only proxies with at
least a listener. This is mandatory in order to stop the health checks. A
previous fix was provided to do so (ba29687bc1 "BUG/MEDIUM: proxy: properly
stop backends"). However, only stop_proxy() function was fixed. When HAproxy
is stopped, this function is no longer used. So the same kind of fix must be
done on do_soft_stop_now().
This patch partially fixes the issue #1874. It must be backported as far as
2.4.
We are simply renaming pause_listener() to suspend_listener() to prevent
confusion around listener pausing.
A suspended listener can be in two differents valid states:
- LI_PAUSED: the listener is effectively paused, it will unpause on
resume_listener()
- LI_ASSIGNED (not bound): the listener does not support the LI_PAUSED
state, so it was unbound to satisfy the suspend request, it will
correcly re-bind on resume_listener()
Besides that, we add the LI_F_SUSPENDED flag to mark suspended listeners in
suspend_listener() and unmark them in resume_listener().
We're also adding li_suspend proxy variable to track the number of currently
suspended listeners:
That is, the number of listeners that were suspended through suspend_listener()
and that are either in LI_PAUSED or LI_ASSIGNED state.
Counter is increased on successful suspend in suspend_listener() and it is
decreased on successful resume in resume_listener()
--
Backport notes:
-> 2.4 only, as "MINOR: proxy/listener: support for additional PAUSED state"
was not backported:
Replace this:
| /* PROXY_LOCK is require
| proxy_cond_resume(px);
By this:
| ha_warning("Resumed %s %s.\n", proxy_cap_str(px->cap), px->id);
| send_log(px, LOG_WARNING, "Resumed %s %s.\n", proxy_cap_str(px->cap), px->id);
-> 2.6 and 2.7 only, as "MINOR: listener: make sure we don't pause/resume" was
custom patched:
Replace this:
|@@ -253,6 +253,7 @@ struct listener {
|
| /* listener flags (16 bits) */
| #define LI_F_FINALIZED 0x0001 /* listener made it to the READY||LIMITED||FULL state at least once, may be suspended/resumed safely */
|+#define LI_F_SUSPENDED 0x0002 /* listener has been suspended using suspend_listener(), it is either is LI_PAUSED or LI_ASSIGNED state */
|
| /* Descriptor for a "bind" keyword. The ->parse() function returns 0 in case of
| * success, or a combination of ERR_* flags if an error is encountered. The
By this:
|@@ -222,6 +222,7 @@ struct li_per_thread {
|
| #define LI_F_QUIC_LISTENER 0x00000001 /* listener uses proto quic */
| #define LI_F_FINALIZED 0x00000002 /* listener made it to the READY||LIMITED||FULL state at least once, may be suspended/resumed safely */
|+#define LI_F_SUSPENDED 0x00000004 /* listener has been suspended using suspend_listener(), it is either is LI_PAUSED or LI_ASSIGNED state */
|
| /* The listener will be directly referenced by the fdtab[] which holds its
| * socket. The listener provides the protocol-specific accept() function to
This is an alternative fix that tries to address the same issue as
d1ebee177 ("BUG/MINOR: listener: close tiny race between
resume_listener() and stopping") while allowing resume_listener() to be
more versatile.
Indeed, because of the previous fix, resume_listener() is not able to
rebind stopped listeners, and this breaks the original behavior that is
documented in the function description:
"If the listener was only in the assigned
state, it's totally rebound. This can happen if a pause() has completely
stopped it. If the resume fails, 0 is returned and an error might be
displayed."
With relax_listener(), we now make sure to check l->state under the
listener lock so we don't call resume_listener() when the conditions are not
met.
As such, concurrently stopped listeners may not be rebound using
relax_listener().
Note: the documented race can't happen since 1b927eb3c ("MEDIUM: proto: stop
protocols under thread isolation during soft stop"), but older versions are
concerned as 1b927eb3c was not marked for backports.
Moreover, the patch also prevents the race between protocol_pause_all() and
resuming from LIMITED or FULL states.
This commit depends on:
- "MINOR: listener: add relax_listener() function"
This should be backported with d1ebee177 up to 2.4
(d1ebee177 is marked to be backported for all stable versions but the current
patch does not apply for versions < 2.4)
Add listener lock hint (AKA lli) to (stop/resume/pause)_listener() functions.
All these functions implicitely take the listener lock when they are called:
It could be useful to be able to call them while already holding the lock, so
we're adding lli hint to make them take the lock only when it is missing.
This should only be backported if explicitly required by another commit
--
-> 2.4 and 2.5 common backport notes:
These 2 commits need to be backported first:
- 187396e34 "CLEANUP: listener: function comment typo in stop_listener()"
- a57786e87 "BUG/MINOR: listener: null pointer dereference suspected by
coverity"
-> 2.4 special backport notes:
In addition to the previously mentionned dependencies, the patch needs to be
slightly adapted to match the corresponding contextual lines:
Replace this:
|@@ -471,7 +474,8 @@ int pause_listener(struct listener *l, int lpx)
| if (!lpx && px)
| HA_RWLOCK_WRLOCK(PROXY_LOCK, &px->lock);
|
|- HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock);
|+ if (!lli)
|+ HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock);
|
| if (l->state <= LI_PAUSED)
| goto end;
By this:
|@@ -471,7 +474,8 @@ int pause_listener(struct listener *l, int lpx)
| if (!lpx && px)
| HA_RWLOCK_WRLOCK(PROXY_LOCK, &px->lock);
|
|- HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock);
|+ if (!lli)
|+ HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock);
|
| if ((global.mode & (MODE_DAEMON | MODE_MWORKER)) &&
| !(proc_mask(l->rx.settings->bind_proc) & pid_bit))
Replace this:
|@@ -169,7 +169,7 @@ void protocol_stop_now(void)
| HA_SPIN_LOCK(PROTO_LOCK, &proto_lock);
| list_for_each_entry(proto, &protocols, list) {
| list_for_each_entry_safe(listener, lback, &proto->receivers, rx.proto_list)
|- stop_listener(listener, 0, 1);
|+ stop_listener(listener, 0, 1, 0);
| }
| HA_SPIN_UNLOCK(PROTO_LOCK, &proto_lock);
| }
By this:
|@@ -169,7 +169,7 @@ void protocol_stop_now(void)
| HA_SPIN_LOCK(PROTO_LOCK, &proto_lock);
| list_for_each_entry(proto, &protocols, list) {
| list_for_each_entry_safe(listener, lback, &proto->receivers, rx.proto_list)
| if (!listener->bind_conf->frontend->grace)
|- stop_listener(listener, 0, 1);
|+ stop_listener(listener, 0, 1, 0);
| }
| HA_SPIN_UNLOCK(PROTO_LOCK, &proto_lock);
Replace this:
|@@ -2315,7 +2315,7 @@ void stop_proxy(struct proxy *p)
| HA_RWLOCK_WRLOCK(PROXY_LOCK, &p->lock);
|
| list_for_each_entry(l, &p->conf.listeners, by_fe)
|- stop_listener(l, 1, 0);
|+ stop_listener(l, 1, 0, 0);
|
| if (!(p->flags & (PR_FL_DISABLED|PR_FL_STOPPED)) && !p->li_ready) {
| /* might be just a backend */
By this:
|@@ -2315,7 +2315,7 @@ void stop_proxy(struct proxy *p)
| HA_RWLOCK_WRLOCK(PROXY_LOCK, &p->lock);
|
| list_for_each_entry(l, &p->conf.listeners, by_fe)
|- stop_listener(l, 1, 0);
|+ stop_listener(l, 1, 0, 0);
|
| if (!p->disabled && !p->li_ready) {
| /* might be just a backend */
The half-closed timeout is now directly retrieved from the proxy
settings. There is no longer usage for the .hcto field in the stconn
structure. So let's remove it.
When bind_conf were created, some elements such as the analysers mask
ought to have moved there but that wasn't the case. Now that it's
getting clearer that bind_conf provides all binding parameters and
the listener is essentially a listener on an address, it's starting
to get really confusing to keep such parameters in the listener, so
let's move the mask to the bind_conf. We also take this opportunity
for pre-setting the mask to the frontend's upon initalization. Now
several loops have one less argument to take care of.
proxy http-only options implemented in http_ext were statically stored
within proxy struct.
We're making some changes so that http_ext are now stored in a dynamically
allocated structs.
http_ext related structs are only allocated when needed to save some space
whenever possible, and they are automatically freed upon proxy deletion.
Related PX_O_HTTP{7239,XFF,XOT) option flags were removed because we're now
considering an http_ext option as 'active' if it is allocated (ptr is not NULL)
A few checks (and BUG_ON) were added to make these changes safe because
it adds some (acceptable) complexity to the previous design.
Also, proxy.http was renamed to proxy.http_ext to make things more explicit.
Just like forwarded (7239) header and forwardfor header, move parsing,
logic and management of 'originalto' option into http_ext dedicated class.
We're only doing this to standardize proxy http options management.
Existing behavior remains untouched.
Just like forwarded (7239) header, move parsing, logic and management
of 'forwardfor' option into http_ext dedicated class.
We're only doing this to standardize proxy http options management.
Existing behavior remains untouched.
Introducing http_ext class for http extension related work that
doesn't fit into existing http classes.
HTTP extension "forwarded", introduced with 7239 RFC is now supported
by haproxy.
The option supports various modes from simple to complex usages involving
custom sample expressions.
Examples :
# Those servers want the ip address and protocol of the client request
# Resulting header would look like this:
# forwarded: proto=http;for=127.0.0.1
backend www_default
mode http
option forwarded
#equivalent to: option forwarded proto for
# Those servers want the requested host and hashed client ip address
# as well as client source port (you should use seed for xxh32 if ensuring
# ip privacy is a concern)
# Resulting header would look like this:
# forwarded: host="haproxy.org";for="_000000007F2F367E:60138"
backend www_host
mode http
option forwarded host for-expr src,xxh32,hex for_port
# Those servers want custom data in host, for and by parameters
# Resulting header would look like this:
# forwarded: host="host.com";by=_haproxy;for="[::1]:10"
backend www_custom
mode http
option forwarded host-expr str(host.com) by-expr str(_haproxy) for for_port-expr int(10)
# Those servers want random 'for' obfuscated identifiers for request
# tracing purposes while protecting sensitive IP information
# Resulting header would look like this:
# forwarded: for=_000000002B1F4D63
backend www_for_hide
mode http
option forwarded for-expr rand,hex
By default (no argument provided), forwarded option will try to mimic
x-forward-for common setups (source client ip address + source protocol)
The option is not available for frontends.
no option forwarded is supported.
More info about 7239 RFC here: https://www.rfc-editor.org/rfc/rfc7239.html
More info about the feature in doc/configuration.txt
This should address feature request GH #575
Depends on:
- "MINOR: http_htx: add http_append_header() to append value to header"
- "MINOR: sample: add ARGC_OPT"
- "MINOR: proxy: introduce http only options"
A few loops waiting for threads to synchronize such as thread_isolate()
rightfully filter the thread masks via the threads_enabled field that
contains the list of enabled threads. However, it doesn't use an atomic
load on it. Before 2.7, the equivalent variables were marked as volatile
and were always reloaded. In 2.7 they're fields in ha_tgroup_ctx[], and
the risk that the compiler keeps them in a register inside a loop is not
null at all. In practice when ha_thread_relax() calls sched_yield() or
an x86 PAUSE instruction, it could be verified that the variable is
always reloaded. If these are avoided (e.g. architecture providing
neither solution), it's visible in asm code that the variables are not
reloaded. In this case, if a thread exists just between the moment the
two values are read, the loop could spin forever.
This patch adds the required _HA_ATOMIC_LOAD() on the relevant
threads_enabled fields. It must be backported to 2.7.
In applets, we stop processing when a write error (CF_WRITE_ERROR) or a shutdown
for writes (CF_SHUTW) is detected. However, any write error leads to an
immediate shutdown for writes. Thus, it is enough to only test if CF_SHUTW is
set.
When the configuration contains such a line:
http-request redirect location /
a "struct logformat_node" object is created and it contains an "arg"
member which gets alloc'ed as well in which we copy the new location
(see add_to_logformat_list). This internal arg pointer was not freed in
the dedicated release_http_redir release function.
Likewise, the expression pointer was not released as well.
This patch can be backported to all stable branches. It should apply
as-is all the way to 2.2 but it won't on 2.0 because release_http_redir
did not exist yet.
Unlike fwdfor_hdr_name, orgto_hdr_name was not properly freed in
free_proxy().
This did not cause observable memory leaks because originalto proxy option is
only used for user configurable proxies, which are solely freed right before
process termination.
No backport needed unless some architectural changes causing regular proxies
to be freed and reused multiple times within single process lifetime are made.
In the past we've seen "show servers state" dump some internal bits for
the check states, that were causing regtests to fail. The relevant bits
have been added to the doc to fix the public API and make sure they do
not change by accident, but the output doesn't take care of masking the
undesired ones, causing regtests (and possibly user programs) to fail
when new bits are added. Let's add the mask for the only documented ones
(0x0F for check and 0x1F for agent respectively).
This could be backported wherever the server state is present, though
there's a tiny risk that some undocumented bits might have already
leaked to some user scripts, so it might be wise to wait a bit before
doing that or even not to backport too far.
Add HA_ANON_CLI to the srv->hostname when using 'show servers state'.
It can contain sensitive information like 'www....com'
No backport needed, except if anonymization mechanism is backported.
Replace HA_ANON_CLI by hash_ipanon to anonynmized address like
anonymizing address in the configuration file.
No backport needed, except if anonymization mechanism is backported.
Modify proxy.c in order to anonymize the following confidential data on
commands 'show servers state' and 'show servers conn':
- proxy name
- server name
- server address
Ed Hein reported in github issue #1856 some occasional watchdog panics
in 2.4.18 showing extreme contention on the proxy's lock while the libc
was in malloc()/free(). One cause of this problem is that we call free()
under the proxy's lock in proxy_capture_error(), which makes no sense
since if we can free the object under the lock after it's been detached,
we can also free it after releasing the lock (since it's not referenced
anymore).
This should be backported to all relevant versions, likely all
supported ones.
This patch is a prerequisite for #1626.
Adding PAUSED state to the list of available proxy states.
The flag is set when the proxy is paused at runtime (pause_listener()).
It is cleared when the proxy is resumed (resume_listener()).
It should be backported to 2.6, 2.5 and 2.4
A minor API change was performed in listener(.c/.h) to restore consistency
between stop_listener() and (resume/pause)_listener() functions.
LISTENER_LOCK was never locked prior to calling stop_listener():
lli variable hint is thus not useful anymore.
Added PROXY_LOCK locking in (resume/pause)_listener() functions
with related lpx variable hint (prerequisite for #1626).
It should be backported to 2.6, 2.5 and 2.4
There was a race involving hlua_proxy_* functions
and some proxy management functions.
pause_proxy() and resume_proxy() can be used directly from lua code,
but that could lead to some race as lua code didn't make sure PROXY_LOCK
was owned before calling the proxy functions.
This patch makes sure it won't happen again elsewhere in the code
by locking PROXY_LOCK directly in resume and pause proxy functions
so that it's not the caller's responsibility anymore.
(based on stop_proxy() behavior that was already safe prior to the patch)
This should be backported to stable series.
Note that the API will likely differ < 2.4
On frontend side, "h1-case-adjust-bogus-client" option is now supported in
TCP mode. It is important to be able to adjust the case of response headers
when a connection is routed to an HTTP backend. In this case, the client
connection is upgraded to H1.
On backend side, "h1-case-adjust-bogus-server" option is now also supported
in TCP mode to be able to perform HTTP health-checks with a case adjustment
of the request headers.
This patch should be backported as far as 2.0.
When a proxy is initialized with the settings of the default proxy, instead
of doing a raw copy of the default server settings, a custom copy is now
performed by calling srv_settings_copy(). This way, all settings will be
really duplicated. Without this deep copy, some pointers are shared between
several servers, leading to UAF, double-free or such bugs.
This patch relies on following commits:
* b32cb9b51 REORG: server: Export srv_settings_cpy() function
* 0b365e3cb MINOR: server: Constify source server to copy its settings
This patch should fix the issue #1804. It must be backported as far as 2.0.
protocol_stop_now() is called from do_soft_stop_now() running on any
thread that received the signal. The problem is that it will call some
listener handlers to close the FD, resulting in an fd_delete() being
called from the wrong group. That's not clean and we cannot even rely
on the thread mask to show up.
One interesting long-term approach could be to have kill queues for
FDs, and maybe we'll need them in the long run. However that doesn't
work well for listeners in this situation.
Let's simply isolate ourselves during this instant. We know we'll be
alone dealing with the close and that the FD will be instantly deleted
since not in use by any other thread. It's not the cleanest solution
but it should last long enough without causing trouble.
Let's rely on tg->threads_enabled there to detect running threads. We
should probably have a dedicated function for this in order to simplify
the code and avoid the risk of using the wrong group ID.
Make the transport parameters be standlone as much as possible as
it consists only in encoding/decoding data into/from buffers.
Reduce the size of xprt_quic.h. Unfortunalety, I think we will
have to continue to include <xprt_quic-t.h> to use the trace API
into this module.
There's no more reason for keepin the code and definitions in conn_stream,
let's move all that to stconn. The alphabetical ordering of include files
was adjusted.
This file contains all the stream-connector functions that are specific
to application layers of type stream. So let's name it accordingly so
that it's easier to figure what's located there.
The alphabetical ordering of include files was preserved.
The new name mor eclearly indicates that a stream connector cannot make
any more progress because it needs room in the channel buffer, or that
it may be unblocked because the buffer now has more room available. The
testing function is sc_waiting_room(). This is mostly used by applets.
Note that the flags will change soon.
We're starting to propagate the stream connector's new name through the
API. Most call places of these functions that retrieve the channel or its
buffer are in applets. The local variable names are not changed in order
to keep the changes small and reviewable. There were ~92 uses of cs_ic(),
~96 of cs_oc() (due to co_get*() being less factorizable than ci_put*),
and ~5 accesses to the buffer itself.
This applies the change so that the applet code stops using ci_putchk()
and friends everywhere possible, for the much saferapplet_put*() instead.
The change is mechanical but large. Two or three functions used to have no
appctx and a cs derived from the appctx instead, which was a reminiscence
of old times' stream_interface. These were simply changed to directly take
the appctx. No sensitive change was performed, and the old (more complex)
API is still usable when needed (e.g. the channel is already known).
The change touched roughly a hundred of locations, with no less than 124
lines removed.
It's worth noting that the stats applet, the oldest of the series, could
get a serious lifting, as it's still very channel-centric instead of
propagating the appctx along the chain. Given that this code doesn't
change often, there's no emergency to clean it up but it would look
better.
This renames the "struct conn_stream" to "struct stconn" and updates
the descriptions in all comments (and the rare help descriptions) to
"stream connector" or "connector". This touches a lot of files but
the change is minimal. The local variables were not even renamed, so
there's still a lot of "cs" everywhere.
This one is the pointer to the conn_stream which is always in the
endpoint that is always present in the appctx, thus it's not needed.
This patch removes it and replaces it with appctx_cs() instead. A
few occurences that were using __cs_strm(appctx->owner) were moved
directly to appctx_strm() which does the equivalent.
The command uses a pointer to the current proxy being dumped, one to the
current server being dumped, an optional ID of the only proxy to dump
(which is in fact used as a boolean), and a flag indicating if we're
doing a "show servers conn" or a "show servers state". Let's move all
this to a struct show_srv_ctx.
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing.
The code still has room for improvement, such as in the "flags" field
where bits are hard-coded, but they weren't modified.
Commit e7f74623e ("MINOR: stats: don't output internal proxies (PR_CAP_INT)")
in 2.5 ensured we don't dump internal proxies on the stats page, but the
same is needed for "show backend", as since the addition of the HTTP client
it now appears there:
$ socat /tmp/sock1 - <<< "show backend"
# name
<HTTPCLIENT>
This needs to be backported to 2.5.
If the "close-spread-time" option is set to "infinite", active
connection closing during a soft-stop can be disabled. The 'connection:
close' header or the GOAWAY frame will not be added anymore to the
server's response and active connections will only be closed once the
clients disconnect. Idle connections will not be closed all at once when
the soft-stop starts anymore, and each idle connection will follow its
own timeout based on the multiple timeouts set in the configuration (as
is the case during regular execution).
This feature request was described in GitHub issue #1614.
This patch should be backported to 2.5. It depends on 'MEDIUM: global:
Add a "close-spread-time" option to spread soft-stop on time window'.
There were plenty of leftovers from old code that were never removed
and that are not needed at all since these files do not use any
definition depending on fcntl.h, let's drop them.
Almost all of our hash-based LB algorithms are implemented as special
cases of something that can now be achieved using sample expressions,
and some of them have adopted some options to adapt their behavior in
ways that could also be achieved using converters.
There are users who want to hash other parameters that are combined
into variables, and who set headers from these values and use
"balance hdr(name)" for this.
Instead of constantly implementing specific options and having users
hack around when they want a real hash, let's implement a native hash
mode that applies to a standard sample expression. This way, any
fetchable element (including variables) may be used to construct the
hash, even modified by any converter if desired.
Since the 2.5, it is possible to define TCP/HTTP ruleset in defaults
sections. However, rules defining a capture in defaults sections was not
properly handled because they was not shared with the proxies inheriting
from the defaults section. This led to crash when haproxy tried to store a
new capture.
So now, to fix the issue, when a new proxy is created, the list of captures
points to the list of its defaults section. It may be NULL or not. All new
caputres are prepended to this list. It is not a problem to share the same
defaults section between several proxies, because it is not altered and we
take care to not release it when corresponding proxies are freed but only
when defaults proxies are freed. To do so, defaults proxies are now
unreferenced at the end of free_proxy() function instead of the beginning.
This patch should fix the issue #1674. It must be backported to 2.5.
Remaining flags and associated functions are move in the conn-stream
scope. These flags are added on the endpoint and not the conn-stream
itself. This way it will be possible to get them from the mux or the
applet. The functions to get or set these flags are renamed accordingly with
the "cs_" prefix and updated to manipualte a conn-stream instead of a
stream-interface.
Flag to consider a stream as indepenent is now handled at the conn-stream
level. Thus SI_FL_INDEP_STR stream-int flag is replaced by CS_FL_INDEP_STR
conn-stream flags.
At many places, we now use the new CS functions to get a stream or a channel
from a conn-stream instead of using the stream-interface API. It is the
first step to reduce the scope of the stream-interfaces. The main change
here is about the applet I/O callback functions. Before the refactoring, the
stream-interface was the appctx owner. Thus, it was heavily used. Now, as
far as possible,the conn-stream is used. Of course, it remains many calls to
the stream-interface API.
The new 'close-spread-time' global option can be used to spread idle and
active HTTP connction closing after a SIGUSR1 signal is received. This
allows to limit bursts of reconnections when too many idle connections
are closed at once. Indeed, without this new mechanism, in case of
soft-stop, all the idle connections would be closed at once (after the
grace period is over), and all active HTTP connections would be closed
by appending a "Connection: close" header to the next response that goes
over it (or via a GOAWAY frame in case of HTTP2).
This patch adds the support of this new option for HTTP as well as HTTP2
connections. It works differently on active and idle connections.
On active connections, instead of sending systematically the GOAWAY
frame or adding the 'Connection: close' header like before once the
soft-stop has started, a random based on the remainder of the close
window is calculated, and depending on its result we could decide to
keep the connection alive. The random will be recalculated for any
subsequent request/response on this connection so the GOAWAY will still
end up being sent, but we might wait a few more round trips. This will
ensure that goaways are distributed along a longer time window than
before.
On idle connections, a random factor is used when determining the expire
field of the connection's task, which should naturally spread connection
closings on the time window (see h2c_update_timeout).
This feature request was described in GitHub issue #1614.
This patch should be backported to 2.5. It depends on "BUG/MEDIUM:
mux-h2: make use of http-request and keep-alive timeouts" which
refactorized the timeout management of HTTP2 connections.
Log servers are a real mess because:
- entries are duplicated using memcpy() without their strings being
reallocated, which results in these ones not being freeable every
time.
- a new field, ring_name, was added in 2.2 by commit 99c453df9
("MEDIUM: ring: new section ring to declare custom ring buffers.")
but it's never initialized during copies, causing the same issue
- no attempt is made at freeing all that.
Of course, running "haproxy -c" under ASAN quickly notices that and
dumps a core.
This patch adds the missing strdup() and initialization where required,
adds a new free_logsrv() function to cleanly free() such a structure,
calls it from the proxy when iterating over logsrvs instead of silently
leaking their file names and ring names, and adds the same logsrv loop
to the proxy_free_defaults() function so that we don't leak defaults
sections on exit.
It looks a bit entangled, but it comes as a whole because all this stuff
is inter-dependent and was missing.
It's probably preferable not to backport this in the foreseable future
as it may reveal other jokes if some obscure parts continue to memcpy()
the logsrv struct.
The server_id_hdr_name is already processed as an ist in various locations lets
also just store it as such.
see 0643b0e7e ("MINOR: proxy: Make `header_unique_id` a `struct ist`") for a
very similar past commit.
The orgto_hdr_name is already processed as an ist in `http_process_request`,
lets also just store it as such.
see 0643b0e7e ("MINOR: proxy: Make `header_unique_id` a `struct ist`") for a
very similar past commit.
The fwdfor_hdr_name is already processed as an ist in `http_process_request`,
lets also just store it as such.
see 0643b0e7e ("MINOR: proxy: Make `header_unique_id` a `struct ist`") for a
very similar past commit.
The monitor_uri is already processed as an ist in `http_wait_for_request`, lets
also just store it as such.
see 0643b0e7e ("MINOR: proxy: Make `header_unique_id` a `struct ist`") for a
very similar past commit.
The unsafe conn-stream API (__cs_*) is now used when we are sure the good
endpoint or application is attached to the conn-stream. This avoids compiler
warnings about possible null derefs. It also simplify the code and clear up
any ambiguity about manipulated entities.
As reported by Coverity in issue #1568, a missing initialization of the
error message pointer in parse_new_proxy() may result in displaying garbage
or crashing in case of memory allocation error when trying to create a new
proxy on startup.
This should be backported to 2.4.
Since recent changes related to the conn-stream/stream-interface
refactoring, GCC reports potential null pointer dereferences when we get the
appctx, the stream or the stream-interface from the conn-strem. Of course,
depending on the time, these entities may be null. But at many places, we
know they are defined and it is safe to get them without any check. Thus, we
use ALREADY_CHECKED() macro to silent these warnings.
Note that the refactoring is unfinished, so it is not a real issue for now.
To be able to move the stream-interface from the stream to the conn-stream,
all access to the SI is done via the conn-stream. This patch is limited to
the proxy part.
Because appctx is now an endpoint of the conn-stream, there is no reason to
still have the stream-interface as appctx owner. Thus, the conn-stream is
now the appctx owner.
Thanks to previous changes, it is now possible to set an appctx as endpoint
for a conn-stream. This means the appctx is no longer linked to the
stream-interface but to the conn-stream. Thus, a pointer to the conn-stream
is explicitly stored in the stream-interface. The endpoint (connection or
appctx) can be retrieved via the conn-stream.
Create a new structure li_per_thread. This is uses as an array in the
listener structure, with an entry allocated per thread. The new function
li_init_per_thr is responsible of the allocation.
For now, li_per_thread contains fields only useful for QUIC listeners.
As such, it is only allocated for QUIC listeners.
Avoid closing idle connections if a soft stop is in progress.
By default, idle connections will be closed during a soft stop. In some
environments, a client talking to the proxy may have prepared some idle
connections in order to send requests later. If there is no proper retry
on write errors, this can result in errors while haproxy is reloading.
Even though a proper implementation should retry on connection/write
errors, this option was introduced to support back compat with haproxy <
v2.4. Indeed before v2.4, we were waiting for a last request to be able
to add a "connection: close" header and advice the client to close the
connection.
In a real life example, this behavior was seen in AWS using the ALB in
front of a haproxy. The end result was ALB sending 502 during haproxy
reloads.
This patch was tested on haproxy v2.4, with a regular reload on the
process, and a constant trend of requests coming in. Before the patch,
we see regular 502 returned to the client; when activating the option,
the 502 disappear.
This patch should help fixing github issue #1506.
In order to unblock some v2.3 to v2.4 migraton, this patch should be
backported up to v2.4 branch.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
[wt: minor edits to the doc to mention other options to care about]
Signed-off-by: Willy Tarreau <w@1wt.eu>
It is now possible to have TCP/HTTP rules and ACLs defined in defaults
sections. So we must try to release corresponding lists when a default proxy
is destroyed.
No backport needed.
TCP and HTTP rules can now be defined in defaults sections, but only those
with a name. Because these rules may use conditions based on ACLs, ACLs can
also be defined in defaults sections.
However there are some limitations:
* A defaults section defining TCP/HTTP rules cannot be used by a defaults
section
* A defaults section defining TCP/HTTP rules cannot be used bu a listen
section
* A defaults sections defining TCP/HTTP rules cannot be used by frontends
and backends at the same time
* A defaults sections defining 'tcp-request connection' or 'tcp-request
session' rules cannot be used by backends
* A defaults sections defining 'tcp-response content' rules cannot be used
by frontends
The TCP request/response inspect-delay of a proxy is now inherited from the
defaults section it uses. For now, these rules are only parsed. No evaluation is
performed.
The PR_FL_READY flags must now be set on a proxy at the end of the
configuration validity check to notify it is fully configured and may be
safely used.
For now there is no real usage of this flag. But it will be usefull for
referenced default proxies to finish their configuration only once.
This patch is mandatory to support TCP/HTTP rules in defaults sections.
A proxy may now references the defaults section it is used. To do so, a
pointer on the default proxy was added in the proxy structure. And a
refcount must be used to track proxies using a default proxy. A default
proxy is destroyed iff its refcount is equal to zero and when it drops to
zero.
All this stuff must be performed during init/deinit staged for now. All
unreferenced default proxies are removed after the configuration parsing.
This patch is mandatory to support TCP/HTTP rules in defaults sections.
This change is required to support TCP/HTTP rules in defaults sections. The
'disabled' bitfield in the proxy structure, used to know if a proxy is
disabled or stopped, is replaced a generic bitfield named 'flags'.
PR_DISABLED and PR_STOPPED flags are renamed to PR_FL_DISABLED and
PR_FL_STOPPED respectively. In addition, everywhere there is a test to know
if a proxy is disabled or stopped, there is now a bitwise AND operation on
PR_FL_DISABLED and/or PR_FL_STOPPED flags.
.disabled field in the proxy structure is documented to be a bitfield. So
use it as a bitfield. This change was introduced to the 2.5, by commit
8e765b86f ("MINOR: proxy: disabled takes a stopping and a disabled state").
No backport is needed except if the above commit is backported.
The last 3 fields were 3 list heads that are per-thread, and which are:
- the pool's LRU head
- the buffer_wq
- the streams list head
Moving them into thread_ctx completes the removal of dynamic elements
from the struct thread_info. Now all these dynamic elements are packed
together at a single place for a thread.
We'll need to improve the API to pass other arguments in the future, so
let's start to adapt better to the current use cases. task_new() is used:
- 18 times as task_new(tid_bit)
- 18 times as task_new(MAX_THREADS_MASK)
- 2 times with a single bit (in a loop)
- 1 in the debug code that uses a mask
This patch provides 3 new functions to achieve this:
- task_new_here() to create a task on the calling thread
- task_new_anywhere() to create a task to be run anywhere
- task_new_on() to create a task to run on a specific thread
The change is trivial and will allow us to later concentrate the
required adaptations to these 3 functions only. It's still possible
to call task_new() if needed but a comment was added to encourage the
use of the new ones instead. The debug code was not changed and still
uses it.
In ticket #1348 some users expressed some concerns regarding the removal
of the "grace" directive from the proxies. Their use case very closely
mimmicks the original intent of the grace keyword, which is, let haproxy
accept traffic for some time when stopping, while indicating an external
LB that it's stopping.
This is implemented here by starting a task whose expiration triggers
the soft-stop for real. The global "stopping" variable is immediately
set however. For example, this below will be sufficient to instantly
notify an external check on port 9999 that the service is going down,
while other services remain active for 10s:
global
grace 10s
frontend ext-check
bind :9999
monitor-uri /ext-check
monitor fail if { stopping }
This option can be used to define a specific log format that will be
used in case of error, timeout, connection failure on a frontend... It
will be used for any log line concerned by the log-separate-errors
option. It will also replace the format of specific error messages
decribed in section 8.2.6.
If no "error-log-format" is defined, the legacy error messages are still
emitted and the other error logs keep using the regular log-format.
This option will be replaced by a "error-log-format" that enables to use
a dedicated log-format for connection error messages instead of the
regular log-format (in which most of the fields would be invalid in such
a case).
The "log-error-via-logformat" mechanism will then be replaced by a test
on the presence of such an error log format or not. If a format is
defined, it is used for connection error messages, otherwise the legacy
error log format is used.
Patch 211c967 ("MINOR: httpclient: add the server to the proxy") broke
the reg-tests that do a "show servers state".
Indeed the servers of the proxies flagged with PR_CAP_INT are dumped in
the output of this CLI command.
This patch fixes the issue par ignoring the PR_CA_INT proxies in the
dump.
In a future patch, it will be possible to remove at runtime every
servers, both static and dynamic. This requires to extend the server
refcount for all instances.
First, refcount manipulation functions have been renamed to better
express the API usage.
* srv_refcount_use -> srv_take
The refcount is always initialize to 1 on the server creation in
new_server. It's also incremented for each check/agent configured on a
server instance.
* free_server -> srv_drop
This decrements the refcount and if null, the server is freed, so code
calling it must not use the server reference after it. As a bonus, this
function now returns the next server instance. This is useful when
calling on the server loop without having to save the next pointer
before each invocation.
In these functions, remove the checks that prevent refcount on
non-dynamic servers. Each reference to "dynamic" in variable/function
naming have been eliminated as well.
As a convenience, return the next server instance from servers list on
free_server.
This is particularily useful when using this function on the servers
list without having to save of the next pointer before calling it.
This patch splits the disabled state of a proxy into a PR_DISABLED and a
PR_STOPPED state.
The first one is set when the proxy is disabled in the configuration
file, and the second one is set upon a stop_proxy().
Rename the 'dontloglegacyconnerr' option to 'log-error-via-logformat'
which is much more self-explanatory and readable.
Note: only legacy keywords don't use hyphens, it is recommended to
separate words with them in new keywords.
In case of connection failure, a dedicated error message is output,
following the format described in section "Error log format" of the
documentation. These messages cannot be configured through a log-format
option.
This patch adds a new option, "dontloglegacyconnerr", that disables
those error logs when set, and "replaces" them by a regular log line
that follows the configured log-format (thanks to a call to sess_log in
session_kill_embryonic).
The new fc_conn_err sample fetch allows to add the legacy error log
information into a regular log format.
This new option is unset by default so the logging logic will remain the
same until this new option is used.
This patch renames the proxy capability "LUA" to "INT" so it could be
used for any internal proxy.
Every proxy that are not user defined should use this flag.
This option had always been broken in HTX, which means that the first
breakage appeared in 1.9, that it was broken by default in 2.0 and that
no workaround existed starting with 2.1. The way this option works is
praticularly unfit to the rest of the configuration and to the internal
architecture. It had some uses when it was introduced 14 years ago but
nowadays it's possible to do much better and more reliable using a
set of "http-request set-dst" and "http-request set-uri" rules, which
additionally are compatible with DNS resolution (via do-resolve) and
are not exclusive to normal load balancing. The "option-http_proxy"
example config file was updated to reflect this.
The option is still parsed so that an error message gives hints about
what to look for.
A queue is specific to a server or a proxy, so we don't need to place
this distinction inside all pendconns, it can be in the queue itself.
This commit adds the relevant fields "px" and "sv" into the struct
queue, and initializes them accordingly.
Till now whenever a server or proxy's queue was touched, this server
or proxy's lock was taken. Not only this requires distinct code paths,
but it also causes unnecessary contention with other uses of these locks.
This patch adds a lock inside the "queue" structure that will be used
the same way by the server and the proxy queuing code. The server used
to use a spinlock and the proxy an rwlock, though the queue only used
it for locked writes. This new version uses a spinlock since we don't
need the read lock part here. Tests have not shown any benefit nor cost
in using this one versus the rwlock so we could change later if needed.
The lower contention on the locks increases the performance from 362k
to 374k req/s on 16 threads with 20 servers and leastconn. The gain
with roundrobin even increases by 9%.
This is tagged medium because the lock is changed, but no other part of
the code touches the queues, with nor without locking, so this should
remain invisible.
This reverts commit fcb8bf8650.
The recent changes since 5304669e1 MEDIUM: queue: make
pendconn_process_next_strm() only return the pendconn opened a tiny race
condition between stream_free() and process_srv_queue(), as the pendconn
is accessed outside of the lock, possibly while it's being freed. A
different approach is required.
Till now whenever a server or proxy's queue was touched, this server
or proxy's lock was taken. Not only this requires distinct code paths,
but it also causes unnecessary contention with other uses of these locks.
This patch adds a lock inside the "queue" structure that will be used
the same way by the server and the proxy queuing code. The server used
to use a spinlock and the proxy an rwlock, though the queue only used
it for locked writes. This new version uses a spinlock since we don't
need the read lock part here. Tests have not shown any benefit nor cost
in using this one versus the rwlock so we could change later if needed.
The lower contention on the locks increases the performance from 491k
to 507k req/s on 16 threads with 20 servers and leastconn. The gain
with roundrobin even increases by 6%.
The performance profile changes from this:
13.03% haproxy [.] fwlc_srv_reposition
8.08% haproxy [.] fwlc_get_next_server
3.62% haproxy [.] process_srv_queue
1.78% haproxy [.] pendconn_dequeue
1.74% haproxy [.] pendconn_add
to this:
11.95% haproxy [.] fwlc_srv_reposition
7.57% haproxy [.] fwlc_get_next_server
3.51% haproxy [.] process_srv_queue
1.74% haproxy [.] pendconn_dequeue
1.70% haproxy [.] pendconn_add
At this point the differences are mostly measurement noise.
This is tagged medium because the lock is changed, but no other part of
the code touches the queues, with nor without locking, so this should
remain invisible.
Lots of places iterating over nbproc or comparing with nbproc could be
simplified. Further, "bind-process" and "process" parsing that was
already limited to process 1 or "all" or "odd" resulted in a bind_proc
field that was either 0 or 1 during the init phase and later always 1.
All the checks for compatibilities were removed since it's not possible
anymore to run a frontend and a backend on different processes or to
have peers and stick-tables bound on different ones. This is the largest
part of this patch.
The bind_proc field was removed from both the proxy and the receiver
structs.
Since the "process" and "bind-process" directives are still parsed,
configs making use of correct values allowing process 1 will continue
to work.
Commit ab0a5192a ("MEDIUM: config: mark "grace" as deprecated") marked
the "grace" keyword as deprecated in 2.3, tentative removal for 2.4
with a hard deadline in 2.5, so let's remove it and return an error now.
This old and outdated feature was incompatible with soft-stop, reload
and socket transfers, and keeping it forced ugly hacks in the lower
layers of the protocol stack.