Level-7 retries are only possible with a restricted number of HTTP
return codes. While it is usually not safe to retry on 401 and 403, I
came up with an authentication backend which was not synchronizing
authentication of users. While not perfect, being allowed to also retry
on those return codes is really helpful and acts as a hotfix until we
can fix the backend.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
in the context of a progressive backend migration, we want to be able to
activate SSL on outgoing connections to the server at runtime without
reloading.
This patch adds a `set server ssl` command; in order to allow that:
- add `srv_use_ssl` to `show servers state` command for compatibility,
also update associated parsing
- when using default-server ssl setting, and `no-ssl` on server line,
init SSL ctx without activating it
- when triggering ssl API, de/activate SSL connections as requested
- clean ongoing connections as it is done for addr/port changes, without
checking prior server state
example config:
backend be_foo
default-server ssl
server srv0 127.0.0.1:6011 weight 1 no-ssl
show servers state:
5 be_foo 1 srv0 127.0.0.1 2 0 1 1 15 1 0 4 0 0 0 0 - 6011 - -1
where srv0 can switch to ssl later during the runtime:
set server be_foo/srv0 ssl on
5 be_foo 1 srv0 127.0.0.1 2 0 1 1 15 1 0 4 0 0 0 0 - 6011 - 1
Also update existing tests and create a new one.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Define a per-thread counters allocated with the greatest size of any
stat module counters. This variable is named trash_counters.
When using a proxy without allocated counters, return the trash counters
from EXTRA_COUNTERS_GET instead of a dangling pointer to prevent
segfault.
This is useful for all the proxies used internally and not
belonging to the global proxy list. As these objects does not appears on
the stat report, it does not matter to use the dummy counters.
For this fix to be functional, the extra counters are explicitly
initialized to NULL on proxy/server/listener init functions.
Most notably, the crash has already been detected with the following
vtc:
- reg-tests/lua/txn_get_priv.vtc
- reg-tests/peers/tls_basic_sync.vtc
- reg-tests/peers/tls_basic_sync_wo_stkt_backend.vtc
There is probably other parts that may be impacted (SPOE for example).
This bug was introduced in the current release and do not need to be
backported. The faulty commits are
"MINOR: ssl: count client hello for stats" and
"MINOR: ssl: add counters for ssl sessions".
This is an anticipation of finer grained locking for the queues. For now
all lock places take a write lock so that there is no difference at all
with previous code.
The proxy stopping mechanism was changed with commit 322b9b94e ("MEDIUM:
proxy: make stop_proxy() now use stop_listener()") so that it's now
entirely driven by the listeners. One thing was forgotten though, which
is that pure backends will not stop anymore since they don't have any
listener, and that it's necessary to stop them in order to stop the
health checks.
No backport is needed.
As discussed here during 2.1-dev, "mode health" is totally obsolete:
https://www.mail-archive.com/haproxy@formilux.org/msg35204.html
It's fundamentally incompatible with usage of SSL, doesn't support
source filtering, and imposes the presence of file descriptors with
hard-coded syscalls directly in the generic accept path.
It's very unlikely that anyone has used it in the last 10 years for
anything beyond testing. In the worst case if anyone would depend
on it, replacing it with "http-request return status 200" and "mode
http" would certainly do the trick.
The keyword is still detected as special by the config parser to help
users update their configurations appropriately.
One difficulty in soft-stopping is to make sure not to forget unlisted
listeners. By first doing a pass using protocol_stop_now() we catch the
vast majority of them. The few remaining ones are the ones belonging to
a proxy having a grace period. For these ones, the proxy will arm its
stop_time timer and emit a log message.
Since neither UDP listeners nor peers use the grace period, we can already
get rid of the special cases there since we know they will have been stopped
by the protocols.
There are multiple ways a proxy may switch to the disabled state,
but now it's essentially once it loses its last listener. Instead
of keeping duplicate code around and reporting the state change
before actually seeing it, we now report it at the moment it's
performed (from the last listener leaving) which allows to remove
the message from all other places.
We have to count unstoppable jobs which correspond to worker sockpairs, in
order to know when to count. However the way it's currently done is quite
awkward because these are counted when stopping making the stop mechanism
non-idempotent. This is definitely something we want to fix before stopping
by protocol or our listeners count will quickly go wrong. Now they are
counted when the listeners are created.
Till now, we used to call pause_proxy()/resume_proxy() to enable/disable
processing on a proxy, which is used during soft reloads. But since we want
to drive this process from the listeners themselves, we have to instead
proceed the other way around so that when we enable/disable a listener,
it checks if it changed anything for the proxy and notifies about updates
at this level.
The detection is made using li_ready=0 for pause(), and li_paused=0
for resume(). Note that we must not include any test for li_bound because
this state is seen by processes which share the listener with another one
and which must not act on it since the other process will do it. As such
the socket behind the FD will automatically be paused and resume without
its local state changing, but this is the limit of a multi-process system
with shared listeners.
The two functions don't need to be distinguished anymore since they have
all the necessary info to act as needed on their listeners. Let's just
pass via stop_proxy() and make it check for each listener which one to
close or not.
Its sole remaining purpose was to display "proxy foo started", which
has little benefit and pollutes output for those with plenty of proxies.
Let's remove it now.
The VTCs were updated to reflect this, because many of them had explicit
counts of dropped lines to match this message.
This is tagged as MEDIUM because some users may be surprized by the
loss of this quite old message.
The remaining proxy states were only used to distinguish an enabled
proxy from a disabled one. Due to the initialization order, both
PR_STNEW and PR_STREADY were equivalent after startup, and they
would only differ from PR_STSTOPPED when the proxy is disabled or
shutdown (which is effectively another way to disable it).
Now we just have a "disabled" field which allows to distinguish them.
It's becoming obvious that start_proxies() is only used to print a
greeting message now, that we'd rather get rid of. Probably that
zombify_proxy() and stop_proxy() should be merged once their
differences move to the right place.
This state was used to mention that a proxy was in PAUSED state, as opposed
to the READY state. This was causing some trouble because if a listener
failed to resume (e.g. because its port was temporarily in use during the
resume), it was not possible to retry the operation later. Now by checking
the number of READY or PAUSED listeners instead, we can accurately know if
something went bad and try to fix it again later. The case of the temporary
port conflict during resume now works well:
$ socat readline /tmp/sock1
prompt
> disable frontend testme3
> disable frontend testme3
All sockets are already disabled.
> enable frontend testme3
Failed to resume frontend, check logs for precise cause (port conflict?).
> enable frontend testme3
> enable frontend testme3
All sockets are already enabled.
This state is only set when a pause() fails but isn't even set when a
resume() fails. And we cannot recover from this state. Instead, let's
just count remaining ready listeners to decide to emit an error or not.
It's more accurate and will better support new attempts if needed.
Since v1.4 or so, it's almost not possible anymore to set this state. The
only exception is by using the CLI to change a frontend's maxconn setting
below its current usage. This case makes no sense, and for other cases it
doesn't make sense either because "full" is a vague concept when only
certain listeners are full and not all. Let's just remove this unused
state and make it clear that it's not reported. The "ready" or "open"
states will continue to be reported without being misleading as they
will be opposed to "stop".
This thing was needed for an optimization used in soft_stop() which
doesn't exist anymore, so let's remove it as it's cryptic and hinders
the listeners cleanup.
The loop doesn't match anymore since the non-started listeners are in
LI_INIT and even if it had ever worked the benefit of closing zombies
at this point looks void at best.
The zombie state is not used anymore by the listeners, because in the
last two cases where it was tested it couldn't match as it was covered
by the test on the process mask. Instead now the FD is either in the
LISTEN state or the INIT state. This also avoids forcing the listener
to be single-dimensional because actually belonging to another process
isn't totally exclusive with the other states, which explains some of
the difficulties requiring to check the proc_mask and the fd sometimes.
So let's get rid of it now not to be tempted to reuse it.
The doc on the listeners state was updated.
Old processes didn't die if a log foward section is declared and
a soft stop is requested.
This patch fix this issue and should be backpored in banches including
the log forward feature.
The code dealing with zombie proxies in soft_stop() is bogus, it uses
close() instead of fd_delete(), leaving a live entry in the fdtab with
a dangling pointer to a free memory location. The FD might be reassigned
for an outgoing connection for the time it takes the proxy to completely
stop, or could be dumped on the CLI's "show fd" command. In addition,
the listener's FD was not even reset, leaving doubts about whether or
not it will happen again in deinit().
And in deinit(), the loop in charge of closing zombie FDs is particularly
unsafe because it closes the fd then calls unbind_listener() then
delete_listener() hoping none of them will touch it again. Since it
requires some mental efforts to figure what's done there, let's correctly
reset the fd here as well and close it using fd_delete() to eliminate any
remaining doubts.
It's uncertain whether this should be backported. Zombie proxies are rare
and the situations capable of triggering such issues are not trivial to
setup. However it's easy to imagine how things could go wrong if backported
too far. Better wait for any matching report if at all (this code has been
there since 1.8 without anobody noticing).
It's the receiver's FD that's inherited from the parent process, not
the listener's so the flag must move to the receiver so that appropriate
actions can be taken.
The listening socket is represented by its file descriptor, which is
generic to all receivers and not just listeners, so it must move to
the rx struct.
It's worth noting that in order to extend receivers and listeners to
other protocols such as QUIC, we'll need other handles than file
descriptors here, and that either a union or a cast to uintptr_t
will have to be used. This was not done yet and the field was
preserved under the name "fd" to avoid adding confusion.
Ever since the protocols were added in 1.3.13, listeners used to be
started twice:
- once by start_proxies(), which iteratees over all proxies then all
listeners ;
- once by protocol_bind_all() which iterates over all protocols then
all listeners ;
It's a real mess because error reporting is not even consistent, and
more importantly now that some protocols do not appear in regular
proxies (peers, logs), there is no way to retry their binding should
it fail on the last step.
What this patch does is to make sure that listeners are exclusively
started by protocols. The failure to start a listener now causes the
emission of an error indicating the proxy's name (as it used to be
the case per proxy), and retryable failures are silently ignored
during all but last attempts.
The start_proxies() function was kept solely for setting the proxy's
state to READY and emitting the "Proxy started" message and log that
some have likely got used to seeking in their logs.
Previous commit b24bc0d ("MINOR: tcp: Support TCP keepalive parameters
customization") broke non-Linux builds as TCP_KEEP{CNT,IDLE,INTVL} are
not necessarily defined elsewhere.
This patch adds the required #ifdefs to condition the visibility of the
keywords, and adds a mention in the doc about their dependency on Linux.
It is now possible to customize TCP keepalive parameters.
These correspond to the socket options TCP_KEEPCNT, TCP_KEEPIDLE, TCP_KEEPINTVL
and are valid for the defaults, listen, frontend and backend sections.
This patch fixes GitHub issue #670.
Commit 69f591e3b ("MINOR: cli/proxy: add a new "show servers conn" command")
added the ability to dump the idle connections state for a server, but we
must not do this if idle connections were not allocated, which happens if
the server is configured with pool-max-conn 0.
This is 2.2, no backport is needed.
This command reuses the existing "show servers state" to also dump the
state of active and idle connections. The main use is to serve as a
debugging tool to troubleshot connection reuse issues.
Actually the cleanup in commit 6ff8143f7 ("BUG/MINOR: proxy: fix
dump_server_state()'s misuse of the trash") allowed to spot that the
trash is never reset when dumping a servers state. I couldn't manage
to make it dump garbage even with large setups but didn't find either
where it's cleared between successive calls while other handlers do
explicitly invoke chunk_reset(), so it seems to happen a bit by luck.
Let's use chunk_printf() here for each turn, it makes things clearer.
This could be backported along with previous patch, especially if any
user reports occasional garbage appearing in the show servers output.
dump_server_state() claims to dump into a buffer but instead it writes
into a buffer then dumps the trash into the channel, so it only supports
being called with buf=&trash and doesn't need this buffer. There doesn't
seem to be any current impact of this mistake since the function is called
from one location only.
A backport may be performed if it helps fixing other bugs but it will not
fix an existing bug by itself.
This patch fixes all the leftovers from the include cleanup campaign. There
were not that many (~400 entries in ~150 files) but it was definitely worth
doing it as it revealed a few duplicates.
There's no point splitting the file in two since only cfgparse uses the
types defined there. A few call places were updated and cleaned up. All
of them were in C files which register keywords.
There is nothing left in common/ now so this directory must not be used
anymore.
This one was not easy because it was embarking many includes with it,
which other files would automatically find. At least global.h, arg.h
and tools.h were identified. 93 total locations were identified, 8
additional includes had to be added.
In the rare files where it was possible to finalize the sorting of
includes by adjusting only one or two extra lines, it was done. But
all files would need to be rechecked and cleaned up now.
It was the last set of files in types/ and proto/ and these directories
must not be reused anymore.
extern struct dict server_name_dict was moved from the type file to the
main file. A handful of inlined functions were moved at the bottom of
the file. Call places were updated to use server-t.h when relevant, or
to simply drop the entry when not needed.
The files remained mostly unchanged since they were OK. However, half of
the users didn't need to include them, and about as many actually needed
to have it and used to find functions like srv_currently_usable() through
a long chain that broke when moving the file.
This one is particularly difficult to split because it provides all the
functions used to manipulate a proxy state and to retrieve names or IDs
for error reporting, and as such, it was included in 73 files (down to
68 after cleanup). It would deserve a small cleanup though the cut points
are not obvious at the moment given the number of structs involved in
the struct proxy itself.
The current state of the logging is a real mess. The main problem is
that almost all files include log.h just in order to have access to
the alert/warning functions like ha_alert() etc, and don't care about
logs. But log.h also deals with real logging as well as log-format and
depends on stream.h and various other things. As such it forces a few
heavy files like stream.h to be loaded early and to hide missing
dependencies depending where it's loaded. Among the missing ones is
syslog.h which was often automatically included resulting in no less
than 3 users missing it.
Among 76 users, only 5 could be removed, and probably 70 don't need the
full set of dependencies.
A good approach would consist in splitting that file in 3 parts:
- one for error output ("errors" ?).
- one for log_format processing
- and one for actual logging.
It was moved without any change, however many callers didn't need it at
all. This was a consequence of the split of proto_http.c into several
parts that resulted in many locations to still reference it.
Almost no change except moving the cli_kw struct definition after the
defines. Almost all users had both types&proto included, which is not
surprizing since this code is old and it used to be the norm a decade
ago. These places were cleaned.
Just some minor reordering, and the usual cleanup of call places for
those which didn't need it. We don't include the whole tools.h into
stats-t anymore but just tools-t.h.
The type file was slightly tidied. The cli-specific APPCTX_CLI_ST1_* flag
definitions were moved to cli.h. The type file was adjusted to include
buf-t.h and not the huge buf.h. A few call places were fixed because they
did not need this include.
The cfg_peers external declaration was moved to the main file instead
of the type one. A few types were still missing from the proto, causing
warnings in the functions prototypes (proxy, stick_table).