event_srv_chk_io() function is renamed srv_chk_io_cb() to be consistant with
the I/O callback function of connections. In addition, this function is
exported. It will be required to use the conn-stream's subscriptions.
The connection is already closed by the health-check itself. Thus there is
now reason to duplicate this part in the .wake callback function. It is
enough to wake the health-check and wait.
The buffer wait list is used to deal with buffer allocation failure. But at
the end of health-check, it must be reinitialized. There is no reason to
reason to get a buffer between two health-check runs. And in fact, the
associated flags, CHK_ST_IN_ALLOC and CHK_ST_OUT_ALLOC, are already cleared
at the end of a health-check.
This patch must be backported as far as 2.2. On the 2.2, MT_LIST_ADDED and
MT_LIST_DEL must be used instead of LIST_INLIST and LIST_DEL_INIT.
Only CS_EP_ERROR flag is now removed from the endpoint when a reset is
performed. When a new the endpoint is allocated, flags are preserved. It is
the caller responsibility to remove other flags, depending on its need.
Concretly, during a connection retry or a L7 retry, we must preserve
flags. In tcpcheck and the CLI, we reset flags.
This patch is 2.6-specific. No backport needed.
There were plenty of leftovers from old code that were never removed
and that are not needed at all since these files do not use any
definition depending on fcntl.h, let's drop them.
Some older systems may routinely return EWOULDBLOCK for some syscalls
while we tend to check only for EAGAIN nowadays. Modern systems define
EWOULDBLOCK as EAGAIN so that solves it, but on a few older ones (AIX,
VMS etc) both are different, and for portability we'd need to test for
both or we never know if we risk to confuse some status codes with
plain errors.
There were few entries, the most annoying ones are the switch/case
because they require to only add the entry when it differs, but the
other ones are really trivial.
A conn-stream is never detached from an endpoint or an application alone,
except on a reset. Thus, to avoid any error, these functions are now
private. And cs_destroy() function is added to destroy a conn-stream. This
function is called when a stream is released, on the front and back
conn-streams, and when a health-check is finished.
These functions don't close the connection but only perform shutdown for
reads and writes at the mux level. It is a bit ambiguous. Thus,
cs_conn_close() is renamed cs_conn_shut() and cs_conn_drain_and_close() is
renamed cs_conn_drain_and_shut(). These both functions rely on
cs_conn_shutw() and cs_conn_shutr().
It is a partial revert of 54e85cbfc ("MAJOR: check: Use a persistent
conn-stream for health-checks"). But with the CS refactoring, the result is
cleaner now. A CS is allocated when a new health-check run is started. The
same CS is then used throughout the run. If there are several connections,
the endpoint is just reset. At the end of the run, the CS is released. It
means, in the tcp-check part, the CS is always defined.
Some conn-stream functions are only used when there is a connection. Thus,
they was renamed with "cs_conn_" prefix. In addition, we expect to have a
connection, so a BUG_ON is added to be sure the functions are never called
in another context.
All old flags CS_FL_* are now moved in the endpoint scope and renamed
CS_EP_* accordingly. It is a systematic replacement. There is no true change
except for the health-check and the endpoint reset. Here it is a bit special
because the same conn-stream is reused. Thus, we must handle endpoint
allocation errors. To do so, cs_reset_endp() has been adapted.
Thanks to this last change, it will now be possible to simplify the
multiplexer and probably the applets too. A review must also be performed to
remove some flags in the channel or the stream-interface. The HTX will
probably be simplified too. Finally, there is now some place in the
conn-stream to move info from the stream-interface.
The conn-stream endpoint is now shared between the conn-stream and the
applet or the multiplexer. If the mux or the applet is created first, it is
responsible to also create the endpoint and share it with the conn-stream.
If the conn-stream is created first, it is the opposite.
When the endpoint is only owned by an applet or a mux, it is called an
orphan endpoint (there is no conn-stream). When it is only owned by a
conn-stream, it is called a detached endpoint (there is no mux/applet).
The last entity that owns an endpoint is responsible to release it. When a
mux or an applet is detached from a conn-stream, the conn-stream
relinquishes the endpoint to recreate a new one. This way, the endpoint
state is never lost for the mux or the applet.
It is a transient commit to prepare next changes. Now, when a conn-stream is
created from an applet or a multiplexer, an endpoint is always provided. In
addition, the API to create a conn-stream was specialized to have one
function per type.
The next step will be to share the endpoint structure.
It is a transient commit to prepare next changes. It is possible to pass a
pre-allocated endpoint to create a new conn-stream. If it is NULL, a new
endpoint is created, otherwise the existing one is used. There no more
change at the conn-stream level.
In the applets, all conn-stream are created with no pre-allocated
endpoint. But for multiplexers, an endpoint is systematically created before
creating the conn-stream.
Certain functions cannot be called on an FD-less conn because they are
normally called as part of the protocol-specific setup/teardown sequence.
Better place a few BUG_ON() to make sure none of them is called in other
situations. If any of them would trigger in ambiguous conditions, it would
always be possible to replace it with an error.
For server checks, SSL and PROXY is automatically inherited from the
server settings if no specific check port is specified. Change this
behavior for dynamic servers : explicit "check-ssl"/"check-send-proxy"
are required for them.
Without this change, it is impossible to add a dynamic server with
SSL/PROXY settings and checks without, if the check port is not
explicit. This is because "no-check-ssl"/"no-check-send-proxy" keywords
are not available for dynamic servers.
This change respects the principle that dynamic servers on the CLI
should not reuse the same shortcuts used during the config file parsing.
Mostly because we expect this feature to be manipulated by automated
tools, contrary to the config file which should aim to be the shortest
possible for human readability.
Update the documentation of the "check" keyword to reflect this change.
The unsafe conn-stream API (__cs_*) is now used when we are sure the good
endpoint or application is attached to the conn-stream. This avoids compiler
warnings about possible null derefs. It also simplify the code and clear up
any ambiguity about manipulated entities.
In the same way a stream has always valid conn-streams, when a health-checks
is created, a conn-stream is now created and the health-check is attached on
it, as an app. This simplify a bit the connect part when a health-check is
running.
In the same way the conn-stream has a pointer to the stream endpoint , this
patch adds a pointer to the application entity in the conn-stream
structure. For now, it is a stream or a health-check. It is mandatory to
merge the stream-interface with the conn-stream.
The conn-stream will progressively replace the stream-interface. Thus, a
stream will have to allocate the backend conn-stream during its
creation. This means it will be possible to have a conn-stream with no
connection. To prepare this change, we test the conn-stream's connection
when we retrieve it.
These functions are declared as external functions in check.h and
as inline functions in check.c. Let's move them as static inline in
check.h. This appeared in 2.4 with the following commits:
4858fb2e1 ("MEDIUM: check: align agentaddr and agentport behaviour")
1c921cd74 ("BUG/MINOR: check: consitent way to set agentaddr")
While harmless (it only triggers build warnings with some gcc 4.x),
it should probably be backported where the paches above are present
to keep the code consistent.
When cleaning up the code to remove most explicit task masks in commit
beeabf531 ("MINOR: task: provide 3 task_new_* wrappers to simplify the
API"), a mistake was done with the external checks where the call does
task_new_on(1) instead of task_new_on(0) due to the confusion with the
previous mask 1.
No backport is needed as that's only 2.5-dev.
This change is required to support TCP/HTTP rules in defaults sections. The
'disabled' bitfield in the proxy structure, used to know if a proxy is
disabled or stopped, is replaced a generic bitfield named 'flags'.
PR_DISABLED and PR_STOPPED flags are renamed to PR_FL_DISABLED and
PR_FL_STOPPED respectively. In addition, everywhere there is a test to know
if a proxy is disabled or stopped, there is now a bitwise AND operation on
PR_FL_DISABLED and/or PR_FL_STOPPED flags.
The last 3 fields were 3 list heads that are per-thread, and which are:
- the pool's LRU head
- the buffer_wq
- the streams list head
Moving them into thread_ctx completes the removal of dynamic elements
from the struct thread_info. Now all these dynamic elements are packed
together at a single place for a thread.
We'll need to improve the API to pass other arguments in the future, so
let's start to adapt better to the current use cases. task_new() is used:
- 18 times as task_new(tid_bit)
- 18 times as task_new(MAX_THREADS_MASK)
- 2 times with a single bit (in a loop)
- 1 in the debug code that uses a mask
This patch provides 3 new functions to achieve this:
- task_new_here() to create a task on the calling thread
- task_new_anywhere() to create a task to be run anywhere
- task_new_on() to create a task to run on a specific thread
The change is trivial and will allow us to later concentrate the
required adaptations to these 3 functions only. It's still possible
to call task_new() if needed but a comment was added to encourage the
use of the new ones instead. The debug code was not changed and still
uses it.
'slowstart' can be used without check on a server, with the CLI handlers
'enable/disable server'. Move the code to initialize and start the
slowstart task outside of check.c.
This change will also be reused to enable slowstart for dynamic servers.
In a future patch, it will be possible to remove at runtime every
servers, both static and dynamic. This requires to extend the server
refcount for all instances.
First, refcount manipulation functions have been renamed to better
express the API usage.
* srv_refcount_use -> srv_take
The refcount is always initialize to 1 on the server creation in
new_server. It's also incremented for each check/agent configured on a
server instance.
* free_server -> srv_drop
This decrements the refcount and if null, the server is freed, so code
calling it must not use the server reference after it. As a bonus, this
function now returns the next server instance. This is useful when
calling on the server loop without having to save the next pointer
before each invocation.
In these functions, remove the checks that prevent refcount on
non-dynamic servers. Each reference to "dynamic" in variable/function
naming have been eliminated as well.
There is currently a leak on agent-check for dynamic servers. When
deleted, the check rules and vars are not liberated. This leak grows
each time a dynamic server with agent-check is deleted.
Replace the manual purge code by a free_check invocation which
centralizes all the details on check cleaning.
There is no leak for health check because in this case the proxy is the
owner of the check vars and rules.
This should not be backported, unless dynamic server checks are
backported.
If an error occured during a dynamic server creation, free_check is used
to liberate a possible agent-check. However, this does not free
associated vars and rules associated as this is done on another function
named deinit_srv_agent_check.
To simplify the check free and avoid a leak, move free vars/rules in
free_check. This is valid because deinit_srv_agent_check also uses
free_check.
This operation is done only for an agent-check because for a health
check, the proxy instance is the owner of check vars/rules.
This should not be backported, unless dynamic server checks are
backported.
Do not reset check flags when setting CHK_ST_PURGE.
Currently, this change has no impact. However, it is semantically wrong
to clear important flags such as CHK_ST_AGENT on purge.
Furthermore, this change will become mandatoy for a future fix to
properly free agent checks on dynamic servers removal. For this, it will
be needed to differentiate health/agent-check on purge via CHK_ST_AGENT
to properly free agent checks.
This must not be backported unless dynamic servers checks are
backported.
Test if server is not null before using free_server in the check purge
operation. Currently, the null server scenario should not occured as
purge is used with refcounted dynamic servers. However, this might not
be always the case if purge is use in the future in other cases; thus
the test is useful for extensibility.
No need to backport, unless dynamic server checks are backported.
This has been reported through a coverity report in github issue #1343.
This commit is the counterpart for agent check of
"MEDIUM: server: implement check for dynamic servers".
The "agent-check" keyword is enabled for dynamic servers. The agent
check must manually be activated via "enable agent" CLI. This can
enable the dynamic server if the agent response is "ready" without an
explicit "enable server" CLI.
Implement check support for dynamic servers. The "check" keyword is now
enabled for dynamic servers. If used, the server check is initialized
and the check task started in the "add server" CLI handler. The check is
explicitely disabled and must be manually activated via "enable health"
CLI handler.
The dynamic server refcount is incremented if a check is configured. On
"delete server" handler, the check is purged, which decrements the
refcount.
Implement a collection of keywords deemed safe and useful to dynamic
servers. The list of the supported keywords is :
- addr
- check-proto
- check-send-proxy
- check-via-socks4
- rise
- fall
- fastinter
- downinter
- port
- agent-addr
- agent-inter
- agent-port
- agent-send
Implement a mechanism to free a started check on runtime for dynamic
servers. A new function check_purge is created for this. The check task
will be marked for deletion and scheduled to properly close connection
elements and free the task/tasklet/buf_wait elements.
This function will be useful to delete a dynamic server wich checks.
global maxsock is used to estimate a number of fd to reserve for
internal use, such as checks. It is incremented at startup with the info
from the config file.
Disable this incrementation in checks functions at runtime. First, it
currently serves no purpose to increment it after startup. Worse, it may
lead to out-of-bound accesse on the fdtab.
This will be useful to initiate checks for dynamic servers.
Remove static qualifier on init_srv_check, init_srv_agent_check and
start_check_task. These functions will be called in server.c for dynamic
servers with checks.
Allocate default tcp ruleset for every backend without explicit rules
defined, even if no server in the backend use check. This change is
required to implement checks for dynamic servers.
This allocation is done on check_config_validity. It must absolutely be
called before check_proxy_tcpcheck (called via post proxy check) which
allocate the implicit tcp connect rule.
A config like the below fails to validate because of a bogus test:
backend b1
tcp-check connect port 1234
option tcp-check
server s1 1.2.3.4 check
[ALERT] (18887) : config : config: proxy 'b1': server 's1' has neither
service port nor check port, and a tcp_check rule
'connect' with no port information.
A || instead of a && only validates the connect rule when both the
address *and* the port are set. A work around is to set the rule like
this:
tcp-check connect addr 0:1234 port 1234
This needs to be backported as far as 2.2 (2.0 is OK).
In srv_parse_agent_check the error code is not returned in case
something goes wrong. The value 0 is always return.
Additionally, there's a small cleanup of unreachable returns that in
most checks are not present either and removed in two places they were
present. This makes the code consistent across the different checks.
Set "config :" as a prefix for the user messages context before starting
the configuration parsing. All following stderr output will be prefixed
by it.
As a consequence, remove extraneous prefix "config" already specified in
various ha_alert/warning/notice calls.
On observe mode, if a server is marked as DOWN, the server's health-check is
rescheduled using the fastinter timeout if the new expiration date is newer
that the current one. But this must only be performed if the fastinter
timeout is defined.
Internally, tick_is_lt() function only checks the date and does not perform any
verification on the provided args. Thus, we must take care of it. However, it is
possible to disable the server health-check by setting its task expiration date
to TICK_ETERNITY.
This patch must be backported as far as 2.2. It is related to
A connection may be synchronously established. In the tcpcheck context, it
may be a problem if several connections come one after another. In this
case, there is no event to close the very first connection before starting
the next one. The checks is thus blocked and timed out, a L7 timeout error
is reported.
To fix the bug, when a tcpcheck is started, we immediately evaluate its
state. Most of time, nothing is performed and we must wait. But it is thus
possible to handle the result of a successfull connection.
This patch should fix the issue #1234. It must be backported as far as 2.2.
The current "ADD" vs "ADDQ" is confusing because when thinking in terms
of appending at the end of a list, "ADD" naturally comes to mind, but
here it does the opposite, it inserts. Several times already it's been
incorrectly used where ADDQ was expected, the latest of which was a
fortunate accident explained in 6fa922562 ("CLEANUP: stream: explain
why we queue the stream at the head of the server list").
Let's use more explicit (but slightly longer) names now:
LIST_ADD -> LIST_INSERT
LIST_ADDQ -> LIST_APPEND
LIST_ADDED -> LIST_INLIST
LIST_DEL -> LIST_DELETE
The same is true for MT_LISTs, including their "TRY" variant.
LIST_DEL_INIT keeps its short name to encourage to use it instead of the
lazier LIST_DELETE which is often less safe.
The change is large (~674 non-comment entries) but is mechanical enough
to remain safe. No permutation was performed, so any out-of-tree code
can easily map older names to new ones.
The list doc was updated.
The dummy frontend used to create the session of the tcp-checks is
initialized without identifier. However, it is required because this id may
be used without any guard, for instance in log-format string via "%f" or
when fe_name sample fetch is called. Thus, an unset id may lead to crashes.
This patch must be backported as far as 2.2.
Add the trace support for the checks. Only tcp-check based health-checks are
supported, including the agent-check.
In traces, the first argument is always a check object. So it is easy to get
all info related to the check. The tcp-check ruleset, the conn-stream and
the connection, the server state...