Instead of the generic MUX_, we now use MUX_CTL_ prefix for all mux_ctl_type
value. This will avoid any ambiguities with other enums, especially with a
new one that will be added to get information on mux streams.
In previous log backend implementation, we created a pseudo log target
for each declared log server, and we made the log target's address point
to the actual server address to save some time and prevent unecessary
copies.
But this was done without knowing that when FQDN is involved (more broadly
when dns/resolution is involved), the "port" part of server addr should
not be relied upon, and we should explicitly use ->svc_port for that
purpose.
With that in mind and thanks to the previous commit, some changes were
required: we allocate a dedicated addr within the log target when target
is in DGRAM mode. The addr is first initialized with known values and it
is then updated automatically by _srv_set_inetaddr() during runtime.
(the change is atomic so readers don't need to worry about it)
addr from server "log target" (INET/DGRAM mode) is made of the combination
of server's address (lacking the port part) and server's svc_port.
Maintain proper px->lbprm.tot_weight for log backends. server's weight is
considered as 1 as long as the server is usable.
This will allow the stats page to correctly display the proxy status since
the check currently relies on proxy's lbprm.tot_weight variable.
server_rules declared using "use-server" keyword within a proxy are not
supported inside a log backend (with "mode log" set), so we report a
warning to the user and reset the setting.
Building without threads emits two warnings because the proxy pointer
is no longer used (only serves for the lock) since 2.9 commit 9a74a6cb1
("MAJOR: log: introduce log backends"). No backport is needed.
We start implementing some postparsing compatibility checks for log
backends.
Here we report a warning if user tries to use tcp-{request,response} rules
with log backend, and we properly ignore such rules when inherited from
defaults section.
"log-balance" directive was recently introduced to configure the
balancing algorithm to use when in a log backend. However, it is
confusing and it causes issues when used in default section.
In this patch, we take another approach: first we remove the
"log-balance" directive, and instead we rely on existing "balance"
directive to configure log load balancing in log backend.
Some algorithms such as roundrobin can be used as-is in a log backend,
and for log-only algorithms, they are implemented as "log-$name" inside
the "backend" directive.
The documentation was updated accordingly.
This bug was introduced with 969e212 ("MINOR: log: add dup_logsrv() helper
function")
When duplicating an existing log entry, we must take care to inherit from
its original ->ref if it is set, because not doing so would make 28ac0999
("MINOR: log: Keep the ref when a log server is copied to avoid duplicate entries")
ineffective given that global log directives will lose their original
reference when duplicated resursively (at least twice), which is what
happens when global log directives are first inherited to defaults which
are then inherited to a regular proxy at the end of the chain.
This can be easily reproduced using the following configuration:
|global
| log stdout format raw local0
|
|defaults
| log global
|
|frontend test
| log global
| ...
Logs from "test" proxy will be duplicated because test incorrectly
inherited from global "log" directives twice, which 28ac0999 would
normally detect and prevent.
No backport needed unless 969e212 gets backported.
This bug was introduced with 29b76ca ("BUG/MEDIUM: server/log: "mode log"
after server keyword causes crash ")
Indeed, we cannot safely rely on addr_proto being set when str2sa_range()
returns in parse_server() (even if SRV_PARSE_PARSE_ADDR is set), because
proto lookup might be bypassed when FQDN addresses are involved.
Unfortunately, the above patch wrongly assumed that proto would always
be set when SRV_PARSE_PARSE_ADDR was passed to parse_server() (so when
str2sa_range() was called), resulting in invalid postparsing checks being
performed, which could as well lead to crashes with log backends
("mode log" set) because some postparsing init was skipped as a result of
proto not being set and this wasn't expected later in the init code.
To fix this, we now make use of the previous patch to perform server's
address compatibility checks on hints that are always set when
str2sa_range() succesfully returns.
For log backend, we're also adding a complementary test to check if the
address family is of expected type, else we report an error, plus we're
moving the postinit logic in log api since _srv_check_proxy_mode() is
only meant to check proxy mode compatibility and we were abusing it.
This patch depends on:
- "MINOR: tools: make str2sa_range() directly return type hints"
No backport required unless 29b76ca gets backported.
str2sa_range() already allows the caller to provide <proto> in order to
get a pointer on the protocol matching with the string input thanks to
5fc9328a ("MINOR: tools: make str2sa_range() directly return the protocol")
However, as stated into the commit message, there is a trick:
"we can fail to return a protocol in case the caller
accepts an fqdn for use later. This is what servers do and in this
case it is valid to return no protocol"
In this case, we're unable to return protocol because the protocol lookup
depends on both the [proto type + xprt type] and the [family type] to be
known.
While family type might not be directly resolved when fqdn is involved
(because family type might be discovered using DNS queries), proto type
and xprt type are already known. As such, the caller might be interested
in knowing those address related hints even if the address family type is
not yet resolved and thus the matching protocol cannot be looked up.
Thus in this patch we add the optional net_addr_type (custom type)
argument to str2sa_range to enable the caller to check the protocol type
and transport type when the function succeeds.
tune.rcvbuf.client and tune.rcvbuf.server are not suitable for shared
dgram sockets because they're per connection so their units are not the
same. However, QUIC's listener and log servers are not connected and
take per-thread or per-process traffic where a socket log buffer might
be too small, causing undesirable packet losses and retransmits in the
case of QUIC. This essentially manifests in listener mode with new
connections taking a lot of time to set up under heavy traffic due to
the small queues causing delays. Let's add a few new settings allowing
to set these shared socket sizes on the frontend and backend side (which
reminds that these are per-front/back and not per client/server hence
not per connection).
hash lb algorithm can be configured with the "log-balance hash <cnv_list>"
directive. With this algorithm, the user specifies a converter list with
<cnv_list>.
The produced log message will be passed as-is to the provided converter
list, and the resulting hash will be used to select the log server that
will receive the log message.
In this patch we add basic support for the random algorithm:
random algorithm picks a random server using the result of the
statistical_prng() function as if it was a hash key to then compute the
related server ID.
There is no support for the <draw> parameter (which is implemented for
tcp/http load-balancing), because we don't have the required metrics to
evaluate server's load in log backends for the moment. Plus it would add
more complexity to the __do_send_log_backend() function so we'll keep it
this way for now but this might be needed in the future.
sticky algorithm always tries to send log messages to the first server in
the farm. The server will stay in front during queue and dequeue
operations (no other server can steal its place), unless it becomes
unavailable, in which case it will be replaced by another server from
the tree.
Using "mode log" in a backend section turns the proxy in a log backend
which can be used to log-balance logs between multiple log targets
(udp or tcp servers)
log backends can be used as regular log targets using the log directive
with "backend@be_name" prefix, like so:
| log backend@mybackend local0
A log backend will distribute log messages to servers according to the
log load-balancing algorithm that can be set using the "log-balance"
option from the log backend section. For now, only the roundrobin
algorithm is supported and set by default.
Introduce log_header struct to easily pass log header data between
functions and use that to simplify the logic around log header
handling.
While at it, some outdated comments were updated as well.
No change in behavior should be expected.
__do_send_log() now takes an extra target parameter to pass an explicit
log target instead of getting it from logger->target.
This will allow __do_send_log() to be called multiple times within a
logger entry containing multiple log targets.
Since a5b325f92 ("MINOR: protocol: add a real family for existing FDs"),
we don't rely anymore on AF_UNSPEC for buffer rings in do_send_log.
But we kept it as a parsing hint to differentiate between implicit and
named rings during ring buffer postparsing.
However it is still a bit confusing and forces us to systematically rely
on target->addr, even for named buffer rings where it doesn't make much
sense anymore.
Now that target->addr was made a pointer in a recent commit, we can
choose not to initialize it when not needed (i.e.: named rings) and use
this as a hint to distinguish implicit rings during init since they rely
on the addr struct to temporarily store the ring's address until the ring
is actually created during postparsing step.
log targets were immediately embedded in logger struct (previously
named logsrv) and could not be used outside of this context.
In this patch, we're introducing log_target type with the associated
helper functions so that it becomes possible to declare and use log
targets outside of loggers scope.
When 'log' directive was implemented, the internal representation was
named 'struct logsrv', because the 'log' directive would directly point
to the log target, which used to be a (UDP) log server exclusively at
that time, hence the name.
But things have become more complex, since today 'log' directive can point
to ring targets (implicit, or named) for example.
Indeed, a 'log' directive does no longer reference the "final" server to
which the log will be sent, but instead it describes which log API and
parameters to use for transporting the log messages to the proper log
destination.
So now the term 'logsrv' is rather confusing and prevents us from
introducing a new level of abstraction because they would be mixed
with logsrv.
So in order to better designate this 'log' directive, and make it more
generic, we chose the word 'logger' which now replaces logsrv everywhere
it was used in the code (including related comments).
This is internal rewording, so no functional change should be expected
on user-side.
The log server lock is pretty visible in perf top when using log samples
because it's taken for each server in turn while trying to validate and
update the log server's index. Let's change this for a CAS, since we have
the index and the range at hand now. This allow us to remove the logsrv
lock.
The test on 4 servers now shows a 3.7 times improvement thanks to much
lower contention. Without log sampling a test producing 4.4M logs/s
delivers 4.4M logs/s at 21 CPUs used, everything spent in the kernel.
After enabling 4 samples (1:4, 2:4, 3:4 and 4:4), the throughput would
previously drop to 1.13M log/s with 37 CPUs used and 75% spent in
process_send_log(). Now with this change, 4.25M logs/s are emitted,
using 26 CPUs and 22% in process_send_log(). That's a 3.7x throughput
improvement for a 30% global CPU usage reduction, but in practice it
mostly shows that the performance drop caused by having samples is much
less noticeable (each of the 4 servers has its index updated for each
log).
Note that in order to even avoid incrementing an index for each log srv
that is consulted, it would be more convenient to have a single index
per frontend and apply the modulus on each log server in turn to see if
the range has to be updated. It would then only perform one write per
range switch. However the place where this is done doesn't have access
to a frontend, so some changes would need to be performed for this, and
it would require to update the current range independently in each
logsrv, which is not necessarily easier since we don't know yet if we
can commit it.
By using a single long long to store both the current range and the
next index, we'll make it possible to perform atomic operations instead
of locking. Let's only regroup them for now under a new "curr_rg_idx".
The upper word is the range, the lower is the index.
The variable curr_rg in process_send_log() is misleading because it is
not related to the integer curr_rg that's used to calculate it, instead
it's a pointer to the current smp_log_range from smp_rgs[], so let's call
it "smp_rg" as a singular for this "smp_rgs" and put an end to this
confusion.
This index is useless because it only serves to know when the global
index reached the end, while the global one already knows it. Let's
just drop it and perform the test on the global range.
It was verified with the following config that the first server continues
to take 1/10 of the traffic, the 2nd one 2/10, the 3rd one 3/10 and the
4th one 4/10:
log 127.0.0.1:10001 sample 1:10 local0
log 127.0.0.1:10002 sample 2,5:10 local0
log 127.0.0.1:10003 sample 3,7,9:10 local0
log 127.0.0.1:10004 sample 4,6,8,10:10 local0
The test of the log range is not very clear, in part due to the
reuse of the "curr_idx" name that happens at two levels. The call
to in_smp_log_range() applies to the smp_info's index to which 1 is
added: it verifies that the next index is still within the current
range.
Let's just have a local variable "next_index" in process_send_log()
that gets assigned the next index (current+1) and compare it to the
current range's boundaries. This makes the test much clearer. We can
then simply remove in_smp_log_range() that's no longer needed.
Since both sink_write and fd_write_frag_line take the maxlen parameter
as argument, there is no added value for the trim before passing the
msg parameter to those functions.
We previously had postparsing logic but only for logsrv sinks, but now we
need to make this operation on logsrv directly instead of sinks to prepare
for additional postparsing logic that is not sink-specific.
To do this, we migrated post_sink_resolve() and sink_postresolve_logsrvs()
to their postresolve_logsrvs() and postresolve_logsrv_list() equivalents.
Then, we split postresolve_logsrv_list() so that the sink-only logic stays
in sink.c (sink_resolve_logsrv_buffer() function), and the "generic"
target part stays in log.c as resolve_logsrv().
Error messages formatting was preserved as far as possible but some slight
variations are to be expected.
As for the functional aspect, no change should be expected.
When leaving cfg_parse_log_forward() on error paths, errmsg which is local
to the function could still point to valid data, and it's our
responsibility to free it.
Instead of freeing it everywhere it is invoved, we free it prior to
leaving the function.
This should be backported as far as 2.4.
Multiple error paths were badly handled in cfg_parse_log_forward():
some errors were raised without interrupting the function execution,
resulting in undefined behavior.
Instead of fixing issues separately, let's fix the whole function at once.
This should be backported as far as 2.4.
"missing name for ip-forward section" is generated instead of "missing
name name for log-forward section" in cfg_parse_log_forward().
This may be backported up to 2.4.
In e709e1e ("MEDIUM: logs: buffer targets now rely on new sink_write")
we started using the sink API instead of using the ring_write function
directly.
But as indicated in the commit message, the maxlen parameter of the log
directive now only applies to the message part and not the complete
payload. I don't know what the original intent was (maybe minimizing code
changes) but it seems wrong, because the doc doesn't mention this special
case, and the result is that the ring->buffer output can differ from all
other log output types, making it very confusing.
One last issue with this is that log messages can end up being dropped at
runtime, only for the buffer target, and even if logsrv->maxlen is
correctly set (including default: 1024) because depending on the generated
header size the payload can grow bigger than the accepted sink size (sink
maxlen is not mandatory) and we have no simple way to detect this at
configuration time.
First, we partially revert e709e1e:
TARGET_BUFFER still leverages the proper sink API, but thanks to
"MINOR: sink: pass explicit maxlen parameter to sink_write()" we now
explicitly pass the logsrv->maxlen to the sink_write function in order
to stop writing as soon as either sink->maxlen or logsrv->maxlen is
reached.
This restores pre-e709e1e behavior with the added benefit from using the
high-level API, which includes automatically announcing dropped message
events.
Then, we also need to take the ending '\n' into account: it is not
explicitly set when generating the logline for TARGET_BUFFER, but it will
be forcefully added by the sink_forward_io_handler function from the tcp
handler applet when log messages from the buffer are forwarded to tcp
endpoints.
In current form, because the '\n' is added later in the chain, maxlen is
not being considered anymore, so the final log message could exceed maxlen
by 1 byte, which could make receiving servers unhappy in logging context.
To prevent this, we sacrifice 1 byte from the logsrv->maxlen to ensure
that the final message will never exceed log->maxlen, even if the '\n'
char is automatically appended later by the forwarding applet.
Thanks to this change TCP (over RING/BUFFER) target now behaves like
FD and UDP targets.
This commit depends on:
- "MINOR: sink: pass explicit maxlen parameter to sink_write()"
It may be backported as far as 2.2
[For 2.2 and 2.4 the patch does not apply automatically, the sink_write()
call must be updated by hand]
sink_write() currently relies on sink->maxlen to know when to stop
writing a given payload.
But it could be useful to pass a smaller, explicit value to sink_write()
to stop before the ring maxlen, for instance if the ring is shared between
multiple feeders.
sink_write() now takes an optional maxlen parameter:
if maxlen is > 0, then sink_write will stop writing at maxlen if maxlen
is smaller than ring->maxlen, else only ring->maxlen will be considered.
[for haproxy <= 2.7, patch must be applied by hand: that is:
__sink_write() and sink_write() should be patched to take maxlen into
account and function calls to sink_write() should use 0 as second argument
to keep original behavior]
A regression was introduced with 5464885 ("MEDIUM: log/sink: re-work
and merge of build message API.").
For UDP targets, a final '\n' is systematically inserted, but with the
rework of the build message API, it is inserted after the maxlen
limitation has been enforced, so this can lead to the final message
becoming maxlen+1. For strict syslog servers that only accept up to
maxlen characters, this could be a problem.
To fix the regression, we take the final '\n' into account prior to
building the message, like it was done before the rework of the API.
This should be backported up to 2.4.
A check was missing in parse_logsrv() to make sure that malloc-dependent
variable is checked for non-NULL before using it.
If malloc fails, the function raises an error and stops, like it's already
done at a few other places within the function.
This partially fixes GH #2130.
It should be backported to every stable versions.
This puts an end to the occasional confusion between the "now" date
that is internal, monotonic and not synchronized with the system's
date, and "date" which is the system's date and not necessarily
monotonic. Variable "now" was removed and replaced with a 64-bit
integer "now_ns" which is a counter of nanoseconds. It wraps every
585 years, so if all goes well (i.e. if humanity does not need
haproxy anymore in 500 years), it will just never wrap. This implies
that now_ns is never nul and that the zero value can reliably be used
as "not set yet" for a timestamp if needed. This will also simplify
date checks where it becomes possible again to do "date1<date2".
All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns".
Due to the intricacies between now, global_now and now_offset, all 3
had to be turned to nanoseconds at once. It's not a problem since all
of them were solely used in 3 functions in clock.c, but they make the
patch look bigger than it really is.
The clock_update_local_date() and clock_update_global_date() functions
are now much simpler as there's no need anymore to perform conversions
nor to round the timeval up or down.
The wrapping continues to happen by presetting the internal offset in
the short future so that the 32-bit now_ms continues to wrap 20 seconds
after boot.
The start_time used to calculate uptime can still be turned to
nanoseconds now. One interrogation concerns global_now_ms which is used
only for the freq counters. It's unclear whether there's more value in
using two variables that need to be synchronized sequentially like today
or to just use global_now_ns divided by 1 million. Both approaches will
work equally well on modern systems, the difference might come from
smaller ones. Better not change anyhting for now.
One benefit of the new approach is that we now have an internal date
with a resolution of the nanosecond and the precision of the microsecond,
which can be useful to extend some measurements given that timestamps
also have this resolution.
Instead we're using ns_to_sec(tv_to_ns(&now)) which allows the tv_sec
part to disappear. At this point, "now" is only used as a timeval in
clock.c where it is updated.
Let's get rid of timeval in storage of internal timestamps so that they
are no longer mistaken for wall clock time. These were exclusively used
subtracted from each other or to/from "now" after being converted to ns,
so this patch removes the tv_to_ns() conversion to use them natively. Two
occurrences of tv_isge() were turned to a regular wrapping subtract.
Instead of operating on {sec, usec} now we convert both operands to
ns then subtract them and convert to ms. This is a first step towards
dropping timeval from these timestamps.
Interestingly, tv_ms_elapsed() and tv_ms_remain() are no longer used at
all and could be removed.
In the syslog applet, when there is no output data, nothing is performed and
the applet leaves by requesting more data. But it is an issue because a
client abort is only handled if it reported with the last bytes of the
message. If the abort occurs after the message was handled, it is ignored.
The session remains opened and inactive until the client timeout is being
triggered. It no such timeout is configured, given that the default maxconn
is 10, all slots can be quickly busy and make the applet unresponsive.
To fix the issue, the best is to always try to read a message when the I/O
handle is called. This way, the abort can be handled. And if there is no
data, we leave as usual.
This patch should fix the issue #2112. It must be backported as far as 2.4.
When the log applet is executed while a shut is pending, the remaining
output data must always be consumed. Otherwise, this can prevent the stream
to exit, leading to a spinning loop on the applet.
It is 2.8-specific. No backport needed.
Just like for other applets, we now use the SE descriptor instead of the
channel to report error and end-of-stream.
Here, the refactoring only reports errors by setting SE_FL_ERROR flag.