Since 04276f3d ("MEDIUM: server: split the address and the port into two
different fields") we should not use srv->addr to store server's port
and rely on srv->svc_port instead.
For sink servers, we correctly set >svc_port upon server creation but
we didn't use it when initializing address for the connection.
As a result, FQDN resolution will not work properly with sink servers.
Hopefully, this used to work by accident because sink servers were
resolved using the PA_O_RESOLVE flag in str2sa_range(), which made the
srv->addr contain the port in addition to the address.
But this will fail to work when FQDN resolution is postponed because only
->svc_port will contain the proper server port upon resolution.
For instance, FQDN resolution with servers from log backends (which are
resolved as regular servers, that is, without the PA_O_RESOLVE) will fail
to work because of this.
This may be backported as far as 2.2 even though the bug didn't have
noticeable effects for versions below 2.9
[In 2.2, sink_forward_session_init() didn't exist it should be applied in
sink_forward_session_create()]
"log-bufsize" may now be used for a log server (in a log backend) to
configure the bufsize of implicit ring associated to the server (which
defaults to BUFSIZE).
This helper function can be used to create a new sink from an existing
server struct (and thus existing proxy as well), in order to spare some
resources when possible.
implicit rings were automatically forced to the parent logger format, but
this was done upon ring creation.
This is quite restrictive because we might want to choose the desired
format right before generating the log header (ie: when producing the
log message), depending on the logger (log directive) that is
responsible for the log message, and with current logic this is not
possible. (To this day, we still have dedicated implicit ring per log
directive, but this might change)
In ring_write(), we check if the sink->fmt is specified:
- defined: we use it since it is the most precise format
(ie: for named rings)
- undefined: then we fallback to the format from the logger
With this change, implicit rings' format is now set to UNSPEC upon
creation. This is safe because the log header building function
automatically enforces the "raw" format when UNSPEC is set. And since
logger->format also defaults to "raw", no change of default behavior
should be expected.
Introduce log_header struct to easily pass log header data between
functions and use that to simplify the logic around log header
handling.
While at it, some outdated comments were updated as well.
No change in behavior should be expected.
Since a5b325f92 ("MINOR: protocol: add a real family for existing FDs"),
we don't rely anymore on AF_UNSPEC for buffer rings in do_send_log.
But we kept it as a parsing hint to differentiate between implicit and
named rings during ring buffer postparsing.
However it is still a bit confusing and forces us to systematically rely
on target->addr, even for named buffer rings where it doesn't make much
sense anymore.
Now that target->addr was made a pointer in a recent commit, we can
choose not to initialize it when not needed (i.e.: named rings) and use
this as a hint to distinguish implicit rings during init since they rely
on the addr struct to temporarily store the ring's address until the ring
is actually created during postparsing step.
log targets were immediately embedded in logger struct (previously
named logsrv) and could not be used outside of this context.
In this patch, we're introducing log_target type with the associated
helper functions so that it becomes possible to declare and use log
targets outside of loggers scope.
When 'log' directive was implemented, the internal representation was
named 'struct logsrv', because the 'log' directive would directly point
to the log target, which used to be a (UDP) log server exclusively at
that time, hence the name.
But things have become more complex, since today 'log' directive can point
to ring targets (implicit, or named) for example.
Indeed, a 'log' directive does no longer reference the "final" server to
which the log will be sent, but instead it describes which log API and
parameters to use for transporting the log messages to the proper log
destination.
So now the term 'logsrv' is rather confusing and prevents us from
introducing a new level of abstraction because they would be mixed
with logsrv.
So in order to better designate this 'log' directive, and make it more
generic, we chose the word 'logger' which now replaces logsrv everywhere
it was used in the code (including related comments).
This is internal rewording, so no functional change should be expected
on user-side.
now forward_px only serves as a hint to know if a proxy was created
specifically for the sink, in which case the sink is responsible for it.
Everywhere forward_px was used in appctx context: get the parent proxy from
the sft->srv instead.
This permits to finally get rid of the double link dependency between sink
and proxy.
Removing unnecessary dependency on proxy->parent pointer in
sink appctx functions by directly using the sink sft from the
applet->svcctx to get back to sink related structs.
Thanks to this, proxy used for a ringbuf does not have to be exclusive
to a single sink anymore.
It's useless to check if sink has been created with BUF type after
calling sink_new_buf() since the goal of the function is to create
a new sink of BUF type.
Fixing some typos that have been overlooked during the recent log/sink
API improvements. Using this patch to make sink_new_from_logsrv() static
since it is not used outside of sink.c
The ring lock was initially mostly used for the logs and used to inherit
its name in lock stats. Now that it's exclusively used by rings, let's
rename it accordingly.
To further clean the code and remove duplication, some sink postparsing
and sink->sft finalization is now performed in a dedicated function
named sink_finalize().
In this patch we move sink freeing logic outside of sink_deinit() function
in order to create the sink_free() helper function that could be used
on error paths for example.
We previously had postparsing logic but only for logsrv sinks, but now we
need to make this operation on logsrv directly instead of sinks to prepare
for additional postparsing logic that is not sink-specific.
To do this, we migrated post_sink_resolve() and sink_postresolve_logsrvs()
to their postresolve_logsrvs() and postresolve_logsrv_list() equivalents.
Then, we split postresolve_logsrv_list() so that the sink-only logic stays
in sink.c (sink_resolve_logsrv_buffer() function), and the "generic"
target part stays in log.c as resolve_logsrv().
Error messages formatting was preserved as far as possible but some slight
variations are to be expected.
As for the functional aspect, no change should be expected.
maxlen now defaults ~0 (instead of BUFSIZE) to make sure no implicit
truncation will be performed when the option is not specified, since the
doc doesn't mention any default value for maxlen. As such, if the payload
is too big, it will be dropped (this is the default expected behavior).
Consider the following example:
|log ring@test-ring len 2000 local0
|
|ring test-ring
| maxlen 1000
This would result in emitted logs being silently truncated to 1000 because
test-ring maxlen is smaller than the log directive maxlen.
In this patch we're adding an extra check in post_sink_resolve() to detect
this kind of confusing setups and warn the user about the implicit
truncation when DIAG mode is on.
This commit depends on:
- "MINOR: sink: simplify post_sink_resolve function"
To prevent logs from being silently (and unexpectly droppped) at runtime,
we check that the maxlen parameter from the log directives are
strictly inferior to the targeted ring size.
|global
| tune.bufsize 16384
| log tcp@127.0.0.1:514 len 32768
| log myring@127.0.0.1:514 len 32768
|ring myring
| # no explicit size
On such configs, a diag warning will be reported.
This commit depends on:
- "MINOR: sink: simplify post_sink_resolve function"
- "MINOR: ring: add a function to compute max ring payload"
When user specifies a maxlen parameter that is greater than the size of
a given ring section, a warning is emitted to inform that the max length
exceeds size, and then the maxlen is forced to size.
The logic is good, but imprecise, because it doesn't take into account
the slight overhead from storing payloads into the ring.
In practise, we cannot store a single message which is exactly the same
length than size. Doing so will result in the message being dropped at
runtime.
Thanks to the ring_max_payload() function introduced in "MINOR: ring: add
a function to compute max ring payload", we can now deduce the maximum
value for the maxlen parameter before it could result in messages being
dropped.
When maxlen value is set to an improper value, the warning will be emitted
and maxlen will be forced to the maximum "single" payload len that could
fit in the ring buffer, preventing messages from being dropped
unexpectedly.
This commit depends on:
- "MINOR: ring: add a function to compute max ring payload"
This may be backported as far as 2.2
When errors are encountered in sink_new_from_logsrv() function,
incompetely allocated ressources are freed to prevent memory leaks.
For instance: logsrv implicit server is manually cleaned up on error prior
to returning from the function.
However, since 198e92a8e5 ("MINOR: server: add a global list of all known
servers") every server created using new_server() is registered to the
global list, but unfortunately the manual srv cleanup in
sink_new_from_logsrv() doesn't remove the srv from the global list, so the
freed server will still be referenced there, which can result in invalid
reads later.
Moreover, server API has evolved since, and now the srv_drop() function is
available for that purpose, so let's use it, but make sure that srv is
freed before the proxy because on older versions srv_drop() expects the
srv to be linked to a valid proxy pointer.
This must be backported up to 2.4.
[For 2.4 version, free_server() must be used instead of srv_drop()]
Multiple error paths (memory,IO related) in cfg_post_parse_ring() were
not implemented correcly and could result in memory leak or undefined
behavior.
Fixing them all at once.
This can be backported in 2.4
sft freeing attempt made in a575421 ("BUG/MINOR: sink: missing sft free in
sink_deinit()") is incomplete, because sink->sft is meant to be used as a
list and not a single sft entry.
Because of that, the previous fix only frees the first sft entry, which
fixes memory leaks for single-server forwarders (this is the case for
implicit rings), but could still result in memory leaks when multiple
servers are configured in a explicit ring sections.
What this patch does: instead of directly freeing sink->sft, it iterates
over every list members to free them.
It must be backported up to 2.4 with a575421.
sink_write() currently relies on sink->maxlen to know when to stop
writing a given payload.
But it could be useful to pass a smaller, explicit value to sink_write()
to stop before the ring maxlen, for instance if the ring is shared between
multiple feeders.
sink_write() now takes an optional maxlen parameter:
if maxlen is > 0, then sink_write will stop writing at maxlen if maxlen
is smaller than ring->maxlen, else only ring->maxlen will be considered.
[for haproxy <= 2.7, patch must be applied by hand: that is:
__sink_write() and sink_write() should be patched to take maxlen into
account and function calls to sink_write() should use 0 as second argument
to keep original behavior]
When maxlen parameter exceeds sink size, a warning is generated and maxlen
is enforced to sink size. But the err_code is incorrectly set to ERR_ALERT
Indeed, being a "warning", ERR_WARN should be used here.
This may be backported as far as 2.2
When a ring section defines its size using the "size" directive with a
smaller size than the default one or smaller than the previous one,
a warning is generated to inform the user that the new size will be
ignored.
However the err_code is returned as FATAL, so this cause haproxy to
incorrectly abort the init sequence.
Changing the err_code to ERR_WARN so that this warning doesn't refrain
from successfully starting the process.
This should be backported as far as 2.4
Adding missing free for sft (string_forward_target) in sink_deinit(),
which resulted in minor leak for each declared ring target at deinit().
(either explicit and implicit rings are affected)
This may be backported up to 2.4.
forward proxy server list created from sink_new_from_logsrv() is invalid
Indeed, srv->next is literally assigned to itself. This did not cause
issues during syslog handling because the sft was properly set, but it
will cause the free_proxy(sink->forward_px) at deinit to go wild since
free_proxy() will try to iterate through the proxy srv list to free
ressources, but because of the improper list initialization, double-free
and infinite-loop will occur.
This bug was revealed by 9b1d15f53a ("BUG/MINOR: sink: free forward_px on deinit()")
It must be backported as far as 2.4.
This puts an end to the occasional confusion between the "now" date
that is internal, monotonic and not synchronized with the system's
date, and "date" which is the system's date and not necessarily
monotonic. Variable "now" was removed and replaced with a 64-bit
integer "now_ns" which is a counter of nanoseconds. It wraps every
585 years, so if all goes well (i.e. if humanity does not need
haproxy anymore in 500 years), it will just never wrap. This implies
that now_ns is never nul and that the zero value can reliably be used
as "not set yet" for a timestamp if needed. This will also simplify
date checks where it becomes possible again to do "date1<date2".
All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns".
Due to the intricacies between now, global_now and now_offset, all 3
had to be turned to nanoseconds at once. It's not a problem since all
of them were solely used in 3 functions in clock.c, but they make the
patch look bigger than it really is.
The clock_update_local_date() and clock_update_global_date() functions
are now much simpler as there's no need anymore to perform conversions
nor to round the timeval up or down.
The wrapping continues to happen by presetting the internal offset in
the short future so that the 32-bit now_ms continues to wrap 20 seconds
after boot.
The start_time used to calculate uptime can still be turned to
nanoseconds now. One interrogation concerns global_now_ms which is used
only for the freq counters. It's unclear whether there's more value in
using two variables that need to be synchronized sequentially like today
or to just use global_now_ns divided by 1 million. Both approaches will
work equally well on modern systems, the difference might come from
smaller ones. Better not change anyhting for now.
One benefit of the new approach is that we now have an internal date
with a resolution of the nanosecond and the precision of the microsecond,
which can be useful to extend some measurements given that timestamps
also have this resolution.
Instead we're using ns_to_sec(tv_to_ns(&now)) which allows the tv_sec
part to disappear. At this point, "now" is only used as a timeval in
clock.c where it is updated.
When a ring section is configured, a new sink is created and forward_px
proxy may be allocated and assigned to the sink.
Such sink-related proxies are added to the sink_proxies_list and thus
don't belong to the main proxy list which is cleaned up in
haproxy deinit() function.
We don't have to manually clean up sink_proxies_list in the main deinit()
func:
sink API already provides the sink_deinit() function so we just add the
missing free_proxy(sink->forward_px) there.
This could be backported up to 2.4.
[in 2.4, commit b0281a49 ("MINOR: proxy: check if p is NULL in free_proxy()")
must be backported first]
The state of the opposite SC is already tested to wait the connection is
established before sending messages. So, there is no reason to test it again
before looping on the ring buffer.
Since d9c7188 ("MEDIUM: ring: make the offset relative to the head/tail instead
of absolute"), ring offset calculation has changed: we don't rely on ring->ofs
absolute offset anymore.
But with the above patch, relative offset is not properly calculated in
sink_forward_oc_io_handler() and sink_forward_io_handler().
The issue here is the same as 737d10f ("BUG/MEDIUM: dns: ensure ring offset is
properly reajusted to head") since dns and sink_forward share the same
ring logic:
When the ring is becoming full, ring_write() will try to regain some space to
insert new data by calling b_del() on older messages. Here b_del() moves
buffer's head under the hood, and since ring->ofs cannot be used to "correct"
the relative offset, both sink_forward_oc_io_handler() and
sink_forward_io_handler() start to get invalid offset.
At this point, we will suffer from ring data corruption resulting in unexpected
behavior or process crashes.
This can be easily demonstrated with the following test:
|log-forward syslog
| dgram-bind 127.0.0.1:5114
| log ring@logbuffer local0
|
|ring logbuffer
| format rfc5424
| size 16384
| server logserver 127.0.0.1:5114
Haproxy will forward incoming logs on udp@127.0.0.1:5114 to
tcp@127.0.0.1:5114
Then use the following tcp server:
nc -l -p 5114
With the following udp log sender:
|while [ 1 ]
|do
| logger --udp --server 127.0.0.1 -P 5114 -p user.warn "Test 7"
|done
Once the ring buffer is full (it takes less that a second to fill the 16k
buffer) haproxy starts to misbehave and the log forwarding stops.
We apply the same fix as in 737d10f ("BUG/MEDIUM: dns: ensure ring offset is
properly reajusted to head").
Please note the ~0 case that is handled slightly differently in this patch:
this is required to properly start reading from a non-empty ring. This case
will be fixed in dns related code in the following patch.
This does not need to be backported as d9c7188 was not marked for backports.
The ring's offset currently contains a perpetually growing custor which
is the number of bytes written from the start. It's used by readers to
know where to (re)start reading from. It was made absolute because both
the head and the tail can change during writes and we needed a fixed
position to know where the reader was attached. But this is complicated,
error-prone, and limits the ability to reduce the lock's coverage. In
fact what is needed is to know where the reader is currently waiting, if
at all. And this location is exactly where it stored its count, so the
absolute position in the buffer (the seek offset from the first storage
byte) does represent exactly this, as it doesn't move (we don't realign
the buffer), and is stable regardless of how head/tail changes with writes.
This patch modifies this so that the application code now uses this
representation instead. The most noticeable change is the initialization,
where we've kept ~0 as a marker to go to the end, and it's now set to
the tail offset instead of trying to resolve the current write offset
against the current ring's position.
The offset was also used at the end of the consuming loop, to detect
if a new write had happened between the lock being released and taken
again, so as to wake the consumer(s) up again. For this we used to
take a copy of the ring->ofs before unlocking and comparing with the
new value read in the next lock. Since it's not possible to write past
the current reader's location, there's no risk of complete rollover, so
it's sufficient to check if the tail has changed.
Note that the change also has an impact on the haring consumer which
needs to adapt as well. But that's good in fact because it will rely
on one less variable, and will use offsets relative to the buffer's
head, and the change remains backward-compatible.
It was done by hand by callers when a shutdown for read or write was
performed. It is now always handled by the functions performing the
shutdown. This way the callers don't take care of it. This will avoid some
bugs.
An endpoint should now set SE_FL_EXP_NO_DATA flag if it does not expect any
data from the opposite endpoint. This way, the stream will be able to
disable any read timeout on the opposite endpoint. Applets should use
applet_expect_no_data() and applet_expect_data() functions to set or clear
the flag. For now, only dns and sink forwarder applets are concerned.
Read and write timeouts (.rto and .wto) are now replaced by an unique
timeout, call .ioto. Since the recent refactoring on channel's timeouts,
both use the same value, the client timeout on client side and the server
timeout on the server side. Thus, this part may be simplified. Now it
represents the I/O timeout.
These timers are related to the I/O. Thus it is logical to move them into
the SE descriptor. The patch is a bit huge but it is just a
replacement. However it is error-prone.
From the stconn or the stream, helper functions are used to get, set or
reset these timers. This simplify the timers manipulations.
Read and write timeouts concerns the I/O. Thus, it is logical to move it into
the stconn. At the end, the stream is responsible to detect the timeouts. So
it is logcial to have these values in the stconn and not in the SE
descriptor. But it may change depending on the recfactoring.
So, now:
* scf->rto is used instead of req->rto
* scf->wto is used instead of res->wto
* scb->rto is used instead of res->rto
* scb->wto is used instead of req->wto