Since traces were adapted to support being declared in the global section
in 2.7 with commit c11f1cdf4 ("MINOR: trace: split the CLI "trace" parser
in CLI vs statement"), the method used to return the error message was
unreliable. For example an invalid sink name in the global section would
produce:
[ALERT] (26685) : config : parsing [test-trace.cfg:51] : 'trace': No such sink
[ALERT] (26685) : config : parsing [test-trace.cfg:51] : (null)
[ALERT] (26685) : config : Error(s) found in configuration file : test-trace.cfg
[ALERT] (26685) : config : Fatal errors found in configuration.
The reason is that the trace is emitted manually using ha_error() in
cfg_parse_trace() and -1 is returned without setting the message, and
the caller also prints the empty message. That's quite awkward given
that the API originally comes from the CLI which does support dynamic
strings and that config keywords do as well.
This commit modifies both cli_parse_trace() and cfg_parse_trace() to
return a dynamically allocated message instead, and adapts the central
function trace_parse_statement() to do the same, replacing a few direct
assignments with strdup() or memprintf(). This way the alert is no
longer emitted by the parser function, it just passes the message to
the caller.
A few of the static messages switching to memprintf() also took this
opportunity to report the faulty word:
[ALERT] (26772) : config : parsing [test-trace.cfg:51] : No such trace sink 'stduot'
[ALERT] (26772) : config : Error(s) found in configuration file : test-trace.cfg
[ALERT] (26772) : config : Fatal errors found in configuration.
This may be backported to 2.8 and 2.7.
Introduce log_header struct to easily pass log header data between
functions and use that to simplify the logic around log header
handling.
While at it, some outdated comments were updated as well.
No change in behavior should be expected.
sink_write() currently relies on sink->maxlen to know when to stop
writing a given payload.
But it could be useful to pass a smaller, explicit value to sink_write()
to stop before the ring maxlen, for instance if the ring is shared between
multiple feeders.
sink_write() now takes an optional maxlen parameter:
if maxlen is > 0, then sink_write will stop writing at maxlen if maxlen
is smaller than ring->maxlen, else only ring->maxlen will be considered.
[for haproxy <= 2.7, patch must be applied by hand: that is:
__sink_write() and sink_write() should be patched to take maxlen into
account and function calls to sink_write() should use 0 as second argument
to keep original behavior]
By default, passing a NULL cb to the trace functions will result in the
source's default one to be used. For some cases we won't want to use any
callback at all, not event the default one. Let's define a trace_no_cb()
function for this, that does absolutely nothing.
Sometimes it would be necessary to prepare some messages, pre-process some
blocks or maybe duplicate some contents before they vanish for the purpose
of tracing them. However we don't want to do that for everything that is
submitted to the traces, it's important to do it only for what will really
be traced.
The __trace() function has all the knowledge for this, to the point of even
checking the lockon pointers. This commit splits the function in two, one
with the trace decision logic, and the other one for the trace production.
The first one is now usable through wrappers such as _trace_enabled() and
TRACE_ENABLED() which will indicate whether traces are going to be produced
for the current source, level, event mask, parameters and tracking.
There are ifdefs at several places to only define TRC_ARGS_QCON when
QUIC is defined, but nothing prevents this code from building without.
Let's just remove those ifdefs, the single "if" they avoid is not worth
the extra maintenance burden.
The exact same commands as those from the CLI may be pre-loaded at boot
time by passing them one per line after the "trace" keyword in the global
section; i.e. just copy-pasting all commands directly there will do the
job. Note that if a ring is mentioned, it needs to be declared before the
global section. Another option is to append another global section after
"ring".
For now the keyword is marked as experimental to discourage its broad
adoption by default. "expose-experimental-directives" needs to be placed
in the global section to expose it.
In order to be able to reuse the "trace" statements elsewhere (e.g.
global section), we'll first need to split its parser. It turns out
that the whole thing is self-contained inside a single function that
emits a single message on warning/error or nothing on success. That's
quite easy to split in two parts, the one that does the job and produces
the status message and the one that sends it to the CLI. That's what
this patch does.
Email alerts are based on health-checks but with no server. Thus, in
__trace() function, responsible to write a trace message, we must be
prepared to have no server and thus no proxy.
This patch must be backported as far as 2.4.
There's a rare race condition possible when trying to retrieve session from
a back connection's owner, that was fixed in 2.4 and described in commit
3aab17bd5 ("BUG/MAJOR: connection: reset conn->owner when detaching from
session list").
It also affects the trace code which does the same, so the same fix is
needed, i.e. check from conn->session_list that the connection is still
enlisted. It's visible when sending a few tens to hundreds of parallel
requests to an h2 backend and enabling traces in parallel.
This should be backported as far as 2.2 which is the oldest version
supporting traces.
This one is referenced in initcalls by its pointer, it makes no sense
to declare it inline. At best it causes function duplication, at worst
it doesn't build on older compilers.
Prepare trace support for quic_conn instances as argument. This will be
used by the xprt-quic layer in replacement of the connection.
This commit is part of the rearchitecture of xprt-quic layers and the
separation between xprt and connection instances.
There were 102 CLI commands whose help were zig-zagging all along the dump
making them unreadable. This patch realigns all these messages so that the
command now uses up to 40 characters before the delimiting colon. About a
third of the commands did not correctly list their arguments which were
added after the first version, so they were all updated. Some abuses of
the term "id" were fixed to use a more explanatory term. The
"set ssl ocsp-response" command was not listed because it lacked a help
message, this was fixed as well. The deprecated enable/disable commands
for agent/health/server were prominently written as deprecated. Whenever
possible, clearer explanations were provided.
With commit a1f12746b ("MINOR: traces: add a new level "error" below
the "user" level") a new trace level was inserted, resulting in
shifting all exiting ones by one. But the levels reported in the
__trace() function were not updated accordingly, resulting in the
TRACE_LEVEL_DEVELOPER not to be properly reported anymore. This
patch fixes it by extending the number of levels to 6.
No backport is needed.
Sometimes it would be nice to be able to only trace abnormal events such
as protocol errors. Let's add a new "error" level below the "user" level
for this. This will allow to add TRACE_ERROR() at various error points
and only see them.
This patch merges build message code between sink and log
and introduce a new API based on struct ist array to
prepare message header with zero copy, targeting the
log forwarding feature.
Log format 'iso' and 'timed' are now avalaible on logs line.
A new log format 'priority' is also added.
Most of the files dealing with error reports have to include log.h in order
to access ha_alert(), ha_warning() etc. But while these functions don't
depend on anything, log.h depends on a lot of stuff because it deals with
log-formats and samples. As a result it's impossible not to embark long
dependencies when using ha_warning() or qfprintf().
This patch moves these low-level functions to errors.h, which already
defines the error codes used at the same places. About half of the users
of log.h could be adjusted, sometimes revealing other issues such as
missing tools.h. Interestingly the total preprocessed size shrunk by
4%.
The current state of the logging is a real mess. The main problem is
that almost all files include log.h just in order to have access to
the alert/warning functions like ha_alert() etc, and don't care about
logs. But log.h also deals with real logging as well as log-format and
depends on stream.h and various other things. As such it forces a few
heavy files like stream.h to be loaded early and to hide missing
dependencies depending where it's loaded. Among the missing ones is
syslog.h which was often automatically included resulting in no less
than 3 users missing it.
Among 76 users, only 5 could be removed, and probably 70 don't need the
full set of dependencies.
A good approach would consist in splitting that file in 3 parts:
- one for error output ("errors" ?).
- one for log_format processing
- and one for actual logging.
Almost no change except moving the cli_kw struct definition after the
defines. Almost all users had both types&proto included, which is not
surprizing since this code is old and it used to be the norm a decade
ago. These places were cleaned.
The sink files could be moved with almost no change at since they
didn't rely on anything fancy. ssize_t required sys/types.h and
thread.h was needed for the locks.
The pretty confusing "buffer.h" was in fact not the place to look for
the definition of "struct buffer" but the one responsible for dynamic
buffer allocation. As such it defines the struct buffer_wait and the
few functions to allocate a buffer or wait for one.
This patch moves it renaming it to dynbuf.h. The type definition was
moved to its own file since it's included in a number of other structs.
Doing this cleanup revealed that a significant number of files used to
rely on this one to inherit struct buffer through it but didn't need
anything from this file at all.
Half of the users of this include only need the type definitions and
not the manipulation macros nor the inline functions. Moves the various
types into mini-clist-t.h makes the files cleaner. The other one had all
its includes grouped at the top. A few files continued to reference it
without using it and were cleaned.
In addition it was about time that we'd rename that file, it's not
"mini" anymore and contains a bit more than just circular lists.
Fortunately that file wasn't made dependent upon haproxy since it was
integrated, better isolate it before it's too late. Its dependency on
api.h was the result of the change from config.h, which in turn wasn't
correct. It was changed back to stddef.h for size_t and sys/types.h for
ssize_t. The recently added reference to MAX() was changed as it was
placed only to avoid a zero length in the non-free-standing version and
was causing a build warning in the hpack encoder.
All files that were including one of the following include files have
been updated to only include haproxy/api.h or haproxy/api-t.h once instead:
- common/config.h
- common/compat.h
- common/compiler.h
- common/defaults.h
- common/initcall.h
- common/tools.h
The choice is simple: if the file only requires type definitions, it includes
api-t.h, otherwise it includes the full api.h.
In addition, in these files, explicit includes for inttypes.h and limits.h
were dropped since these are now covered by api.h and api-t.h.
No other change was performed, given that this patch is large and
affects 201 files. At least one (tools.h) was already freestanding and
didn't get the new one added.
This patch extends the sink_write prototype and code to
handle the rfc5424 and rfc3164 header.
It uses header building tools from log.c. Doing this some
functions/vars have been externalized.
facility and minlevel have been removed from the struct sink
and passed to args at sink_write because they depends of the log
and not of the sink (they remained unused by rest of the code
until now).
Since commit 88ebd40 ("MINOR: trace: add allocation of buffer-sized
trace buffers") we have a trace buffer allocated at boot time. But
there was a copy-paste error there making the test verify that the
trash was allocated instead of the trace buffer. The result is that
depending on the link order either the test will succeed or fail,
preventing haproxy from starting at all.
No backport is needed.
The detail level initially based on syslog levels is not used, while
something related is missing, trace verbosity, to indicate whether or
not we want to call the decoding callback and what level of decoding
we want (raw captures etc). Let's change the field to "verbosity" for
this. A verbosity of zero means that the decoding callback is not
called, and all other levels are handled by this callback and are
source-specific. The source is now prompted to list the levels that
are proposed to the user. When the source doesn't define anything,
"quiet" and "default" are available.
It's very convenient to know the trace level in the output, at least to
grep/grep -v on it. The main usage is to filter/filter out the developer
traces at level DEVEL. For this we now add the numeric level (0 to 4) just
after the source name in the brackets. The output now looks like this :
[00|h2|4|mux_h2.c:3174] h2_send(): entering : h2c=0x27d75a0 st=2
-, -, | ------------, --------, -------------------> message
| | | | '-> function name
| | | '-> source file location
| | '-> trace level (4=dev)
| '-> trace source
'-> thread number
Working on adding traces to mux-h2 revealed that the function names are
manually copied a lot in developer traces. The reason is that they are
not preprocessor macros and as such cannot be concatenated. Let's
slightly adjust the trace() function call to take a function name just
after the file:line argument. This argument is only added for the
TRACE_DEVEL and 3 new TRACE_ENTER, TRACE_LEAVE, and TRACE_POINT macros
and left NULL for others. This way the function name is only reported
for traces aimed at the developers. The pretty-print callback was also
extended to benefit from this. This will also significantly shrink the
data segment as the "entering" and "leaving" strings will now be merged.
One technical point worth mentioning is that the function name is *not*
passed as an ist to the inline function because it's not considered as
a builtin constant by the compiler, and would lead to strlen() being
run on it from all call places before calling the inline function. Thus
instead we pass the const char * (that the compiler knows where to find)
and it's the __trace() function that converts it to an ist for internal
consumption and for the pretty-print callback. Doing this avoids losing
5-10% peak performance.
The "payload" trace level was ambigous because its initial purpose was
to be able to dump received data. But it doesn't make sense to force to
report data transfers just to be able to report state changes. For
example, all snd_buf()/rcv_buf() operations coming from the application
layer should be tagged at this level. So here we move this payload level
above the state transitions and rename it to avoid the ambiguity making
one think it's only about request/response payload. Now it clearly is
about any data transfer and is thus just below the developer level. The
help messages on the CLI and the doc were slightly reworded to help
remove this ambiguity.
Traces were missing the thread number and the source name, which was
really annoying. Now the thread number is emitted on two digits inside
the square brackets, followed by the source name then the line location,
each delimited with a vertical bar, such as below :
[00|h2|mux_h2.c:2651] Notifying stream about SID change : h2c=0x7f3284581ae0 st=3 h2s=0x7f3284297f00 id=523 st=4
[00|h2|mux_h2.c:2708] receiving H2 HEADERS frame : h2c=0x7f3284581ae0 st=3 dsi=525 (st=0)
[02|h2|mux_h2.c:2194] Received H2 request : h2c=0x7f328d3d1ae0 st=2 : [525] H2 REQ: GET / HTTP/2.0
[02|h2|mux_h2.c:2561] Expecting H2 frame header : h2c=0x7f328d3d1ae0 st=2
It becomes apparent that most traces will use a single trace pretty
print callback, so let's allow the trace source to declare a default
one so that it can be omitted from trace calls, and will be used if
no other one is specified.
Now we try to find frontend, listener, backend, server, connection,
session, stream, from the presented argument of type connection,
stream or session. Various combinations and bounces allow to
retrieve most of them almost all the time. The extraction is
performed early so that we'll be able to apply filters later.
The lock-on is set if it was not there while the trace is running and
a valid pointer is available. If it was already set and doesn't match,
no trace is produced.
When no criterion is provided, it carefully enumerates all available
ones, including those provided by the source itself. Otherwise it sets
the new criterion and resets the lockon pointer.
When we stop or pause a trace (either on a matching event or by hand),
we must also stop the lock-on feature so that we don't follow any
further activity on this pointer even if it is recycled. For now this
is not exploited.
The trace() call will support an optional decoding callback and 4
arguments that this function is supposed to know how to use to provide
extra information. The output remains unchanged when the function is
NULL. Otherwise, the message is pre-filled into the thread-local
trace_buf, and the function is called with all arguments so that it
completes the buffer in a readable form depending on the expected
level of detail.
This new "level" argument will allow the trace sources to label the
traces for different purposes, and filter out some of them if they
are not relevant to the current target. Right now we have 5 different
levels:
- USER : the least verbose one, only a few functional information
- PAYLOAD: like user but also displays some payload-related information
- PROTO: focuses on the protocol's framing
- STATE: also indicate state internal transitions or non-transitions
- DEVELOPER: adds extra info about branches taken in the code (break
points, return points)
We now pass an extra argument "where" to the trace() call, which
is supposed to be an ist made of the concatenation of the filename
and the line number. We only keep the last 10 chars from this string
since the end of file names is most often easy to recognize. This
gives developers useful information at very low cost.
For now it remains quite basic. It performs a few state checks, calls
the source's sink if defined, and performs the transitions between
RUNNING, STOPPED and WAITING when the configured events match.
The new "show trace" CLI command lists available trace sources and
indicates their status, their sink, and number of dropped packets.
When "show trace <source>" is used, the list of known events is
also listed with their status per action (report/start/stop/pause).
The "level" keyword allows to indicate the expected level of verbosity
in the traces, among "user" (least verbose, just synthetic info) to
"developer" (very detailed, including function entry/leaving). It's only
displayed and set but not used yet.
For now it lists the sources if one is not provided, and checks
for the source's existence. It lists the events if not provided,
checks for their existence if provided, and adjusts reported
events/start/stop/pause events, and performs state transitions.
It lists sinks and adjusts them as well. Filters, lock, and
level are not implemented yet.