In 12d08cf912 ("BUG/MEDIUM: log: don't ignore disabled node's options"),
while trying to restore historical node option inheritance behavior, I
broke the '+bin' logformat node option recently introduced in b7c3d8c87c
("MINOR: log: add +bin logformat node option").
Indeed, because of 12d08cf912, LOG_OPT_BIN is not set anymore on
individual nodes even if it was set globally, making the feature unusable.
('+bin' is also used for binary cbor encoding)
What I should have done instead is include LOG_OPT_BIN in the options
inherited from global ones. This is what's being done in this commit.
Misleading comment was adjusted.
It must be backported in 3.0 with 12d08cf912.
In previous commit we implemented log-profile parsing logic. Now let's
actually make use of available log-profile information from logger struct
to decide whether we need to rebuild the logline under process_send_log()
according to log profile settings. Nothing is done if the logger didn't
specify a log-profile.
Now that we have log-profile parsing done, let's prepare for runtime
log-profile handling by adding the necessary string buffer required to
re-build log strings using sess_build_logline() on the fly without
altering regular loglines content.
Indeed, since a different log-profile may (or may not) be specified for
each logger, we must keep the original string and only rebuild a custom
one when required for the current logger (according to the selected log-
profile).
This patch implements prerequisite log-profile struct and parser logic.
It has no effect during runtime for now.
Logformat expressions provided in log-profile "steps" are postchecked
during postparsing for each proxy "log" directive that makes use of a
given profile. (this allows to ensure that the logformat expressions
used in the profile are compatible with proxy using them)
Extract sd metadata assignment in __send_log() to make an inline helper
function out of it in order to be able to use it from other functions if
needed.
Prerequisite work for log-profiles, we need to know under which proxy
context the logger is being used. When the info is not available, (ie:
global section or log-forward section, <px> is set to NULL)
Since resolve_logger() always resolves logger target (even when error
occurs), we must take care of freeing parsing hints because free_logger()
won't try to do it if target RESOLVED flag is set on the target.
This isn't considered as a bug because resolve_logger(), being a
postparsing check, will make haproxy immediately exit upon fatal error
in haproxy.c, but it's better to ensure that everything will be properly
freed if we decide to perform a clean exit upon postparsing checks error
in the future.
It is no longer relevant to say that <logger> is used for implicit
settings. In fact the function resolves <logger>, but currently
mainly focuses on loggers's target. However we could extend the
function to perform additional work on the logger itself in the future.
let's adjust the comment to prevent any confusion.
'%OG' logformat alias may be used to report the log origin (when/where)
that triggered log generation using sess_build_logline().
Possible values are:
- "sess_error": log was generated during session error handling
- "sess_killed": log was generated during session abortion (killed
embryonic session)
- "txn_accept": log was generated right after frontend conn was accepted
- "txn_request": log was generated after client request was received
- "txn_connect": log was generated after backend connection establishment
- "txn_response": log was generated during server response handling
- "txn_close": log was generated at the final txn step, before closing
- "unspec": unknown or not specified
Documentation was updated.
This is another prerequisite work in preparation for log-profiles: in this
patch we make process_send_log() aware of the log origin, primarily aiming
for sess and txn logging steps such as error, accept, connect, close, as
well as relevant sess and stream pointers.
Move the embryonic session logging logic down to sess_log() in preparation
for log-profiles because then log preferences will be set per logger and
not per proxy. Indeed, as each logger may come with its own log-profile
that possibly overrides proxy logformat preferences, the check will need
to be performed at a central place by lower sending functions.
To ensure the change doesn't break existing behavior, a dedicated
sess_log_embryonic() wrapper was added and is exclusively used by
session_kill_embryonic() to indicate that a special logging logic must
be performed under sess_log().
Also, thanks to this change, log-format-sd will now be taken into account
for legacy embryonic session logging.
This commit looks messy, but all it does is reorganize send_log() helpers
by dependency order to remove the need of forward-declaring some of them.
Also, since they're all internal helpers, let's explicitly mark them as
static to prevent any misuse.
Since 7a21c3a4ef ("MAJOR: log: implement proper postparsing for logformat
expressions"), logformat expressions stored in a default section are not
postchecked anymore. This is because the REGISTER_POST_PROXY_CHECK() only
evaluates regular proxies. Because of this, proxy options which are
automatically enabled on the proxy depending on the logformat expression
features in use are not set on the default proxy, which means such options
are not passed to the regular proxies that inherit from it (proxies that
and will actually be running the logformat expression during runtime).
Because of that, a logformat expression stored inside a default section
and executed by a regular proxy may not behave properly. Also, since
03ca16f38b ("OPTIM: log: resolve logformat options during postparsing"),
it's even worse because logformat node options postresoving is also
skipped, which may also alter logformat expression encoding feature.
To fix the issue, let's add a special case for default proxies in
parse_logformat_string() and lf_expr_postcheck() so that default proxies
are postchecked on the fly during parsing time in a "relaxed" way as we
cannot assume that the features involved in the logformat expression won't
be compatible with the proxy actually running it since we may have
different types of proxies inheriting from the same default section.
This bug was discovered while trying to address GH #2597.
It should be backported to 3.0 with 7a21c3a4ef and 03ca16f38b.
logformat_node was referenced as "node" in the error message reported
to the user, but in fact it is referred to as "item" in user
documentation. Using "item" in the error message to better comply with
the doc.
Error message was introduced with 7a21c3a4ef ("MAJOR: log: implement
proper postparsing for logformat expressions")
Warning message to indicate that the "http-send-name-header" option is
ignored for backend in "mode log" was referenced using its internal
struct wording instead of public name (as seen in the documentation).
Let's fix that.
It may be backported with c7783fb ("MINOR: log/backend: prevent
"http-send-name-header" use with LOG mode") in 2.9.
In 2.9 we started to introduce an ambiguity in the documentation by
referring to historical log-format variables ('%var') as log-format
tags in 739c4e5b1e ("MINOR: sample: accept_date / request_date return
%Ts / %tr timestamp values") and 454c372b60 ("DOC: configuration: add
sample fetches for timing events").
In fact, we've had this confusion between log-format tag and log-format
var for more than 10 years now, but in 2.9 it was the first time the
confusion was exposed in the documentation.
Indeed, both 'log-format variable' and 'log-format tag' actually refer
to the same feature (that is: '%B' and friends that can be used for
direct access to some log-oriented predefined fetches instead of using
%[expr] with generic sample expressions).
This feature was first implemented in 723b73ad75 ("MINOR: config: Parse
the string of the log-format config keyword") and later documented in
4894040fa ("DOC: log-format documentation"). At that time, it was clear
that we used to name it 'log-format variable'.
But later the same year, 'log-format tag' naming started to appear in
some commit messages (while still referring to the same feature), for
instance with ffc3fcd6d ("MEDIUM: log: report SSL ciphers and version
in logs using logformat %sslc/%sslv").
Unfortunately in 2.9 when we added (and documented) new log-format
variables we officially started drifting to the misleading 'log-format
tag' naming (perhaps because it was the most recent naming found for
this feature in git log history, or because the confusion has always
been there)
Even worse, in 3.0 this confusion led us to rename all 'var' occurrences
to 'tag' in log-format related code to unify the code with the doc.
Hopefully William quickly noticed that we made a mistake there, but
instead of reverting to historical naming (log-format variable), it was
decided that we must use a different name that is less confusing than
'tags' or 'variables' (tags and variables are keywords that are already
used to designate other features in the code and that are not very
explicit under log-format context today).
Now we refer to '%B' and friends as a logformat alias, which is
essentially a handy way to print some log oriented information in the
log string instead of leveraging '%[expr]' with generic sample expressions
made of fetches and converters. Of course, there are some subtelties, such
as a few log-format aliases that still don't have sample fetch equivalent
for historical reasons, and some aliases that may be a little faster than
their generic sample expression equivalents because most aliases are
pretty much hardcoded in the log building function. But in general
logformat aliases should be simply considered as an alternative to using
expressions (with '%[expr']')
Also, under log-format context, when we want to refer to either an alias
('%alias') or an expression ('%[expr]'), we should use the generic term
'logformat item', which in fact designates a single item within the
logformat string provided by the user. Indeed, a logformat item (whether
is is an alias or an expression) always starts with '%' and may accept
optional flags / arguments
Both the code and the documentation were updated in that sense, hopefully
this will clarify things and prevent future confusions.
In c614fd3b9 ("MINOR: log: add +cbor encoding option"), I wrongly used
strnlen() without noticing that the function is not portable (requires
_POSIX_C_SOURCE >= 2008) and that it was the first occurrence in the
entire project. In fact it is not a hard requirement since it's a pretty
simple function. Thus to restore build compatibility with minimal/older
build systems, let's actually get rid of it and use an equivalent portable
code where needed (we cannot simply rely on strlen() because the string
might not be NULL terminated, we must take upstream len into account).
No backport needed (unless c614fd3b9 gets backported)
When a log directive is defined in the global section, each time we use
"log global" in a proxy section, the global log directives are duplicated
for the current proxy. This works by creating a new proxy logger struct
and duplicating every members for each global one.
However, smp_rgs logger member is a special pointer member that is
allocated when "range" is used on a log directive. Currently, we simply
copy the array pointer (from the global one), instead of creating our own
copy. Because of that, range log sampling may not work properly in some
situations prior to 3f1284560 ("MINOR: log: remove the unused curr_idx in
struct smp_log_range") when used in global log directives, for instance:
global
log 127.0.0.1:5114 format raw sample 1-2,3:4 local0 info # should receive 75% of all proxy logs
log 127.0.0.1:5115 format raw sample 4:4 local0 info # should receive 25% of all proxy logs
listen proxy1
log global
listen proxy2
log global
May not work as expected, because curr_idx was stored within smp_rgs array
member prior to 3f1284560, and due to this bug, it happens to be shared
between every log directive inherited from a "global" one. The result is
that curr_idx counter will not behave properly because the index will be
increased globally instead of per-log directive, and it could even suffer
from concurrent thread accesses under load since we don't own the global
log directive's lock when manipulating it.
Another issue that was revealed because of this bug is that the smp_rgs
array allocated during config parsing is never freed in free_logger(),
resulting in small memory leak during clean exit.
To fix these issues all at once, let's properly duplicate smp_rgs logger
struct member in dup_logger() like we already do for other special members
so that every log directive have its own sms_rgs copy, and then
systematically free it in free_logger().
While this bug affects all stable versions (including 2.4), it's probably
best to not backport this beyond 2.6 because of 211ea252d
("BUG/MINOR: logs: fix logsrv leaks on clean exit") prerequisite that
first appears in 2.6.
[ada: for versions prior to 2.9, 969e212
("MINOR: log: add dup_logsrv() helper function") and 76acde91
("BUG/MINOR: log: keep the ref in dup_logger()") must be backported
first.
Note: Some ctx adjustments should be performed because 'logger' struct
used to be named 'logsrv' in the past and 2.9 introduced logger target
struct member. Thus it's probably easier to manually apply 76acde91 and
the current bugfix by hand directly on top of 969e212.
]
If add_sample_to_logformat_list() fails to allocate new logformat_node,
then we directly jump to error_free label to cleanup the node using
free_logformat_node() before returning an error.
However if the node failed to allocate, then the sample expression that
was allocated just before (not yet assigned) isn't released
(free_logformat_node() is a no-op when NULL is provided). Thus if expr
wasn't assigned to the node during early failure, then it must be manually
released.
This bug was introduced by 2462e5bcc ("BUG/MINOR: log: fix potential
lf->name memory leak") which wasn't marked for backports. It only
affects 3.0.
As reported by Tristan in GH #2561, syslog messages sent over rings are
malformed since commit 01aa0a05 ("MEDIUM: ring: change the ring reader
to use the new vector-based API now").
Indeed, take a look at the following log message produced prior to
01aa0a05:
181 <134>1 2024-05-07T09:45:21.543263+02:00 - haproxy 113700 - - 127.0.0.1:56136 [07/May/2024:09:45:21.491] front front/s1 0/0/21/30/51 404 369 - - ---- 1/1/0/0/0 0/0 "GET / HTTP/1.1"
Starting with 01aa0a05, here's the equivalent log message:
<134>1 2024-05-07T09:45:21.543263+02:00 - haproxy 112729 - - 127.0.0.1:56136 [07/May/2024:09:45:21.491] front front/s1 0/0/66/39/105 404 369 - - ---- 1/1/0/0/0 0/0 "GET / HTTP/1.1"-fwr
-> Message is missing octet counting header, and garbage bytes are found
at the end of the payload.
This bug is caused by a small mistake in syslog_applet_append_event():
when the function was refactored to use vector API instead of buffer
API, we used 'trash.area' as starting pointer to write the event instead
of 'trash.area + trash.data', causing existing octet counting prefix
(already written in trash) to be overwritten and trash.data to be
wrongly incremented.
No backport needed (01aa0a05 was introduced during 3.0 development)
In lf_buildctx_prepare(), we perform costly bitwise operations for every
nodes to resolve node options and check for incompatibilities with global
options.
In fact, all this logic may safely be performed during postparsing. This
is what we're doing in this commit. Doing so saves us from unnecessary
runtime checks and could help speedup sess_build_logline().
Since checks are not as costly as before (due to them being performed
during postparsing and not on log building path anymore), an complementary
check for OPT_HTTP vs OPT_ENCODE incompatibity was added:
encoding is ignored if HTTP option is set, unless HTTP option wasn't
set globally and encoding was set globally, which means encoding
takes the precedence
Thanks to this patch, lf_buildctx_prepare() now only takes care of
assigning proper typecast and options settings depending if it's used
from global or per-node context, and prepares CBOR-specific structure
members when CBOR encode option is set.
In sess_build_logline(), we have multiple fieds such as '%t' that build
a fixed-length string out of a date struct and then print it using
lf_rawtext(). In fact, printing it using lf_rawtext() is only mandatory
to deal with encoding options, but when no encoding is used we can output
the result to tmplog directly. Since most dates generate between 25 and 30
chars, doing so spares us from writing them twice and could help make
sess_build_logline() a bit faster when no encoding is used. (to match with
pre-encoding patch series performance).
Now that lf_buildctx isn't pushed on the stack anymore, let's take this
opportunity to store a small buffer of 256 bytes within it, and then use
this buffer as general purpose buffer to build fixed-length strings that
are then printed using lf_{raw}text() function. By doing so we stop
relying on temporary stack buffers.
Following previous commit's logic, let's move lf_buildctx ctx away from
sess_build_logline() to stop abusing from the stack to push large
structure each time sess_build_logline() is called. Also, don't memset
the structure for each invokation, but only reset members explicitly when
required.
For that we now declare one static lf_buildctx per thread (using
THREAD_LOCAL) and make sess_build_logline() refer to it using a pointer.
'empty' buffer used in sess_build_logline() inside a loop, and since it
is only being read from and not modified, until recently it ended up being
cached most of the time and didn't cause overhead due to systematic push
on the stack.
However, due recent encoding work and new added variables on the stack,
we're starting to reach a stack limit and declaring 'empty' buffer within
the loop seems to cause non-negligible CPU overhead.
Since the variable isn't modified during log generation, let's declare
'empty' buffer as a global variable outside from sess_build_logline()
to prevent pushing it on the stack for each node evaluation.
Christian reported in GH #2556 that since 3.0-dev double spaces may be
found in log messages on some cases where it was not the case before.
As we were able to easily reproduce, a quick bisect led us to c6a7138
("MINOR: log: simplify last_isspace in sess_build_logline()"). While
it is true that all switch cases set the last_isspace variable to 0,
there was a subtelty for some fields such as '%hr', '%hrl', '%hs' or
'%hsl' and I overlooked it. Indeed, for '%hr', last_isspace was only set
to 0 if data was emitted, else the assignment didn't occur.
But with c6a7138, last_isspace is always set to 0 as long as the current
node type is not a separator. Because of that, if no data is emitted for
the current node value, and a space was already emitted prior to the
current node, then an extra space could be emitted after the node,
resulting in two spaces being emitted.
Note that while c6a7138 introduces a slight behavior regression regarding
last_isspace logic with the specific fields mentionned above, this
behavior could already be triggered with a failing or empty logformat
node sample expression. Consider this logformat expression:
log-format "%{-M}o | %[str()] |"
str() will not print anything, and since we disabled mandatory option with
'-M', nothing gets printed for the node sample expression. As a result, we
have the following output:
"| |"
Instead of (when mandatory option is enabled):
"| - |"
Thus in order to stick to the historical behavior, systematically set
last_isspace to 0 for EXPR nodes, and only set last_isspace to 0 when
data was written for TAG nodes. This way, '%hr', '%hrl', '%hs' or
'%hsl' should behave as before.
No backport needed.
Instead of chaining 2 switchcases and performing encoding checks for all
nodes let's actually split the logic in 2: first handle simple node types
(text/separator), and then handle dynamic node types (tag, expr). Encoding
options are only evaluated for dynamic node types.
Also, last_isspace is always set to 0 after next_fmt label, since next_fmt
label is only used for dynamic nodes, thus != LOG_FMT_SEPARATOR.
Since LF_NODE_WITH_OPT() macro (which was introduced recently) is now
unused, let's get rid of it.
No functional change should be expected.
(Use diff -w to check patch changes since reindentation makes the patch
look heavy, but in fact it remains fairly small)
last_change was a member present in both proxy and server struct. It is
used as an age statistics to report the last update of the object.
Move last_change into fe_counters/be_counters. This is necessary to be
able to manipulate it through generic stat column and report it into
stats-file.
Note that there is a change for proxy structure with now 2 different
last_change values, on frontend and backend side. Special care was taken
to ensure that the value is initialized only on the proxy side. The
other value is set to 0 unless a listen proxy is instantiated. For the
moment, only backend counter is reported in stats. However, with now two
distinct values, stats could be extended to report it on both side.
In 3f2e8d0ed ("MEDIUM: log: lf_* build helpers now take a ctx argument")
I made a mistake, because starting with this commit it is no longer
possible from a node to disable global logformat options.
The result is that when an option is set globally, it cannot be disabled
anymore.
For instance, it is not possible to do this anymore:
log-format "%{+X}o %{-X}Ts"
The original intent was to prevent encoding options from being
disabled once enabled globally, because when encoding is enabled globally
we start the object enumeration right away (ie: in CBOR and JSON we
announce dynamic map, and for each node we announce the key..), thus it
doesn't make sense to mix encoding types there, unless encoding is only
used per-node, in which case only the value gets encoded, thus it remains
possible to print a value in JSON/CBOR-compatible format while the next
one shouldn't be printed as-is.
Thus, to restore the original behavior, slightly change the logic in
lf_buildctx_prepare() so that only global encoding options take the
precedence over node's options (instead of all options).
No backport needed.
The BUG_ON() statement that was added in 9bdea51 ("MINOR: log/cbor:
_lf_cbor_encode_byte() explicitly requires non-NULL ctx") isn't
sufficient as Coverity still thinks the lf_buildctx itself may be NULL
as shown in GH #2554. In fact the original reports complains about the
lf_buildctx itself and I didn't understand it properly, let's add another
check in the BUG_ON() to ensure both cbor_ctx and cbor_ctx->ctx are not
NULL since it is not expected if used properly.
In 98b44e8 ("BUG/MINOR: log: fix global lf_expr node options behavior"),
I properly restored global node options behavior for when encoding is
not used, however the fix is not optimal when encoding is involved:
Indeed, encoding logic in sess_build_logline() relies on global node
options to know if encoding must be handled expression-wide or
individually. However, because of the above fix, if an expression is
made of 1 or multiple nodes that all set an encoding option manually
(without '%o'), we consider that the option was set globally, but
that's probably not what the user intended. Instead we should only
evaluate global options from '%o', so that it remains possible to
skip global encoding when needed.
No backport needed.
When CBOR encoding was added in c614fd3b9 ("MINOR: log: add +cbor encoding
option"), in LOGCHAR(), we forgot to check that we don't assign the NULL
value to tmplog (as we assume that tmplog cannot be NULL at the end of
sess_build_logline())
No backport needed.
In sess_build_logline(), contrary to what's stated in the comment
"only consider global ctx for key encoding", we check for
LOG_OPT_ENCODE flag on the current ctx options instead of global
ones. Because of this, we could end up doing the wrong thing if the
previous node had encoding enabled but it isn't set globally for
instance.
To fix the issue, let's simply check the presence of the flag on
g_options before entering the "key encoding" block.
This bug was introduced with 3f7c8387 ("MINOR: log: add +json encoding
option"), no backport needed.
LF_NODE_WITH_OPT(node) returns true if the node's option may be set and
thus should be considered. Logic is based on logformat node's type:
for now only TAG and FMT nodes can be configured.
In 507223d5 ("MINOR: log: global lf_expr node options"), a mistake was
made because it was assumed that only the last occurence of %o
(LOG_FMT_GLOBAL) should be kept as global node options.
However, although not documented, it is possible to have multiple %o
within a single logformat expression to change the global settings on the
fly.
For instance, consider this example:
log-format "%{+X}o test1=%ms %{-X}o test2=%ms %{+X}o test3=%ms"
Prior to 3f2e8d0ed ("MEDIUM: log: lf_* build helpers now take a ctx
argument"), this would output something like this:
test1=18B test2=395 test3=18B
This is because global options is properly updated as the lf_expr string
is parsed. But now due to 507223d5 and 3f2e8d0ed, only the last %o
occurence is considered. With the above example, this gives:
test1=18B test2=18B test3=18B
To restore historical behavior, let's partially revert 507223d5: to
compute global node options, we now start with all options enabled and
then for each configurable node in lf_expr_postcheck(), we keep options
common to the current node and previous nodes using AND masking, this way
we really end up with options common to all nodes.
No backport needed.
As shown in GH #2550, Coverity is tempted to think that NULL-dereference
can occur in _lf_cbor_encode_byte() due to user-ctx being dereferenced
from cbor_ctx, while coverity thinks that cbor_ctx may be NULL.
In practise this cannot happen, because _lf_cbor_encode_byte() is
only leveraged through a function pointer that is set in conjunction with
the function pointer ctx (which ain't NULL). All this logic is done inside
lf_buildctx_prepare() when LOG_OPT_ENCODE_CBOR is set.
Since coverity doesn't seem to understand the logic properly, then it
might as well confuse humans, so let's make it clear in
_lf_cbor_encode_byte() that we expect non-NULL ctx by adding a BUG_ON()
Rename e_byte_fct to e_fct_byte and e_fct_byte_ctx to e_fct_ctx, and
adjust some comments to make it clear that e_fct_ctx is here to provide
additional user-ctx to the custom cbor encode function pointers.
For now, only e_fct_byte function may be provided, but we could imagine
having e_fct_int{16,32,64}() one day to speed up the encoding when we
know we can encode multiple bytes at a time, but for now it's not worth
the hassle.
CBOR in hex format as implemented in previous commit is convenient because
the produced output is portable and can easily be embedded in regular
syslog payloads.
However, one of the goal of CBOR implementation is to be able to produce
"Concise Binary" object representation. Here is an excerpt from cbor.io
website:
"Some applications also benefit from CBOR itself being encoded in
binary. This saves bulk and allows faster processing."
Currently we don't offer that with '+cbor', quite the opposite actually
since a text string encoded with '+cbor' option will be larger than a
text string encoded with '+json' or without encoding at all, because for
each CBOR binary byte, 2 characters will be emitted.
Hopefully, the sink/log API allows for binary data to be passed as
parameter, this is because all relevant functions in the chain don't rely
on the terminating NULL byte and take a string pointer + string length as
parameter. We can actually rely on this property to support the '+bin'
option when combined with '+cbor' to produce RAW binary CBOR output.
Be careful though, as this is only intended for use with set-var-fmt or to
send binary data to capable UDP/ring endpoints.
Example:
log-format "%{+cbor,+bin}o %(test)[bin(00AABB)]"
Will produce:
bf64746573745f4300aabbffff
(output was piped to `hexdump -ve '1/1 "%.2x"'` to dump raw bytes as HEX
characters)
With cbor.me pretty printer, it gives us:
BF # map(*)
64 # text(4)
74657374 # "test"
5F # bytes(*)
43 # bytes(3)
00AABB # "\u0000\xAA\xBB"
FF # primitive(*)
FF # primitive(*)
In this patch, we make use of the CBOR (RFC8949) encode helper functions
from the previous commit to implement '+cbor' encoding option for log-
formats. The logic behind it is pretty similar to '+json' encoding option,
except that the produced output is a CBOR payload written in HEX format so
that it remains compatible to use this with regular syslog endpoints.
Example:
log-format "%{+cbor}o %[int(4)] test %(named_field)[str(ok)]"
Will produce:
BF6B6E616D65645F6669656C64626F6BFF
Detailed view (from cbor.me):
BF # map(*)
6B # text(11)
6E616D65645F6669656C64 # "named_field"
62 # text(2)
6F6B # "ok"
FF # primitive(*)
If the option isn't set globally, but on a specific node instead, then
only the value will be encoded according to CBOR specification.
Example:
log-format "test cbor bool: %{+cbor}[bool(true)]"
Will produce:
test cbor bool: F5
In this patch, we add the "+json" log format option that can be set
globally or per log format node.
What it does, it that it sets the LOG_OPT_ENCODE_JSON flag for the
current context which is provided to all lf_* log building function.
This way, all lf_* are now aware of this option and try to comply with
JSON specification when the option is set.
If the option is set globally, then sess_build_logline() will produce a
map-like object with key=val pairs for named logformat nodes.
(logformat nodes that don't have a name are simply ignored).
Example:
log-format "%{+json}o %[int(4)] test %(named_field)[str(ok)]"
Will produce:
{"named_field": "ok"}
If the option isn't set globally, but on a specific node instead, then
only the value will be encoded according to JSON specification.
Example:
log-format "{ \"manual_key\": %(named_field){+json}[bool(true)] }"
Will produce:
{"manual_key": true}
When the option is set, +E option will be ignored, and partial numerical
values (ie: because of logasap) will be encoded as-is.
Support '+bin' option argument on logformat nodes to try to preserve
binary output type with binary sample expressions.
For this, we rely on the log/sink API which is capable of conveying binary
data since all related functions don't search for a terminating NULL byte
in provided log payload as they take a string pointer and a string length
as argument.
Example:
log-format "%{+bin}o %[bin(00AABB)]"
Will produce:
00aabb
(output was piped to `hexdump -ve '1/1 "%.2x"'` to dump raw bytes as HEX
characters)
This should be used carefully, because many syslog endpoints don't expect
binary data (especially NULL bytes). This is mainly intended for use with
set-var-fmt actions or with ring/udp log endpoints that know how to deal
with such binary payloads.
Also, this option is only supported globally (for use with '%o'), it will
not have any effect when set on an individual node. (it makes no sense to
have binary data in the middle of log payload that was started without
binary data option)
Providing no_escape_map as <map> argument to _lf_encode_bytes() function
will make the function skip escaping since the map is empty.
This is for convenience, as it might be useful to call lf_encode_chunk()
to encoding binary data without escaping it.
In sess_build_logline(), for sample expression nodes, instead of directly
calling sample_fetch_as_type(... SMP_T_STR), let's first process the
sample using sample_process(), and then proceed with the conversion to
str if required.
Doing so will allow us to implement type casting and preserving logic.
Add internal lf_buildctx struct that is only used inside
sess_build_logline() scope and is passed to lf_* log building helpers
to expose current building context. For now, node options and the in_text
counter are stored in the ctx struct. Thanks to this change, lf_* building
functions don't depend on a logformat_node struct pointer, and may be used
in a standalone manner as long as a build context is provided.
Also, global options are now handled explictly in sess_build_logline() to
make sure that global options are always considered even if they were not
duplicated on every nodes.
No functional change should be expected.
lf_encode_string() and lf_encode_chunk() function are pretty similar. The
only difference is the stopping behavior, encode_chunk stops at a given
position while encode_string stops when encountering '\0'. Moreover,
both functions leverage tools.c encode helpers, but because of the
LOG_OPT_ESC option, they reimplement those helpers with added logic.
Instead of having to deal with code duplication which makes both functions
harder to maintain, let's define a _lf_encode_bytes() helper function
which satisfies lf_encode_string() and lf_encode_chunk() needs while
keeping the function as simple as possible.
_lf_encode_bytes() itself is made of multiple static inline helper
functions, in the attempt to keep checks outside of core loop for
better performance.
There is no need to expose such functions since they are only involved in
the log building process that occurs inside sess_build_logline().
Making functions static and removing their public prototype to ease code
maintenance.
Rename LOGQUOTE_{START,END} macros to more generic LOG_VARTEXT_{START,END}
in order to prepare for new encoding types that rely on specific treatment
for variable-length texts. No functional change should be expected.