Commit Graph

555 Commits

Author SHA1 Message Date
Aurelien DARRAGON
e84c8dee1a BUILD: log: get rid of non-portable strnlen() func
In c614fd3b9 ("MINOR: log: add +cbor encoding option"), I wrongly used
strnlen() without noticing that the function is not portable (requires
_POSIX_C_SOURCE >= 2008) and that it was the first occurrence in the
entire project. In fact it is not a hard requirement since it's a pretty
simple function. Thus to restore build compatibility with minimal/older
build systems, let's actually get rid of it and use an equivalent portable
code where needed (we cannot simply rely on strlen() because the string
might not be NULL terminated, we must take upstream len into account).

No backport needed (unless c614fd3b9 gets backported)
2024-05-17 15:24:53 +02:00
Aurelien DARRAGON
32f0cd3242 BUG/MINOR: log: smp_rgs array issues with inherited global log directives
When a log directive is defined in the global section, each time we use
"log global" in a proxy section, the global log directives are duplicated
for the current proxy. This works by creating a new proxy logger struct
and duplicating every members for each global one.

However, smp_rgs logger member is a special pointer member that is
allocated when "range" is used on a log directive. Currently, we simply
copy the array pointer (from the global one), instead of creating our own
copy. Because of that, range log sampling may not work properly in some
situations prior to 3f1284560 ("MINOR: log: remove the unused curr_idx in
struct smp_log_range") when used in global log directives, for instance:

  global
    log 127.0.0.1:5114 format raw sample 1-2,3:4 local0 info # should receive 75% of all proxy logs
    log 127.0.0.1:5115 format raw sample 4:4 local0 info     # should receive 25% of all proxy logs

  listen proxy1
    log global

  listen proxy2
    log global

May not work as expected, because curr_idx was stored within smp_rgs array
member prior to 3f1284560, and due to this bug, it happens to be shared
between every log directive inherited from a "global" one. The result is
that curr_idx counter will not behave properly because the index will be
increased globally instead of per-log directive, and it could even suffer
from concurrent thread accesses under load since we don't own the global
log directive's lock when manipulating it.

Another issue that was revealed because of this bug is that the smp_rgs
array allocated during config parsing is never freed in free_logger(),
resulting in small memory leak during clean exit.

To fix these issues all at once, let's properly duplicate smp_rgs logger
struct member in dup_logger() like we already do for other special members
so that every log directive have its own sms_rgs copy, and then
systematically free it in free_logger().

While this bug affects all stable versions (including 2.4), it's probably
best to not backport this beyond 2.6 because of 211ea252d
("BUG/MINOR: logs: fix logsrv leaks on clean exit") prerequisite that
first appears in 2.6.

[ada: for versions prior to 2.9, 969e212
 ("MINOR: log: add dup_logsrv() helper function") and 76acde91
 ("BUG/MINOR: log: keep the ref in dup_logger()") must be backported
 first.
 Note: Some ctx adjustments should be performed because 'logger' struct
 used to be named 'logsrv' in the past and 2.9 introduced logger target
 struct member. Thus it's probably easier to manually apply 76acde91 and
 the current bugfix by hand directly on top of 969e212.
]
2024-05-14 12:00:23 +02:00
Aurelien DARRAGON
9d4a44e713 BUG/MINOR: log: fix leak in add_sample_to_logformat_list() error path
If add_sample_to_logformat_list() fails to allocate new logformat_node,
then we directly jump to error_free label to cleanup the node using
free_logformat_node() before returning an error.

However if the node failed to allocate, then the sample expression that
was allocated just before (not yet assigned) isn't released
(free_logformat_node() is a no-op when NULL is provided). Thus if expr
wasn't assigned to the node during early failure, then it must be manually
released.

This bug was introduced by 2462e5bcc ("BUG/MINOR: log: fix potential
lf->name memory leak") which wasn't marked for backports. It only
affects 3.0.
2024-05-13 16:44:27 +02:00
Aurelien DARRAGON
fbbc2925d4 BUG/MEDIUM: log/ring: broken syslog octet counting
As reported by Tristan in GH #2561, syslog messages sent over rings are
malformed since commit 01aa0a05 ("MEDIUM: ring: change the ring reader
to use the new vector-based API now").

Indeed, take a look at the following log message produced prior to
01aa0a05:

  181 <134>1 2024-05-07T09:45:21.543263+02:00 - haproxy 113700 - - 127.0.0.1:56136 [07/May/2024:09:45:21.491] front front/s1 0/0/21/30/51 404 369 - - ---- 1/1/0/0/0 0/0   "GET / HTTP/1.1"

Starting with 01aa0a05, here's the equivalent log message:

  <134>1 2024-05-07T09:45:21.543263+02:00 - haproxy 112729 - - 127.0.0.1:56136 [07/May/2024:09:45:21.491] front front/s1 0/0/66/39/105 404 369 - - ---- 1/1/0/0/0 0/0   "GET / HTTP/1.1"-fwr

-> Message is missing octet counting header, and garbage bytes are found
at the end of the payload.

This bug is caused by a small mistake in syslog_applet_append_event():
when the function was refactored to use vector API instead of buffer
API, we used 'trash.area' as starting pointer to write the event instead
of 'trash.area + trash.data', causing existing octet counting prefix
(already written in trash) to be overwritten and trash.data to be
wrongly incremented.

No backport needed (01aa0a05 was introduced during 3.0 development)
2024-05-07 19:23:01 +02:00
Aurelien DARRAGON
03ca16f38b OPTIM: log: resolve logformat options during postparsing
In lf_buildctx_prepare(), we perform costly bitwise operations for every
nodes to resolve node options and check for incompatibilities with global
options.

In fact, all this logic may safely be performed during postparsing. This
is what we're doing in this commit. Doing so saves us from unnecessary
runtime checks and could help speedup sess_build_logline().

Since checks are not as costly as before (due to them being performed
during postparsing and not on log building path anymore), an complementary
check for OPT_HTTP vs OPT_ENCODE incompatibity was added:

  encoding is ignored if HTTP option is set, unless HTTP option wasn't
  set globally and encoding was set globally, which means encoding
  takes the precedence

Thanks to this patch, lf_buildctx_prepare() now only takes care of
assigning proper typecast and options settings depending if it's used
from global or per-node context, and prepares CBOR-specific structure
members when CBOR encode option is set.
2024-05-06 11:13:46 +02:00
Aurelien DARRAGON
d26a160133 OPTIM: log: speedup date printing in sess_build_logline() when no encoding is used
In sess_build_logline(), we have multiple fieds such as '%t' that build
a fixed-length string out of a date struct and then print it using
lf_rawtext(). In fact, printing it using lf_rawtext() is only mandatory
to deal with encoding options, but when no encoding is used we can output
the result to tmplog directly. Since most dates generate between 25 and 30
chars, doing so spares us from writing them twice and could help make
sess_build_logline() a bit faster when no encoding is used. (to match with
pre-encoding patch series performance).
2024-05-04 10:13:05 +02:00
Aurelien DARRAGON
bf3b4001ce OPTIM: log: use lf_buildctx's buffer instead of temporary stack buffers
Now that lf_buildctx isn't pushed on the stack anymore, let's take this
opportunity to store a small buffer of 256 bytes within it, and then use
this buffer as general purpose buffer to build fixed-length strings that
are then printed using lf_{raw}text() function. By doing so we stop
relying on temporary stack buffers.
2024-05-04 10:13:05 +02:00
Aurelien DARRAGON
ccc4341258 OPTIM: log: use thread local lf_buildctx to stop pushing it on the stack
Following previous commit's logic, let's move lf_buildctx ctx away from
sess_build_logline() to stop abusing from the stack to push large
structure each time sess_build_logline() is called. Also, don't memset
the structure for each invokation, but only reset members explicitly when
required.

For that we now declare one static lf_buildctx per thread (using
THREAD_LOCAL) and make sess_build_logline() refer to it using a pointer.
2024-05-04 10:13:05 +02:00
Aurelien DARRAGON
728b5aa835 OPTIM: log: declare empty buffer as global variable
'empty' buffer used in sess_build_logline() inside a loop, and since it
is only being read from and not modified, until recently it ended up being
cached most of the time and didn't cause overhead due to systematic push
on the stack.

However, due recent encoding work and new added variables on the stack,
we're starting to reach a stack limit and declaring 'empty' buffer within
the loop seems to cause non-negligible CPU overhead.

Since the variable isn't modified during log generation, let's declare
'empty' buffer as a global variable outside from sess_build_logline()
to prevent pushing it on the stack for each node evaluation.
2024-05-04 10:13:05 +02:00
Aurelien DARRAGON
cc2e94a948 BUG/MINOR: log: prevent double spaces emission in sess_build_logline()
Christian reported in GH #2556 that since 3.0-dev double spaces may be
found in log messages on some cases where it was not the case before.

As we were able to easily reproduce, a quick bisect led us to c6a7138
("MINOR: log: simplify last_isspace in sess_build_logline()"). While
it is true that all switch cases set the last_isspace variable to 0,
there was a subtelty for some fields such as '%hr', '%hrl', '%hs' or
'%hsl' and I overlooked it. Indeed, for '%hr', last_isspace was only set
to 0 if data was emitted, else the assignment didn't occur.

But with c6a7138, last_isspace is always set to 0 as long as the current
node type is not a separator. Because of that, if no data is emitted for
the current node value, and a space was already emitted prior to the
current node, then an extra space could be emitted after the node,
resulting in two spaces being emitted.

Note that while c6a7138 introduces a slight behavior regression regarding
last_isspace logic with the specific fields mentionned above, this
behavior could already be triggered with a failing or empty logformat
node sample expression. Consider this logformat expression:

  log-format "%{-M}o | %[str()] |"

str() will not print anything, and since we disabled mandatory option with
'-M', nothing gets printed for the node sample expression. As a result, we
have the following output:

  "|  |"

Instead of (when mandatory option is enabled):

  "| - |"

Thus in order to stick to the historical behavior, systematically set
last_isspace to 0 for EXPR nodes, and only set last_isspace to 0 when
data was written for TAG nodes. This way, '%hr', '%hrl', '%hs' or
'%hsl' should behave as before.

No backport needed.
2024-05-03 16:48:21 +02:00
Aurelien DARRAGON
48e0efb00b MEDIUM: log: optimizing tmp->type handling in sess_build_logline()
Instead of chaining 2 switchcases and performing encoding checks for all
nodes let's actually split the logic in 2: first handle simple node types
(text/separator), and then handle dynamic node types (tag, expr). Encoding
options are only evaluated for dynamic node types.

Also, last_isspace is always set to 0 after next_fmt label, since next_fmt
label is only used for dynamic nodes, thus != LOG_FMT_SEPARATOR.

Since LF_NODE_WITH_OPT() macro (which was introduced recently) is now
unused, let's get rid of it.

No functional change should be expected.

(Use diff -w to check patch changes since reindentation makes the patch
look heavy, but in fact it remains fairly small)
2024-05-03 16:48:21 +02:00
Amaury Denoyelle
634cc2a5d8 MINOR: counters: move last_change into counters struct
last_change was a member present in both proxy and server struct. It is
used as an age statistics to report the last update of the object.

Move last_change into fe_counters/be_counters. This is necessary to be
able to manipulate it through generic stat column and report it into
stats-file.

Note that there is a change for proxy structure with now 2 different
last_change values, on frontend and backend side. Special care was taken
to ensure that the value is initialized only on the proxy side. The
other value is set to 0 unless a listen proxy is instantiated. For the
moment, only backend counter is reported in stats. However, with now two
distinct values, stats could be extended to report it on both side.
2024-05-02 10:55:25 +02:00
Aurelien DARRAGON
12d08cf912 BUG/MEDIUM: log: don't ignore disabled node's options
In 3f2e8d0ed ("MEDIUM: log: lf_* build helpers now take a ctx argument")
I made a mistake, because starting with this commit it is no longer
possible from a node to disable global logformat options.
The result is that when an option is set globally, it cannot be disabled
anymore.

For instance, it is not possible to do this anymore:
  log-format "%{+X}o %{-X}Ts"

The original intent was to prevent encoding options from being
disabled once enabled globally, because when encoding is enabled globally
we start the object enumeration right away (ie: in CBOR and JSON we
announce dynamic map, and for each node we announce the key..), thus it
doesn't make sense to mix encoding types there, unless encoding is only
used per-node, in which case only the value gets encoded, thus it remains
possible to print a value in JSON/CBOR-compatible format while the next
one shouldn't be printed as-is.

Thus, to restore the original behavior, slightly change the logic in
lf_buildctx_prepare() so that only global encoding options take the
precedence over node's options (instead of all options).

No backport needed.
2024-04-30 18:45:07 +02:00
Aurelien DARRAGON
41d7e82e0f MINOR: log/cbor: _lf_cbor_encode_byte() explicitly requires non-NULL ctx (again)
The BUG_ON() statement that was added in 9bdea51 ("MINOR: log/cbor:
_lf_cbor_encode_byte() explicitly requires non-NULL ctx") isn't
sufficient as Coverity still thinks the lf_buildctx itself may be NULL
as shown in GH #2554. In fact the original reports complains about the
lf_buildctx itself and I didn't understand it properly, let's add another
check in the BUG_ON() to ensure both cbor_ctx and cbor_ctx->ctx are not
NULL since it is not expected if used properly.
2024-04-30 10:10:35 +02:00
Aurelien DARRAGON
9931a62c3f BUG/MINOR: log: fix global lf_expr node options behavior (2nd try)
In 98b44e8 ("BUG/MINOR: log: fix global lf_expr node options behavior"),
I properly restored global node options behavior for when encoding is
not used, however the fix is not optimal when encoding is involved:

Indeed, encoding logic in sess_build_logline() relies on global node
options to know if encoding must be handled expression-wide or
individually. However, because of the above fix, if an expression is
made of 1 or multiple nodes that all set an encoding option manually
(without '%o'), we consider that the option was set globally, but
that's probably not what the user intended. Instead we should only
evaluate global options from '%o', so that it remains possible to
skip global encoding when needed.

No backport needed.
2024-04-30 10:10:35 +02:00
Aurelien DARRAGON
97240d01b3 BUG/MINOR: log/encode: fix potential NULL-dereference in LOGCHAR()
When CBOR encoding was added in c614fd3b9 ("MINOR: log: add +cbor encoding
option"), in LOGCHAR(), we forgot to check that we don't assign the NULL
value to tmplog (as we assume that tmplog cannot be NULL at the end of
sess_build_logline())

No backport needed.
2024-04-30 10:10:35 +02:00
Aurelien DARRAGON
949ac95aa6 BUG/MINOR: log/encode: consider global options for key encoding
In sess_build_logline(), contrary to what's stated in the comment
"only consider global ctx for key encoding", we check for
LOG_OPT_ENCODE flag on the current ctx options instead of global
ones. Because of this, we could end up doing the wrong thing if the
previous node had encoding enabled but it isn't set globally for
instance.

To fix the issue, let's simply check the presence of the flag on
g_options before entering the "key encoding" block.

This bug was introduced with 3f7c8387 ("MINOR: log: add +json encoding
option"), no backport needed.
2024-04-30 10:10:35 +02:00
Aurelien DARRAGON
9bdce67585 CLEANUP: log: add a macro to know if a lf_node is configurable
LF_NODE_WITH_OPT(node) returns true if the node's option may be set and
thus should be considered. Logic is based on logformat node's type:
for now only TAG and FMT nodes can be configured.
2024-04-29 14:47:37 +02:00
Aurelien DARRAGON
98b44e8edb BUG/MINOR: log: fix global lf_expr node options behavior
In 507223d5 ("MINOR: log: global lf_expr node options"), a mistake was
made because it was assumed that only the last occurence of %o
(LOG_FMT_GLOBAL) should be kept as global node options.

However, although not documented, it is possible to have multiple %o
within a single logformat expression to change the global settings on the
fly.

For instance, consider this example:

  log-format "%{+X}o test1=%ms %{-X}o test2=%ms %{+X}o test3=%ms"

Prior to 3f2e8d0ed ("MEDIUM: log: lf_* build helpers now take a ctx
argument"), this would output something like this:

  test1=18B test2=395 test3=18B

This is because global options is properly updated as the lf_expr string
is parsed. But now due to 507223d5 and 3f2e8d0ed, only the last %o
occurence is considered. With the above example, this gives:

  test1=18B test2=18B test3=18B

To restore historical behavior, let's partially revert 507223d5: to
compute global node options, we now start with all options enabled and
then for each configurable node in lf_expr_postcheck(), we keep options
common to the current node and previous nodes using AND masking, this way
we really end up with options common to all nodes.

No backport needed.
2024-04-29 14:47:37 +02:00
Aurelien DARRAGON
9bdea51d7e MINOR: log/cbor: _lf_cbor_encode_byte() explicitly requires non-NULL ctx
As shown in GH #2550, Coverity is tempted to think that NULL-dereference
can occur in _lf_cbor_encode_byte() due to user-ctx being dereferenced
from cbor_ctx, while coverity thinks that cbor_ctx may be NULL.

In practise this cannot happen, because _lf_cbor_encode_byte() is
only leveraged through a function pointer that is set in conjunction with
the function pointer ctx (which ain't NULL). All this logic is done inside
lf_buildctx_prepare() when LOG_OPT_ENCODE_CBOR is set.

Since coverity doesn't seem to understand the logic properly, then it
might as well confuse humans, so let's make it clear in
_lf_cbor_encode_byte() that we expect non-NULL ctx by adding a BUG_ON()
2024-04-29 14:47:37 +02:00
Aurelien DARRAGON
0e2aea8224 CLEANUP: tools/cbor: rename cbor_encode_ctx struct members
Rename e_byte_fct to e_fct_byte and e_fct_byte_ctx to e_fct_ctx, and
adjust some comments to make it clear that e_fct_ctx is here to provide
additional user-ctx to the custom cbor encode function pointers.

For now, only e_fct_byte function may be provided, but we could imagine
having e_fct_int{16,32,64}() one day to speed up the encoding when we
know we can encode multiple bytes at a time, but for now it's not worth
the hassle.
2024-04-29 14:47:37 +02:00
Aurelien DARRAGON
c33b857df9 MINOR: log: support true cbor binary encoding
CBOR in hex format as implemented in previous commit is convenient because
the produced output is portable and can easily be embedded in regular
syslog payloads.

However, one of the goal of CBOR implementation is to be able to produce
"Concise Binary" object representation. Here is an excerpt from cbor.io
website:

  "Some applications also benefit from CBOR itself being encoded in
   binary. This saves bulk and allows faster processing."

Currently we don't offer that with '+cbor', quite the opposite actually
since a text string encoded with '+cbor' option will be larger than a
text string encoded with '+json' or without encoding at all, because for
each CBOR binary byte, 2 characters will be emitted.

Hopefully, the sink/log API allows for binary data to be passed as
parameter, this is because all relevant functions in the chain don't rely
on the terminating NULL byte and take a string pointer + string length as
parameter. We can actually rely on this property to support the '+bin'
option when combined with '+cbor' to produce RAW binary CBOR output.
Be careful though, as this is only intended for use with set-var-fmt or to
send binary data to capable UDP/ring endpoints.

Example:
  log-format "%{+cbor,+bin}o %(test)[bin(00AABB)]"

Will produce:
  bf64746573745f4300aabbffff

(output was piped to `hexdump  -ve '1/1 "%.2x"'` to dump raw bytes as HEX
characters)

With cbor.me pretty printer, it gives us:
  BF              # map(*)
     64           # text(4)
        74657374  # "test"
     5F           # bytes(*)
        43        # bytes(3)
           00AABB # "\u0000\xAA\xBB"
        FF        # primitive(*)
     FF           # primitive(*)
2024-04-26 18:39:32 +02:00
Aurelien DARRAGON
c614fd3b9f MINOR: log: add +cbor encoding option
In this patch, we make use of the CBOR (RFC8949) encode helper functions
from the previous commit to implement '+cbor' encoding option for log-
formats. The logic behind it is pretty similar to '+json' encoding option,
except that the produced output is a CBOR payload written in HEX format so
that it remains compatible to use this with regular syslog endpoints.

Example:
  log-format "%{+cbor}o %[int(4)] test %(named_field)[str(ok)]"

Will produce:
  BF6B6E616D65645F6669656C64626F6BFF

  Detailed view (from cbor.me):
    BF                           # map(*)
       6B                        # text(11)
          6E616D65645F6669656C64 # "named_field"
       62                        # text(2)
          6F6B                   # "ok"
       FF                        # primitive(*)

If the option isn't set globally, but on a specific node instead, then
only the value will be encoded according to CBOR specification.

Example:
  log-format "test cbor bool: %{+cbor}[bool(true)]"

Will produce:
  test cbor bool: F5
2024-04-26 18:39:32 +02:00
Aurelien DARRAGON
3f7c8387c0 MINOR: log: add +json encoding option
In this patch, we add the "+json" log format option that can be set
globally or per log format node.

What it does, it that it sets the LOG_OPT_ENCODE_JSON flag for the
current context which is provided to all lf_* log building function.

This way, all lf_* are now aware of this option and try to comply with
JSON specification when the option is set.

If the option is set globally, then sess_build_logline() will produce a
map-like object with key=val pairs for named logformat nodes.
(logformat nodes that don't have a name are simply ignored).

Example:
  log-format "%{+json}o %[int(4)] test %(named_field)[str(ok)]"

Will produce:
  {"named_field": "ok"}

If the option isn't set globally, but on a specific node instead, then
only the value will be encoded according to JSON specification.

Example:
  log-format "{ \"manual_key\": %(named_field){+json}[bool(true)] }"

Will produce:
  {"manual_key": true}

When the option is set, +E option will be ignored, and partial numerical
values (ie: because of logasap) will be encoded as-is.
2024-04-26 18:39:32 +02:00
Aurelien DARRAGON
b7c3d8c87c MINOR: log: add +bin logformat node option
Support '+bin' option argument on logformat nodes to try to preserve
binary output type with binary sample expressions.

For this, we rely on the log/sink API which is capable of conveying binary
data since all related functions don't search for a terminating NULL byte
in provided log payload as they take a string pointer and a string length
as argument.

Example:
  log-format "%{+bin}o %[bin(00AABB)]"

Will produce:
  00aabb

(output was piped to `hexdump  -ve '1/1 "%.2x"'` to dump raw bytes as HEX
characters)

This should be used carefully, because many syslog endpoints don't expect
binary data (especially NULL bytes). This is mainly intended for use with
set-var-fmt actions or with ring/udp log endpoints that know how to deal
with such binary payloads.

Also, this option is only supported globally (for use with '%o'), it will
not have any effect when set on an individual node. (it makes no sense to
have binary data in the middle of log payload that was started without
binary data option)
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
162e311a0e MINOR: log: add no_escape_map to bypass escape with _lf_encode_bytes()
Providing no_escape_map as <map> argument to _lf_encode_bytes() function
will make the function skip escaping since the map is empty.

This is for convenience, as it might be useful to call lf_encode_chunk()
to encoding binary data without escaping it.
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
fb8b47fed8 MINOR: log: postpone conversion for sample expressions in sess_build_logline()
In sess_build_logline(), for sample expression nodes, instead of directly
calling sample_fetch_as_type(... SMP_T_STR), let's first process the
sample using sample_process(), and then proceed with the conversion to
str if required.

Doing so will allow us to implement type casting and preserving logic.
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
84963fb743 MINOR: log: expose node typecast in lf_buildctx struct
Store node->typecast setting inside lf_buildctx struct so that encoding
functions may benefit from it.
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
3f2e8d0ed2 MEDIUM: log: lf_* build helpers now take a ctx argument
Add internal lf_buildctx struct that is only used inside
sess_build_logline() scope and is passed to lf_* log building helpers
to expose current building context. For now, node options and the in_text
counter are stored in the ctx struct. Thanks to this change, lf_* building
functions don't depend on a logformat_node struct pointer, and may be used
in a standalone manner as long as a build context is provided.

Also, global options are now handled explictly in sess_build_logline() to
make sure that global options are always considered even if they were not
duplicated on every nodes.

No functional change should be expected.
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
f7cb384f1a MINOR: log: merge lf_encode_string() and lf_encode_chunk() logic
lf_encode_string() and lf_encode_chunk() function are pretty similar. The
only difference is the stopping behavior, encode_chunk stops at a given
position while encode_string stops when encountering '\0'. Moreover,
both functions leverage tools.c encode helpers, but because of the
LOG_OPT_ESC option, they reimplement those helpers with added logic.

Instead of having to deal with code duplication which makes both functions
harder to maintain, let's define a _lf_encode_bytes() helper function
which satisfies lf_encode_string() and lf_encode_chunk() needs while
keeping the function as simple as possible.

_lf_encode_bytes() itself is made of multiple static inline helper
functions, in the attempt to keep checks outside of core loop for
better performance.
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
a1583ec7c7 MINOR: log: make all lf_* sess build helper static
There is no need to expose such functions since they are only involved in
the log building process that occurs inside sess_build_logline().

Making functions static and removing their public prototype to ease code
maintenance.
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
3b9096bd36 MINOR: log: use LOG_VARTEXT_{START,END} to enclose text strings
Rename LOGQUOTE_{START,END} macros to more generic LOG_VARTEXT_{START,END}
in order to prepare for new encoding types that rely on specific treatment
for variable-length texts. No functional change should be expected.
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
278d6c3379 MINOR: log: explicitly handle %ts and %tsc as text strings
Build fixed-length strings for %ts and %tsc to be able to print them
using lf_rawtext_len(), this way it will be easier to encode them
when new encoding options will be added.

No functional change should be expected.
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
2e4cc517bf MEDIUM: log: use lf_rawtext for lf_ip() and lf_port() hex strings
Same as the previous commit, but for ip and port oriented values when
+X option is provided.

No functional change should be expected.

Because of this patch, we add a little overhead because we first generate
the text into a temporary variable and then use lf_rawtext() to print it.
Thus we have a double-copy, and this could have some performance
implications that were not yet evaluated. Due to the small number of bytes
that can end up being copied twice, we could be lucky and have no visible
performance impact, but if we happen to see a significant impact, it could
be useful to add a passthrough mechanism (to keep historical behavior)
when no encoding is involved.
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
3a3bdf1c76 MEDIUM: log: write raw strings using lf_rawtext()
Make use of the previous commit to print strings that should not be
modified.

For instance, when +X option is provided, we have to print numerical
values in ASCII HEX form. For that, we used snprintf() to output the
result to the log output buffer directly, but now we build the string in
a temporary buffer of fixed-size and then print it using lf_rawtext()
which will take care of encoding options.

Because of this patch, we add a little overhead because we first generate
the text into a temporary variable and then use lf_rawtext() to print it.
Thus we have a double-copy, and this could have some performance
implications that were not yet evaluated. Due to the small number of bytes
that can end up being copied twice, we could be lucky and have no visible
performance impact, but if we happen to see a significant impact, it could
be useful to add a passthrough mechanism (to keep historical behavior)
when no encoding is involved.
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
0d1e99c086 MEDIUM: log: pass date strings to lf_rawtext()
Don't directly call functions that take date as argument and output the
string representation to the log output buffer under sess_build_logline(),
and instead build the strings in temporary buffers of fixed size
(hopefully such functions, such as date2str_log() and gmt2str_log()
procuce strings of known size), and then print the result using
lf_rawtext() helper function. This way, we will be able to encode them
automatically as regular string/text when new encoding methods are added.

Because of this patch, we add a little overhead because we first generate
the text into a temporary variable and then use lf_rawtext() to print it.
Thus we have a double-copy, and this could have some performance
implications that were not yet evaluated. Due to the small number of bytes
that can end up being copied twice (< 30), we could be lucky and have no
visible performance impact, but if we happen to see a significant impact,
it could be useful to add a passthrough mechanism (to keep historical
behavior) when no encoding is involved.
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
fcb7e4beaa MINOR: log: add lf_rawtext{_len}() functions
similar to lf_text_{len}, except that quoting and mandatory options are
ignored. Use this to print the input string without any modification (
except for encoding logic).
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
1fa2da18cd MINOR: log: add lf_int() wrapper to print integers
Wrap ltoa(), lltoa(), ultoa() and utoa_pad() functions that are used by
sess_build_logline() to print numerical values by implementing a dedicated
helper named lf_int() that takes <dft_hld> as argument to know how to
write the integer by default (when no encoding is specified).

LF_INT_UTOA_PAD_4 is used to emulate utoa_pad(x, 4) since it's found only
once under sess_build_logline(), thus there is no need to pass an extra
parameter to lf_int() function.
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
d3c92a3a83 MINOR: log: skip custom logformat_node name if empty
Reminder:

Since 3.0-dev4, we can optionally give a name to logformat nodes:

  log-format "%(custom_name1)B %(custom_name2)[str(value)]"

But we may also optionally set the expected node type by appending
':type' after the name, type being either sint,str or bool, like this:

  log-format "%(string_as_int:sint)[str(14)]"

However, it is currently not possible to provide a type without providing
a name that is a least 1 char long. But it could be useful to provide a
type without setting a name, like this, for typecasting purposes only:

  log-format "%(:sint)[bool(true)]"

Thus in order to allow this usage, don't set node->name if node name is
not at least 1 character long. By doing so, node->name will remain NULL
and will not be considered, but the typecast setting will.
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
c584600083 CLEANUP: log: simplify complex values usages in sess_build_logline()
make sess_build_logline() switch case more readable by performing some
simplifications: complex values are first extracted in a temporary
variable so that it's easier to refer to them and at a single place.
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
507223d527 MINOR: log: global lf_expr node options
Add options to lf_expr->nodes to store global options (those that are
common to all node) for easier access.

No functional change should be expected.
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
7ff4f09e23 MINOR: log: store lf_expr nodes inside substruct
Add another struct level inside lf_expr struct to allow new information
to be stored alongside lf_expr nodes.
2024-04-26 18:39:31 +02:00
Aurelien DARRAGON
f8e1357a05 CLEANUP: log: remove unused checks for encode_{chunk,string}
Thanks to 8226e92eb ("BUG/MINOR: tools/log: invalid
encode_{chunk,string} usage"), we only need to check for NULL return
value from encode_{chunk,string}() and escape_string() to know if the
call failed.
2024-04-26 18:39:31 +02:00
Ilya Shipitsin
ab7f05daba CLEANUP: assorted typo fixes in the code and comments
This is 41st iteration of typo fixes
2024-04-17 11:14:44 +02:00
Aurelien DARRAGON
9420cfc0db CLEANUP: log: lf_text_len() returns a pointer not an integer
In c83684519 ("MEDIUM: log: add the ability to include samples in logs")
we checked the return value of lf_text_len() as an integer instead of
comparing the pointer with NULL explicitly. Since this may be confusing,
let's test the return value against NULL.

[ada: for backports, the patch needs to be applied manually because of
 c6a713842 ("MINOR: log: simplify last_isspace in sess_build_logline()")]
2024-04-09 17:35:53 +02:00
Aurelien DARRAGON
28548f812f BUG/MINOR: log: invalid snprintf() usage in sess_build_logline()
According to snprintf() man page:

       The  functions snprintf() and vsnprintf() do not write more than
       size bytes (including the terminating null byte ('\0')).  If the
       output was truncated due to this limit, then the return value is
       the number of  characters  (excluding  the  terminating null byte)
       which would have been written to the final string if enough space
       had been available.  Thus, a return value of size or more means
       that the output was truncated.

However, in sess_build_logline(), each time we need to check the return
value of snprintf(), here is how we proceed:

       iret = snprintf(tmplog, max, ...);
       if (iret < 0 || iret > max)
           // error
       // success
       tmplog += iret;

Here is the issue: if snprintf() lacks 1 byte space to write the
terminating NULL byte, it will return max. Which means in this case
that we fail to know that snprintf() truncated the output in reality,
and we still add iret to tmplog pointer. Considering sess_build_logline()
should NOT write more than <maxsize> bytes (including the terminating NULL
byte) as per the function description, in this case the function would
write <maxsize>+1 byte (to write the terminating NULL byte upon return),
which may lead to invalid write if <dst> was meant to hold <maxsize> bytes
at maximum.

Hopefully, this bug wasn't triggered so far because sess_build_logline()
is called with logline as <dst> argument and <global.max_syslog_len> as
<maxsize> argument, logline being initialized with 1 extra byte upon
startup.

But we better fix this to comply with the function description and prevent
any side-effect since some sess_build_logline() helpers may assume that
'tmplog-dst < maxsize' is always true. Also sess_build_logline() users
probably don't expect NULL-byte to be accounted for in the produced
logline length.

This should be backported to all stable versions.

[ada: for backports, the patch needs to be applied manually because of
 c6a713842 ("MINOR: log: simplify last_isspace in sess_build_logline()")]
2024-04-09 17:35:53 +02:00
Aurelien DARRAGON
8226e92eb0 BUG/MINOR: tools/log: invalid encode_{chunk,string} usage
encode_{chunk,string}() is often found to be used this way:

  ret = encode_{chunk,string}(start, stop...)
  if (ret == NULL || *ret != '\0') {
	//error
  }
  //success

Indeed, encode_{chunk,string} will always try to add terminating NULL byte
to the output string, unless no space is available for even 1 byte.
However, it means that for the caller to be able to spot an error, then it
must provide a buffer (here: start) which is already initialized.

But this is wrong: not only this is very tricky to use, but since those
functions don't return NULL on failure, then if the output buffer was not
properly initialized prior to calling the function, the caller will
perform invalid reads when checking for failure this way. Moreover, even
if the buffer is initialized, we cannot reliably tell if the function
actually failed this way because if the buffer was previously initialized
with NULL byte, then the caller might think that the call actually
succeeded (since the function didn't return NULL and didn't update the
buffer).

Also, sess_build_logline() relies lf_encode_{chunk,string}() functions
which are in fact wrappers for encode_{chunk,string}() functions and thus
exhibit the same error handling mechanism. It turns out that
sess_build_logline() makes unsafe use of those functions because it uses
the error-checking logic mentionned above while buffer (tmplog) is not
guaranteed to be initialized when entering the function. This may
ultimately cause malfunctions or invalid reads if the output buffer is
lacking space.

To fix the issue once and for all and prevent similar bugs from being
introduced, we make it so encode_{string, chunk} and escape_string()
(based on encode_string()) now explicitly return NULL on failure
(when the function failed to write at least the ending NULL byte)

lf_encode_{string,chunk}() helpers had to be patched as well due to code
duplication.

This should be backported to all stable versions.

[ada: for 2.4 and 2.6 the patch won't apply as-is, it might be helpful to
 backport ae1e14d65 ("CLEANUP: tools: removing escape_chunk() function")
 first, considering it's not very relevant to maintain a dead function]
2024-04-09 17:35:45 +02:00
Aurelien DARRAGON
b15f6dfae8 BUG/MINOR: log: fix lf_text_len() truncate inconsistency
In c5bff8e550 ("BUG/MINOR: log: improper behavior when escaping log data")
we fixed lf_text_len() behavior with +E (escape) option.

However we introduced an inconsistency if output buffer is too small to
hold the whole output and truncation occurs: indeed without +E option up
to <size> bytes (including NULL byte) will be used whereas with +E option
only <size-1> bytes will be used. Fixing the function and related comment
so that the function behaves the same in regards to truncation whether +E
option is used or not.

This should be backported to all stable versions.
2024-04-09 17:30:13 +02:00
Aurelien DARRAGON
e751eebfc6 MEDIUM: proxy/log: leverage lf_expr API for logformat preparsing
Currently, the way proxy-oriented logformat directives are handled is way
too complicated. Indeed, "log-format", "log-format-error", "log-format-sd"
and "unique-id-format" all rely on preparsing hints stored inside
proxy->conf member struct. Those preparsing hints include the original
string that should be compiled once the proxy parameters are known plus
the config file and line number where the string was found to generate
precise error messages in case of failure during the compiling process
that happens within check_config_validity().

Now that lf_expr API permits to compile a lf_expr struct that was
previously prepared (with original string and config hints), let's
leverage lf_expr_compile() from check_config_validity() and instead
of relying on individual proxy->conf hints for each logformat expression,
store string and config hints in the lf_expr struct directly and use
lf_expr helpers funcs to handle them when relevant (ie: original
logformat string freeing is now done at a central place inside
lf_expr_deinit(), which allows for some simplifications)

Doing so allows us to greatly simplify the preparsing logic for those 4
proxy directives, and to finally save some space in the proxy struct.

Also, since httpclient proxy has its "logformat" automatically compiled
in check_config_validity(), we now use the file hint from the logformat
expression struct to set an explicit name that will be reported in case
of error ("parsing [httpclient:0] : ...") and remove the extraneous check
in httpclient_precheck() (logformat was parsed twice previously..)
2024-04-04 19:10:01 +02:00
Aurelien DARRAGON
2b79457bc0 MEDIUM: log: add compiling logic to logformat expressions
split parse_logformat_string() into two functions:

parse_logformat_string() sticks to the same behavior, but now becomes an
helper for lf_expr_compile() which uses explicit arguments so that it
becomes possible to use lf_expr_compile() without a proxy, but also
compile an expression which was previously prepared for compiling (set
string and config hints within the logformat expression to avoid manually
storing string and config context if the compiling step happens later).

lf_expr_dup() may be used to duplicate an expression before it is
compiled, lf_expr_xfer() now makes sure that the input logformat is
already compiled.

This is some prerequisite works for log-profiles implementation, no
functional change should be expected.
2024-04-04 19:10:01 +02:00