With the new ability to log to a terminal, it's convenient to be able
to use "log stdout" in a config file, except that it now results in
setting the terminal to non-blocking mode, breaking every utility
relying on stdin afterwards. Since the only reason for logging to a
terminal is to debug, do not set the FD to non-blocking mode when it's
a terminal.
This fix must be backported to 1.9.
Commit f8188c6 ("MEDIUM: threads/logs: Make logs thread-safe") made logs
thread-local but it also made the copy of the startup-logs thread-local,
meaning that when threads are configured, upon startup the list of startup
logs appears to be empty. Let's just remove the THEAD_LOCAL directive
there, as the check for the startup period is already present.
This fix should be backported to 1.8.
PiBa-NL reported an issue affecting logs when stdout is enabled at the
same time as an IP address. It does not affect FD and UNIX, but does
still affect multiple FDs. What happens is that the condition to detect
that the initialization was not made relies on the FD being -1, and in
this case the FD points to the *unique* FD used for AF_INET sockets, so
the configured socket used for outgoing logs over UDP gets overwritten
by the last configured FD. This is not appropriate, so instead we rely
on the sin_port part of the IPv4-mapped address to store the
initialization state for each FD.
This part deserves being significantly revamped, as IPv6 is still not
possible due to the way the FDs are managed, and inherited FDs are a
bit hackish.
Note that this patch relies on "MINOR: tools: preset the port of
fd-based "sockets" to zero" in order to operate properly.
No backport is needed.
In sess_build_logline(), don't attempt to call sample_fetch_as_type()
if we don't have a stream.
It used never to happen in the past before commit 09bb27c ("MEDIUM: log:
make sess_build_logline() support being called with no stream"). But now
it can happen when sess_log() is called from the lower layers (i.e. the
H2 mux got garbage when it was expecting a preface frame), and it reveals
that some sample fetch functions and some converter fnuctions which rely
on the stream don't test for its existence. For the sample fetch functions,
a durable solution is easy and would consist in adapting sample_process()
to verify the SMP_USE_* bits when the stream is not set. But for the
converters we don't have this option as they don't declare whether or not
they use a stream (which possibly is the real issue).
Thus for now let's disable sample_fetch_as_type() if a stream does not
exist, until it can be more accurately refined later.
signal_init(), init_log(), init_stream(), and init_task() all used to
only preset some values and lists. This needs to be done very early to
provide a reliable interface to all other users. The calls used to be
explicit in haproxy.c:init(). Now they're placed in initcalls at the
STG_PREPARE stage. The functions are not exported anymore.
Most calls to hap_register_post_check(), hap_register_post_deinit(),
hap_register_per_thread_init(), hap_register_per_thread_deinit() can
be done using initcalls and will not require a constructor anymore.
Let's create a set of simplified macros for this, called respectively
REGISTER_POST_CHECK, REGISTER_POST_DEINIT, REGISTER_PER_THREAD_INIT,
and REGISTER_PER_THREAD_DEINIT.
Some files were not modified because they wouldn't benefit from this
or because they conditionally register (e.g. the pollers).
This switches explicit calls to various trivial registration methods for
keywords, muxes or protocols from constructors to INITCALL1 at stage
STG_REGISTER. All these calls have in common to consume a single pointer
and return void. Doing this removes 26 constructors. The following calls
were addressed :
- acl_register_keywords
- bind_register_keywords
- cfg_register_keywords
- cli_register_kw
- flt_register_keywords
- http_req_keywords_register
- http_res_keywords_register
- protocol_register
- register_mux_proto
- sample_register_convs
- sample_register_fetches
- srv_register_keywords
- tcp_req_conn_keywords_register
- tcp_req_cont_keywords_register
- tcp_req_sess_keywords_register
- tcp_res_cont_keywords_register
- flt_register_keywords
Remaining calls to si_cant_put() were all for lack of room and were
turned to si_rx_room_blk(). A few places where SI_FL_RXBLK_ROOM was
cleared by hand were converted to si_rx_room_rdy().
The now unused si_cant_put() function was removed.
When configuring the logs with a FD and using the master worker, the FD
was closed upon a reload because it was configured with CLOEXEC. It
leads to using the wrong FD for the logs and to close them. Which is
unfortunate since the master rely on the FD left opened during a reload.
The fix is to stop doing a CLOEXEC when the FD is inherited.
No backport needed.
This format is pretty similar to the previous "short" format except
that it also removes the severity level. Thus only the raw message is
sent. This is suitable for use in containers, where only the raw
information is expected and where the severity is supposed to come
from the file descriptor used.
This format is meant to be used with local file descriptors. It emits
messages only prefixed with a level, removing all the process name,
system name, date and so on. It is similar to the printk() format used
on Linux. It's suitable to be sent to a local logger compatible with
systemd's output format.
Note that the facility is still required but not used, hence it is
suggested to use "daemon" to remind that it's a local logger.
Example :
log stdout format short daemon # send everything to stdout
log stderr format short daemon notice # send important events to stderr
In certain situations it would be desirable to log to an existing file
descriptor, the most common case being a pipe between containers or
processes. The main issue with pipes is that using write() on them will
randomly truncate messages. But there is a trick. By using writev(), we
can atomically deliver or drop a message, which perfectly fits the
purpose. The only caveat is that large messages (4096 bytes on modern
operating systems) may be interleaved with messages from other processes
if using nbproc for example. In practice such messages are rare and most
of the time when users need such type of logging, the load is low enough
for a single process to be running so this is not really a problem.
This logging method thus uses unbuffered writev() calls and is uses more
CPU than if it used its own buffer with large writes at once, though this
is not a problem for moderate loads.
Logging to a file descriptor attached to a file also works with the side
effect that the process is significantly slowed down during disk accesses
and that it's not possible to rotate the file without restarting the
process. For this reason this option is not offered as a configuration
option, since it would confuse most users, but one could decide to
redirect haproxy's output to a file during debugging sessions. Two aliases
"stdout" and "stderr" are provided, but keep in mind that these are closed
by default in daemon mode.
When logging to a pipe or socket at a high enough rate, some logs will be
dropped and the number of dropped messages is reported in "show info".
It's easy to detect when logs on some paths are lost as sendmsg() will
return EAGAIN. This is particularly true when sending to /dev/log, which
often doesn't support a big logging capacity. Let's keep track of these
and report the total number of dropped messages in "show info".
The error messages used to say something along "socket logger 2 failed"
or "sendmsg logger 2 failed" which are confusing. Let's rephrase this
"sendmsg() failed for logger 2".
It doesn't make sense to limit this code to applets, as any stream
interface can use it. Let's rename it by simply dropping the "applet_"
part of the name. No other change was made except updating the comments.
At many places in muxes we'll have to add tests to check if the
connection is front or back before deciding to log. Instead let's
centralize this test in sess_log() to simply do nothing when sess=NULL.
There are 3 tables in proto_http which are used exclusively by logs :
hdr_encode_map[], url_encode_map[] and http_encode_map[]. They indicate
what characters are safe to be emitted in logs depending on the part of
the message where they are placed. Let's move this to log.c, as well as
its initialization. It's worth noting that the rfc5424 map was already
initialized there.
It's a bit painful to have to deal with HTTP semantics for each protocol
version (H1 and H2), and working on the version-agnostic code further
emphasizes the problem.
This patch creates http.h and http.c which are agnostic to the version
in use, and which borrow a few parts from proto_http and from h1. For
example the once thought h1-specific h1_char_classes array is in fact
dictated by RFC7231 and is used to parse HTTP headers. A few changes
were made to a few files which were including proto_http.h while they
only needed http.h.
Certain string definitions pre-dated the introduction of indirect
strings (ist) so some were used to simplify the definition of the known
HTTP methods. The current lookup code saves 2 kB of a heavily used table
and is faster than the previous table based lookup (typ. 14 ns vs 16
before).
The new function sess_log() only needs a session to emit a log. It will
ignore the parts that depend on the stream. It is usable to emit a log
to report early errors in muxes. These ones will typically mention
"<BADREQ>" for the request and 0 for the HTTP status code.
Till now it was impossible to emit logs from the lower layers only because
a stream was mandatory. From now on it will at least be possible to emit a
log to report a bad request or some timings for example. When the stream
is null, sess_build_logline() will use default values and will extract the
timing information from the session just like stream_new() does, so the
resulting log line is perfectly valid.
The termination state will indicate a proxy error during the request phase
since it is the only realistic use for such a call with no stream.
When s==NULL we don't have any assigned request counter. Ideally we
should proceed exactly like when a stream is initialized and assign
a unique value. For now we only place it into a local variable.
We'll soon support s==NULL so let's use an intermediary variable for the
logs structure. For now it only points to s->logs but will support a local
variable as an alternative later.
The current build_logline() can only be used with valid streams, which
means it is not suitable for use from muxes. We start by moving it into
another more generic function which takes the session as an argument,
to avoid complexifying all the internal API for jsut a few use cases.
This new function is not supposed to be called directly from outside so
we'll be able to instrument it to support several calling conventions.
For now the behaviour and conditions remain unchanged.
The current name is misleading as it implies a queue size, but the value
instead indicates a position in the queue.
The value is only the queue size at the exact moment the element is enqueued.
Soon we will gain the ability to insert anywhere into the queue, upon which
clarity of the name is more important.
Now all the code used to manipulate chunks uses a struct buffer instead.
The functions are still called "chunk*", and some of them will progressively
move to the generic buffer handling code as they are cleaned up.
Chunks are only a subset of a buffer (a non-wrapping version with no head
offset). Despite this we still carry a lot of duplicated code between
buffers and chunks. Replacing chunks with buffers would significantly
reduce the maintenance efforts. This first patch renames the chunk's
fields to match the name and types used by struct buffers, with the goal
of isolating the code changes from the declaration changes.
Most of the changes were made with spatch using this coccinelle script :
@rule_d1@
typedef chunk;
struct chunk chunk;
@@
- chunk.str
+ chunk.area
@rule_d2@
typedef chunk;
struct chunk chunk;
@@
- chunk.len
+ chunk.data
@rule_i1@
typedef chunk;
struct chunk *chunk;
@@
- chunk->str
+ chunk->area
@rule_i2@
typedef chunk;
struct chunk *chunk;
@@
- chunk->len
+ chunk->data
Some minor updates to 3 http functions had to be performed to take size_t
ints instead of ints in order to match the unsigned length here.
With "log global" line, the global list of loggers are copied into the proxy's
struct. The list coming from the default section is also copied when a frontend
or a backend section is parsed. So it is possible to have duplicate entries in
the proxy's list. For instance, with this following config, all messages will be
logged twice:
global
log 127.0.0.1 local0 debug
daemon
defaults
mode http
log global
option httplog
frontend front-http
log global
bind *:8888
default_backend back-http
backend back-http
server www 127.0.0.1:8000
Now, the function parse_logsrv should be used to parse a "log" line. This
function will update the list of loggers passed in argument. It can release all
log servers when "no log" line was parsed (by the caller) or it can parse "log
global" or "log <address> ... " lines. It takes care of checking the caller
context (global or not) to prohibit "log global" usage in the global section.
This is a recurring pain when using certain unix domain sockets or when
sending to temporarily unroutable addresses, if the process remains in
the foreground, the console is full of error which it's impossible to
do anything about. It's even worse when the process is remote, or when
run from a serial console which will slow the whole process down. Let's
send them only once now to warn about a possible config issue, and not
pollute the system nor slow everything down.
A log socket (UDP or UNIX) is opened by the master during its startup, when the
first log message is sent. So, to prevent FD leaks, we must ensure we correctly
close it during a reload. By setting FD_CLOEXEC bit on it, we are sure it will
be automatically closed it during a reload.
This patch must be backported in 1.8.
During the migration to the second version of the pools, the new
functions and pool pointers were all called "pool_something2()" and
"pool2_something". Now there's no more pool v1 code and it's a real
pain to still have to deal with this. Let's clean this up now by
removing the "2" everywhere, and by renaming the pool heads
"pool_head_something".
All the references to connections in the data path from streams and
stream_interfaces were changed to use conn_streams. Most functions named
"something_conn" were renamed to "something_cs" for this. Sometimes the
connection still is what matters (eg during a connection establishment)
and were not always renamed. The change is significant and minimal at the
same time, and was quite thoroughly tested now. As of this patch, all
accesses to the connection from upper layers go through the pass-through
mux.
There was a flaw in the way the threads was created. the main one was just used
to create all the others and just wait to exit. Now, it is used to run a poll
loop. So we only create nbthread-1 threads.
This also fixes a bug about the compression filter when there is only 1 thread
(nbthread == 1 or no threads support). The bug was in the way thread-local
resources was initialized. per-thread init/deinit callbacks were never called
for the main process. So, with nthread set to 1, some buffers remained
uninitialized.
Now, each proxy contains a lock that must be used when necessary to protect
it. Moreover, all proxy's counters are now updated using atomic operations.
log buffers and static variables used in log functions are now thread-local. So
there is no need to lock anything to log messages. Moreover, per-thread
init/deinit functions are now used to initialize these buffers.