The detail level initially based on syslog levels is not used, while
something related is missing, trace verbosity, to indicate whether or
not we want to call the decoding callback and what level of decoding
we want (raw captures etc). Let's change the field to "verbosity" for
this. A verbosity of zero means that the decoding callback is not
called, and all other levels are handled by this callback and are
source-specific. The source is now prompted to list the levels that
are proposed to the user. When the source doesn't define anything,
"quiet" and "default" are available.
It's very convenient to know the trace level in the output, at least to
grep/grep -v on it. The main usage is to filter/filter out the developer
traces at level DEVEL. For this we now add the numeric level (0 to 4) just
after the source name in the brackets. The output now looks like this :
[00|h2|4|mux_h2.c:3174] h2_send(): entering : h2c=0x27d75a0 st=2
-, -, | ------------, --------, -------------------> message
| | | | '-> function name
| | | '-> source file location
| | '-> trace level (4=dev)
| '-> trace source
'-> thread number
Working on adding traces to mux-h2 revealed that the function names are
manually copied a lot in developer traces. The reason is that they are
not preprocessor macros and as such cannot be concatenated. Let's
slightly adjust the trace() function call to take a function name just
after the file:line argument. This argument is only added for the
TRACE_DEVEL and 3 new TRACE_ENTER, TRACE_LEAVE, and TRACE_POINT macros
and left NULL for others. This way the function name is only reported
for traces aimed at the developers. The pretty-print callback was also
extended to benefit from this. This will also significantly shrink the
data segment as the "entering" and "leaving" strings will now be merged.
One technical point worth mentioning is that the function name is *not*
passed as an ist to the inline function because it's not considered as
a builtin constant by the compiler, and would lead to strlen() being
run on it from all call places before calling the inline function. Thus
instead we pass the const char * (that the compiler knows where to find)
and it's the __trace() function that converts it to an ist for internal
consumption and for the pretty-print callback. Doing this avoids losing
5-10% peak performance.
The "payload" trace level was ambigous because its initial purpose was
to be able to dump received data. But it doesn't make sense to force to
report data transfers just to be able to report state changes. For
example, all snd_buf()/rcv_buf() operations coming from the application
layer should be tagged at this level. So here we move this payload level
above the state transitions and rename it to avoid the ambiguity making
one think it's only about request/response payload. Now it clearly is
about any data transfer and is thus just below the developer level. The
help messages on the CLI and the doc were slightly reworded to help
remove this ambiguity.
Save the authority TLV in a PROXYv2 header from the client connection,
if present, and make it available as fc_pp_authority.
The fetch can be used, for example, to set the SNI for a backend TLS
connection.
Traces were missing the thread number and the source name, which was
really annoying. Now the thread number is emitted on two digits inside
the square brackets, followed by the source name then the line location,
each delimited with a vertical bar, such as below :
[00|h2|mux_h2.c:2651] Notifying stream about SID change : h2c=0x7f3284581ae0 st=3 h2s=0x7f3284297f00 id=523 st=4
[00|h2|mux_h2.c:2708] receiving H2 HEADERS frame : h2c=0x7f3284581ae0 st=3 dsi=525 (st=0)
[02|h2|mux_h2.c:2194] Received H2 request : h2c=0x7f328d3d1ae0 st=2 : [525] H2 REQ: GET / HTTP/2.0
[02|h2|mux_h2.c:2561] Expecting H2 frame header : h2c=0x7f328d3d1ae0 st=2
It becomes apparent that most traces will use a single trace pretty
print callback, so let's allow the trace source to declare a default
one so that it can be omitted from trace calls, and will be used if
no other one is specified.
The principle is that when emitting a message, if some dropped events
were logged, we first attempt to report this counter before going
further. This is done under an exclusive lock while all logs are
produced under a shared lock. This ensures that the dropped line is
accurately reported and doesn't accidently arrive after a later
event.
The new "show events" CLI keyword lists supported event sinks. When
passed a buffer-type sink it completely dumps it.
no drops at all during attachment even at 8 millon evts/s.
still missing the attachment limit though.
This now provides sink_new_buf() which allocates a ring buffer. One such
ring ("buf0") of 1 MB is created already, and may be used by sink_write().
The sink's creation should probably be moved somewhere else later.
The three functions (attach, IO handler, and release) are meant to be
called by any CLI command which requires to dump the contents of a ring
buffer. We do not implement anything generic to dump any ring buffer on
the CLI since it's meant to be used by other functionalities above.
However these functions deal with locking and everything so it's trivial
to embed them in other code.
This function tries to write to the ring buffer, possibly removing enough
old messages to make room for the new one. It takes two arrays of fragments
on input to ease the insertion of prefixes by the caller. It atomically
writes the message, possibly truncating it if desired, and returns the
operation's status.
Our circular buffers are well suited for being used as ring buffers for
not-so-structured data. The machanism here consists in making room in a
buffer before inserting a new record which is prefixed by its size, and
looking up next record based on the previous one's offset and size. We
can have up to 255 consumers watching for data (dump in progress, tail)
which guarantee that entrees are not recycled while they're being dumped.
The complete representation is described in the header file. For now only
ring_new(), ring_resize() and ring_free() are created.
Let's not mess up with fd-specific code, locking nor message formating
here, and use the new generic function instead. This substantially
simplifies the sink_write() code and makes it more agnostic to the
output representation and storage.
Currently both logs and event sinks may use a file descriptor to
atomically emit some output contents. The two may use the same FD though
nothing is done to make sure they use the same lock. Also there is quite
some redundancy between the two. Better make a specific function to send
a fragmented message to a file descriptor which will take care of the
locking via the fd's lock. The function is also able to truncate a
message and to enforce addition of a trailing LF when building the
output message.
It will sometimes be useful to encode varints to know the output size in
advance. Two versions are provided, one inline using a switch/case construct
which will be trivial for use with constants (and will be very fast albeit
huge) and one function iterating on the number which is 5 times smaller,
for use with variables.
The converter can be useful to look up a server queue from a dynamic value.
It takes an input value of type string, either a server name or
<backend>/<server> format and returns the number of queued sessions
on that server. Can be used in places where we want to look up
queued sessions from a dynamic name, like a cookie value (e.g.
req.cook(SRVID),srv_queue) and then make a decision to break
persistence or direct a request elsewhere.
Signed-off-by: Nenad Merdanovic <nmerdan@haproxy.com>
The url32 sample fetch does not take the path part of the URL into
account. This is because in smp_fetch_url32() we erroneously modify
path.len and path.ptr before testing their value and building the
path based part of the hash.
This fixes issue #235
This must be backported as far as 1.9, when HTX was introduced.
The delete_listener() function takes the listener's lock before taking
the proto_lock, which is contrary to what other functions do, possibly
causing an AB/BA deadlock. In practice the two only places where both
are taken are during protocol_enable_all() and delete_listener(), the
former being used during startup and the latter during stop. In practice
during reload floods, it is technically possible for a thread to be
initializing the listeners while another one is stopping. While this
is too hard to trigger on 2.0 and above due to the synchronization of
all threads during startup, it's reasonably easy to do in 1.9 by having
hundreds of listeners, starting 64 threads and flooding them with reloads
like this :
$ while usleep 50000; do killall -USR2 haproxy; done
Usually in less than a minute, all threads will be deadlocked. The fix
consists in always taking the proto_lock before the listener lock. It
seems to be the only place where these two locks were reversed. This
fix needs to be backported to 2.0, 1.9, and 1.8.
If haproxy is built with profiling enabled with -pg, it is possible to
see the master quit during a reload while it's re-executing itself with
error code 155 (signal 27) saying "Profile timer expired)". This happens
if the SIGPROF signal is delivered during the execve() call while the
handler was already unregistered. The issue itself is not directly inside
haproxy but it's easy to address. This patch disables this signal before
calling execvp() during a master reload. A simple test for this consists
in running this little script with haproxy started in master-worker mode :
$ while usleep 50000; do killall -USR2 haproxy; done
This fix should be backported to all versions using the master-worker
model.
If a receipt ends with the HTX buffer full and everything is completed except
appending the HTX EOM block, we end up detecting an error because the H1
parser did not switch to H1_MSG_DONE yet while all conditions for an end of
stream and end of buffer are met. This can be detected by retrieving 31532
or 31533 chunk-encoded bytes over H1 and seeing haproxy log "SD--" at the
end of a successful transfer.
Ideally the EOM part should be totally independent on the H1 message state
since the block was really parsed and finished. So we should switch to a
last state requiring to send only EOM. However this needs a few risky
changes. This patch aims for simplicity and backport safety, thus it only
adds a flag to the H1 stream indicating that an EOM is still needed, and
excludes this condition from the ones used to detect end of processing. A
cleaner approach needs to be studied, either by adding a state before DONE
or by setting DONE once the various blocks are parsed and before trying to
send EOM.
This fix must be backported to 2.0. The issue does not seem to affect 1.9
though it is not yet known why, probably that it is related to the different
encoding of trailers which always leaves a bit of room to let EOM be stored.
The H1 message parser calls the various message block parsers with an
offset indicating where in the buffer to start from, and only consumes
the data at the end of the parsing. The headers and trailers parsers
have a condition detecting if a headers or trailers block is too large
to fit into the buffer. This is detected by an incomplete block while
the buffer is full. Unfortunately it doesn't take into account the fact
that the block may be parsed after other blocks that are still present
in the buffer, resulting in aborting some transfers early as reported
in issue #231. This typically happens if a trailers block is incomplete
at the end of a buffer full of data, which typically happens with data
sizes multiple of the buffer size minus less than the trailers block
size. It also happens with the CRLF that follows the 0-sized chunk of
any transfer-encoded contents is itself on such a boundary since this
CRLF is technically part of the trailers block. This can be reproduced
by asking a server to retrieve exactly 31532 or 31533 bytes of static
data using chunked encoding with curl, which reports:
transfer closed with outstanding read data remaining
This issue was revealed in 2.0 and does not affect 1.9 because in 1.9
the trailers block was processed at once as part of the data block
processing, and would simply give up and wait for the rest of the data
to arrive.
It's interesting to note that the headers block parsing is also affected
by this issue but in practice it has a much more limited impact since a
headers block is normally only parsed at the beginning of a buffer. The
only case where it seems to matter is when dealing with a response buffer
full of 100-continue header blocks followed by a regular header block,
which will then be rejected for the same reason.
This fix must be backported to 2.0 and partially to 1.9 (the headers
block part).
Now we try to find frontend, listener, backend, server, connection,
session, stream, from the presented argument of type connection,
stream or session. Various combinations and bounces allow to
retrieve most of them almost all the time. The extraction is
performed early so that we'll be able to apply filters later.
The lock-on is set if it was not there while the trace is running and
a valid pointer is available. If it was already set and doesn't match,
no trace is produced.
When no criterion is provided, it carefully enumerates all available
ones, including those provided by the source itself. Otherwise it sets
the new criterion and resets the lockon pointer.
When we stop or pause a trace (either on a matching event or by hand),
we must also stop the lock-on feature so that we don't follow any
further activity on this pointer even if it is recycled. For now this
is not exploited.
The trace() call will support an optional decoding callback and 4
arguments that this function is supposed to know how to use to provide
extra information. The output remains unchanged when the function is
NULL. Otherwise, the message is pre-filled into the thread-local
trace_buf, and the function is called with all arguments so that it
completes the buffer in a readable form depending on the expected
level of detail.
This new "level" argument will allow the trace sources to label the
traces for different purposes, and filter out some of them if they
are not relevant to the current target. Right now we have 5 different
levels:
- USER : the least verbose one, only a few functional information
- PAYLOAD: like user but also displays some payload-related information
- PROTO: focuses on the protocol's framing
- STATE: also indicate state internal transitions or non-transitions
- DEVELOPER: adds extra info about branches taken in the code (break
points, return points)
We now pass an extra argument "where" to the trace() call, which
is supposed to be an ist made of the concatenation of the filename
and the line number. We only keep the last 10 chars from this string
since the end of file names is most often easy to recognize. This
gives developers useful information at very low cost.
For now it remains quite basic. It performs a few state checks, calls
the source's sink if defined, and performs the transitions between
RUNNING, STOPPED and WAITING when the configured events match.
The new "show trace" CLI command lists available trace sources and
indicates their status, their sink, and number of dropped packets.
When "show trace <source>" is used, the list of known events is
also listed with their status per action (report/start/stop/pause).
The "level" keyword allows to indicate the expected level of verbosity
in the traces, among "user" (least verbose, just synthetic info) to
"developer" (very detailed, including function entry/leaving). It's only
displayed and set but not used yet.
For now it lists the sources if one is not provided, and checks
for the source's existence. It lists the events if not provided,
checks for their existence if provided, and adjusts reported
events/start/stop/pause events, and performs state transitions.
It lists sinks and adjusts them as well. Filters, lock, and
level are not implemented yet.
The principle of this subsystem will be to support taking live traces
at various places in the code with conditional triggers, filters, and
ability to lock on some elements. The traces will support typed events
and will be sent into sinks made of ring buffers, file descriptors or
remote servers.
This is the most basic type of sink. It pre-registers "stdout" and
"stderr", and is able to use writev() on them. The writev() operation
is locked to avoid mixing outputs. It's likely that the registration
should move somewhere else to take into account the fact that stdout
and stderr are still opened or are closed.
The principle will be to be able to dispatch events to various destinations
called "sinks". This is already done in part in logs where log servers can
be either a UDP socket or a file descriptor. This will be needed with the
new trace subsystem where we may also want to add ring buffers. And it turns
out that all such destinations make sense at all places. Logs may need to be
sent to a TCP server via a ring buffer, or consulted from the CLI. Trace
events may need to be sent to stdout/stderr as well as to remote log servers.
This patch creates a new structure "sink" aiming at addressing these similar
needs. The goal is to merge together what is common to all of them, such as
the output format, the dropped events count, etc, and also keep separately
the target identification (network address, file descriptor). Provisions
were made to have a "waiter" on the sink. For a TCP log server it will be
the task to wake up after writing to the log buffer. For a ring buffer, it
could be the list of watchers on the CLI running a "tail" operation and
waiting for new events. A lock was also placed in the struct since many
operations will require some locking, including the FD ones. The output
formats covers those in use by logs and two extra ones prepending the ISO
time in front of the message (convenient for stdio/buffer).
For now only the generic infrastructure is present, no type-specific
output is implemented. There's the sink_write() function which prepares
and formats a message to be sent, trying hard to avoid copies and only
using pointer manipulation, where the type-specific code just has to be
added. Dropped messages are already counted (for now 100% drop). The
message is put into an iovec array as it will be trivial to use with
file descriptors and sockets.
The function call tracing code is a quite old and was never ported to
support threads. It's not even sure whether it still works well, but
at least its presence creates confusion for future work so let's rename
it to calltrace.c and add a comment about its lack of thread-safety.
In h1_rcv_buf(), wake the h1c tasklet as long as we're not done reading the
request/response, and the h1c is not already subscribed for receiving. Now
that we no longer subscribe in h1_recv() if we managed to read data, we
rely on h1_rcv_buf() calling us again, but h1_process_input() may have
returned 0 if we only received part of the request, so we have to wake
the tasklet to be sure to get more data again.
When we dump a thread's state (show thread, panic) we don't know if
anything is happening in Lua, which can be problematic especially when
calling external functions. With this patch, the thread dump code can
now detect if we're running in a global Lua task (hlua_process_task),
or in a TCP or HTTP Lua service (task_run_applet and applet.fct ==
hlua_applet_tcp_fct or http_applet_http_fct), or a fetch/converter
from an analyser (s->hlua != NULL). In such situations, it's able to
append a formatted Lua backtrace of the Lua execution path with
function names, file names and line numbers.
Note that a shorter alternative could be to call "luaL_where(hlua->T,0)"
which only prints the current location, but it's not necessarily sufficient
for complex code.
The current functions are seen outside from the debugging code and are
convenient to export so that we can improve the thread dump output :
void hlua_applet_tcp_fct(struct appctx *ctx);
void hlua_applet_http_fct(struct appctx *ctx);
struct task *hlua_process_task(struct task *task, void *context, unsigned short state);
Of course they are only available when USE_LUA is defined.
This is somewhat related to indent_msg() except that this one places a
known prefix at the beginning of each line, allows to replace the EOL
character, and not to insert a prefix on the first line if not desired.
It works with a normal output buffer/chunk so it doesn't need to allocate
anything nor to modify the input string. It is suitable for use in multi-
line backtraces.
In mux_pt_attach(), don't inconditionally call unsubscribe, and only do so
if we were subscribed. The idea was that at this point we would always be
subscribed, as for the mux_pt attach would only be called after at least one
request, after which the mux_pt would have subscribed, but this is wrong.
We can also be called if for some reason the connection failed before the
xprt was created. And with no xprt, attempting to call unsubscribe will
probably lead to a crash.
This should be backported to 2.0.
The stats applet waits to have a full body to process POST requests. Because
when it is waiting for the end of a request it does not produce anything, the
applet may be blocked. The client side is blocked because the stats applet does
not consume anything and the applet is waiting because all the body is not
received. Registering the analyzer AN_REQ_HTTP_BODY when a POST request is sent
for the stats applet solves the issue.
This patch must be backported to 2.0.
This bug was introduced by the commit bfab2ddd ("MINOR: hlua: Add a flag on the
lua txn to know in which context it can be used"). The wrong test was done. So
the timeout was always set on the response channel. It may lead to an infinite
loop.
This patch must be backported everywhere the commit bfab2ddd is. For now, at
least to 2.0, 1.9 and 1.8.
In the REORG of commit 1a18b5414 ("REORG: connection: centralize the
conn_set_{tos,mark,quickack} functions") a bug was introduced by
calling conn_set_tos instead of conn_set_mark.
This was reported in issue #212
This should be backported to 1.9 and 2.0.
When we upgrade the mux from TCP to H2/HTX, don't use cs_destroy() to free
the conn_stream, use cs_free() instead. Using cs_destroy() would call the
mux detach method, and at that point of time the mux would be the H2 mux,
which knows nothing about that conn_stream, so bad things would happen.
This should eventually make upgrade from TCP to H2/HTX work, and fix
the github issue #196.
This should be backported to 2.0.
In stream_end_backend(), if we're upgrading from TCP to H1/HTX, as we don't
destroy the stream, we have to add the SF_HTX flag on the stream, or bad
things will happen.
This was broken when attempting to fix github issue #196.
This should be backported to 2.0.