Since patch "7185b789", the authority TLV in a PROXYv2 header from a
client connection is stored. Authority TLV sends in PROXYv2 should be
taken into account to allow chaining PROXYv2 without droping it.
Now by prefixing a log server with "ring@<name>" it's possible to send
the logs to a ring buffer. One nice thing is that it allows multiple
sessions to consult the logs in real time in parallel over the CLI, and
without requiring file system access. At the moment, ring0 is created as
a default sink for tracing purposes and is available. No option is
provided to create new rings though this is trivial to add to the global
section.
Instead of detecting an AF_UNSPEC address family for a log server and
to deduce a file descriptor, let's create a target type field and
explicitly mention that the socket is of type FD.
When logging to a file descriptor, we'd rather use the unified
fd_write_frag_line() which uses the FD's lock than perform the
writev() ourselves and use a per-server lock, because if several
loggers point to the same output (e.g. stdout) they are still
not locked and their logs may interleave. The function above
instead relies on the fd's lock so this is safer and will even
protect against concurrent accesses from other areas (e.g traces).
The function also deals with the FD's non-blocking mode so we do
not have to keep specific code for this anymore in the logs.
Logs and sinks were resorting to dirty hacks to initialize an FD to
non-blocking mode. Now we have a bit for this in the fd tab so we can
do it on the fly on first use of the file descriptor. Previously it was
set per log server by writing value 1 to the port, or during a sink
initialization regardless of the usage of the fd.
The purpose is to be able to remember that initialization was already
done for a file descriptor. This will allow to get rid of some dirty
hacks performed in the logs or fd sinks where the init state of the
fd has to be guessed.
The "cache" entry was still present in the fdtab struct and it was
reported in "show sess". Removing it broke the cache-line alignment
on 64-bit machines which is important for threads, so it was fixed
by adding an attribute(aligned()) when threads are in use. Doing it
only in this case allows 32-bit thread-less platforms to see the
struct fit into 32 bytes.
Ilya reported in issue #242 that h2c_handle_priority() was having
unreachable code... Obviously, I missed the braces around the "if",
leaving an unconditional return.
No backport is needed.
Now it is possible for a reader to subscribe and wait for new events
sent to a ring buffer. When new events are written to a ring buffer,
the applets that are subscribed are woken up to display new events.
For now we only support this with the CLI applet called by "show events"
since the I/O handler is indeed a CLI I/O handler. But it's not
complicated to add other mechanisms to consume events and forward them
to external log servers for example. The wait mode is enabled by adding
"-w" after "show events <sink>". An extra "-n" was added to directly
seek to new events only.
Some CLI parsers are currently abusing the CLI context types such as
pointers to stuff longs into them by lack of room. But the context is
80 bytes while cli is only 48, thus there's some room left. This patch
adds a list element and two size_t usable as various offsets. The list
element is initialized.
There are two problems with the way we count watchers on a ring:
- the test for >=255 was accidently kept to 1 used during QA
- if the producer deletes all data up to the reader's position
and the reader is called, cannot write, and is called again,
it will have a zero offset, causing it to initialize itself
multiple times and each time adding a new refcount.
Let's change this and instead use ~0 as the initial offset as it's
not possible to have it as a valid one. We now store it into the
CLI context's size_t o0 instead of casting it to a void*.
No backport needed.
user-level traces are more readable when visually aligned. This is easily
done by writing "rcvd" instead of "received" to align with "sent" :
$ socat - /tmp/sock1 <<< "show events buf0"
[00|h2|0|mux_h2.c:2465] rcvd H2 request : [1] H2 REQ: GET /?s=10k HTTP/2.0
[00|h2|0|mux_h2.c:4563] sent H2 response : [1] H2 RES: HTTP/1.1 200
h2c->dsi is only for demuxing, and needed while decoding a new request.
But if we already have a valid stream ID (e.g. response or outgoing
request), we should use it instead. This avoids seeing [0] in front of
the responses at user level.
This is as documented in "trace h2 verbosity", level "minimal" only
features flags and doesn't perform any decoding at all, "simple" does,
just like "clean" which is the default for end uesrs.
The "clean" output will be suitable for user and proto-level output
where the internal stuff (state, pointers, etc) is not desired but
just the basic protocol elements.
We need this all the time in traces, let's have it now. For the sake
of compact outputs, the strings are all 3-chars long. The "show fd"
output was improved to make use of this.
All functions of the h2 data path were updated to receive one or multiple
TRACE() calls, at least one pair of TRACE_ENTER()/TRACE_LEAVE(), and those
manipulating protocol elements have been improved to report frame types,
special state transitions or detected errors. Even with careful tests, no
performance impact was measured when traces are disabled.
They are not completely exploited yet, the callback function tries to
dump a lot about them, but still doesn't provide buffer dumps, nor does
it indicate the stream or connection error codes.
The first argument is always set to the connection when known. The second
argument is set to the h2s when known, sometimes a 3rd argument is set to
a buffer, generally the rxbuf or htx, and occasionally the 4th argument
points to an integer (number of bytes read/sent, error code).
Retrieving a 10kB object produces roughly 240 lines when at developer
level, 35 lines at data level, 27 at state level, and 10 at proto level
and 2 at user level.
For now the headers are not dumped, but the start line are emitted in
each direction at user level.
The patch is marked medium because it touches lots of places, though
it takes care not to change the execution path.
The new function h2_trace() is called when relevant by the trace subsystem
in order to provide extra information about the trace being produced. It
can for example display the connection pointer, the stream pointer, etc.
It is declared in the trace source as the default callback as we expect
it to be versatile enough to enrich most traces.
In addition, for requests and responses, if we have a buffer and we can
decode it as an HTX buffer, we can extract user-friendly information from
the start line.
For now the traces are not used. Supported events are categorized by
where the activity comes from (h2c, h2s, stream, etc), a direction
(send/recv/wake), and a list of possibilities for each of them (frame
types, errors, shut, ...). This results in ~50 different events that
try to cover a lot of possibilities when it's needed to filter on
something specific. Special events like protocol error are handled.
A few aggregate events like "rx_frame" or "tx_frame" are planed to
cover all frame types at once by being placed all the time with any
of the other ones.
We also state that the first argument is always the connection. This way
the trace subsystem will be able to safely retrieve some useful info, and
we'll still be able to get the h2c from there (conn->ctx) in a pretty
print function. The second argument will always be an h2s, and in order
to propose it for tracking, we add its description. We also define 4
verbosity levels, which seems more than enough.
The detail level initially based on syslog levels is not used, while
something related is missing, trace verbosity, to indicate whether or
not we want to call the decoding callback and what level of decoding
we want (raw captures etc). Let's change the field to "verbosity" for
this. A verbosity of zero means that the decoding callback is not
called, and all other levels are handled by this callback and are
source-specific. The source is now prompted to list the levels that
are proposed to the user. When the source doesn't define anything,
"quiet" and "default" are available.
It's very convenient to know the trace level in the output, at least to
grep/grep -v on it. The main usage is to filter/filter out the developer
traces at level DEVEL. For this we now add the numeric level (0 to 4) just
after the source name in the brackets. The output now looks like this :
[00|h2|4|mux_h2.c:3174] h2_send(): entering : h2c=0x27d75a0 st=2
-, -, | ------------, --------, -------------------> message
| | | | '-> function name
| | | '-> source file location
| | '-> trace level (4=dev)
| '-> trace source
'-> thread number
Working on adding traces to mux-h2 revealed that the function names are
manually copied a lot in developer traces. The reason is that they are
not preprocessor macros and as such cannot be concatenated. Let's
slightly adjust the trace() function call to take a function name just
after the file:line argument. This argument is only added for the
TRACE_DEVEL and 3 new TRACE_ENTER, TRACE_LEAVE, and TRACE_POINT macros
and left NULL for others. This way the function name is only reported
for traces aimed at the developers. The pretty-print callback was also
extended to benefit from this. This will also significantly shrink the
data segment as the "entering" and "leaving" strings will now be merged.
One technical point worth mentioning is that the function name is *not*
passed as an ist to the inline function because it's not considered as
a builtin constant by the compiler, and would lead to strlen() being
run on it from all call places before calling the inline function. Thus
instead we pass the const char * (that the compiler knows where to find)
and it's the __trace() function that converts it to an ist for internal
consumption and for the pretty-print callback. Doing this avoids losing
5-10% peak performance.
The "payload" trace level was ambigous because its initial purpose was
to be able to dump received data. But it doesn't make sense to force to
report data transfers just to be able to report state changes. For
example, all snd_buf()/rcv_buf() operations coming from the application
layer should be tagged at this level. So here we move this payload level
above the state transitions and rename it to avoid the ambiguity making
one think it's only about request/response payload. Now it clearly is
about any data transfer and is thus just below the developer level. The
help messages on the CLI and the doc were slightly reworded to help
remove this ambiguity.
In prompts on the CLI we now commonly need to propose a keyword name
and a description and it doesn't make sense to define a new struct for
each such pairs. Let's simply have a generic "name_desc" for this.
Save the authority TLV in a PROXYv2 header from the client connection,
if present, and make it available as fc_pp_authority.
The fetch can be used, for example, to set the SNI for a backend TLS
connection.
Previously the callback was almost mandatory so it made sense to have it
before the message. Now that it can default to the one declared in the
trace source, most TRACE() calls contain series of empty args and callbacks,
which make them suitable for being at the end and being totally omitted.
This patch thus reverses the TRACE arguments so that the message appears
first, then the mask, then arg1..arg4, then the callback. In practice
we'll mostly see 1 arg, or 2 args and nothing else, and it will not be
needed anymore to pass long series of commas in the middle of the
arguments. However if a source is enforced, the empty commas will still
be needed for all omitted arguments.
Traces were missing the thread number and the source name, which was
really annoying. Now the thread number is emitted on two digits inside
the square brackets, followed by the source name then the line location,
each delimited with a vertical bar, such as below :
[00|h2|mux_h2.c:2651] Notifying stream about SID change : h2c=0x7f3284581ae0 st=3 h2s=0x7f3284297f00 id=523 st=4
[00|h2|mux_h2.c:2708] receiving H2 HEADERS frame : h2c=0x7f3284581ae0 st=3 dsi=525 (st=0)
[02|h2|mux_h2.c:2194] Received H2 request : h2c=0x7f328d3d1ae0 st=2 : [525] H2 REQ: GET / HTTP/2.0
[02|h2|mux_h2.c:2561] Expecting H2 frame header : h2c=0x7f328d3d1ae0 st=2
It becomes apparent that most traces will use a single trace pretty
print callback, so let's allow the trace source to declare a default
one so that it can be omitted from trace calls, and will be used if
no other one is specified.
The principle is that when emitting a message, if some dropped events
were logged, we first attempt to report this counter before going
further. This is done under an exclusive lock while all logs are
produced under a shared lock. This ensures that the dropped line is
accurately reported and doesn't accidently arrive after a later
event.
The new "show events" CLI keyword lists supported event sinks. When
passed a buffer-type sink it completely dumps it.
no drops at all during attachment even at 8 millon evts/s.
still missing the attachment limit though.
This now provides sink_new_buf() which allocates a ring buffer. One such
ring ("buf0") of 1 MB is created already, and may be used by sink_write().
The sink's creation should probably be moved somewhere else later.
The three functions (attach, IO handler, and release) are meant to be
called by any CLI command which requires to dump the contents of a ring
buffer. We do not implement anything generic to dump any ring buffer on
the CLI since it's meant to be used by other functionalities above.
However these functions deal with locking and everything so it's trivial
to embed them in other code.
This function tries to write to the ring buffer, possibly removing enough
old messages to make room for the new one. It takes two arrays of fragments
on input to ease the insertion of prefixes by the caller. It atomically
writes the message, possibly truncating it if desired, and returns the
operation's status.
Our circular buffers are well suited for being used as ring buffers for
not-so-structured data. The machanism here consists in making room in a
buffer before inserting a new record which is prefixed by its size, and
looking up next record based on the previous one's offset and size. We
can have up to 255 consumers watching for data (dump in progress, tail)
which guarantee that entrees are not recycled while they're being dumped.
The complete representation is described in the header file. For now only
ring_new(), ring_resize() and ring_free() are created.
Let's not mess up with fd-specific code, locking nor message formating
here, and use the new generic function instead. This substantially
simplifies the sink_write() code and makes it more agnostic to the
output representation and storage.
Currently both logs and event sinks may use a file descriptor to
atomically emit some output contents. The two may use the same FD though
nothing is done to make sure they use the same lock. Also there is quite
some redundancy between the two. Better make a specific function to send
a fragmented message to a file descriptor which will take care of the
locking via the fd's lock. The function is also able to truncate a
message and to enforce addition of a trailing LF when building the
output message.
The new functions are :
__b_put_varint() : inserts a varint when it's known that it fits
b_put_varint() : tries to insert a varint at the tail
b_get_varint() : tries to get a varint from the head
b_peek_varint() : tries to peek a varint at a specific offset
Wrapping is supported so that they are expected to be safe to use to
manipulate varints with buffers anywhere.
It will sometimes be useful to encode varints to know the output size in
advance. Two versions are provided, one inline using a switch/case construct
which will be trivial for use with constants (and will be very fast albeit
huge) and one function iterating on the number which is 5 times smaller,
for use with variables.
The converter can be useful to look up a server queue from a dynamic value.
It takes an input value of type string, either a server name or
<backend>/<server> format and returns the number of queued sessions
on that server. Can be used in places where we want to look up
queued sessions from a dynamic name, like a cookie value (e.g.
req.cook(SRVID),srv_queue) and then make a decision to break
persistence or direct a request elsewhere.
Signed-off-by: Nenad Merdanovic <nmerdan@haproxy.com>
The url32 sample fetch does not take the path part of the URL into
account. This is because in smp_fetch_url32() we erroneously modify
path.len and path.ptr before testing their value and building the
path based part of the hash.
This fixes issue #235
This must be backported as far as 1.9, when HTX was introduced.