It is not rare to see configurations with a large number of "tcp-request
content" or "http-request" rules for instance. A large number of rules
combined with cpu-demanding actions (e.g.: actions that work on content)
may create thread contention as all the rules from a given ruleset are
evaluated under the same polling loop if the evaluation is not interrupted
Thus, in this patch we add extra logic around "tcp-request content",
"tcp-response content", "http-request" and "http-response" rulesets, so
that when a certain number of rules are evaluated under the single polling
loop, we force the evaluating function to yield. As such, the rule which
was about to be evaluated is saved, and the function starts evaluating
rules from the save pointer when it returns (in the next polling loop).
We use task_wakeup(task, TASK_WOKEN_MSG) to explicitly wake the task so
that no time is wasted and the processing is resumed ASAP. TASK_WOKEN_MSG
is mandatory here because process_stream() expects TASK_WOKEN_MSG for
explicit analyzers re-evaluation.
rules_bcount stream's attribute was added to count how manu rules were
evaluated since last interruption (yield). Also, SF_RULE_FYIELD flag
was added to know that the s->current_rule was assigned due to forced
yield and not regular yield.
By default haproxy will enforce a yield every 50 rules, this behavior
can be configured using the "tune.max-rules-at-once" global keyword.
There is a limitation though: for now, if the ACT_OPT_FINAL flag is set
on act_opts, we consider it is not safe to yield (as it is already the
case for automatic yield). In this case instead of yielding an taking
the risk of not being called back, we skip the yield and hope it will
not create contention. This is something we should ideally try to
improve in order to yield in all conditions.
In this patch, events for the stream location are reported. These events are
first reported on the corresponding stream-connector. So front events on scf
and back event on scb. Then all events are both merged in the stream. But
only 4 events are saved on the stream.
Several internal events are for now grouped with the type
"tevt_type_intercepted". More events will be added to have a better
resolution. But at least the place to report these events are identified.
For now, when a event is reported on a SC, it is also reported on the stream
and vice versa.
To fix thread-safety issues when a stream must be shut, three new task
states were added. These states are generic (UEVT1, UEVT2 and UEVT3), the
task callback function is responsible to know what to do with them. However,
it is not really scalable.
The best is to use an atomic field in the stream structure itself to deal
with these dedicated events. There is already the "pending_events" field
that save wake up reasons (TASK_WOKEN_*) to not loose them if
process_stream() is interrupted before it had a chance to handle them.
So the idea is to introduce a new field to handle streams dedicated events
and merged them with the task's wake up reasons used by the stream. This
means a mapping must be performed between some task wake up reasons and
streams events. Note that not all task wake up reasons will be mapped.
In this patch, the "new_events" field is introduced. It is an atomic
bit-field. Streams events (STRM_EVT_*) are also introduced to map the task
wake up reasons used by process_stream(). Only TASK_WOKEN_TIMER and
TASK_WOKEN_MSG are mapped, in addition to TASK_F_UEVT* flags. In
process_stream(), "pending_events" field is now filled with new stream
events and the mapping of the wake up reasons.
When http-buffer-request option is set on a proxy, the processing will be
paused to wait the full request payload or a full buffer. So it is an entity
that block the processing, just like a rule or a filter that yields. So now,
it is reported as a waiting entity if an error or a timeout occurred.
To do so, an stream entity type is added for this option. There is no
pointer. And "waiting_entity" sample fetch returns the option name.
When a rule or a filter yields because it waits for something to be able to
continue its processing, this entity is saved in the stream. If an error or
a timeout occurred, info on this entity may be retrieved via the
"waiting_entity" sample fetch, for instance to dump it in the logs. This
info may be useful to found root cause of some bugs because it is a way to
know the processing was temporarily stopped. This may explain timeouts for
instance.
The sample fetch is not documented yet.
It is very similar to the last evaluated rule. When a filter returns an
error that interrupts the processing, it is saved in the stream, in the
last_entity field, with the type 2. The pointer on filter config is
saved. This pointer never changes during runtime and is part of the proxy's
structure. It is an element of the filter_configs list in the proxy
structure.
"last_entity" sample fetch was update accordingly. The filter identifier is
returned, if defined. Otherwise the save pointer.
The last evaluated rule is now saved in a generic structure, named
last_entity, with a type to identify it. The idea is to be able to store
other kind of entity that may interrupt a specific processing.
The type of the last evaluated rule is set to 1. It will be replace later by
an enum to be more explicit. In addition, the pointer to the rule itself is
saved instead of its location.
The sample fetch "last_entity" was added to retrieve the information about
it. In this case, it is the rule localtion, the config file containing the
rule followed by the line where the rule is defined, separated by a
colon. This sample fetch is not documented yet.
Process_stream() is a complex function and a few times some lopos were
either witnessed or suspected. Each time this happens it's extremely
difficult to figure why because it involves combinations of analysers,
filters, errors etc.
Let's at least maintain a set of 4 counters per stream that report the
number of times we've been through each of the 4 most important blocks
(stconn changes, request analysers, response analysers, and propagation
of changes down). These ones are stored in the stream and reported in
"show sess all", just like they will be reported in panic dumps.
Thanks to the previous patch, it is now possible to add an action to
dynamically change the maxumum number of connection retires for a stream.
"set-retries" action may now be used to do so, from a "tcp-request content"
or a "http-request" rule. This action accepts an expression or an integer
between 0 and 100. The integer value is checked during the configuration
parsing and leads to an error if it is not in the expected range. However,
for the expression, the value is retrieve at runtime. So, invalid value are
just ignored.
Too high value is forbidden to avoid any trouble. 100 retries seems already
be an amazingly hight value. In addition, the option is only available on
backend or listen sections.
Because the max retries is limited to 100 at most, it can be stored as a
unsigned short. This save some space in the stream structure.
Instead of directly relying on the backend parameter to limit the number of
connection retries, we now use a per-stream value. This value is by default
inherited from the backend value when it is set. So for now, there is no
change except the stream value is used instead of the backend value. But
thanks to this change, it will be possible to dynamically change this value.
Rename 'enum log_orig' to 'enum log_orig_id', since this enum specifically
contains the log origin ids.
Add 'struct log_orig' which wraps 'enum log_orig' with optional flags
(no flags defined for now).
Add log_orig() helper func that takes id and flags as parameter and
returns log_orig struct initialized with input arguments.
Update functions taking log origin as parameter so they explicitly take
log orig id or log orig wrapper as argument depending on the level of
context expected by the function.
A pointer to a parent stream was added in the stream structure. For now,
this pointer is never set, but the idea is to have an access to a stream
environment from another one from the moment there is a parent/child
relationship betwee these streams.
Concretely, for now, there is nothing to formalize this relationship.
This is another prerequisite work in preparation for log-profiles: in this
patch we make process_send_log() aware of the log origin, primarily aiming
for sess and txn logging steps such as error, accept, connect, close, as
well as relevant sess and stream pointers.
When the buffer allocation callback is notified of a buffer availability,
it will now set a MAYALLOC flag on the stream so that the stream knows it
is allowed to bypass the queue checks. For now this is not used.
While trying to reproduce another crash case involving lua filters
reported by @bgrooot on GH #2467, we found out that mixing filters loaded
from different contexts ('lua-load' vs 'lua-load-per-thread') for the same
stream isn't supported and may even cause the process to crash.
Historically, mixing lua-load and lua-load-per-threads for a stream wasn't
supported, but this changed thanks to 0913386 ("BUG/MEDIUM: hlua: streams
don't support mixing lua-load with lua-load-per-thread").
However, the above fix didn't consider lua filters's use-case properly:
unlike lua fetches, actions or even services, lua filters don't simply
use the stream hlua context as a "temporary" hlua running context to
process some hlua code. For fetches, actions.. hlua executions are
processed sequentially, so we simply reuse the hlua context from the
previous action/fetch to run the next one (this allows to bypass memory
allocations and initialization, thus it increases performance), unless
we need to run on a different hlua state-id, in which case we perform a
reset of the hlua context.
But this cannot work with filters: indeed, once registered, a filter will
last for the whole stream duration. It means that the filter will rely
on the stream hlua context from ->attach() to ->detach(). And here is the
catch, if for the same stream we register 2 lua filters from different
contexts ('lua-load' + 'lua-load-per-thread'), then we have an issue,
because the hlua stream will be re-created each time we switch between
runtime contexts, which means each time we switch between the filters (may
happen for each stream processing step), and since lua filters rely on the
stream hlua to carry context between filtering steps, this context will be
lost upon a switch. Given that lua filters code was not designed with that
in mind, it would confuse the code and cause unexpected behaviors ranging
from lua errors to crashing process.
So here we take another approach: instead of re-creating the stream hlua
context each time we switch between "global" and "per-thread" runtime
context, let's have both of them inside the stream directly as initially
suggested by Christopher back then when talked about the original issue.
For this we leverage hlua_stream_ctx_prepare() and hlua_stream_ctx_get()
helper functions which return the proper hlua context for a given stream
and state_id combination.
As for debugging infos reported after ha_panic(), we check for both hlua
runtime contexts to check if one of them was active when the panic occured
(only 1 runtime ctx per stream may be active at a given time).
This should be backported to all stable versions with 0913386
("BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread")
This commit depends on:
- "DEBUG: lua: precisely identify if stream is stuck inside lua or not"
[for versions < 2.9 the ha_thread_dump_one() part should be skipped]
- "MINOR: hlua: use accessors for stream hlua ctx"
For 2.4, the filters API didn't exist. However it may be a good idea to
backport it anyway because ->set_priv()/->get_priv() from tcp/http lua
applets may also be affected by this bug, plus it will ease code
maintenance. Of course, filters-related parts should be skipped in this
case.
set-bc-{mark,tos} actions are pretty similar to set-fc-{mark,tos} to set
mark/tos on packets sent from haproxy to server: set-bc-{mark,tos} actions
act on the whole backend/srv connection: from connect() to connection
teardown, thus they may only be used before the connection to the server
is instantiated, meaning that they are only relevant for request-oriented
rules such as tcp-request or http-request rules. For now their use is
limited to content request rules, because tos and mark informations are
stored directly within the stream, thus it is required that the stream
already exists.
stream flags are used in combination with dedicated stream struct members
variables to pass 'tos' and 'mark' informations so that they are correctly
considered during stream connection assignment logic (prior to connecting
to actually connecting to the server)
'tos' and 'mark' fd sockopts are taken into account in conn hash
parameters for connection reuse mechanism.
The documentation was updated accordingly.
Implements the customized payload pattern for the master CLI.
The pattern is stored in the stream in char pcli_payload_pat[8].
The principle is basically the same as the CLI one, it looks for '<<'
then stores what's between '<<' and '\n', and look for it to exit the
payload mode.
This provides more consistency between the master and the worker. When
"prompt timed" is passed on the master, the timed mode is toggled. When
enabled, for a master it will show the master process' uptime, and for
a worker it will show this worker's uptime. Example:
master> prompt timed
[0:00:00:50] master> show proc
#<PID> <type> <reloads> <uptime> <version>
11940 master 1 [failed: 0] 0d00h02m10s 2.8-dev11-474c14-21
# workers
11955 worker 0 0d00h00m59s 2.8-dev11-474c14-21
# old workers
11942 worker 1 0d00h02m10s 2.8-dev11-474c14-21
# programs
[0:00:00:58] master> @!11955
[0:00:01:03] 11955> @!11942
[0:00:02:17] 11942> @
[0:00:01:10] master>
Let's get rid of timeval in storage of internal timestamps so that they
are no longer mistaken for wall clock time. These were exclusively used
subtracted from each other or to/from "now" after being converted to ns,
so this patch removes the tv_to_ns() conversion to use them natively. Two
occurrences of tv_isge() were turned to a regular wrapping subtract.
The number of stick-counter entries usable by track-sc rules is currently
set at build time. There is no good value for this since the vast majority
of users don't need any, most need only a few and rare users need more.
Adding more counters for everyone increases memory and CPU usages for no
reason.
This patch moves the per-session and per-stream arrays to a pool of a size
defined at boot time. This way it becomes possible to set the number of
entries at boot time via a new global setting "tune.stick-counters" that
sets the limit for the whole process. When not set, the MAX_SESS_STR_CTR
value still applies, or 3 if not set, as before.
It is also possible to lower the value to 0 to save a bit of memory if
not used at all.
Note that a few low-level sample-fetch functions had to be protected due
to the ability to use sample-fetches in the global section to set some
variables.
Upon a reload with the master CLI, the FD of the master CLI session is
received by the internal socketpair listener.
This session is used to display the status of the reload and then will
close.
The new function is strm_et_show_flags(). Only the error type is
handled at the moment, as a bit more complex logic is needed to
mix the values and enums present in some fields.
It was a mistake to put these two fields in the struct task. This
was added in 1.9 via commit 9efd7456e ("MEDIUM: tasks: collect per-task
CPU time and latency"). These fields are used solely by streams in
order to report the measurements via the lat_ns* and cpu_ns* sample
fetch functions when task profiling is enabled. For the rest of the
tasks, this is pure CPU waste when profiling is enabled, and memory
waste 100% of the time, as the point where these latencies and usages
are measured is in the profiling array.
Let's move the fields to the stream instead, and have process_stream()
retrieve the relevant info from the thread's context.
The struct task is now back to 120 bytes, i.e. almost two cache lines,
with 32 bit still available.
Some recent traces started to show confusing stream pointers ending with
0xe. The reason was that the stream's obj_type was almost unused in the
past and was stuffed in a hole in the structure. But now it's present in
all "show sess all" outputs and having to mentally match this value against
another one that's 0x17e lower is painful. The solution here is to move the
obj_type at the top, like in almost every other structure, but without
breaking the efficient layout.
This patch moves a few fields around and manages to both plug some holes
(16 bytes saved, 976 to 960) and avoid channels needlessly crossing cache
boundaries (res was spread over 3 lines vs 2 now).
Nothing else was changed. It would be desirable to backport this to 2.6
since it's where dumps are currently being processed the most.
Function arguments and local variables called "cs" were renamed to "sc"
to avoid future confusion. The HTTP analyser and the backend functions
were all updated after being reviewed. Function stream_update_both_cs()
was renamed to stream_update_both_sc()
There's no more reason for keepin the code and definitions in conn_stream,
let's move all that to stconn. The alphabetical ordering of include files
was adjusted.
This renames the "struct conn_stream" to "struct stconn" and updates
the descriptions in all comments (and the rare help descriptions) to
"stream connector" or "connector". This touches a lot of files but
the change is minimal. The local variables were not even renamed, so
there's still a lot of "cs" everywhere.
This flag is no longer needed now that it must always match the presence
of a destination address on the backend conn_stream. Worse, before previous
patch, if it were to be accidently removed while the address is present, it
could result in a leak of that address since alloc_dst_address() would first
be called to flush it.
Its usage has a long history where addresses were stored in an area shared
with the connection, but as this is no longer the case, there's no reason
for putting this burden onto application-level code that should not focus
on setting obscure flags.
The only place where that made a small difference is in the dequeuing code
in case of queue redistribution, because previously the code would first
clear the flag, and only later when trying to deal with the queue, would
release the address. It's not even certain whether there would exist a
code path going to connect_server() without calling pendconn_dequeue()
first (e.g. retries on queue timeout maybe?).
Now the pendconn_dequeue() code will rely on SF_ASSIGNED to decide to
clear and release the address, since that flag is always set while in
a server's queue, and its clearance implies that we don't want to keep
the address. At least it remains consistent and there's no more risk of
leaking it.
These flags only concerns the connection part. In addition, it is required
for a next commit, to avoid circular deps. Thus CS_SHR_* and CS_SHW_* were
renamed with the "CO_" prefix.
The stream-interface state (SI_ST_*) is now in the conn-stream. It is a
mechanical replacement for now. Nothing special. SI_ST_* and SI_SB_* were
renamed accordingly. Utils functions to manipulate these infos were moved
under the conn-stream scope.
But it could be good to keep in mind that this part should be
reworked. Indeed, at the CS level, we only need to know if it is ready to
receive or to send. The state of conn-stream from INI to EST is only used on
the server side. The client CS is immediately set to EST. Thus current
SI_ST_* states should probably be moved to the stream to reflect the server
connection state during the establishment stage.
Only the server side is concerned by the stream-interface error type. It is
useless to have an err_type field on the client side. So, it is now move to
the stream. SI_ET_* are renames STRM_ET_* and moved in stream-t.h header
file.
The previous connection state on the client side was only used for debugging
purpose to report client close. But this may be handled when the client
stream-interface is switched from SI_ST_DIS to SI_ST_CLO.
So, there only remains the previous connection state on the server side that
is used by the stream, in process_stream(), to be able to set the correct
termination flags. Thus, instead of keeping this info in the
stream-interface for only one side, the info is now stored in the stream
itself.
Flag to get the source ip/port with getsockname is now handled at the stream
level. Thus SI_FL_SRC_ADDR stream-int flag is replaced by SF_SRC_ADDR stream
flag.
The expiration date in the stream-interface was only used on the server side
to set the connect, queue or turn-around timeout. It was checked on the
frontend stream-interface, but never used concretely. So it was removed and
replaced by a connect expiration date in the stream itself. Thus, SI_FL_EXP
flag in stream-interfaces is replaced by a stream flag, SF_CONN_EXP.
The conn_retries counter was set to the max value and decremented at each
connection retry. Thus the counter reflected the number of retries left and
not the real number of retries. All calculations of redispatch or reporting
of number of retries experienced were made using subtracts from the
configured retries, which was complicated and didn't bring any benefit.
Now, this counter is set to 0 and incremented at each retry. We know we've
reached the maximum allowed connection retries by comparing it to the
configured value. In all other cases, we directly use the counter.
This patch should address the feature request #1608.
The conn_retries counter may be moved into the stream structure. It only
concerns the connection establishment. The frontend stream-interface does not
use it. So it is a logical change.
When a tcp-{request,response} content or http-request/http-response
rule delivers a final verdict (deny, accept, redirect etc), the last
evaluated one will now be recorded in the stream. The purpose is to
permit to log the last one that performed a final action. For now
the log is not produced.
Thanks to all previous changes, it is now possible to move the
stream-interface into the conn-stream. To do so, some SI functions are
removed and their conn-stream counterparts are added. In addition, the
conn-stream is now responsible to create and release the
stream-interface. While the stream-interfaces were inlined in the stream
structure, there is now a pointer in the conn-stream. stream-interfaces are
now dynamically allocated. Thus a dedicated pool is added. It is a temporary
change because, at the end, the stream-interface structure will most
probably disappear.
frontend and backend conn-streams are now directly accesible from the
stream. This way, and with some other changes, it will be possible to remove
the stream-interfaces from the stream structure.
Allow to set the master CLI in expert or experimental mode. No command
within the master are unlocked yet, but it gives the ability to send
expert or experimental commands to the workers.
echo "@1; experimental-mode on; del server be1/s2" | socat /var/run/haproxy.master -
echo "experimental-mode on; @1 del server be1/s2" | socat /var/run/haproxy.master -
We have an anti-looping protection in process_stream() that detects bugs
that used to affect a few filters like compression in the past which
sometimes forgot to handle a read0 or a particular error, leaving a
thread looping at 100% CPU forever. When such a condition is detected,
an alert it emitted and the process is killed so that it can be replaced
by a sane one:
[ALERT] (19061) : A bogus STREAM [0x274abe0] is spinning at 2057156
calls per second and refuses to die, aborting now! Please
report this error to developers [strm=0x274abe0,3 src=unix
fe=MASTER be=MASTER dst=<MCLI> txn=(nil),0 txn.req=-,0
txn.rsp=-,0 rqf=c02000 rqa=10000 rpf=88000021 rpa=8000000
sif=EST,40008 sib=DIS,84018 af=(nil),0 csf=0x274ab90,8600
ab=0x272fd40,1 csb=(nil),0
cof=0x25d5d80,1300:PASS(0x274aaf0)/RAW((nil))/unix_stream(9)
cob=(nil),0:NONE((nil))/NONE((nil))/NONE(0) filters={}]
call trace(11):
| 0x4dbaab [c7 04 25 01 00 00 00 00]: stream_dump_and_crash+0x17b/0x1b4
| 0x4df31f [e9 bd c8 ff ff 49 83 7c]: process_stream+0x382f/0x53a3
(...)
One problem with this detection is that it used to only count the call
rate because we weren't sure how to make it more accurate, but the
threshold was high enough to prevent accidental false positives.
There is actually one case that manages to trigger it, which is when
sending huge amounts of requests pipelined on the master CLI. Some
short requests such as "show version" are sufficient to be handled
extremely fast and to cause a wake up of an analyser to parse the
next request, then an applet to handle it, back and forth. But this
condition is not an error, since some data are being forwarded by
the stream, and it's easy to detect it.
This patch modifies the detection so that update_freq_ctr() only
applies to calls made without CF_READ_PARTIAL nor CF_WRITE_PARTIAL
set on any of the channels, which really indicates that nothing is
happening at all.
This is greatly sufficient and extremely effective, as the call above
is still caught (shutr being ignored by an analyser) while a loop on
the master CLI now has no effect. The "call_rate" field in the detailed
"show sess" output will now be much lower, except for bogus streams,
which may help spot them. This field is only there for developers
anyway so it's pretty fine to slightly adjust its meaning.
This patch could be backported to stable versions in case of reports
of such an issue, but as that's unlikely, it's not really needed.
Define a new stream flag SF_WEBSOCKET and a new cs flag CS_FL_WEBSOCKET.
The conn-stream flag is first set by h1/h2 muxes if the request is a
valid websocket upgrade. The flag is then converted to SF_WEBSOCKET on
the stream creation.
This will be useful to properly manage websocket streams in
connect_server().
For now, these addresses are never set. But the idea is to be able to set, at
least first, the client source and destination addresses at the stream level
without updating the session or connection ones.
Of course, because these addresses are carried by the strream-interface, it
would be possible to set server source and destination addresses at this level
too.
Functions to fill these addresses have been added: si_get_src() and
si_get_dst(). If not already set, these functions relies on underlying
layers to fill stream-interface addresses. On the frontend side, the session
addresses are used if set, otherwise the client connection ones are used. On
the backend side, the server connection addresses are used.
And just like for sessions and conncetions, si_src() and si_dst() may be used to
get source and destination addresses or the stream-interface. And, if not set,
same mechanism as above is used.