Based on Willy's idea (from 3.0-dev6 announcement message): in this patch
we try to reduce the max latency that can be caused by running lua scripts
with default settings.
Indeed, by default, hlua engine is allowed to process up to 10k
instructions per batch. While this value was found to be the optimal one
for a single thread, it turns out that keeping a thread busy for 10k lua
instructions could increase thread contention. This is especially true
when the script is loaded with 'lua-load', because in that case the
current thread owns the main lua lock and prevent other threads from
making any progress if they're also waiting on the main lock.
Thanks to Thierry Fournier's work, we know that performance-wise we can
reach optimal performance by sticking between 500 and 10k instructions
per batch. Given that, when the script is loaded using 'lua-load', if no
"tune.lua.forced-yield" was set by the user, we automatically divide the
default value (10K) by the number of threads haproxy can use to reduce
thread contention (given that all threads could compete for the main lua
lock), however we make sure not to return a value below 500, because
Thierry's work showed that this would come with a significant performance
loss.
The historical behavior may still be enforced by setting
"tune.lua.forced-yield" to 10000 in the global config section.
While trying to reproduce another crash case involving lua filters
reported by @bgrooot on GH #2467, we found out that mixing filters loaded
from different contexts ('lua-load' vs 'lua-load-per-thread') for the same
stream isn't supported and may even cause the process to crash.
Historically, mixing lua-load and lua-load-per-threads for a stream wasn't
supported, but this changed thanks to 0913386 ("BUG/MEDIUM: hlua: streams
don't support mixing lua-load with lua-load-per-thread").
However, the above fix didn't consider lua filters's use-case properly:
unlike lua fetches, actions or even services, lua filters don't simply
use the stream hlua context as a "temporary" hlua running context to
process some hlua code. For fetches, actions.. hlua executions are
processed sequentially, so we simply reuse the hlua context from the
previous action/fetch to run the next one (this allows to bypass memory
allocations and initialization, thus it increases performance), unless
we need to run on a different hlua state-id, in which case we perform a
reset of the hlua context.
But this cannot work with filters: indeed, once registered, a filter will
last for the whole stream duration. It means that the filter will rely
on the stream hlua context from ->attach() to ->detach(). And here is the
catch, if for the same stream we register 2 lua filters from different
contexts ('lua-load' + 'lua-load-per-thread'), then we have an issue,
because the hlua stream will be re-created each time we switch between
runtime contexts, which means each time we switch between the filters (may
happen for each stream processing step), and since lua filters rely on the
stream hlua to carry context between filtering steps, this context will be
lost upon a switch. Given that lua filters code was not designed with that
in mind, it would confuse the code and cause unexpected behaviors ranging
from lua errors to crashing process.
So here we take another approach: instead of re-creating the stream hlua
context each time we switch between "global" and "per-thread" runtime
context, let's have both of them inside the stream directly as initially
suggested by Christopher back then when talked about the original issue.
For this we leverage hlua_stream_ctx_prepare() and hlua_stream_ctx_get()
helper functions which return the proper hlua context for a given stream
and state_id combination.
As for debugging infos reported after ha_panic(), we check for both hlua
runtime contexts to check if one of them was active when the panic occured
(only 1 runtime ctx per stream may be active at a given time).
This should be backported to all stable versions with 0913386
("BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread")
This commit depends on:
- "DEBUG: lua: precisely identify if stream is stuck inside lua or not"
[for versions < 2.9 the ha_thread_dump_one() part should be skipped]
- "MINOR: hlua: use accessors for stream hlua ctx"
For 2.4, the filters API didn't exist. However it may be a good idea to
backport it anyway because ->set_priv()/->get_priv() from tcp/http lua
applets may also be affected by this bug, plus it will ease code
maintenance. Of course, filters-related parts should be skipped in this
case.
Change hlua_stream_ctx_prepare() prototype so that it now returns the
proper hlua ctx on success instead of returning a boolean.
Add hlua_stream_ctx_get() to retrieve hlua ctx out of a given stream.
This way we may easily change the storage mechanism for hlua stream in
the future without extensive code changes.
No backport needed unless a commit depends on it.
When ha_panic() is called by the watchdog, we try to guess from
ha_task_dump() and ha_thread_dump_one() if the thread was stuck while
executing lua from the stream context. However we consider this is the
case by simply checking if the stream hlua context was set, but this is
not very precise because if the hlua context is set, then it simply means
that at least one lua instruction was executed at the stream level, not
that the stuck was currently executing lua when the panic occured.
This is especially true with filters, one could simply register a lua
filter that does nothing but this will still end up initializing the
stream hlua context for each stream. If the thread end up being stuck
during the stream handling, then debug dumping functions will report
that the stream was stuck while handling lua, which is not necessarilly
true, and could in fact confuse us even more.
So here we take another approach, we add the BUSY flag to hlua context:
this flag is set by hlua_ctx_resume() around lua_resume() call, this way
we can precisely tell if the thread was handling lua when it was
interrupted, and we rely on this flag in debug functions to check if the
thread was effectively stuck inside lua or not while processing the stream
No backport needed unless a commit depends on it.
hlua_filter_delete() calls hlua_unref() on the stream hlua stack, but
we should own the lock prior to manipulating the stack.
This should be backported up to 2.6.
This is a complementary patch to 8670db7 ("BUG/MAJOR: hlua: improper lock
usage with hlua_ctx_resume()") for hlua_filter_new().
Indeed, the HLUA_E_ERRMSG case still relies on the lua stack but didn't
take the lock to do so.
This should be backported up to 2.6.
Trying to register the same lua filter from global and per-thread context
(using 'lua-load' + 'lua-load-per-thread') causes a segmentation fault in
hlua_post_init().
This is due to a simple copy paste error as we try to print the function
name in the error message (like we do when loading the same lua function
from different contexts) instead of the filter name.
This should be backported up to 2.6.
Instead of reporting lua errors using ha_alert(), let's use SEND_ERR()
helper which will also try to generate a log message according to lua
log settings.
hlua_event_subscribe() is meant to be called from a protected lua env
during init and/or runtime. As such, only hlua_event_sub() makes uses
of it: when an error happens hlua_event_sub() will already raise a Lua
exception. Thus it's not relevant to use ha_alert() there as it could
generate log pollution (error is relevant from Lua script point of view,
not from haproxy one).
This could be backported in 2.8.
hlua_ctx_resume() itself can safely be used as-is in a multithreading
context because it takes care of taking the lua lock.
However, when hlua_ctx_resume() returns, the lock is released and it is
thus the caller's responsibility to ensure it owns the lock prior to
performing additional manipulations on the Lua stack. Unfortunately, since
early haproxy lua implementation, we used to do it wrong:
The most common hlua_ctx_resume() pattern we can find in the code (because
it was duplicated over and over over time) is the following:
|ret = hlua_ctx_resume()
|switch (ret) {
| case HLUA_E_OK:
| break;
| case HLUA_E_ERRMSG:
| break;
| [...]
|}
Problem is: for some of the switch cases, we still perform lua stack
manipulations. This is the case for the HLUA_E_ERRMSG for instance where
we often use lua_tostring() to retrieve last lua error message on the top
of the stack, or sometimes for the HLUA_E_OK case, when we need to perform
some lua cleanup logic once the resume ended. But all of this is done
WITHOUT the lua lock, so this means that the main lua stack could be
accessed simultaneously by concurrent threads when a script was loaded
using 'lua-load'.
While it is not critical for switch-cases dedicated to error handling,
(those are not supposed to happen very often), it can be very problematic
for stack manipulations occuring in the HLUA_E_OK case under heavy load
for instance. In this case, main lua stack corruptions will eventually
happen. This is especially true inside hlua_filter_new(), where this bug
was known to cause lua stack corruptions under load, leading to lua errors
and even crashing the process as reported by @bgrooot in GH #2467.
The fix is relatively simple, once hlua_ctx_resume() returns: we should
consider that ANY lua stack access should be lua-lock protected. If the
related lua calls may raise lua errors, then (RE)SET_SAFE_LJMP
combination should be used as usual (it allows to lock the lua stack and
catch lua exceptions at the same time), else hlua_{lock,unlock} may be
used if no exceptions are expected.
This patch should fix GH #2467.
It should be backported to all stable versions.
[ada: some ctx adj will be required for older versions as event_hdl
doesn't exist prior to 2.8 and filters were implemented in 2.5, thus
some chunks won't apply]
When we want to perform some unsafe lua stack manipulations from an
unprotected lua environment, we use SET_SAFE_LJMP() RESET_SAFE_LJMP()
combination to lock lua stack and catch potential lua exceptions that
may occur between the two.
Hence, we regularly find this pattern (duplicated over and over):
|if (!SET_SAFE_LJMP(hlua)) {
| const char *error;
|
| if (lua_type(hlua->T, -1) == LUA_TSTRING)
| error = hlua_tostring_safe(hlua->T, -1);
| else
| error = "critical error";
| SEND_ERR(NULL, "*: %s.\n", error);
|}
This is wrong because when SET_SAFE_LJMP() returns false (meaning that an
exception was caught), then the lua lock was released already, thus the
caller is not expected to perform lua stack manipulations (because the
main lua stack may be shared between multiple threads). In the pattern
above we only want to retrieve the lua exception message which may be
found at the top of the stack, to do so we now explicitly take the lua
lock before accessing the lua stack. Note that hlua_lock() doesn't catch
lua exceptions so only safe lua functions are expected to be used there
(lua functions that may NOT raise exceptions).
It should be backported to every stable versions.
[ada: some ctx adj will be required for older versions as event_hdl
doesn't exist prior to 2.8 and filters were implemented in 2.5, thus
some chunks won't apply, but other fixes should stay relevant]
In hlua_filter_new(), after each hlua resume, we systematically try to
empty the stack by calling lua_settop(). However we're doing this without
locking the lua context, so it is unsafe in multithreading context if the
script is loaded using 'lua-load'. To fix the issue, we protect the call
with hlua_{lock,unlock}() helpers.
This should be backported up to 2.6.
In hlua_filter_callback(), some lua stack work is performed under
SET_SAFE_LJMP() guard which also takes care of locking the hlua context
when needed. However, a lua_gettop() call is performed out of the guard,
thus it is unsafe in multithreading context if the script is loaded using
'lua-load' because in this case the main lua stack is shared between
threads and each access to a lua stack must be performed under the lock,
thus we move lua_gettop() call under the lock.
It should be backported up to 2.6.
hlua_filter_new() handles memory allocation errors by jumping to the
"end:" cleanup label in case of errors. Such errors may happen when the
system is heavily loaded for instance.
In hlua_filter_new(), we try to allocate two hlua contexts in a row before
checking if one of them failed (in which case we jump to the cleanup part
of the function), and only then we initialize them both.
If a memory allocation failure happens for only one out of the two
flt_ctx->hlua[] contexts pair, we still jump to the cleanup part.
It means that the hlua context that was successfully allocated and wasn't
initialized yet will be passed to hlua_ctx_destroy(), resulting in invalid
reads in the cleanup function, which may ultimately cause the process to
crash.
To fix the issue: we make sure flt_ctx hlua contexts are initialized right
after they are allocated, that is before any error handling condition that
may force the cleanup.
This bug was discovered when trying to reproduce GH #2467 with haproxy
started with "-dMfail" argument.
It should be backported up to 2.6.
As per lua documentation, lua_tostring() may raise a memory error.
However, we're often using it to fetch the error message at the top of
the stack (ie: after a failing lua call) from unprotected environments.
In practise, lua_tostring() has rare chances of failing, but still, if
it happens to be the case, it could crash the process and we better not
risk it.
So here, we add hlua_tostring_safe() function, which works exactly as
lua_tostring(), but the function cannot LJMP as it will catch
lua_tostring() exceptions to return NULL instead.
Everywhere lua_tostring() was used to retrieve error string from such
unprotected contexts, we now rely on hlua_tostring_safe().
This should be backported to all stable versions.
[ada: ctx adj will be required, for versions prior to 2.8 event_hdl
API didn't exist so some chunks won't apply, and prior to 2.5 filters
API didn't exist either, so again, some chunks should be ignored]
Lua documentation says that lua_tostring() returns a pointer that remains
valid as long as the object is not removed from the stack.
However there are some places were we use the returned string AFTER the
corresponding object is removed from the stack. In practise this doesn't
seem to cause visible bugs (probably because the pointer remains valid
waiting for a GC cycle), but let's fix that to comply with the
documentation and avoid undefined behavior.
It should be backported in all stable versions.
Add core.silent (-1) value to be able to disable logging via
TXN:set_loglevel() call. Otherwise, there is no way to do so and it may be
handy. This special value cannot be used with TXN:log() function.
This patch may be backported if necessary.
When the log level is changed in lua, by calling TXN:set_loglevel function,
it must be incremented by one because it is decremented in strm_log()
function.
This patch must be backport to all stable versions.
If some data are received for a lua socket while the lua script responsible
to consume these data is not ready to do so, for instance because it is
sleeping, the applet is woken up in loop because it never states it will not
consume these data yet.
To fix the issue, in the applet I/O handle, when there are outgoing data, we
always pretend the applet will not consume it. It is the responsibility to
the lua script to reactivate receives by calling Socket.receive() function.
This patch must be backported to every stable version. For 2.4 and older,
si_want_get()/si_cant_get() must be used instead of
applet_will_consume()/applet_wont_consume().
It is poosible to create a lua socket without performing any connect. In
this case, the lua socket is released because of the garbage collector.
However, the garbarge collector does not release the applet, it wakes it
up. Since commit 751b59c40b ("BUG/MEDIUM: hlua: Initialize appctx used by a
lua socket on connect only"), the applet initialization is performed on
connect. So, here, it is possible to wake an uninitialized applet. It is an
unexpected case for the applet's I/O handler, leading to a segfault because
some resources are not initialized (the stream's target in this case).
So, now, in the lua socket GC function, we take care to immediately release
uninitialized applets. At worst, the release itself is delayed. But it is
safe because we are sure the applet's I/O handler will never be executed.
In addition, we take case to increment the GC counter when the lua socket is
created. The way, uninitialized lua socket are released more quickly.
This patch should fix the issue #2451. It must be backported as far as 2.6.
These both flags are set after releasing the applet, in
appctx_shut(). Concretly, it means the applet is shutdown for reads and
writes. Once set, the applet's I/O handler was no longer called. Tests on
these flags are useless. There is no chance to match them.
This is a complementary patch to "MINOR: tcp-act: Rename "set-{mark,tos}"
to "set-fc-{mark,tos}"", but for the Lua API.
set_mark and set_tos were kept as aliases for set_fc_mark and set_fc_tos
but they were marked as deprecated.
Using this opportunity to reorder set_mark and set_tos by alphabetical
order.
This is cleanup patch to address cosmetic issues introduced in f034139bc0
("MINOR: lua: Allow reading "proc." scoped vars from LUA core.")
Also taking this opportunity to prefix the function with __LJMP to
indicate that it may longjump.
No backport needed.
As raised by Coverity in GH #2223, f034139bc0 ("MINOR: lua: Allow reading
"proc." scoped vars from LUA core.") causes uninitialized reads due to
smp being passed to vars_get_by_name() without being initialized first.
Indeed, vars_get_by_name() tries to read smp->sess and smp->strm pointers.
As we're only interested in the PROC var scope, it is safe to call
vars_get_by_name() with sess and strm pointers set to NULL, thus we
simply memset smp prior to calling vars_get_by_name() to fix the issue.
This should be backported in 2.9 with f034139bc0.
4e5e2664 ("MINOR: proxy: add findserver_unique_id() and findserver_unique_name()")
added findserver_unique_id() and findserver_unique_name() functions that
were inspired from the historical findserver() function, so unfortunately
they don't perform well when used on large backend farms because they scan
the whole server list linearly.
I was about to provide a patch to optimize such functions when I stumbled
on Baptiste's work:
19a106d24 ("MINOR: server: server_find functions: id, name, best_match")
It turns out Baptiste already implemented helper functions to supersed
the unoptimized findserver() function (at least at runtime when servers
have been assigned their final IDs and inserted in the lookup trees): they
offer more matching options and rely on eb lookups so they are much more
suitable for fast queries. I don't know how I missed that, but they are a
perfect base for the server rid matching functions.
So in this patch, we essentially revert 4e5e2664 to provide the optimized
equivalent functions named server_find_by_id_unique() and
server_find_by_name_unique(), then we force existing findserver_unique_*()
callers to switch to the new functions.
This patch depends on:
- "OPTIM: server: eb lookup for server_find_by_name()"
This could be backported up to 2.8.
This bugfix is the same as the following one:
"BUG/MINOR: ssl_ckch: Wrong OCSP CID after modifying an SSL certficate"
where the OCSP CID had to be reset when updating a certificate.
Must be backported to 2.8.
The proxy's initialization is rather odd. First, init_new_proxy() is
called to zero all the lists and certain values, except those that can
come from defaults, which are initialized by proxy_preset_defaults().
The default server settings are also only set there.
This results in these settings not to be set for a number of internal
proxies that do not explicitly call proxy_preset_defaults() after
allocation, such as sink and log forwarders.
This was revealed by last commit 79aa63823 ("MINOR: server: always
initialize pp_tlvs for default servers") which crashes in log parsers
when applied to certain proxies which did not initialize their default
servers.
In theory this should be backported, however it would be desirable to
wait a bit before backporting it, in case certain parts would rely on
these elements not being initialized.
str2sa_range() already allows the caller to provide <proto> in order to
get a pointer on the protocol matching with the string input thanks to
5fc9328a ("MINOR: tools: make str2sa_range() directly return the protocol")
However, as stated into the commit message, there is a trick:
"we can fail to return a protocol in case the caller
accepts an fqdn for use later. This is what servers do and in this
case it is valid to return no protocol"
In this case, we're unable to return protocol because the protocol lookup
depends on both the [proto type + xprt type] and the [family type] to be
known.
While family type might not be directly resolved when fqdn is involved
(because family type might be discovered using DNS queries), proto type
and xprt type are already known. As such, the caller might be interested
in knowing those address related hints even if the address family type is
not yet resolved and thus the matching protocol cannot be looked up.
Thus in this patch we add the optional net_addr_type (custom type)
argument to str2sa_range to enable the caller to check the protocol type
and transport type when the function succeeds.
After making it configurable in previous commit "MINOR: lua: Add flags
to configure logging behaviour", this patch changes the default value
of tune.lua.log.stderr from 'on' (unconditionally forward LUA logs to
stderr) to 'auto' (only forward LUA logs to stderr if logging via a
standard logger is disabled, or none is configured for the current context)
Since this is a change in behaviour, it shouldn't be backported
Until now, messages printed from LUA log functions were sent both to
the any logger configured for the current proxy, and additionally to
stderr (in most cases)
This introduces two flags to configure LUA log handling:
- tune.lua.log.loggers to use standard loggers or not
- tune.lua.log.stderr to use stderr, or not, or only conditionally
This addresses github feature request #2316
This can be backported to 2.8 as it doesn't change previous behaviour.
Because channel_is_empty() function does now only check the channel's
buffer, we can remove it and rely on co_data() instead. Of course, all tests
must be inverted.
channel_is_empty() is thus removed.
Since last fixes about the lua cosocket, the appctx is no longer initialized
in hlua_socket_new(). The code to deal with error at this stage can be
removed.
This patch should fix the issue #2308.
Ths appctx used by a lua socket was synchronously initialized after the
appctx creation. The connect itself is performed later. However it is an
issue because the script may be interrupted beteween the two operation. In
this case, the stream attached to the appctx is woken up before any
destination is set. The stream will try to connect but without destination,
it fails. When the lua script is rescheduled and the connect is performed,
the connection has already failed and an error is returned.
To fix the issue, we must be sure to not woken up the stream before the
connect. To do so, we must defer the appctx initilization. It is now perform
on connect.
This patch relies on the following commits:
* MINOR: hlua: Test the hlua struct first when the lua socket is connecting
* MINOR: hlua: Save the lua socket's server in its context
* MINOR: hlua: Save the lua socket's timeout in its context
* MINOR: hlua: Don't preform operations on a not connected socket
* MINOR: hlua: Set context's appctx when the lua socket is created
All the series must be backported as far as 2.6.
For the same reason than the timeout, the server used by a lua socket is now
saved in its context. This will be mandatory to fix issues with the lua
sockets.
When the lua socket timeout is set, it is now saved in its context. If there
is already a stream attached to the appctx, the timeout is then immediately
modified. Otherwise, it is modified when the stream is created, thus during
the appctx initialization.
For now, the appctx is initialized when it is created. But this will change
to fix issues with the lua sockets. Thus, this patch is mandatory.
There is nothing that prevent someone to create a lua socket and try to
receive or to write before the connection was established ot after the
shutdown was performed. The same is true when info about the socket are
retrieved.
It is not an issue because this will fail later. But now, we check the
socket is connected or not earlier. It is more effecient but it will be also
mandatory to fix issue with the lua sockets.
The lua socket's context referenced the owning appctx. It was set when the
appctx was initialized. It is now performed when the appctx is created. It
is a small change but this will be required to fix several issues with the
lua sockets.
This commit introduces support for the "http-after-res" action in
hlua, enabling the invocation of a Lua function in a
"http-after-response" rule. With this enhancement, a Lua action can be
registered using the "http-after-res" action type:
core.register_action('myaction', {'http-after-res'}, myaction)
A new "lua.myaction" is created and can be invoked in a
"http-after-response" rule:
http-after-response lua.myaction
This addition provides greater flexibility and extensibility in
handling post-response actions using Lua.
This commit depends on:
- 4457783 ("MINOR: http_ana: position the FINAL flag for http_after_res execution")
Signed-off-by: Sébastien Gross <sgross@haproxy.com>
It's not supported to call lua_resume with <L> and <from> designating
the same lua coroutine. It didn't cause visible bugs so far because
Lua 5.3 used to be more permissive about this, and moreover, yielding
is not involved during the hlua init state.
But this is wrong usage, and the doc clearly specifies that the <from>
argument can be NULL when there is no such coroutine, which is the case
here.
This should be backported in every stable versions.
In hlua_ctx_resume(), we call lua_resume() function like this:
lua_resume(lua->T, hlua_states[lua->state_id], lua->nargs)
Once the call returns, we may call the function again with the same
hlua context when E_YIELD is returned (the execution was interrupted
and may be resumed through another lua_resume() call).
The 3rd argument to lua_resume(), 'nargs', is a hint passed to Lua to
know how many (optional) arguments were pushed on the stack prior to
resuming the execution (arguments that Lua will then expose to the Lua
script).
But here is the catch: we never reset lua->nargs between successive
lua_resume() calls, meaning that next lua_resume() calls will still
inherit from the initial nargs value that was set in hlua ctx prior
to calling hlua_ctx_resume() (our wrapper function) for the first time.
This is problematic, because despite not being explicitly mentioned in
the Lua documentation, passed arguments (to which `nargs` refer to), are
already consumed once lua_resume() returns.
This means that we cannot keep calling lua_resume() with non-zero nargs
if we don't push new arguments on the stack prior to resuming lua after
the initial call: nargs is proper to a single lua_resume() invocation.
Despite improper use of lua_resume() for a long time, this didn't cause
visible issues in the past with Lua 5.3, but it is particularly sensitive
starting with Lua 5.4.3 due to debugging hooks improvements that led to
some internal changes (see: lua/lua@58aa09a). Not using nargs properly
now exposes us to undefined behavior when resuming after a yield triggered
from a debugging hook, which may cause running scripts to crash
unexpectedly: for instance with Lua raising errors and complaining about
values being NULL where it should not be the case.
For reference, this issue was initially raised on the Lua mailing list:
http://lua-users.org/lists/lua-l/2023-09/msg00005.html
In this patch, we immediately reset nargs when lua_resume() returns to
prevent any misuse.
It should be backported to every maintained versions.
When hlua_action error messages were reworked in d5b073cf1
("MINOR: lua: Improve error message"), an error was made for the
E_YIELD case.
Indeed, everywhere E_YIELD error is handled: "yield is not allowed" or
similar error message is reported to the user. But instead we currently
have: "aborting Lua processing on expired timeout".
It is quite misleading because this error message often refers to the
HLUA_E_ETMOUT case.
Thus, we now report the proper error message thanks to this patch.
This should be backported to all stable versions.
[on 2.0, the patch needs to be slightly adapted]
Replace ->lock type of pat_ref struct by HA_RWLOCK_T.
Replace all calls to HA_SPIN_LOCK() (resp. HA_SPIN_UNLOCK()) by HA_RWLOCK_WRLOCK()
(resp. HA_RWLOCK_WRUNLOCK()) when a write access is required.
There is only one read access which is needed. This is in the "show map" command
callback, cli_io_handler_map_lookup() where a HA_SPIN_LOCK() call is replaced
by HA_RWLOCK_RDLOCK() (resp. HA_SPIN_UNLOCK() by HA_RWLOCK_RDUNLOCK).
Replace HA_SPIN_INIT() calls by HA_RWLOCK_INIT() calls.
Organize reference to pattern element of map (struct pat_ref_elt) into an ebtree:
- add an eb_root member to the map (pat_ref struct) and an ebpt_node to its
element (pat_ref_elt struct),
- modify the code to insert these nodes into their ebtrees each time they are
allocated. This is done in pat_ref_append().
Note that ->head member (struct list) of map (struct pat_ref) is not removed
could have been removed. This is not the case because still necessary to dump
the map contents from the CLI in the order the map elememnts have been inserted.
This patch also modifies http_action_set_map() which is the callback at least
used by "set-map" action. The pat_ref_elt element returned by pat_ref_find_elt()
is no more ignored, but reused if not NULL by pat_ref_set() as first element to
lookup from. This latter is also modified to use the ebtree attached to the map
in place of the ->head list attached to each map element (pat_ref_elt struct).
Also modify pat_ref_find_elt() to makes it use ->eb_root map ebtree added to the
map by this patch in place of inspecting all the elements with a strcmp() call.
Michel Mayen reported that mixing lua actions loaded from 'lua-load'
and 'lua-load-per-thread' directives within a single http/tcp session
yields unexpected results.
When executing action defined in another running context from the one of
the previously executed action (from lua-load, then from
lua-load-per-thread or the opposite, order doesn't matter), it would yield
this kind of error:
"Lua function 'name': [state-id x] runtime error: attempt to call a nil value from ."
He also noted that when loading all actions using the same loading
directive, the issue is gone.
This is due to the fact that for lua actions, fetches and converters, lua
code is being executed from the stream lua context. However, the stream
lua context, which is created on the fly when first executing some lua
code related to the stream, is reused between multiple lua executions.
But the thing is, despite successive executions referring to the same
parent "stream" (which is also assigned to a given thread id), they don't
necessarily depend on the same running context from lua point of view.
Indeed, since the function which is about to be executed could have been
loaded from either 'lua-load' or 'lua-load-per-thread', the function
declaration and related dependencies are defined in a specific stack ID
which is known by calling fcn_ref_to_stack_id() on the given function.
Thus, in order to make streams capable of chaining lua actions, fetches and
converters loaded in different lua stacks, we add a new detection logic
in hlua_stream_ctx_prepare() to be able to recreate the lua context in the
proper stack space when the existing one conflicts with the expected stack
id.
This must be backported in every stable versions.
It depends on:
- "MINOR: hlua: add hlua_stream_prepare helper function"
[for < 2.5, skip the filter part since they didn't exist]
[wt: warning, wait a little bit before backporting too far, we
need to be certain the added BUG_ON() will never trigger]
Stream-dedicated hlua ctx creation and attachment is now performed in
hlua_stream_ctx_prepare() helper function to ease code maintenance.
No functional behavior change should be expected.
Multiple error paths made invalid use of lua_pop():
When the stack is emptied using lua_settop(0), lua_pop() (which is
implemented as a lua_settop() macro) should not be used right after,
because it could lead to invalid reads since the stack is already empty.
Unfortunately, some remnants from initial lua stack implementation kept
doing so, resulting in haproxy crashs on some lua runtime errors paths
from time to time (ie: ERRRUN, ERRMEM).
Moreover, the extra lua_pop() instruction, even if it was safe, is totally
pointless in such case.
Removing such unsafe lua_pop() statements when we know that the stack is
already empty.
This must be backported in every stable versions.