After the flag renaming, it is now the turn for the channel function to be
renamed and moved in the SC scope. channel_shutr_now() is replaced by
sc_schedule_abort(). The request channel is replaced by the front SC and the
response is replace by the back SC.
Most of calls to channel_abort() are associated to a call to
channel_auto_close(). Others are in areas where the auto close is the
default. So, it is now systematically enabled when an abort is performed on
a channel, as part of channel_abort() function.
As every time strncat() is used, it's wrong, and this one is no exception.
Users often think that the length applies to the destination except it
applies to the source and makes it hard to use correctly. The bug did not
have an impact because the length was preallocated from the sum of all
the individual lengths as measured by strlen() so there was no chance one
of them would change in between. But it could change in the future. Let's
fix it to use memcpy() instead for strings, or byte copies for delimiters.
No backport is needed, though it can be done if it helps to apply other
fixes.
Now that the event handler API is pretty mature, we can expose it in
the lua API.
Introducing the core.event_sub(<event_types>, <cb>) lua function that
takes an array of event types <event_types> as well as a callback
function <cb> as argument.
The function returns a subscription <sub> on success.
Subscription <sub> allows you to manage the subscription from anywhere
in the script.
To this day only the sub->unsub method is implemented.
The following event types are currently supported:
- "SERVER_ADD": when a server is added
- "SERVER_DEL": when a server is removed from haproxy
- "SERVER_DOWN": server states goes from up to down
- "SERVER_UP": server states goes from down to up
As for the <cb> function: it will be called when one of the registered
event types occur. The function will be called with 3 arguments:
cb(<event>,<data>,<sub>)
<event>: event type (string) that triggered the function.
(could be any of the types used in <event_types> when registering
the subscription)
<data>: data associated with the event (specific to each event family).
For "SERVER_" family events, server details such as server name/id/proxy
will be provided.
If the server still exists (not yet deleted), a reference to the live
server is provided to spare you from an additionnal lookup if you need
to have direct access to the server from lua.
<sub> refers to the subscription. In case you need to manage it from
within an event handler.
(It refers to the same subscription that the one returned from
core.event_sub())
Subscriptions are per-thread: the thread that will be handling the
event is the one who performed the subscription using
core.event_sub() function.
Each thread treats events sequentially, it means that if you have,
let's say SERVER_UP, then SERVER_DOWN in a short timelapse, then your
cb function will first be called with SERVER_UP, and once you're done
handling the event, your function will be called again with SERVER_DOWN.
This is to ensure event consitency when it comes to logging / triggering
logic from lua.
Your lua cb function may yield if needed, but you're pleased to process
the event as fast as possible to prevent the event queue from growing up
To prevent abuses, if the event queue for the current subscription goes
over 100 unconsumed events, the subscription will pause itself
automatically for as long as it takes for your handler to catch up.
This would lead to events being missed, so a warning will be emitted in
the logs to inform you about that. This is not something you want to let
happen too often, it may indicate that you subscribed to an event that
is occurring too frequently or/and that your callback function is too
slow to keep up the pace and you should review it.
If you want to do some parallel processing because your callback
functions are slow: you might want to create subtasks from lua using
core.register_task() from within your callback function to perform the
heavy job in a dedicated task and allow remaining events to be processed
more quickly.
Please check the lua documentation for more information.
core.register_task(function) may now take up to 4 additional arguments
that will be passed as-is to the task function.
This could be convenient to spawn sub-tasks from existing functions
supporting core.register_task() without the need to use global
variables to pass some context to the newly created task function.
The new prototype is:
core.register_task(function[, arg1[, arg2[, ...[, arg4]]]])
Implementation remains backward-compatible with existing scripts.
Main lua lock is used at various places in the code.
Most of the time it is used from unprotected lua environments,
in which case the locking is mandatory.
But there are some cases where the lock is attempted from protected
lua environments, meaning that lock is already owned by the current
thread. Thus new locking attempt should be skipped to prevent any
deadlocks from occuring.
To address this, "already_safe" lock hint was implemented in
hlua_ctx_init() function with commit bf90ce1
("BUG/MEDIUM: lua: dead lock when Lua tasks are trigerred")
But this approach is not very safe, for 2 reasons:
First reason is that there are still some code paths that could lead
to deadlocks.
For instance, in register_task(), hlua_ctx_init() is called with
already_safe set to 1 to prevent deadlock from occuring.
But in case of task init failure, hlua_ctx_destroy() will be called
from the same environment (protected environment), and hlua_ctx_destroy()
does not offer the already_safe lock hint.. resulting in a deadlock.
Second reason is that already_safe hint is used to completely skip
SET_LJMP macros (which manipulates the lock internally), resulting
in some logics in the function being unprotected from lua aborts in
case of unexpected errors when manipulating the lua stack (the lock
does not protect against longjmps)
Instead of leaving the locking responsibility to the caller, which is
quite error prone since we must find out ourselves if we are or not in
a protected environment (and is not robust against code re-use),
we move the deadlock protection logic directly in hlua_lock() function.
Thanks to a thread-local lock hint, we can easily guess if the current
thread already owns the main lua lock, in which case the locking attempt
is skipped. The thread-local lock hint is implemented as a counter so
that the lock is properly dropped when the counter reaches 0.
(to match actual lock() and unlock() calls)
This commit depends on "MINOR: hlua: simplify lua locking"
It may be backported to every stable versions.
[prior to 2.5 lua filter API did not exist, filter-related parts
should be skipped]
The check on lua state==0 to know whether locking is required or not can
be performed in a locking wrapper to simplify things a bit and prevent
implementation errors.
Locking from hlua context should now be performed via hlua_lock(L) and
unlocking via hlua_unlock(L)
Using hlua_pushref() everywhere temporary lua objects are involved.
(ie: hlua_checkfunction(), hlua_checktable...)
Those references are expected to be cleared using hlua_unref() when
they are no longer used.
Using hlua_ref() everywhere temporary lua objects are involved.
Those references are expected to be cleared using hlua_unref()
when they are no longer used.
Several error paths were leaking function or table references.
(Obtained through hlua_checkfunction() and hlua_checktable() functions)
Now we properly release the references thanks to hlua_unref() in
such cases.
This commit depends on "MINOR: hlua: add simple hlua reference handling API"
This could be backported in every stable versions although it is not
mandatory as such leaks only occur on rare error/warn paths.
[prior to 2.5 lua filter API did not exist, the hlua_register_filter()
part should be skipped]
hlua init function references were not released during
hlua_post_init_state().
Hopefully, this function is only used during startup so the resulting
leak is not a big deal.
Since each init lua function runs precisely once, it is safe to release
the ref as soon as the function is restored on the stack.
This could be backported to every stable versions.
Please note that this commit depends on "MINOR: hlua: add simple hlua reference handling API"
In core.register_task(): we take a reference to the function passed as
argument in order to push it in the new coroutine substack.
However, once pushed in the substack: the reference is not useful
anymore and should be cleared.
Currently, this is not the case in hlua_register_task().
Explicitly dropping the reference once the function is pushed to the
coroutine's stack to prevent any reference leak (which could contribute
to resource shortage)
This may be backported to every stable versions.
Please note that this commit depends on "MINOR: hlua: add simple hlua reference handling API"
hlua_checktable() and hlua_checkfunction() both return the raw
value of luaL_ref() function call.
As luaL_ref() returns a signed int, both functions should return a signed
int as well to prevent any misuse of the returned reference value.
We're doing this in an attempt to simplify temporary lua objects
references handling.
Adding the new hlua_unref() function to release lua object references
created using luaL_ref(, LUA_REGISTRYINDEX)
(ie: hlua_checkfunction() and hlua_checktable())
Failure to release unused object reference prevents the reference index
from being re-used and prevents the referred ressource from being garbage
collected.
Adding hlua_pushref(L, ref) to replace
lua_rawgeti(L, LUA_REGISTRYINDEX, ref)
Adding hlua_ref(L) to replace luaL_ref(L, LUA_REGISTRYINDEX)
The comment for the hlua_ctx_destroy() function states that the "lua"
struct is not freed.
This is not true anymore since 2c8b54e7 ("MEDIUM: lua: remove Lua struct
from session, and allocate it with memory pools")
Updating the function comment to properly report the actual behavior.
This could be backported in every stable versions with 2c8b54e7
("MEDIUM: lua: remove Lua struct from session, and allocate it with memory pools")
When HAproxy is loaded with a lot of frontends/backends (tested with 300k),
it is slow to start and it uses a lot of memory just for indexing backends
in the lua tables.
This patch uses the internal frontend/backend index of HAProxy in place of
lua table.
HAProxy startup is now quicker as each frontend/backend object is created
on demand and not at init.
This has to come with some cost: the execution of Lua will be a little bit
slower.
register_{init, converters, fetches, action, service, cli, filter} are
meant to run exclusively from body context according to the
documentation (unlike register_task which is designed to work from both
init and runtime contexts)
A quick code inspection confirms that only register_task implements
the required precautions to make it safe out of init context.
Trying to use those register_* functions from a runtime lua task will
lead to a program crash since they all assume that they are running from
the main lua context and with no concurrent runs:
core.register_task(function()
core.register_init(function()
end)
end)
When loaded from the config, the above example would segfault.
To prevent this undefined behavior, we now report an explicit error if
the user tries to use such functions outside of init/body context.
This should be backported in every stable versions.
[prior to 2.5 lua filter API did not exist, the hlua_register_filter()
part should be skipped]
In hlua_process_task: when HLUA_E_ETMOUT was returned by
hlua_ctx_resume(), meaning that the lua task reached
tune.lua.task-timeout (default: none),
we logged "Lua task: unknown error." before stopping the task.
Now we properly handle HLUA_E_ETMOUT to report a meaningful error
message.
In function hlua_hook, a yieldk is performed when function is yieldable.
But the following code in that function seems to assume that the yield
never returns, which is not the case!
Moreover, Lua documentation says that in this situation the yieldk call
must immediately be followed by a return.
This patch adds a return statement after the yieldk call.
It also adds some comments and removes a needless lua_sethook call.
It could be backported to all stable versions, but it is not mandatory,
because even if it is undefined behavior this bug doesn't seem to
negatively affect lua 5.3/5.4 stacks.
There are 3 kinds of applet in lua: The co-sockets, the TCP services and the
HTTP services. The three are refactored to use the SE descriptor instead of
the channel to report error and end-of-stream.
Thanks to the previous patch, it is now possible for applets to not set the
CF_EOI flag on the channels. On this point, the applets get closer to the
muxes.
It was done by hand by callers when a shutdown for read or write was
performed. It is now always handled by the functions performing the
shutdown. This way the callers don't take care of it. This will avoid some
bugs.
Read and write timeouts (.rto and .wto) are now replaced by an unique
timeout, call .ioto. Since the recent refactoring on channel's timeouts,
both use the same value, the client timeout on client side and the server
timeout on the server side. Thus, this part may be simplified. Now it
represents the I/O timeout.
These timers are related to the I/O. Thus it is logical to move them into
the SE descriptor. The patch is a bit huge but it is just a
replacement. However it is error-prone.
From the stconn or the stream, helper functions are used to get, set or
reset these timers. This simplify the timers manipulations.
Read and write timeouts concerns the I/O. Thus, it is logical to move it into
the stconn. At the end, the stream is responsible to detect the timeouts. So
it is logcial to have these values in the stconn and not in the SE
descriptor. But it may change depending on the recfactoring.
So, now:
* scf->rto is used instead of req->rto
* scf->wto is used instead of res->wto
* scb->rto is used instead of res->rto
* scb->wto is used instead of req->wto
In bb581423b ("BUG/MEDIUM: httpclient/lua: crash when the lua task timeout
before the httpclient"), a new logic was implemented to make sure that
when a lua ctx destroyed, related httpclients are correctly destroyed too
to prevent a such httpclients from being resuscitated on a destroyed lua ctx.
This was implemented by adding a list of httpclients within the lua ctx,
and a new function, hlua_httpclient_destroy_all(), that is called under
hlua_ctx_destroy() and runs through the httpclients list in the lua context
to properly terminate them.
This was done with the assumption that no concurrent Lua garbage collection
cycles could occur on the same ressources, which seems OK since the "lua"
context is about to be freed and is not explicitly being used by other threads.
But when 'lua-load' is used, the main lua stack is shared between multiple
OS threads, which means that all lua ctx in the process are linked to the
same parent stack.
Yet it seems that lua GC, which can be triggered automatically under
lua_resume() or manually through lua_gc(), does not limit itself to the
"coroutine" stack (the stack referenced in lua->T) when performing the cleanup,
but is able to perform some cleanup on the main stack plus coroutines stacks
that were created under the same main stack (via lua_newthread()) as well.
This can be explained by the fact that lua_newthread() coroutines are not meant
to be thread-safe by design.
Source: http://lua-users.org/lists/lua-l/2011-07/msg00072.html (lua co-author)
It did not cause other issues so far because most of the time when using
'lua-load', the global lua lock is taken when performing critical operations
that are known to interfere with the main stack.
But here in hlua_httpclient_destroy_all(), we don't run under the global lock.
Now that we properly understand the issue, the fix is pretty trivial:
We could simply guard the hlua_httpclient_destroy_all() under the global
lua lock, this would work but it could increase the contention over the
global lock.
Instead, we switched 'lua->hc_list' which was introduced with bb581423b
from simple list to mt_list so that concurrent accesses between
hlua_httpclient_destroy_all and hlua_httpclient_gc() are properly handled.
The issue was reported by @Mark11122 on Github #2037.
This must be backported with bb581423b ("BUG/MEDIUM: httpclient/lua: crash
when the lua task timeout before the httpclient") as far as 2.5.
In hlua_httpclient_send(), we replace hc->req.url with a new url.
But we forgot to free the original url that was allocated in
hlua_httpclient_new() or in the previous httpclient_send() call.
Because of this, each httpclient request performed under lua scripts would
result in a small leak. When stress-testing a lua action which uses httpclient,
the leak is clearly visible since we're leaking severals Mbytes per minute.
This bug was discovered by chance when trying to reproduce GH issue #2037.
It must be backported up to 2.5
When the payload length cannot be determined, the htx extra field is set to
the magical vlaue ULLONG_MAX. It is not obvious. This a dedicated HTX value
is now used. Now, HTX_UNKOWN_PAYLOAD_LENGTH must be used in this case,
instead of ULLONG_MAX.
These both functions are buggy and don't respect the documentation. They
must wait for more data, if possible.
For Channel.data(), it must happen if not enough data was received orf if no
length was specified and no data was received. The first case is properly
handled but not the second one. An empty string is return instead. In
addition, if there is no data and the channel can't receive more data, 'nil'
value must be returned.
In the same spirit, for Channel.line(), we must try to wait for more data
when no line is found if not enough data was received or if no length was
specified. Here again, only the first case is properly handled. And for this
function too, 'nil' value must be returned if there is no data and the
channel can't receive more data.
This patch is related to the issue #1993. It must be backported as far as
2.5.
CF_READ_NULL flag is not really useful and used. It is a transient event
used to wakeup the stream. As we will see, all read events on a channel may
be resumed to only one and are all used to wake up the stream.
In this patch, we introduce CF_READ_EVENT flag as a replacement to
CF_READ_NULL. There is no breaking change for now, it is just a
rename. Gradually, other read events will be merged with this one.
The lua httpclient cleanup can be called in 2 places, the
hlua_httpclient_gc() and the hlua_httpclient_destroy_all().
A LIST_DELETE() is performed to remove the hlua_hc struct of the list.
However, when the lua task ends and call hlua_ctx_destroy(), it does a
LIST_DELETE() first, and then the gc tries to do a LIST_DELETE() again
in hlua_httpclient_gc(), provoking a crash.
This patch fixes the issue by doing a LIST_DEL_INIT() instead of
LIST_DELETE() in both cases.
Should fix issue #1958.
Must be backported where bb58142 is backported.
Rename the structure "cert_key_and_chain" to "ckch_data" in order to
avoid confusion with the store whcih often called "ckchs".
The "cert_key_and_chain *ckch" were renamed "ckch_data *data", so we now
have store->data instead of ckchs->ckch.
Marked medium because it changes the API.
When the lua task finished before the httpclient that are associated to
it, there is a risk that the httpclient try to task_wakeup() the lua
task which does not exist anymore.
To fix this issue the httpclient used in a lua task are stored in a
list, and the httpclient are destroyed at the end of the lua task.
Must be backported in 2.5 and 2.6.
The HTTPclient callback req_payload callback is set when a request payload
must be streamed. In the lua, this callback is set when a body is passed as
argument in one of httpclient functions (head/get/post/put/delete). However,
there is no reason to set it if body string is empty.
This patch is related to the issue #1898. It may be backported as far as
2.5.
In cd341d531, I added a FIXME comment because I noticed a
lua_pushvalue with 0 index, whereas lua doc states that 0 is never
an acceptable index.
After reviewing and testing the hlua_applet_http_send_response() code,
it turns out that this pushvalue is not even needed.
So it's safer to remove it as it could lead to undefined
behavior (since it is not supported by Lua API) and it grows lua stack
by 1 for no reason.
No backport needed.
In hlua code, we mark every function that may longjump using
MAY_LJMP macro so it's easier to identify them by reading the code.
However, some luaL_checktypes() were performed without the MAY_LJMP.
According to lua doc:
Functions called luaL_check* always raise an error if
the check is not satisfied.
-> Adding the missing MAY_LJMP for those luaLchecktypes() calls.
No backport needed.
Channel.insert(channel, string, [,offset]):
When no offset is provided, hlua_channel_insert_data() inserts
string at the end of incoming data.
This behavior conflicts with the documentation that explicitly says
that the default behavior is to insert the string in front of incoming data.
This patch fixes hlua_channel_insert_data() behavior so that it fully
complies with the documentation.
Thanks to Smackd0wn for noticing it.
This could be backported to 2.6 and 2.5
In hlua_lua2arg_check(), we allow for the first argument to not be
provided, if it has a type we know, this is true for frontend, backend,
and stick table. However, the stick table code was changed. It used
to be deduced from the proxy, but it is now directly provided in struct
args. So setting the proxy there no longer work, and we have to
explicitely set the stick table.
Not doing so will lead the code do use the proxy pointer as a stick
table pointer, which will likely cause crashes.
This should be backported up to 2.0.
In hlua_lua2arg_check(), on failure, before calling free_argp(), make
sure to always mark the failed argument as ARGT_STOP. We only want to
free argument prior to that point, because we did not allocate the
strings after this one, and so we don't want to free them.
This should be backported up to 2.2.
In function hlua_applet_http_send_response(), a pushvalue
is performed with index '0'.
But according to lua doc (https://www.lua.org/manual/5.3/manual.html#4.3):
"Note that 0 is never an acceptable index".
Adding a FIXME comment near to the pushvalue operation
so that this can get some chance to be reviewed later.
No backport needed.
When providing multiple optional arguments with lua-load or
lua-load-per-thread directives, arguments where pushed 1 by 1
to the stack using lua_pushstring() without checking if the stack
could handle it.
This could easily lead to program crash when providing too much
arguments. I can easily reproduce the crash starting from ~50 arguments.
Calling lua_checkstack() before pushing to the stack fixes the crash:
According to lua.org, lua_checkstack() does some housekeeping and
allow the stack to be expanded as long as some memory is available
and the hard limit isn't reached.
When no memory is available to expand the stack or the limit is reached,
lua_checkstacks returns an error: in this case we force hlua_load_state()
to return a meaningfull error instead of crashing.
In practice though, cfgparse complains about too many words
way before such event may occur on a normal system.
TLDR: the ~50 arguments limitation is not an issue anymore.
No backport needed, except if 'MINOR: hlua: Allow argument on
lua-lod(-per-thread) directives' (ae6b568) is backported.
Calling the function with an offset when "offset + len" was superior or equal
to the targeted blk length caused 'v' value to be improperly set.
And because 'v' is directly provided to htx_replace_blk_value(), blk consistency was compromised.
(It seems that blk was overrunning in htx_replace_blk_value() due to
this and header data was overwritten in this case).
This patch adds the missing checks to make the function behave as
expected when offset is set and offset+len is greater or equals to the targeted blk length.
Some comments were added to the function as well.
It may be backported to 2.6 and 2.5