4e5e2664 ("MINOR: proxy: add findserver_unique_id() and findserver_unique_name()")
added findserver_unique_id() and findserver_unique_name() functions that
were inspired from the historical findserver() function, so unfortunately
they don't perform well when used on large backend farms because they scan
the whole server list linearly.
I was about to provide a patch to optimize such functions when I stumbled
on Baptiste's work:
19a106d24 ("MINOR: server: server_find functions: id, name, best_match")
It turns out Baptiste already implemented helper functions to supersed
the unoptimized findserver() function (at least at runtime when servers
have been assigned their final IDs and inserted in the lookup trees): they
offer more matching options and rely on eb lookups so they are much more
suitable for fast queries. I don't know how I missed that, but they are a
perfect base for the server rid matching functions.
So in this patch, we essentially revert 4e5e2664 to provide the optimized
equivalent functions named server_find_by_id_unique() and
server_find_by_name_unique(), then we force existing findserver_unique_*()
callers to switch to the new functions.
This patch depends on:
- "OPTIM: server: eb lookup for server_find_by_name()"
This could be backported up to 2.8.
This bugfix is the same as the following one:
"BUG/MINOR: ssl_ckch: Wrong OCSP CID after modifying an SSL certficate"
where the OCSP CID had to be reset when updating a certificate.
Must be backported to 2.8.
The proxy's initialization is rather odd. First, init_new_proxy() is
called to zero all the lists and certain values, except those that can
come from defaults, which are initialized by proxy_preset_defaults().
The default server settings are also only set there.
This results in these settings not to be set for a number of internal
proxies that do not explicitly call proxy_preset_defaults() after
allocation, such as sink and log forwarders.
This was revealed by last commit 79aa63823 ("MINOR: server: always
initialize pp_tlvs for default servers") which crashes in log parsers
when applied to certain proxies which did not initialize their default
servers.
In theory this should be backported, however it would be desirable to
wait a bit before backporting it, in case certain parts would rely on
these elements not being initialized.
str2sa_range() already allows the caller to provide <proto> in order to
get a pointer on the protocol matching with the string input thanks to
5fc9328a ("MINOR: tools: make str2sa_range() directly return the protocol")
However, as stated into the commit message, there is a trick:
"we can fail to return a protocol in case the caller
accepts an fqdn for use later. This is what servers do and in this
case it is valid to return no protocol"
In this case, we're unable to return protocol because the protocol lookup
depends on both the [proto type + xprt type] and the [family type] to be
known.
While family type might not be directly resolved when fqdn is involved
(because family type might be discovered using DNS queries), proto type
and xprt type are already known. As such, the caller might be interested
in knowing those address related hints even if the address family type is
not yet resolved and thus the matching protocol cannot be looked up.
Thus in this patch we add the optional net_addr_type (custom type)
argument to str2sa_range to enable the caller to check the protocol type
and transport type when the function succeeds.
After making it configurable in previous commit "MINOR: lua: Add flags
to configure logging behaviour", this patch changes the default value
of tune.lua.log.stderr from 'on' (unconditionally forward LUA logs to
stderr) to 'auto' (only forward LUA logs to stderr if logging via a
standard logger is disabled, or none is configured for the current context)
Since this is a change in behaviour, it shouldn't be backported
Until now, messages printed from LUA log functions were sent both to
the any logger configured for the current proxy, and additionally to
stderr (in most cases)
This introduces two flags to configure LUA log handling:
- tune.lua.log.loggers to use standard loggers or not
- tune.lua.log.stderr to use stderr, or not, or only conditionally
This addresses github feature request #2316
This can be backported to 2.8 as it doesn't change previous behaviour.
Because channel_is_empty() function does now only check the channel's
buffer, we can remove it and rely on co_data() instead. Of course, all tests
must be inverted.
channel_is_empty() is thus removed.
Since last fixes about the lua cosocket, the appctx is no longer initialized
in hlua_socket_new(). The code to deal with error at this stage can be
removed.
This patch should fix the issue #2308.
Ths appctx used by a lua socket was synchronously initialized after the
appctx creation. The connect itself is performed later. However it is an
issue because the script may be interrupted beteween the two operation. In
this case, the stream attached to the appctx is woken up before any
destination is set. The stream will try to connect but without destination,
it fails. When the lua script is rescheduled and the connect is performed,
the connection has already failed and an error is returned.
To fix the issue, we must be sure to not woken up the stream before the
connect. To do so, we must defer the appctx initilization. It is now perform
on connect.
This patch relies on the following commits:
* MINOR: hlua: Test the hlua struct first when the lua socket is connecting
* MINOR: hlua: Save the lua socket's server in its context
* MINOR: hlua: Save the lua socket's timeout in its context
* MINOR: hlua: Don't preform operations on a not connected socket
* MINOR: hlua: Set context's appctx when the lua socket is created
All the series must be backported as far as 2.6.
For the same reason than the timeout, the server used by a lua socket is now
saved in its context. This will be mandatory to fix issues with the lua
sockets.
When the lua socket timeout is set, it is now saved in its context. If there
is already a stream attached to the appctx, the timeout is then immediately
modified. Otherwise, it is modified when the stream is created, thus during
the appctx initialization.
For now, the appctx is initialized when it is created. But this will change
to fix issues with the lua sockets. Thus, this patch is mandatory.
There is nothing that prevent someone to create a lua socket and try to
receive or to write before the connection was established ot after the
shutdown was performed. The same is true when info about the socket are
retrieved.
It is not an issue because this will fail later. But now, we check the
socket is connected or not earlier. It is more effecient but it will be also
mandatory to fix issue with the lua sockets.
The lua socket's context referenced the owning appctx. It was set when the
appctx was initialized. It is now performed when the appctx is created. It
is a small change but this will be required to fix several issues with the
lua sockets.
This commit introduces support for the "http-after-res" action in
hlua, enabling the invocation of a Lua function in a
"http-after-response" rule. With this enhancement, a Lua action can be
registered using the "http-after-res" action type:
core.register_action('myaction', {'http-after-res'}, myaction)
A new "lua.myaction" is created and can be invoked in a
"http-after-response" rule:
http-after-response lua.myaction
This addition provides greater flexibility and extensibility in
handling post-response actions using Lua.
This commit depends on:
- 4457783 ("MINOR: http_ana: position the FINAL flag for http_after_res execution")
Signed-off-by: Sébastien Gross <sgross@haproxy.com>
It's not supported to call lua_resume with <L> and <from> designating
the same lua coroutine. It didn't cause visible bugs so far because
Lua 5.3 used to be more permissive about this, and moreover, yielding
is not involved during the hlua init state.
But this is wrong usage, and the doc clearly specifies that the <from>
argument can be NULL when there is no such coroutine, which is the case
here.
This should be backported in every stable versions.
In hlua_ctx_resume(), we call lua_resume() function like this:
lua_resume(lua->T, hlua_states[lua->state_id], lua->nargs)
Once the call returns, we may call the function again with the same
hlua context when E_YIELD is returned (the execution was interrupted
and may be resumed through another lua_resume() call).
The 3rd argument to lua_resume(), 'nargs', is a hint passed to Lua to
know how many (optional) arguments were pushed on the stack prior to
resuming the execution (arguments that Lua will then expose to the Lua
script).
But here is the catch: we never reset lua->nargs between successive
lua_resume() calls, meaning that next lua_resume() calls will still
inherit from the initial nargs value that was set in hlua ctx prior
to calling hlua_ctx_resume() (our wrapper function) for the first time.
This is problematic, because despite not being explicitly mentioned in
the Lua documentation, passed arguments (to which `nargs` refer to), are
already consumed once lua_resume() returns.
This means that we cannot keep calling lua_resume() with non-zero nargs
if we don't push new arguments on the stack prior to resuming lua after
the initial call: nargs is proper to a single lua_resume() invocation.
Despite improper use of lua_resume() for a long time, this didn't cause
visible issues in the past with Lua 5.3, but it is particularly sensitive
starting with Lua 5.4.3 due to debugging hooks improvements that led to
some internal changes (see: lua/lua@58aa09a). Not using nargs properly
now exposes us to undefined behavior when resuming after a yield triggered
from a debugging hook, which may cause running scripts to crash
unexpectedly: for instance with Lua raising errors and complaining about
values being NULL where it should not be the case.
For reference, this issue was initially raised on the Lua mailing list:
http://lua-users.org/lists/lua-l/2023-09/msg00005.html
In this patch, we immediately reset nargs when lua_resume() returns to
prevent any misuse.
It should be backported to every maintained versions.
When hlua_action error messages were reworked in d5b073cf1
("MINOR: lua: Improve error message"), an error was made for the
E_YIELD case.
Indeed, everywhere E_YIELD error is handled: "yield is not allowed" or
similar error message is reported to the user. But instead we currently
have: "aborting Lua processing on expired timeout".
It is quite misleading because this error message often refers to the
HLUA_E_ETMOUT case.
Thus, we now report the proper error message thanks to this patch.
This should be backported to all stable versions.
[on 2.0, the patch needs to be slightly adapted]
Replace ->lock type of pat_ref struct by HA_RWLOCK_T.
Replace all calls to HA_SPIN_LOCK() (resp. HA_SPIN_UNLOCK()) by HA_RWLOCK_WRLOCK()
(resp. HA_RWLOCK_WRUNLOCK()) when a write access is required.
There is only one read access which is needed. This is in the "show map" command
callback, cli_io_handler_map_lookup() where a HA_SPIN_LOCK() call is replaced
by HA_RWLOCK_RDLOCK() (resp. HA_SPIN_UNLOCK() by HA_RWLOCK_RDUNLOCK).
Replace HA_SPIN_INIT() calls by HA_RWLOCK_INIT() calls.
Organize reference to pattern element of map (struct pat_ref_elt) into an ebtree:
- add an eb_root member to the map (pat_ref struct) and an ebpt_node to its
element (pat_ref_elt struct),
- modify the code to insert these nodes into their ebtrees each time they are
allocated. This is done in pat_ref_append().
Note that ->head member (struct list) of map (struct pat_ref) is not removed
could have been removed. This is not the case because still necessary to dump
the map contents from the CLI in the order the map elememnts have been inserted.
This patch also modifies http_action_set_map() which is the callback at least
used by "set-map" action. The pat_ref_elt element returned by pat_ref_find_elt()
is no more ignored, but reused if not NULL by pat_ref_set() as first element to
lookup from. This latter is also modified to use the ebtree attached to the map
in place of the ->head list attached to each map element (pat_ref_elt struct).
Also modify pat_ref_find_elt() to makes it use ->eb_root map ebtree added to the
map by this patch in place of inspecting all the elements with a strcmp() call.
Michel Mayen reported that mixing lua actions loaded from 'lua-load'
and 'lua-load-per-thread' directives within a single http/tcp session
yields unexpected results.
When executing action defined in another running context from the one of
the previously executed action (from lua-load, then from
lua-load-per-thread or the opposite, order doesn't matter), it would yield
this kind of error:
"Lua function 'name': [state-id x] runtime error: attempt to call a nil value from ."
He also noted that when loading all actions using the same loading
directive, the issue is gone.
This is due to the fact that for lua actions, fetches and converters, lua
code is being executed from the stream lua context. However, the stream
lua context, which is created on the fly when first executing some lua
code related to the stream, is reused between multiple lua executions.
But the thing is, despite successive executions referring to the same
parent "stream" (which is also assigned to a given thread id), they don't
necessarily depend on the same running context from lua point of view.
Indeed, since the function which is about to be executed could have been
loaded from either 'lua-load' or 'lua-load-per-thread', the function
declaration and related dependencies are defined in a specific stack ID
which is known by calling fcn_ref_to_stack_id() on the given function.
Thus, in order to make streams capable of chaining lua actions, fetches and
converters loaded in different lua stacks, we add a new detection logic
in hlua_stream_ctx_prepare() to be able to recreate the lua context in the
proper stack space when the existing one conflicts with the expected stack
id.
This must be backported in every stable versions.
It depends on:
- "MINOR: hlua: add hlua_stream_prepare helper function"
[for < 2.5, skip the filter part since they didn't exist]
[wt: warning, wait a little bit before backporting too far, we
need to be certain the added BUG_ON() will never trigger]
Stream-dedicated hlua ctx creation and attachment is now performed in
hlua_stream_ctx_prepare() helper function to ease code maintenance.
No functional behavior change should be expected.
Multiple error paths made invalid use of lua_pop():
When the stack is emptied using lua_settop(0), lua_pop() (which is
implemented as a lua_settop() macro) should not be used right after,
because it could lead to invalid reads since the stack is already empty.
Unfortunately, some remnants from initial lua stack implementation kept
doing so, resulting in haproxy crashs on some lua runtime errors paths
from time to time (ie: ERRRUN, ERRMEM).
Moreover, the extra lua_pop() instruction, even if it was safe, is totally
pointless in such case.
Removing such unsafe lua_pop() statements when we know that the stack is
already empty.
This must be backported in every stable versions.
This adds the "core.get_var()" method allow the reading
of "proc." scoped variables outside of TXN or HTTP/TCPApplet.
Fixes: #2212
Signed-off-by: Daan van Gorkum <djvg@djvg.net>
Calling lual_newstate(Init main lua stack) in the hlua_init_state()
function, the return value of lua_newstate() may be NULL (for example
in case of OOM). In this case, L will be NULL, and then crash happens
in lua_getextraspace(). So, we add a check for lua_newstate.
This should be backported at least to 2.4, maybe further.
lua_yieldk ctx argument is of type lua_KContext which is typedefed to
intptr_t when available so it can be used to store pointers.
But the wrapper function hlua_yieldk() passes it as a regular it so it
breaks that promise.
Changing hlua_yieldk() prototype so that ctx argument is of type
lua_KContext.
This bug had no functional impact because ctx argument is not being
actively used so far. This may be backported to all stable versions
anyway.
When an HTTP applet tries to get request data, we must take care to properly
detect the end of the message. It an empty HTX message with the SC_FL_EOI
flag set on the front SC. However, an issue was introduced during the SC
refactoring performed in the 2.8. The backend SC is tested instead of the
frontend one.
Because of this bug, the receive functions hang because the test on
SC_FL_EOI flag never succeeds. Of course, by checking the frontend SC (the
opposite SC to the one attached to the appctx), it works.
This patch should fix the issue #2180. It must be backported to the 2.8.
Fixing hlua_lua2smp() usage in hlua's code since it was assumed that
hlua_lua2smp() makes a standalone smp out of lua data, but it is not
the case.
This is especially true when dealing with lua strings (string is
extracted using lua_tolstring() which returns a pointer to lua string
memory location that may be reclaimed by lua at any time when no longer
used from lua's point of view). Thus, smp generated by hlua_lua2smp() may
only be used from the lua context where the call was initially made, else
it should be explicitly duplicated before exporting it out of lua's
context to ensure safe (standalone) usage.
This should be backported to all stable versions.
When hlua_event_runner() pauses the subscription (ie: if the consumer
can't keep up the pace), hlua_traceback() is used to get the current
lua trace (running context) to provide some info to the user.
However, as hlua_traceback() may raise an error (__LJMP) is set, it is
used within a SET_SAFE_LJMP() / RESET_SAFE_LJMP() combination to ensure
lua errors are properly handled and don't result in unexpected behavior.
But the current usage of SET_SAFE_LJMP() within the function is wrong
since hlua_traceback() will run a second time (unprotected) if the
first (protected) attempt fails. This is undefined behavior and could
even lead to crashes.
Hopefully it is very hard to trigger this code path, thus we can consider
this as a minor bug.
Also using this as an opportunity to enhance the message report to make
it more meaningful to the user.
This should fix GH #2159.
It is a 2.8 specific bug, no backport needed unless c84899c636
("MEDIUM: hlua/event_hdl: initial support for event handlers") gets
backported.
Since 3157222 ("MEDIUM: hlua/applet: Use the sedesc to report and detect
end of processing"), hlua_socket_handler() might spin loop if the hlua
socket is destroyed and some data was left unconsumed in the applet.
Prior to the above commit, the stream was explicitly KILLED
(when ctx->die == 1) so the app couldn't spinloop on unconsumed data.
But since the refactor this is no longer the case.
To prevent unconsumed data from waking the applet indefinitely, we consume
pending data when either one of EOS|ERROR|SHR|SHW flags are set, as it is
done everywhere else this check is performed in the code. Hence it was
probably overlooked in the first place during the refacto.
This bug is 2.8 specific only, so no backport needed.
Exposing a new hlua function, available from body or init contexts, that
forcefully disables the sending of email alerts even if the mailers are
defined in haproxy configuration.
This will help for sending email directly from lua.
(prevent legacy email sending from intefering with lua)
Exposing SERVER_CHECK event through the lua API.
New lua class named ServerEventCheck was added to provide additional
data for SERVER_CHECK event.
Lua documentation was updated accordingly.
sc_need_room() now takes the required free space to receive more data as
parameter. All calls to this function are updated accordingly. For now, this
value is set but not used. When we are waiting for a buffer, 0 is used. So
we expect to be unblocked ASAP. However this must be reviewed because
SC_FL_NEED_BUF is probably enough in this case and this flag is already set
if the input buffer allocation fails.
This puts an end to the occasional confusion between the "now" date
that is internal, monotonic and not synchronized with the system's
date, and "date" which is the system's date and not necessarily
monotonic. Variable "now" was removed and replaced with a 64-bit
integer "now_ns" which is a counter of nanoseconds. It wraps every
585 years, so if all goes well (i.e. if humanity does not need
haproxy anymore in 500 years), it will just never wrap. This implies
that now_ns is never nul and that the zero value can reliably be used
as "not set yet" for a timestamp if needed. This will also simplify
date checks where it becomes possible again to do "date1<date2".
All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns".
Due to the intricacies between now, global_now and now_offset, all 3
had to be turned to nanoseconds at once. It's not a problem since all
of them were solely used in 3 functions in clock.c, but they make the
patch look bigger than it really is.
The clock_update_local_date() and clock_update_global_date() functions
are now much simpler as there's no need anymore to perform conversions
nor to round the timeval up or down.
The wrapping continues to happen by presetting the internal offset in
the short future so that the 32-bit now_ms continues to wrap 20 seconds
after boot.
The start_time used to calculate uptime can still be turned to
nanoseconds now. One interrogation concerns global_now_ms which is used
only for the freq counters. It's unclear whether there's more value in
using two variables that need to be synchronized sequentially like today
or to just use global_now_ns divided by 1 million. Both approaches will
work equally well on modern systems, the difference might come from
smaller ones. Better not change anyhting for now.
One benefit of the new approach is that we now have an internal date
with a resolution of the nanosecond and the precision of the microsecond,
which can be useful to extend some measurements given that timestamps
also have this resolution.
Let's get rid of timeval in storage of internal timestamps so that they
are no longer mistaken for wall clock time. These were exclusively used
subtracted from each other or to/from "now" after being converted to ns,
so this patch removes the tv_to_ns() conversion to use them natively. Two
occurrences of tv_isge() were turned to a regular wrapping subtract.
'when' optional argument is provided to lua event handlers.
It is an integer representing the number of seconds elapsed since Epoch
and may be used in conjunction with lua `os.date()` function to provide
a custom format string.
Changing hlua_event_hdl_cb_data_push_args() return type to void since it
does not return anything useful.
Also changing its name to hlua_event_hdl_cb_push_args() since it does more
than just pushing cb data argument (it also handles event type and mgmt).
Errors catched by the function are reported as lua errors.