haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-14 11:06:56 +02:00

Author	SHA1	Message	Date
Christopher Faulet	9e05c14a41	MINOR: hlua/h1: Use http_parse_cont_len_header() to parse content-length value Till now, h1_parse_cont_len_header() was used during the H1 message parsing and by the lua HTTP applets to parse the content-length header value. But a more generic function was added some years ago doing exactly the same operations. So let's use it instead.	2025-04-22 16:14:47 +02:00
Aurelien DARRAGON	b81ab159a6	BUG/MEDIUM: hlua: fix hlua_applet_{http,tcp}_fct() yield regression (lost data) Jacques Heunis from bloomberg reported on the mailing list [1] that with haproxy 2.8 up to master, yielding from a Lua tcp service while data was still buffered inside haproxy would eat some data which was definitely lost. He provided the reproducer below which turned out to be really helpful: global log stdout format raw local0 info lua-load haproxy_yieldtest.lua defaults log global timeout connect 10s timeout client 1m timeout server 1m listen echo bind *:9090 mode tcp tcp-request content use-service lua.print_input haproxy_yieldtest.lua: core.register_service("print_input", "tcp", function(applet) core.Info("Start printing input...") while true do local inputs = applet:getline() if inputs == nil or string.len(inputs) == 0 then core.Info("closing input connection") return end core.Info("Received line: "..inputs) core.yield() end end) And the script below: #!/usr/bin/bash for i in $(seq 1 9999); do for j in $(seq 1 50); do echo "${i}_foo_${j}" done sleep 2 done Using it like this: ./test_seq.sh \| netcat localhost 9090 We can clearly see the missing data for every "foo" burst (every 2 seconds), as they are holes in the numbering. Thanks to the reproducer, it was quickly found that only versions >= 2.8 were affected, and that in fact this regression was introduced by commit `31572229e` ("MEDIUM: hlua/applet: Use the sedesc to report and detect end of processing") In fact in `31572229e` 2 mistakes were made during the refaco. Indeed, both in hlua_applet_tcp_fct() (which is involved in the reproducer above) and hlua_applet_http_fct(), the request (buffer) is now systematically consumed when returning from the function, which wasn't the case prior to this commit: when HLUA_E_AGAIN is returned, it means a yield was requested and that the processing is not done yet, thus we should not consume any data, like we did prior to the refacto. Big thanks to Jacques who did a great job reproducing and reporting this issue on the mailing list. [1]: https://www.mail-archive.com/haproxy@formilux.org/msg45778.html It should be backported up to 2.8 with commit `31572229e`	2025-04-17 14:40:34 +02:00
Aurelien DARRAGON	ea3c96369f	BUG/MINOR: hlua: fix invalid errmsg use in hlua_init() errmsg is used with memprintf and friends, thus it must be NULL initialized before being passed to memprintf, else invalid read will occur. However in hlua_init() the errmsg value isn't initialized, let's fix that This is really minor because it would only cause issue on error paths, yet it may be backported to all stable versions, just in case.	2025-04-10 22:10:26 +02:00
Aurelien DARRAGON	11d4d0957e	MEDIUM: task: make notification_* API thread safe by default Some notification_* functions were not thread safe by default as they assumed only one producer would emit events for registered tasks. While this suited well with the Lua sockets use-case, this proved to be a limitation with some other event sources (ie: lua Queue class) instead of having to deal with both the non thread safe and thread safe variants (_mt suffix), which is error prone, let's make the entire API thread safe regarding the event list. Pruning functions still require that only one thread executes them, with Lua this is always the case because there is one cleanup list per context.	2025-04-03 17:52:50 +02:00
Aurelien DARRAGON	0ffc80d3ba	MINOR: hlua: add AppletTCP:try_receive() This is the non-blocking variant for AppletTCP:receive(). It doesn't take any argument, instead it tries to read as much data as available at once. If no data is available, empty string is returned. Lua documentation was updated.	2025-04-03 17:52:39 +02:00
Aurelien DARRAGON	86d3cfdeeb	MINOR: hlua: split hlua_applet_tcp_recv_yield() in two functions Split hlua_applet_tcp_recv_yield() in order to create hlua_applet_tcp_recv_try() helper function which does a single receive attempt.	2025-04-03 17:52:34 +02:00
Aurelien DARRAGON	c7cbfafa38	MINOR: hlua: core.wait() takes optional delay paramater core.wait() now accepts optional delay parameter in ms. Passed this delay the task is woken up if no event woke the task before. Lua documentation was updated.	2025-04-03 17:52:28 +02:00
Aurelien DARRAGON	1e4e5ab4d2	MINOR: hlua: add core.wait() Similar to core.yield(), except that the task is not woken up automatically, instead it waits for events to trigger the task wakeup. Lua documentation was updated.	2025-04-03 17:52:23 +02:00
Ilia Shipitsin	27a6353ceb	CLEANUP: assorted typo fixes in the code, commits and doc	2025-04-03 11:37:25 +02:00
Ilia Shipitsin	78b849b839	CLEANUP: assorted typo fixes in the code and comments code, comments and doc actually.	2025-04-02 11:12:20 +02:00
Aurelien DARRAGON	21601f4a27	BUG/MEDIUM: hlua/cli: fix cli applet UAF in hlua_applet_wakeup() Recent commit `e5e36ce09` ("BUG/MEDIUM: hlua/cli: Fix lua CLI commands to work with applet's buffers") revealed a bug in hlua cli applet handling Indeed, playing with Willy's lua tetris script on the cli, a segfault would be encountered when forcefully closing the session by sending a CTRL+C on the terminal. In fact the crash was caused by a UAF: while the cli applet was already freed, the lua task responsible for waking it up would still point to it. Thus hlua_applet_wakeup() could be called even if the applet didn't exist anymore. To fix the issue, in hlua_cli_io_release_fct() we must also free the hlua task linked to the applet, like we already do for hlua_applet_tcp_release() and hlua_applet_http_release(). While this bug exists on stable versions (where it should be backported too for precaution), it only seems to be triggered starting with 3.0.	2025-03-19 17:03:28 +01:00
Aurelien DARRAGON	4651c4edd5	BUG/MINOR: hlua: fix optional timeout argument index for AppletTCP:receive() Baptiste reported that using the new optional timeout argument introduced in `19e48f2` ("MINOR: hlua: add an optional timeout to AppletTCP:receive()") the following error would occur at some point: runtime error: file.lua:lineno: bad argument #-2 to 'receive' (number expected, got light userdata) from [C]: in method 'receive... In fact this is caused by exp_date being retrieved using relative index -1 instead of absolute index 3. Indeed, while using relative index is fine most of the time when we trust the stack, when combined with yielding the top of the stack when resuming from yielding is not necessarily the same as when the function was first called (ie: if some data was pushed to the stack in the yieldable function itself). As such, it is safer to use explicit index to access exp_date variable at position 3 on the stack. It was confirmed that doing so addresses the issue. No backport needed unless `19e48f2` is.	2025-03-18 16:48:32 +01:00
Willy Tarreau	19e48f237f	MINOR: hlua: add an optional timeout to AppletTCP:receive() TCP services might want to be interactive, and without a timeout on receive(), the possibilities are a bit limited. Let's add an optional timeout in the 3rd argument to possibly limit the wait time. In this case if the timeout strikes before the requested size is complete, a possibly incomplete block will be returned.	2025-03-17 16:19:34 +01:00
Aurelien DARRAGON	29b6d8af16	MINOR: hlua: rename "tune.lua.preserve-smp-bool" to "tune.lua.bool-sample-conversion" A better name was found for the option implemented in `ec74438` ("MINOR: hlua: add option to preserve bool type from smp to lua") Indeed, "tune.lua.preserve-smp-bool {on \| off}" wasn't explicit enough nor did it encourage the adoption of the new "fixed" behavior (vs historical behavior which is now considered as a bug). Thus it becomes "tune.lua.bool-sample-conversion { normal \| pre-3.1-bug }" which actively encourage users to switch the new behavior after having patched in-use Lua script if needed. From a technical point of view, the logic remains the same, as the option currently defaults to "pre-3.1-bug" to prevent script breakage, and a warning is emitted if the option isn't set explicily and Lua is used. Documentation and regtests were updated. Must be backported in 3.1 with `ec74438` and `f2838f5` ("REGTESTS: fix lua-based regtests using tune.lua.smp-preserve-bool")	2024-12-20 17:34:05 +01:00
Aurelien DARRAGON	ec74438273	MINOR: hlua: add option to preserve bool type from smp to lua As discussed in GH #2814, there is an ambiguity in hlua implementation that causes haproxy smp boolean type to be pushed as an integer on the Lua stack. On the other hand, when doing Lua to haproxy smp conversion, the boolean type is properly perserved. Of course this situation is not desirable and can lead to unexpected results. However we cannot simply fix the behavior because in Lua boolean and integer types are not are completely distinct types and cannot be used interchangeably. So in order to prevent breaking existing scripts logic, in this patch we add a dedicated lua tunable named "tune.lua.smp-preserve-bool" which can take the following values: - "on" : when converting haproxy smp to lua, boolean type is preserved - "off": when converting haproxy smp to lua, boolean is converted to integer (legacy behavior) For now, the tunable defaults to "off" to preserve historical behavior. However, when the option isn't set explicitly and lua is used, a warning will be emitted in order to raise user's awareness about this ambiguity. It is expected that the tunable could default to "on" in future versions, thus it is recommended to avoid setting it to "off" except when using existing Lua scripts that still rely on the old behavior regarding boolean smp to Lua conversion, and that they cannot be fixed easily. This should solve issue GH #2814. It may be relevant to backport this in haproxy 3.1.	2024-12-19 13:50:27 +01:00
William Lallemand	acb2c9eb8b	MINOR: ssl: improve HAVE_SSL_OCSP ifdef Allow to build correctly without OCSP. It could be disabled easily with OpenSSL build with OPENSSL_NO_OCSP. Or even with DEFINE="-DOPENSSL_NO_OCSP" on haproxy make line.	2024-12-19 10:53:05 +01:00
Willy Tarreau	a4f50c69e4	CLEANUP: hlua: use ASSUME_NONNULL() instead of ALREADY_CHECKED() The purpose of the test in hlua_applet_tcp_new() was precisely to declare non-nullity. Let's just do it using ASSUME_NONNULL() now.	2024-12-17 17:47:57 +01:00
Aurelien DARRAGON	70b5cd6794	MINOR: hlua: fix ambiguous hlua usage in hlua_filter_delete() In GH #2804, @Bbulatov reported that the result of hlua_stream_ctx_get() was used and de-referenced without checking if it's NULL in hlua_filter_delete() while other functions used to check for NULL before de-referencing it. In fact hlua_stream_ctx_get() can only return NULL if hlua_stream_ctx_prepare() failed or was not called on the current stream. Now because of the filter's API, since hlua_filter_delete() is mapped as detach method and hlua_filter_new() as attach method, and since hlua_filter_new() is responsible for calling hlua_stream_ctx_prepare(), there's no reason hlua_filter_delete() should be called if hlua_filter_new() failed or wasn't called. Thus we can assume that hlua can never be NULL in hlua_filter_delete(), so we add a BUG_ON() to ensure it is always the case and remove the ambiguity.	2024-12-02 17:22:51 +01:00
Aurelien DARRAGON	31784efad2	MINOR: hlua: add core.get_patref method core.get_patref() method may be used to get a reference to a pattern object (pat_ref struct which is used for maps and acl storage) from Lua by providing the reference name (filename for files, or prefix+name for opt or virtual pattern references). Lua documentation was updated.	2024-11-29 07:22:38 +01:00
Aurelien DARRAGON	3d250b3be8	MINOR: pattern: split pat_ref_set() split pat_ref_set() function in 2 distinct functions. Indeed, since `0844bed7d3` ("MEDIUM: map/acl: Improve pat_ref_set() efficiency (for "set-map", "add-acl" action perfs)"), pat_ref_set() prototype was updated to include an extra <elt> argument. But the logic behind is not explicit because the function will not only try to set <elt>, but also its duplicate (unlike pat_ref_set_elt() which only tries to update <elt>). Thus, to make it clearer and better distinguish between the key-based lookup version and the elt-based one, restotre pat_ref_set() previous prototype and add a dedicated pat_ref_set_elt_duplicate() that takes <elt> as argument and tries to update <elt> and all duplicates.	2024-11-26 16:12:05 +01:00
Aurelien DARRAGON	2ce0db4e4b	OPTION: map/hlua: make core.set_map() lookup more efficient `0844bed7d3` ("MEDIUM: map/acl: Improve pat_ref_set() efficiency (for "set-map", "add-acl" action perfs)") improved lookup efficiency for set-map http action, but the core.set_map() lua method which is built on the same construct was overlooked. Let's also benefit from this optim as it easily applies.	2024-11-20 16:14:13 +01:00
Aurelien DARRAGON	5d766260f0	MEDIUM: protocol: rely on AF_CUST_ABNS family to recognize ABNS sockets Now that we can easily distinguish regular UNIX socket from ABNS sockets by simply looking at the address family, stop looking at the first byte from addr->sun_path to guess if the socket is an ABNS one or not. Looking at the family is straightforward and will allow to differentiate between upcoming ABNSZ and ABNS (where looking at the first byte from path won't help anymore).	2024-10-29 12:14:37 +01:00
Willy Tarreau	78ac312bbd	MEDIUM: protocol: make abns a custom unix socket address family This is a pre-requisite to adding the abnsz socket address family: in this patch we make use of protocol API rework started by `732913f` ("MINOR: protocol: properly assign the sock_domain and sock_family") in order to implement a dedicated address family for ABNS sockets (based on UNIX parent family). Thanks to this, it will become trivial to implement a new ABNSZ (for abns zero) family which is essentially the same as ABNS but with a slight difference when it comes to path handling (ABNS uses the whole sun_path length, while ABNSZ's path is zero terminated and evaluation stops at 0) It was verified that this patch doesn't break reg-tests and behaves properly (tests performed on the CLI with show sess and show fd). Anywhere relevant, AF_CUST_ABNS is handled alongside AF_UNIX. If no distinction needs to be made, real_family() is used to fetch the proper real family type to handle it properly. Both stream and dgram were converted, so no functional change should be expected for this "internal" rework, except that proto will be displayed as "abns_{stream,dgram}" instead of "unix_{stream,dgram}". Before ("show sess" output): 0x64c35528aab0: proto=unix_stream src=unix:1 fe=GLOBAL be=<NONE> srv=<none> ts=00 epoch=0 age=0s calls=1 rate=0 cpu=0 lat=0 rq[f=848000h,i=0,an=00h,ax=] rp[f=80008000h,i=0,an=00h,ax=] scf=[8,0h,fd=21,rex=10s,wex=] scb=[8,1h,fd=-1,rex=,wex=] exp=10s rc=0 c_exp= After: 0x619da7ad74c0: proto=abns_stream src=unix:1 fe=GLOBAL be=<NONE> srv=<none> ts=00 epoch=0 age=0s calls=1 rate=0 cpu=0 lat=0 rq[f=848000h,i=0,an=00h,ax=] rp[f=80008000h,i=0,an=00h,ax=] scf=[8,0h,fd=22,rex=10s,wex=] scb=[8,1h,fd=-1,rex=,wex=] exp=10s rc=0 c_exp= Co-authored-by: Aurelien DARRAGON <adarragon@haproxy.com>	2024-10-29 12:14:25 +01:00
Aurelien DARRAGON	f88f162868	BUG/MEDIUM: hlua: properly handle sample func errors in hlua_run_sample_{fetch,conv}() To execute sample fetches and converters from lua. hlua API leverages the sample API. Prior to executing the sample func, the arg checker is called from hlua_run_sample_{fetch,conv}() to detect potential errors. However, hlua_run_sample_{fetch,conv}() both pass NULL as <err> argument, but it is wrong for two reasons. First we miss an opportunity to report precise error messages to help the user know what went wrong during the check.. and more importantly, some val check functions consider that the <err> pointer is never NULL. This is the case for example with check_crypto_hmac(). Because of this, when such val check functions encounter an error, they will crash the process because they will try to de-reference NULL. This bug was discovered and reported by GH user @JB0925 on #2745. Perhaps val check functions should make sure that the provided <err> pointer is != NULL prior to de-referencing it. But since there are multiple occurences found in the code and the API isn't clear about that, it is easier to fix the hlua part (caller) for now. To fix the issue, let's always provide a valid <err> pointer when leveraging val_arg() check function pointer, and make use of it in case or error to report relevant message to the user before freeing it. It should be backported to all stable versions.	2024-10-08 12:00:42 +02:00
Aurelien DARRAGON	d0e0105181	BUG/MEDIUM: hlua: make hlua_ctx_renew() safe hlua_ctx_renew() is called from unsafe places where the caller doesn't expect it to LJMP.. however hlua_ctx_renew() makes use of Lua library function that could potentially raise errors, such as lua_newthread(), and it does nothing to catch errors. Because of this, haproxy could unexpectedly crash. This was discovered and reported by GH user @JB0925 on #2745. To fix the issue, let's simply make hlua_ctx_renew() safe by applying the same logic implemented for hlua_ctx_init() or hlua_ctx_destroy(), which is catching Lua errors by leveraging SET_SAFE_LJMP_PARENT() helper. It should be backported to all stable versions.	2024-10-08 12:00:36 +02:00
Aperence	a7b04e383a	MINOR: tools: extend str2sa_range to add an alt parameter Add a new parameter "alt" that will store wether this configuration use an alternate protocol. This alt pointer will contain a value that can be transparently passed to protocol_lookup to obtain an appropriate protocol structure. This change is needed to allow for example the servers to know if it need to use an alternate protocol or not.	2024-08-30 18:53:49 +02:00
Christopher Faulet	e5e36ce097	BUG/MEDIUM: hlua/cli: Fix lua CLI commands to work with applet's buffers In 3.0, the CLI applet was rewritten to use its own buffers. However, the lua part, used to register CLI commands at runtime, was not updated accordingly. It means the lua CLI commands still try to write in the channel buffers. This is of course totally unexepected and not supported. Because of this bug, the applet hangs intead of returning the command result. The registration of lua CLI commands relies on the lua TCP applets. So the send and receive functions were fixed to use the applet's buffer when it is required and still use the channel buffers otherwies. This way, other lua TCP applets can still run on the legacy mode, without the applet's buffers. This patch must be backported to 3.0.	2024-07-02 10:05:40 +02:00
Aurelien DARRAGON	185d230e2c	BUG/MINOR: hlua: report proper context upon error in hlua_cli_io_handler_fct() As a result of copy pasting, hlua_cli_io_handler_fct() used to report lua exceptions like E_ETMOUT as "Lua converter" instead of "Lua cli". Let's fix that. It could be backported to all stable versions. [ada: for older versions, HLUA_E_BTMOUT case didn't exist so it has to be skipped]	2024-06-26 11:06:24 +02:00
Aurelien DARRAGON	983513d901	DEBUG: hlua: distinguish burst timeout errors from exec timeout errors hlua burst timeout was introduced in `58e36e5b1` ("MEDIUM: hlua: introduce tune.lua.burst-timeout"). It is a safety measure that allows to detect when too much time is spent on a single lua execution (between 2 interruptions/yields), meaning that the current thread is not able to perform other tasks. Such scenario should be avoided because it will cause thread contention which may have negative performance impact and could cause the watchdog to trigger. When the burst timeout is exceeded, the current Lua execution is aborted and a timeout error is reported to the user. Unfortunately, the same error is currently being reported for cumulative (AKA execution) timeout and for burst timeout, which may be confusing to the user. Indeed, "execution timeout" error historically results from the current hlua context exceeding the total (cumulative) time it's allowed to run. It is set per lua context using the dedicated tunables: - tune.lua.session-timeout - tune.lua.task-timeout - tune.lua.service-timeout We've already faced an user report where the user was able to trigger the burst timeout and got "Lua task: execution timeout." error while the user didn't set cumulative timeout. Thus the error was actually confusing because it was indeed the burst timeout which was causing it due to the use of cpu-intensive call from within the task without sufficient manual "yield" keypoints around the cpu-intensive call to ensure it runs on a dedicated scheduler cycle. In this patch we make it so burst timeout related errors are reported as "burst timeout" errors instead of "execution timeout" errors (which in fact became the generic timeout errors catchall with `58e36e5b1`). To do this, hlua_timer_check() now returns a different value depending if the exeeded timeout is the burst one or the cumulative one, which allows us to return either HLUA_E_ETMOUT or HLUA_E_BTMOUT in hlua_ctx_resume(). It should improve the situation described in GH #2356 and may possibly be backported with `58e36e5b1` to improve error reporting if it applies without resistance.	2024-06-14 18:25:58 +02:00
Aurelien DARRAGON	2bde0d64dd	CLEANUP: hlua: simplify ambiguous lua_insert() usage in hlua_ctx_resume() 'lua_insert(lua->T, -lua_gettop(lua->T))' is actually used to rotate the top value with the bottom one, thus the code was overkill and the comment was actually misleading, let's fix that by using explicit equivalent form (absolute index). It may be backported with `5508db9a2` ("BUG/MINOR: hlua: fix unsafe lua_tostring() usage with empty stack") to all stable versions to ease code maintenance.	2024-06-04 16:31:38 +02:00
Aurelien DARRAGON	755c2daf0f	BUG/MINOR: hlua: fix leak in hlua_ckch_set() error path in hlua_ckch_commit_yield() and hlua_ckch_set(), when an error occurs, we enter the error path and try to raise an error from the <err> msg pointer which must be freed afterwards. However, the fact that luaL_error() never returns was overlooked, because of that <err> msg is never freed in such case. To fix the issue, let's use hlua_pushfstring_safe() helper to push the err on the lua stack and then free it before throwing the error using lua_error(). It should be backported up to 2.6 with `30fcca18` ("MINOR: ssl/lua: CertCache.set() allows to update an SSL certificate file")	2024-06-04 16:31:30 +02:00
Aurelien DARRAGON	2be94c008e	CLEANUP: hlua: get rid of hlua_traceback() security checks Thanks to the previous commit, we may now assume that hlua_traceback() won't LJMP, so it's safe to use it from unprotected environment without any precautions.	2024-06-04 16:31:22 +02:00
Aurelien DARRAGON	365ee28510	BUG/MINOR: hlua: prevent LJMP in hlua_traceback() Function is often used on error paths where no precaution is taken against LJMP. Since the function is used on error paths (which include out-of-memory error paths) the function lua_getinfo() could also raise a memory exception, causing the process to crash or improper error handling if the caller isn't prepared against that eventually. Since the function is only used on rare events (error handling) and is lacking the __LJMP prototype pefix, let's make it safe by protecting the lua_getinfo() call so that hlua_traceback() callers may use it safely now (the function will always succeed, output will be truncated in case of error). This could be backported to all stable versions.	2024-06-04 16:31:15 +02:00
Aurelien DARRAGON	f0e5b825cf	BUG/MINOR: hlua: fix unsafe hlua_pusherror() usage Following previous commit's logic: hlua_pusherror() is mainly used from cleanup paths where the caller isn't protected against LJMPs. Caller was tempted to think that the function was safe because func prototype was lacking the __LJMP prefix. Let's make the function really LJMP-safe by wrapping the sensitive calls under lua_pcall(). This may be backported to all stable versions.	2024-06-04 16:31:09 +02:00
Aurelien DARRAGON	c0a3c1281f	BUG/MINOR: hlua: don't use lua_pushfstring() when we don't expect LJMP lua_pushfstring() is used in multiple cleanup paths (upon error) to push the error message that will be raised by lua_error(). However this is often done from an unprotected environment, or in the middle of a cleanup sequence, thus we don't want the function to LJMP! (it may cause various issues ranging from memory leaks to crashing the process..) Hopefully this has very few chances of happening but since the use of lua_pushfstring() is limited to error reporting here, it's ok to use our own hlua_pushfstring_safe() implementation with a little overhead to ensure that the function will never LJMP. This could be backported to all stable versions.	2024-06-04 16:31:01 +02:00
Aurelien DARRAGON	6e484996c6	CLEANUP: hlua: use hlua_pusherror() where relevant In hlua_map_new(), when error occurs we use a combination of luaL_where, lua_pushfstring and lua_concat to build the error string before calling lua_error(). It turns out that we already have the hlua_pusherror() macro which is exactly made for that purpose so let's use it. It could be backported to all stable versions to ease code maintenance.	2024-06-04 16:30:55 +02:00
Aurelien DARRAGON	a63f2cde94	CLEANUP: hlua: fix CertCache class comment CLASS_CERTCACHE is used to declare CertCache global object, not Regex one This copy-paste typo introduced was in `30fcca18` ("MINOR: ssl/lua: CertCache.set() allows to update an SSL certificate file")	2024-06-03 17:00:06 +02:00
Aurelien DARRAGON	4f906a9c38	BUG/MINOR: hlua: use CertCache.set() from various hlua contexts Using CertCache.set() from init context wasn't explicitly supported and caused the process to crash: crash.lua: core.register_init(function() CertCache.set{filename="reg-tests/ssl/set_cafile_client.pem", ocsp=""} end) crash.conf: global lua-load crash.lua listen front bind localhost:9090 ssl crt reg-tests/ssl/set_cafile_client.pem ca-file reg-tests/ssl/set_cafile_interCA1.crt verify none ./haproxy -f crash.conf [NOTICE] (267993) : haproxy version is 3.0-dev2-640ff6-910 [NOTICE] (267993) : path to executable is ./haproxy [WARNING] (267993) : config : missing timeouts for proxy 'front'. \| While not properly invalid, you will certainly encounter various problems \| with such a configuration. To fix this, please ensure that all following \| timeouts are set to a non-zero value: 'client', 'connect', 'server'. [1] 267993 segmentation fault (core dumped) ./haproxy -f crash.conf This is because in hlua_ckch_set/hlua_ckch_commit_yield, we always consider that we're being called from a yield-capable runtime context. As such, hlua_gethlua() is never checked for NULL and we systematically try to wake hlua->task and yield every 10 instances. In fact, if we're called from the body or init context (that is, during haproxy startup), hlua_gethlua() will return NULL, and in this case we shouldn't care about yielding because it is ok to commit all instances at once since haproxy is still starting up. Also, when calling CertCache.set() from a non-yield capable runtime context (such as hlua fetch context), we kept doing as if the yield succeeded, resulting in unexpected function termination (operation would be aborted and the CertCache lock wouldn't be released). Instead, now we explicitly state in the doc that CertCache.set() cannot be used from a non-yield capable runtime context, and we raise a runtime error if it is used that way. These bugs were discovered by reading the code when trying to address Svace report documented by @Bbulatov GH #2586. It should be backported up to 2.6 with `30fcca18` ("MINOR: ssl/lua: CertCache.set() allows to update an SSL certificate file")	2024-06-03 17:00:00 +02:00
Aurelien DARRAGON	231d3d32be	MEDIUM: hlua: take nbthread into account in hlua_get_nb_instruction() Based on Willy's idea (from 3.0-dev6 announcement message): in this patch we try to reduce the max latency that can be caused by running lua scripts with default settings. Indeed, by default, hlua engine is allowed to process up to 10k instructions per batch. While this value was found to be the optimal one for a single thread, it turns out that keeping a thread busy for 10k lua instructions could increase thread contention. This is especially true when the script is loaded with 'lua-load', because in that case the current thread owns the main lua lock and prevent other threads from making any progress if they're also waiting on the main lock. Thanks to Thierry Fournier's work, we know that performance-wise we can reach optimal performance by sticking between 500 and 10k instructions per batch. Given that, when the script is loaded using 'lua-load', if no "tune.lua.forced-yield" was set by the user, we automatically divide the default value (10K) by the number of threads haproxy can use to reduce thread contention (given that all threads could compete for the main lua lock), however we make sure not to return a value below 500, because Thierry's work showed that this would come with a significant performance loss. The historical behavior may still be enforced by setting "tune.lua.forced-yield" to 10000 in the global config section.	2024-05-15 11:59:44 +02:00
Aurelien DARRAGON	e60d9dddf8	MINOR: hlua: add hlua_nb_instruction getter No functional behavior change, but this will ease the work of dynamically computing hlua_nb_instruction value depending on various inputs.	2024-05-15 11:59:37 +02:00
Aurelien DARRAGON	07b2e84bce	BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread (2nd try) While trying to reproduce another crash case involving lua filters reported by @bgrooot on GH #2467, we found out that mixing filters loaded from different contexts ('lua-load' vs 'lua-load-per-thread') for the same stream isn't supported and may even cause the process to crash. Historically, mixing lua-load and lua-load-per-threads for a stream wasn't supported, but this changed thanks to `0913386` ("BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread"). However, the above fix didn't consider lua filters's use-case properly: unlike lua fetches, actions or even services, lua filters don't simply use the stream hlua context as a "temporary" hlua running context to process some hlua code. For fetches, actions.. hlua executions are processed sequentially, so we simply reuse the hlua context from the previous action/fetch to run the next one (this allows to bypass memory allocations and initialization, thus it increases performance), unless we need to run on a different hlua state-id, in which case we perform a reset of the hlua context. But this cannot work with filters: indeed, once registered, a filter will last for the whole stream duration. It means that the filter will rely on the stream hlua context from ->attach() to ->detach(). And here is the catch, if for the same stream we register 2 lua filters from different contexts ('lua-load' + 'lua-load-per-thread'), then we have an issue, because the hlua stream will be re-created each time we switch between runtime contexts, which means each time we switch between the filters (may happen for each stream processing step), and since lua filters rely on the stream hlua to carry context between filtering steps, this context will be lost upon a switch. Given that lua filters code was not designed with that in mind, it would confuse the code and cause unexpected behaviors ranging from lua errors to crashing process. So here we take another approach: instead of re-creating the stream hlua context each time we switch between "global" and "per-thread" runtime context, let's have both of them inside the stream directly as initially suggested by Christopher back then when talked about the original issue. For this we leverage hlua_stream_ctx_prepare() and hlua_stream_ctx_get() helper functions which return the proper hlua context for a given stream and state_id combination. As for debugging infos reported after ha_panic(), we check for both hlua runtime contexts to check if one of them was active when the panic occured (only 1 runtime ctx per stream may be active at a given time). This should be backported to all stable versions with `0913386` ("BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread") This commit depends on: - "DEBUG: lua: precisely identify if stream is stuck inside lua or not" [for versions < 2.9 the ha_thread_dump_one() part should be skipped] - "MINOR: hlua: use accessors for stream hlua ctx" For 2.4, the filters API didn't exist. However it may be a good idea to backport it anyway because ->set_priv()/->get_priv() from tcp/http lua applets may also be affected by this bug, plus it will ease code maintenance. Of course, filters-related parts should be skipped in this case.	2024-03-13 09:24:46 +01:00
Aurelien DARRAGON	aa554be69c	MINOR: hlua: use accessors for stream hlua ctx Change hlua_stream_ctx_prepare() prototype so that it now returns the proper hlua ctx on success instead of returning a boolean. Add hlua_stream_ctx_get() to retrieve hlua ctx out of a given stream. This way we may easily change the storage mechanism for hlua stream in the future without extensive code changes. No backport needed unless a commit depends on it.	2024-03-13 09:24:46 +01:00
Aurelien DARRAGON	1a2cdf64c9	DEBUG: lua: precisely identify if stream is stuck inside lua or not When ha_panic() is called by the watchdog, we try to guess from ha_task_dump() and ha_thread_dump_one() if the thread was stuck while executing lua from the stream context. However we consider this is the case by simply checking if the stream hlua context was set, but this is not very precise because if the hlua context is set, then it simply means that at least one lua instruction was executed at the stream level, not that the stuck was currently executing lua when the panic occured. This is especially true with filters, one could simply register a lua filter that does nothing but this will still end up initializing the stream hlua context for each stream. If the thread end up being stuck during the stream handling, then debug dumping functions will report that the stream was stuck while handling lua, which is not necessarilly true, and could in fact confuse us even more. So here we take another approach, we add the BUSY flag to hlua context: this flag is set by hlua_ctx_resume() around lua_resume() call, this way we can precisely tell if the thread was handling lua when it was interrupted, and we rely on this flag in debug functions to check if the thread was effectively stuck inside lua or not while processing the stream No backport needed unless a commit depends on it.	2024-03-13 09:24:46 +01:00
Aurelien DARRAGON	85d81e4d0a	BUG/MINOR: hlua: fix missing lock in hlua_filter_delete() hlua_filter_delete() calls hlua_unref() on the stream hlua stack, but we should own the lock prior to manipulating the stack. This should be backported up to 2.6.	2024-03-13 09:24:46 +01:00
Aurelien DARRAGON	ecd8f3bfd7	BUG/MINOR: hlua: missing lock in hlua_filter_new() This is a complementary patch to `8670db7` ("BUG/MAJOR: hlua: improper lock usage with hlua_ctx_resume()") for hlua_filter_new(). Indeed, the HLUA_E_ERRMSG case still relies on the lua stack but didn't take the lock to do so. This should be backported up to 2.6.	2024-03-13 09:24:46 +01:00
Aurelien DARRAGON	4aefffc38c	BUG/MINOR: hlua: segfault when loading the same filter from different contexts Trying to register the same lua filter from global and per-thread context (using 'lua-load' + 'lua-load-per-thread') causes a segmentation fault in hlua_post_init(). This is due to a simple copy paste error as we try to print the function name in the error message (like we do when loading the same lua function from different contexts) instead of the filter name. This should be backported up to 2.6.	2024-03-13 09:24:46 +01:00
Aurelien DARRAGON	75c8a1bc2d	CLEANUP: hlua: txn class functions may LJMP Clarify that some txn related class functions may LJMP by adding the __LJMP tag to their prototype.	2024-03-04 16:48:51 +01:00
Aurelien DARRAGON	f364f4670b	MINOR: hlua: use SEND_ERR to report errors in hlua_event_runner() Instead of reporting lua errors using ha_alert(), let's use SEND_ERR() helper which will also try to generate a log message according to lua log settings.	2024-03-04 16:48:48 +01:00
Aurelien DARRAGON	e1b0031650	BUG/MINOR: hlua: don't call ha_alert() in hlua_event_subscribe() hlua_event_subscribe() is meant to be called from a protected lua env during init and/or runtime. As such, only hlua_event_sub() makes uses of it: when an error happens hlua_event_sub() will already raise a Lua exception. Thus it's not relevant to use ha_alert() there as it could generate log pollution (error is relevant from Lua script point of view, not from haproxy one). This could be backported in 2.8.	2024-03-04 16:48:42 +01:00
Aurelien DARRAGON	8670db7a89	BUG/MAJOR: hlua: improper lock usage with hlua_ctx_resume() hlua_ctx_resume() itself can safely be used as-is in a multithreading context because it takes care of taking the lua lock. However, when hlua_ctx_resume() returns, the lock is released and it is thus the caller's responsibility to ensure it owns the lock prior to performing additional manipulations on the Lua stack. Unfortunately, since early haproxy lua implementation, we used to do it wrong: The most common hlua_ctx_resume() pattern we can find in the code (because it was duplicated over and over over time) is the following: \|ret = hlua_ctx_resume() \|switch (ret) { \| case HLUA_E_OK: \| break; \| case HLUA_E_ERRMSG: \| break; \| [...] \|} Problem is: for some of the switch cases, we still perform lua stack manipulations. This is the case for the HLUA_E_ERRMSG for instance where we often use lua_tostring() to retrieve last lua error message on the top of the stack, or sometimes for the HLUA_E_OK case, when we need to perform some lua cleanup logic once the resume ended. But all of this is done WITHOUT the lua lock, so this means that the main lua stack could be accessed simultaneously by concurrent threads when a script was loaded using 'lua-load'. While it is not critical for switch-cases dedicated to error handling, (those are not supposed to happen very often), it can be very problematic for stack manipulations occuring in the HLUA_E_OK case under heavy load for instance. In this case, main lua stack corruptions will eventually happen. This is especially true inside hlua_filter_new(), where this bug was known to cause lua stack corruptions under load, leading to lua errors and even crashing the process as reported by @bgrooot in GH #2467. The fix is relatively simple, once hlua_ctx_resume() returns: we should consider that ANY lua stack access should be lua-lock protected. If the related lua calls may raise lua errors, then (RE)SET_SAFE_LJMP combination should be used as usual (it allows to lock the lua stack and catch lua exceptions at the same time), else hlua_{lock,unlock} may be used if no exceptions are expected. This patch should fix GH #2467. It should be backported to all stable versions. [ada: some ctx adj will be required for older versions as event_hdl doesn't exist prior to 2.8 and filters were implemented in 2.5, thus some chunks won't apply]	2024-03-04 16:48:31 +01:00

1 2 3 4 5 ...

944 Commits