The commit reverts following commits:
* 83926a04 BUG/MEDIUM: debug/lua: Don't dump the lua stack if not dumpable
* a61789a1 MEDIUM: lua: Use a per-thread counter to track some non-reentrant parts of lua
Instead of relying on a Lua function to print the lua traceback into the
debugger, we are now using our own internal function (hlua_traceback()).
This one does not allocate memory and use a chunk instead. This avoids any
issue with a possible deadlock in the memory allocator because the thread
processing was interrupted during a memory allocation.
This patch relies on the commit "BUG/MEDIUM: debug/lua: Use internal hlua
function to dump the lua traceback". Both must be backported wherever the
patches above are backported, thus as far as 2.0
The separator string is now configurable, passing it as parameter when the
function is called. In addition, the message have been slightly changed to
be a bit more readable.
Some parts of the Lua are non-reentrant. We must be sure to carefully track
these parts to not dump the lua stack when it is interrupted inside such
parts. For now, we only identified the custom lua allocator. If the thread
is interrupted during the memory allocation, we must not try to print the
lua stack wich also allocate memory. Indeed, realloc() is not
async-signal-safe.
In this patch we introduce a thread-local counter. It is incremented before
entering in a non-reentrant part and decremented when exiting. It is only
performed in hlua_alloc() for now.
The default proxy was passed as a variable to all parsers instead of a
const, which is not without risk, especially when some timeout parsers used
to make some int pointers point to the default values for comparisons. We
want to be certain that none of these parsers will modify the defaults
sections by accident, so it's important to mark this proxy as const.
This patch touches all occurrences found (89).
The actconns list creates massive contention on low server counts because
it's in fact a list of streams using a server, all threads compete on the
list's head and it's still possible to see some watchdog panics on 48
threads under extreme contention with 47 threads trying to add and one
thread trying to delete.
Moving this list per thread is trivial because it's only used by
srv_shutdown_streams(), which simply required to iterate over the list.
The field was renamed to "streams" as it's really a list of streams
rather than a list of connections.
There are multiple per-thread lists in the listeners, which isn't the
most efficient in terms of cache, and doesn't easily allow to store all
the per-thread stuff.
Now we introduce an srv_per_thread structure which the servers will have an
array of, and place the idle/safe/avail conns tree heads into. Overall this
was a fairly mechanical change, and the array is now always initialized for
all servers since we'll put more stuff there. It's worth noting that the Lua
code still has to deal with its own deinit by itself despite being in a
global list, because its server is not dynamically allocated.
It's a real pain not to have access to the list of all registered servers,
because whenever there is a need to late adjust their configuration, only
those attached to regular proxies are seen, but not the peers, lua, logs
nor DNS.
What this patch does is that new_server() will automatically add the newly
created server to a global list, and it does so as well for the 1 or 2
statically allocated servers created for Lua. This way it will be possible
to iterate over all of them.
The "socket_tcp" and "socket_ssl" servers had no config file name nor
line number, but this is sometimes annoying during debugging or later
in error messages, while all other places using new_server() or
parse_server() make sure to have a valid file:line set. Let's set
something to address this.
It's been too short for quite a while now and is now full. It's still
time to extend it to 32-bits since we have room for this without
wasting any space, so we now gained 16 new bits for future flags.
The values were not reassigned just in case there would be a few
hidden u16 or short somewhere in which these flags are placed (as
it used to be the case with stream->pending_events).
The patch is tagged MEDIUM because this required to update the task's
process() prototype to use an int instead of a short, that's quite a
bunch of places.
The remaining contention on the server lock solely comes from
sess_change_server() which takes the lock to add and remove a
stream from the server's actconn list. This is both expensive
and pointless since we have mt-lists, and this list is only
used by the CLI's "shutdown server sessions" command!
Let's migrate to an mt-list and remove the need for this costly
lock. By doing so, the request rate increased by ~1.8%.
The server idle/safe/available connection lists are replaced with ebmb-
trees. This is used to store backend connections, with the new field
connection hash as the key. The hash is a 8-bytes size field, used to
reflect specific connection parameters.
This is a preliminary work to be able to reuse connection with SNI,
explicit src/dst address or PROXY protocol.
The EOM block may be removed. The HTX_FL_EOM flags is enough. Most of time,
to know if the end of the message is reached, we just need to have an empty
HTX message with HTX_FL_EOM flag set. It may also be detected when the last
block of a message with HTX_FL_EOM flag is manipulated.
Removing EOM blocks simplifies the HTX message filling. Indeed, there is no
more edge problems when the message ends but there is no more space to write
the EOM block. However, some part are more tricky. Especially the
compression filter or the FCGI mux. The compression filter must finish the
compression on the last DATA block. Before it was performed on the EOM
block, an extra DATA block with the checksum was added. Now, we must detect
the last DATA block to be sure to finish the compression. The FCGI mux on
its part must be sure to reserve the space for the empty STDIN record on the
last DATA block while this record was inserted on the EOM block.
The H2 multiplexer is probably the part that benefits the most from this
change. Indeed, it is now fairly easier to known when to set the ES flag.
The HTX documentaion has been updated accordingly.
Lua requires LLONG_MAX defined with __USE_ISOC99 which is set by
_GNU_SOURCE, not necessarely defined by default on old compiler/glibc.
$ make V=1 TARGET=linux-glibc-legacy USE_THREAD= USE_ACCEPT4= USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1 USE_LUA=1
..
cc -Iinclude -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-strict-aliasing -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-missing-field-initializers -DUSE_EPOLL -DUSE_NETFILTER -DUSE_PCRE -DUSE_POLL -DUSE_TPROXY -DUSE_LINUX_TPROXY -DUSE_LINUX_SPLICE -DUSE_LIBCRYPT -DUSE_CRYPT_H -DUSE_GETADDRINFO -DUSE_OPENSSL -DUSE_LUA -DUSE_FUTEX -DUSE_ZLIB -DUSE_CPU_AFFINITY -DUSE_DL -DUSE_RT -DUSE_PRCTL -DUSE_THREAD_DUMP -I/usr/include/openssl101e/ -DUSE_PCRE -I/usr/include -DCONFIG_HAPROXY_VERSION=\"2.4-dev5-73246d-83\" -DCONFIG_HAPROXY_DATE=\"2021/01/21\" -c -o src/hlua.o src/hlua.c
In file included from /usr/local/include/lua.h:15,
from /usr/local/include/lauxlib.h:15,
from src/hlua.c:16:
/usr/local/include/luaconf.h:581:2: error: #error "Compiler does not support 'long long'. Use option '-DLUA_32BITS' or '-DLUA_C89_NUMBERS' (see file 'luaconf.h' for details)"
..
cc -Iinclude -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-strict-aliasing -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-missing-field-initializers -DUSE_EPOLL -DUSE_NETFILTER -DUSE_PCRE -DUSE_POLL -DUSE_TPROXY -DUSE_LINUX_TPROXY -DUSE_LINUX_SPLICE -DUSE_LIBCRYPT -DUSE_CRYPT_H -DUSE_GETADDRINFO -DUSE_OPENSSL -DUSE_LUA -DUSE_FUTEX -DUSE_ZLIB -DUSE_CPU_AFFINITY -DUSE_DL -DUSE_RT -DUSE_PRCTL -DUSE_THREAD_DUMP -I/usr/include/openssl101e/ -DUSE_PCRE -I/usr/include -DCONFIG_HAPROXY_VERSION=\"2.4-dev5-73246d-83\" -DCONFIG_HAPROXY_DATE=\"2021/01/21\" -c -o src/hlua_fcn.o src/hlua_fcn.c
In file included from /usr/local/include/lua.h:15,
from /usr/local/include/lauxlib.h:15,
from src/hlua_fcn.c:17:
/usr/local/include/luaconf.h:581:2: error: #error "Compiler does not support 'long long'. Use option '-DLUA_32BITS' or '-DLUA_C89_NUMBERS' (see file 'luaconf.h' for details)"
..
Cc: Thierry Fournier <tfournier@arpalert.org>
hlua_init() uses 'idx' only in openssl related code, while 'i' is used
in shared code and is safe to be reused. This commit replaces the use of
'idx' with 'i'
$ make V=1 TARGET=linux-glibc USE_LUA=1 USE_OPENSSL=
..
cc -Iinclude -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -DUSE_EPOLL -DUSE_NETFILTER -DUSE_POLL -DUSE_THREAD -DUSE_BACKTRACE -DUSE_TPROXY -DUSE_LINUX_TPROXY -DUSE_LINUX_SPLICE -DUSE_LIBCRYPT -DUSE_CRYPT_H -DUSE_GETADDRINFO -DUSE_LUA -DUSE_FUTEX -DUSE_ACCEPT4 -DUSE_CPU_AFFINITY -DUSE_TFO -DUSE_NS -DUSE_DL -DUSE_RT -DUSE_PRCTL -DUSE_THREAD_DUMP -I/usr/include/lua5.3 -I/usr/include/lua5.3 -DCONFIG_HAPROXY_VERSION=\"2.4-dev5-37286a-78\" -DCONFIG_HAPROXY_DATE=\"2021/01/21\" -c -o src/hlua.o src/hlua.c
src/hlua.c: In function 'hlua_init':
src/hlua.c:9145:6: warning: unused variable 'idx' [-Wunused-variable]
9145 | int idx;
| ^~~
This is from the output of codespell. It's done at once over a bunch
of files and only affects comments, so there is nothing user-visible.
No backport needed.
During a configuration check valgrind reports:
==14425== 0 bytes in 106 blocks are definitely lost in loss record 1 of 107
==14425== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==14425== by 0x4C2FDEF: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==14425== by 0x443CFC: hlua_alloc (hlua.c:8662)
==14425== by 0x5F72B11: luaM_realloc_ (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==14425== by 0x5F78089: luaH_free (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==14425== by 0x5F707D3: sweeplist (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==14425== by 0x5F710D0: luaC_freeallobjects (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==14425== by 0x5F7715D: close_state (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==14425== by 0x443D4C: hlua_deinit (hlua.c:9302)
==14425== by 0x543F88: deinit (haproxy.c:2742)
==14425== by 0x5448E7: deinit_and_exit (haproxy.c:2830)
==14425== by 0x5455D9: init (haproxy.c:2044)
This is due to Lua calling `hlua_alloc()` with `ptr = NULL` and `nsize = 0`.
While `realloc` is supposed to be equivalent `free()` if the size is `0` this
is only required for a non-NULL pointer. Apparently my allocator (or valgrind)
actually allocates a zero size area if the pointer is NULL, possibly taking up
some memory for management structures.
Fix this leak by specifically handling the case where both the pointer and the
size are `0`.
This bug appears to have been introduced with the introduction of the
multi-threaded Lua, thus this fix is specific for 2.4. No backport needed.
It is now possible to set the buffer used by the channel request buffer when
a stream is created. It may be useful if input data are already received,
instead of waiting the first call to the mux rcv_buf() callback. This change
is mandatory to support H1 connection with no stream attached.
For now, the multiplexers don't pass any buffer. BUF_NULL is thus used to
call stream_create_from_cs().
It seems to me that lua_close() must be called on all states at deinit
time, not just the first two ones. This is likely a remnant of commit
59f11be43 ("MEDIUM: lua-thread: Add the lua-load-per-thread directive").
There should likely be some memory leak reports when using Lua without
this fix, though none were observed for now.
No backport is needed as this was merged into 2.4-dev.
Lua dedicated TCP, HTTP and SSL socket and proxies must be initialized
once. Right now, they are initialized from the Lua init state, but since
commit 59f11be43 ("MEDIUM: lua-thread: Add the lua-load-per-thread
directive") this function is called one time per lua context. This
caused some fields to be cleared and overwritten, and pre-allocated
object to be lost. This is why the address sanitizer detected memory
leaks from the socket_ssl server initialization.
Let's move all the state-independent part of the function to the
hlua_init() function to avoid this.
No backport is needed, this is only 2.4-dev.
The goal is to allow execution of one main lua state per thread.
This patch contains the main job. The lua init is done using these
steps:
- "lua-load-per-thread" loads the lua code in the first thread
- it creates the structs
- it stores loaded files
- the 1st step load is completed (execution of hlua_post_init)
and now, we known the number of threads
- we initilize lua states for all remaining threads
- for each one, we load the lua file
- for each one, we execute post-init
Once all is loaded, we control consistency of functions references.
The rules are:
- a function reference cannot be in the shared lua state and in
a per-thread lua state at the same time.
- if a function reference is declared in a per-thread lua state, it
must be declared in all per-thread lua states
The goal is to allow execution of one main lua state per thread.
The array introduces storage of one reference per thread, because each
lua state can have different reference id for a same function. A function
returns the preferred state id according to configuration and current
thread id.
The goal is to allow execution of one main lua state per thread.
"state_from" is a pointer to the parent lua state. "state_id"
is the index of the parent state id in the reference lua states
array. "state_id" is better because the lock is a "== 0" test
which is quick than pointer comparison. In other way, the state_id
index could index other things the the Lua state concerned. I
think to the function references.
The goal is to allow execution of one main lua state per thread.
This function will initialize the struct with other things than 0.
With this function helper, the initialization is centralized and
it prevents mistakes. This patch also keeps a reference to each
declared function in a list. It will be useful in next patches to
control consistency of declared references.
The goal is to allow execution of one main lua state per thread.
The array of states is initialized at the max number of thread +1.
We define the index 0 is the common state shared by all threads
and should be locked. Other index index are dedicated to each
one thread. The old gL now becomes hlua_states[0].
The goal is to allow execution of one main lua state per thread.
This patch opens the way to addition of a per-thread dedicated lua state.
By passing the hlua we can figure the original state that's been used
and decide to lock or not.
The goal is to allow execution of one main lua state per thread.
Stop using locks in init part, we will use only in parts where
the parent lua state is known, so we could take decision about lock
according with the lua parent state.
The goal is to allow execution of one main lua state per thread.
This commit introduces this variable in the core. Lua state initialized
by thread will have access to this variable, which reports the executing
thread. 0 indicates the shared thread. Programs which must be executed
only once can check for core.thread <= 1.
The goal is to allow execution of one main lua state per thread.
This function will be called for each initialized lua state, so
one per thread. The split transforms the lua state variable from
global to local.
The goal is to allow execution of one main lua state per thread.
This function will be called once per thread, using different Lua
states. This patch prepares the work.
The goal is to allow execution of one main lua state per thread.
The function hlua_ctx_init() now gets the original lua state from
its caller. This allows the initialisation of lua_thread (coroutines)
from any master lua state.
The parent lua state is stored in the hlua struct.
This patch is a temporary transition, it will be modified later.
The goal is to allow execution of one main lua state per thread.
This is a preparative work in order to init more than one stack
in the lua-thread objective.
The goal is to allow execution of one main lua state per thread.
Because this struct will be filled after the configuration parser, we
cannot copy the content. The actual state of the Haproxy code doesn't
justify this change, it is an update preparing next steps.
The goal is to no longer use "struct hlua" with global main lua_state.
The usage of the "struct hlua" is no longer required. This patch replaces
this struct by another one.
Now, the usage of runtime Lua phase is separated from the start lua phase.
The goal is to no longer use "struct hlua" with global main lua_state.
This patch returns NULL value when some code tries go get the hlua struct
associated with a task through hlua_gethlua(). This functions is useful
only during runtime because the struct hlua contains only runtime states.
Some Lua functions allowed to yield are called from init environment.
I'm not sure this is a good practice. Maybe it will be clever to
disallow calling this kind of functions.
The goal is no longer using "struct hlua" with global main lua_state.
if somewhere in the code, hlua_ctx_renew() is called with a global Lua
context, we have a serious bug. A crash is better than working with
this bug, so this patch remove a useless control.
In other way, this control were used during hlua_post_init() function.
The function hlua_post_init() used a call to the runtime hlua_ctx_resume()
function. This call no longer exists.
The goal is to no longer use "struct hlua" with global main lua_state.
The hlua_post_init() is executed during start phase, it does not require
yielding nor any advanced runtime error processing. Let's simplify this
by re-implementing the code using lower-level functions which directly
take a state and not an hlua anymore.
The goal is to no longer use "struct hlua" with global main lua_state
and directly take the state instead.
This patch removes the implicit dependency to this struct with
the function hlua_prepend_path()
Let's switch memory accounting to atomics so that the allocator function
may safely be used from concurrent Lua states.
Given that this function is extremely hot on the call path, we try to
optimize it for the most common case, which is:
- no limit
- there's enough memory
The accounting is what is particuarly expensive in threads since all
CPUs compete for a cache line, so when the limit is not used, we don't
want to use accounting. However we need to preserve it during the boot
phase until we may parse a "tune.lua.maxmem" value. For this, we turn
the unlimited "0" value to ~0 at the end of the boot phase to mark the
definite end of accounting. The function then detects this value and
directly jumps to realloc() in this case.
When the limit is enforced however, we use a CAS to check and reserve
our share of memory, and we roll back on failure. The CAS is used both
for increments and decrements so that a single operation is enough to
update the counters.
The function really has the semantics of a realloc() except that it
also passes the old size to help with accounting. No need to special
case the free or malloc, realloc does everything we need.
Lua allows registering multiple sample-fetches, converters, action, cli,
applet/services with the same name. This is absolutely useless since only
the first registration will be used. This patch sends a warning if the case
is encountered.
This pach could be backported until 1.8, with the 3 associated patches:
- MINOR: actions: Export actions lookup functions
- MINOR: actions: add a function returning a service pointer from its name
- MINOR: cli: add a function to look up a CLI service description
Operation luaL_openlibs() and lua_prepend path are processed whithout
the safe context, so in case of failure Haproxy aborts or stops without
error message.
This patch could be backported until 1.8
"lua-load" doesn't check if the expected parameter is present. It tries to
open() directly the argument at second position. So if the filename is
omitted, it tries to load an empty filename.
This patch could be backported until 1.8