Commit Graph

133 Commits

Author SHA1 Message Date
Amaury Denoyelle
83281303f6 MINOR: stats: define stats-file output format support
Prepare stats function to handle a new format labelled "stats-file". Its
purpose is to generate a statistics dump with a format closed from the
CSV output. Such output will be then used to preload haproxy internal
counters on process startup.

stats-file output differs from a standard CSV on several points. First,
only an excerpt of all statistics is outputted. All values that does not
make sense to preload are excluded. For the moment, stats-file only list
stats fully defined via "struct stat_col" method. Contrary to a CSV, sll
columns of a stats-file will be filled. As such, empty field value is
used to mark stats which should not be outputted.

Some adaptation specifics to stats-file are necessary into
me_generate_field(). First, stats-file will output separatedly values
from frontend and backend sides with their own respective set of
columns. As such, an empty field value is returned if stat is not
defined for either frontend/listener, or backend/server when outputting
the other side. Also, as stats-file does not support empty column,
stcol_hide() is not used for it.

A minor adjustement was necessary for stats_fill_fe_line() to pass
context flags. This is necessary to detect stat output format. All other
listener/server/backend corresponding functions already have it.
2024-04-26 10:20:57 +02:00
Amaury Denoyelle
861370a6d4 MINOR: stats: update ambiguous "metrics" naming to "stat_cols"
The name "metrics" was chosen to represent the various list of haproxy
exposed statistics. However, it is deemed as ambiguous as some stats are
indeed metric in the true sense, but some are not, as highlighted by
various "enum field_origin" values.

Replace it by the new name "stat_cols" for statistic columns. Along with
the already existing notion of stat lines it should better reflect its
purpose.
2024-04-26 10:20:57 +02:00
Amaury Denoyelle
341bf913d4 MINOR: stats: use STAT_F_* prefix for flags
Some flags are defined during statistics generation and output. They use
the prefix STAT_* which is also used for other purposes. Rename them
with the new prefix STAT_F_* to differentiate them from the other
usages.
2024-04-22 16:25:18 +02:00
Amaury Denoyelle
e97375dcab MINOR: stats: use stricter naming stats/field/line
Several unique names were used for different purposes under statistics
implementation. This caused the code to be difficult to understand.

* stat/stats name is removed when a more specific name could be used
* restrict field usage to purely refer to <struct field> which
  represents a raw stat value.
* use "line" naming to represent an array of <struct field>
2024-04-22 16:25:18 +02:00
Amaury Denoyelle
8dbb74542f MINOR: stats: rename info stats
Info are used to expose haproxy global metrics. It is similar to proxy
statistics and any other module. As such, rename info indexes using
SI_I_INF_* prefix. Also info variable is renamed stat_line_info.

Thanks to this, naming is now consistent between info and other
statistics. It will help to integrate it as a "global" statistics
module.
2024-04-22 16:25:18 +02:00
Amaury Denoyelle
8fc0b18087 MINOR: stats: rename proxy stats
This commit is the first one of a serie which adjust naming convention
for stats module. The objective is to remove ambiguity and better
reflect how stats are implemented, especially since the introduction of
stats module.

This patch renames elements related to proxies statistics. One of the
main change is to rename ST_F_* statistics indexes prefix with the new
name ST_I_PX_*. This remove the reference to field which represents
another concept in the stats module. In the same vein, global
stat_fields variable is renamed metrics_px.
2024-04-22 16:25:18 +02:00
Willy Tarreau
1a088da7c2 MAJOR: stktable: split the keys across multiple shards to reduce contention
In order to reduce the contention on the table when keys expire quickly,
we're spreading the load over multiple trees. That counts for keys and
expiration dates. The shard number is calculated from the key value
itself, both when looking up and when setting it.

The "show table" dump on the CLI iterates over all shards so that the
output is not fully sorted, it's only sorted within each shard. The Lua
table dump just does the same. It was verified with a Lua program to
count stick-table entries that it works as intended (the test case is
reproduced here as it's clearly not easy to automate as a vtc):

  function dump_stk()
    local dmp = core.proxies['tbl'].stktable:dump({});
    local count = 0
    for _, __ in pairs(dmp) do
        count = count + 1
    end
    core.Info('Total entries: ' .. count)
  end

  core.register_action("dump_stk", {'tcp-req', 'http-req'}, dump_stk, 0);

  ##
  global
    tune.lua.log.stderr on
    lua-load-per-thread lua-cnttbl.lua

  listen front
    bind :8001
    http-request lua.dump_stk if { path_beg /stk }
    http-request track-sc1 rand(),upper,hex table tbl
    http-request redirect location /

  backend tbl
    stick-table size 100k type string len 12 store http_req_cnt

  ##
  $ h2load -c 16 -n 10000 0:8001/
  $ curl 0:8001/stk

  ## A count close to 100k appears on haproxy's stderr
  ## On the CLI, "show table tbl" | wc will show the same.

Some large parts were reindented only to add a top-level loop to iterate
over shards (e.g. process_table_expire()). Better check the diff using
git show -b.

The number of shards is decided just like for the pools, at build time
based on the max number of threads, so that we can keep a constant. Maybe
this should be done differently. For now CONFIG_HAP_TBL_BUCKETS is used,
and defaults to CONFIG_HAP_POOL_BUCKETS to keep the benefits of all the
measurements made for the pools. It turns out that this value seems to
be the most reasonable one without inflating the struct stktable too
much. By default for 1024 threads the value is 32 and delivers 980k RPS
in a test involving 80 threads, while adding 1kB to the struct stktable
(roughly doubling it). The same test at 64 gives 1008 kRPS and at 128
it gives 1040 kRPS for 8 times the initial size. 16 would be too low
however, with 675k RPS.

The stksess already have a shard number, it's the one used to decide which
peer connection to send the entry. Maybe we should also store the one
associated with the entry itself instead of recalculating it, though it
does not happen that often. The operation is done by hashing the key using
XXH32().

The peers also take and release the table's lock but the way it's used
it not very clear yet, so at this point it's sure this will not work.

At this point, this allowed to completely unlock the performance on a
80-thread setup:

 before: 5.4 Gbps, 150k RPS, 80 cores
  52.71%  haproxy    [.] stktable_lookup_key
  36.90%  haproxy    [.] stktable_get_entry.part.0
   0.86%  haproxy    [.] ebmb_lookup
   0.18%  haproxy    [.] process_stream
   0.12%  haproxy    [.] process_table_expire
   0.11%  haproxy    [.] fwrr_get_next_server
   0.10%  haproxy    [.] eb32_insert
   0.10%  haproxy    [.] run_tasks_from_lists

 after: 36 Gbps, 980k RPS, 80 cores
  44.92%  haproxy    [.] stktable_get_entry
   5.47%  haproxy    [.] ebmb_lookup
   2.50%  haproxy    [.] fwrr_get_next_server
   0.97%  haproxy    [.] eb32_insert
   0.92%  haproxy    [.] process_stream
   0.52%  haproxy    [.] run_tasks_from_lists
   0.45%  haproxy    [.] conn_backend_get
   0.44%  haproxy    [.] __pool_alloc
   0.35%  haproxy    [.] process_table_expire
   0.35%  haproxy    [.] connect_server
   0.35%  haproxy    [.] h1_headers_to_hdr_list
   0.34%  haproxy    [.] eb_delete
   0.31%  haproxy    [.] srv_add_to_idle_list
   0.30%  haproxy    [.] h1_snd_buf

WIP: uint64_t -> long

WIP: ulong -> uint

code is much smaller
2024-04-03 17:34:47 +02:00
Tim Duesterhus
f88ea5949c CLEANUP: Reapply strcmp.cocci (2)
This reapplies strcmp.cocci across the whole src/ tree.
2024-04-02 07:27:33 +02:00
Aurelien DARRAGON
3ac79b504a MEDIUM: server: make server_set_inetaddr() updater serializable
server_set_inetaddr() updater argument is a simple char * string
containing infos about the caller responsible for the update.

In this patch, we try to make this argument serializable, that is, make
it so that we can easily export it without having to keep the original
pointer passed by the caller or having to work with strings of variable
lengths.

This was a prerequisite for exposing more updater information through
SERVER_INETADDR event (upcoming patch).

Static strings were simply mapped to a fixed ID that can be converted back
to a string when needed using server_inetaddr_updater_by_to_str(). One
special case one made for the SERVER_INETADDR_UPDATER_DNS_RESOLVER updater
since in this case the updater hint has to be generated from the
corresponding resolver id / nameserver id combination. This was achieved
by saving the nameserver id within the updater struct. Knowing that the
resolver id can be guessed from the server struct directly, it was not
exposed through the updater struct.

This patch depends on:
 - "MINOR: resolvers: add unique numeric id to nameservers"

No functional change should be expected.
2023-12-21 14:22:27 +01:00
Ilya Shipitsin
80813cdd2a CLEANUP: assorted typo fixes in the code and comments
This is 37th iteration of typo fixes
2023-11-23 16:23:14 +01:00
Willy Tarreau
6cbb5a057b Revert "MAJOR: import: update mt_list to support exponential back-off"
This reverts commit c618ed5ff4.

The list iterator is broken. As found by Fred, running QUIC single-
threaded shows that only the first connection is accepted because the
accepter relies on the element being initialized once detached (which
is expected and matches what MT_LIST_DELETE_SAFE() used to do before).
However while doing this in the quic_sock code seems to work, doing it
inside the macro show total breakage and the unit test doesn't work
anymore (random crashes). Thus it looks like the fix is not trivial,
let's roll this back for the time it will take to fix the loop.
2023-09-15 17:13:43 +02:00
Willy Tarreau
c618ed5ff4 MAJOR: import: update mt_list to support exponential back-off
The new mt_list code supports exponential back-off on conflict, which
is important for use cases where there is contention on a large number
of threads. The API evolved a little bit and required some updates:

  - mt_list_for_each_entry_safe() is now in upper case to explicitly
    show that it is a macro, and only uses the back element, doesn't
    require a secondary pointer for deletes anymore.

  - MT_LIST_DELETE_SAFE() doesn't exist anymore, instead one just has
    to set the list iterator to NULL so that it is not re-inserted
    into the list and the list is spliced there. One must be careful
    because it was usually performed before freeing the element. Now
    instead the element must be nulled before the continue/break.

  - MT_LIST_LOCK_ELT() and MT_LIST_UNLOCK_ELT() have always been
    unclear. They were replaced by mt_list_cut_around() and
    mt_list_connect_elem() which more explicitly detach the element
    and reconnect it into the list.

  - MT_LIST_APPEND_LOCKED() was only in haproxy so it was left as-is
    in list.h. It may however possibly benefit from being upstreamed.

This required tiny adaptations to event_hdl.c and quic_sock.c. The
test case was updated and the API doc added. Note that in order to
keep include files small, the struct mt_list definition remains in
list-t.h (par of the internal API) and was ifdef'd out in mt_list.h.

A test on QUIC with both quictls 1.1.1 and wolfssl 5.6.3 on ARM64 with
80 threads shows a drastic reduction of CPU usage thanks to this and
the refined memory barriers. Please note that the CPU usage on OpenSSL
3.0.9 is significantly higher due to the excessive use of atomic ops
by openssl, but 3.1 is only slightly above 1.1.1 though:

  - before: 35 Gbps, 3.5 Mpps, 7800% CPU
  - after:  41 Gbps, 4.2 Mpps, 2900% CPU
2023-09-13 11:50:33 +02:00
Aurelien DARRAGON
ee1891ccbe BUG/MINOR: hlua_fcn: potentially unsafe stktable_data_ptr usage
As reported by Coverity in GH #2253, stktable_data_ptr() usage in
hlua_stktable_dump() func is potentially unsafe because
stktable_data_ptr() may return NULL and the returned value is
dereferenced as-is without precautions.

In practise, this should not happen because some error checking was
already performed prior to calling stktable_data_ptr(). But since we're
using the safe stktable_data_ptr() function, all the error checking is
already done within the function, thus all we need to do is check ptr
against NULL instead to protect against NULL dereferences.

This should be backported in every stable versions.
2023-08-25 11:52:43 +02:00
Willy Tarreau
7968fe3889 MEDIUM: stick-table: change the ref_cnt atomically
Due to the ts->ref_cnt being manipulated and checked inside wrlocks,
we continue to have it updated under plenty of read locks, which have
an important cost on many-thread machines.

This patch turns them all to atomic ops and carefully moves them outside
of locks every time this is possible:
  - the ref_cnt is incremented before write-unlocking on creation otherwise
    the element could vanish before we can do it
  - the ref_cnt is decremented after write-locking on release
  - for all other cases it's updated out of locks since it's guaranteed by
    the sequence that it cannot vanish
  - checks are done before locking every time it's used to decide
    whether we're going to release the element (saves several write locks)
  - expiration tests are just done using atomic loads, since there's no
    particular ordering constraint there, we just want consistent values.

For Lua, the loop that is used to dump stick-tables could switch to read
locks only, but this was not done.

For peers, the loop that builds updates in peer_send_teachmsgs is extremely
expensive in write locks and it doesn't seem this is really needed since
the only updated variables are last_pushed and commitupdate, the first
one being on the shared table (thus not used by other threads) and the
commitupdate could likely be changed using a CAS. Thus all of this could
theoretically move under a read lock, but that was not done here.

On a 80-thread machine with a peers section enabled, the request rate
increased from 415 to 520k rps.
2023-08-11 19:03:35 +02:00
Aurelien DARRAGON
70e10ee5bc BUG/MEDIUM: hlua_fcn/queue: bad pop_wait sequencing
I assumed that the hlua_yieldk() function used in queue:pop_wait()
function would eventually return when the continuation function would
return.

But this is wrong, the continuation function is simply called back by the
resume after the hlua_yieldk() which does not return in this case. The
caller is no longer the initial calling function, but Lua, so when the
continuation function eventually returns, it does not give the hand back
to the C calling function (queue:pop_wait()), like we're used to, but
directly to Lua which will continue the normal execution of the (Lua)
function that triggered the C-function, effectively bypassing the end
of the C calling function.

Because of this, the queue waiting list cleanup never occurs!

This causes some undesirable effects:
 - pop_wait() will slowly leak over the time, because the allocated queue
   waiting entry never gets deallocated when the function is finished
 - queue:push() will become slower and slower because the wait list will
   keep growing indefinitely as a result of the previous leak
 - the task that performed at least 1 pop_wait() could suffer from
   useless wakeups because it will stay indefinitely in the queue waiting
   list, so every queue:push() will try to wake the task, even if the
   task is not waiting for new queue items.
 - last but not least, if the task that performed at least 1 pop_wait ends
   or crashes, the next queue:push() will lead to invalid reads and
   process crash because it will try to wakeup a ghost task that doesn't
   exist anymore.

To fix this, the pop_wait function was reworked with the assumption that
the hlua_yieldk() with continuation function never returns. Indeed, it is
now the continuation function that will take care of the cleanup, instead
of the parent function.

This must be backported in 2.8 with 86fb22c5 ("MINOR: hlua_fcn: add Queue class")
2023-07-17 07:42:52 +02:00
Aurelien DARRAGON
33a8c2842b BUG/MINOR: hlua_fcn/queue: use atomic load to fetch queue size
In hlua_queue_size(), queue size is loaded as a regular int, but the
queue might be shared by multiple threads that could perform some
atomic pushing or popping attempts in parallel, so we better use an
atomic load operation to guarantee consistent readings.

This could be backported in 2.8.
2023-07-11 16:04:39 +02:00
Aurelien DARRAGON
b58bd9794f MINOR: hlua_fcn/mailers: handle timeout mail from mailers section
As the example/lua/mailers.lua script does its best to mimic the c-mailer
implementation, it should support the "timeout mail" directive as well.

This could be backported in 2.8.
2023-07-10 18:28:08 +02:00
Aurelien DARRAGON
d7d507aa8a CLEANUP: hlua_fcn/queue: make queue:push() easier to read
Adding some spaces and code comments in queue:push() function to make
it easier to read.
2023-05-11 09:23:14 +02:00
Aurelien DARRAGON
c0af7cdba2 BUG/MINOR: hlua_fcn/queue: fix reference leak
When pushing a lua object through lua Queue class, a new reference is
created from the object so that it can be safely restored when needed.

Likewise, when popping an object from lua Queue class, the object is
restored at the top of the stack via its reference id.

However, once the object is restored the related queue entry is removed,
thus the object reference must be dropped to prevent reference leak.
2023-05-11 09:23:14 +02:00
Aurelien DARRAGON
bd8a94a759 BUG/MINOR: hlua_fcn/queue: fix broken pop_wait()
queue:pop_wait() was broken during late refactor prior to merge.
(Due to small modifications to ensure that pop() returns nil on empty
queue instead of nothing)

Because of this, pop_wait() currently behaves exactly as pop(), resulting
in 100% active CPU when used in a while loop.

Indeed, _hlua_queue_pop() should explicitly return 0 when the queue is
empty since pop_wait logic relies on this and the pushnil should be
handled directly in queue:pop() function instead.

Adding some comments as well to document this.
2023-05-11 09:23:14 +02:00
Aurelien DARRAGON
86fb22c557 MINOR: hlua_fcn: add Queue class
Adding a new lua class: Queue.

This class provides a generic FIFO storage mechanism that may be shared
between multiple lua contexts to easily pass data between them, as stock
Lua doesn't provide easy methods for passing data between multiple
coroutines.

New Queue object may be obtained using core.queue()
(it works like core.concat() for a concat Class)

Lua documentation was updated (including some usage examples)
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
717a38d135 MINOR: hlua: expose proxy mailers
Proxy mailers, which are configured using "email-alert" directives
in proxy sections from the configuration, are now being exposed
directly in lua thanks to the proxy:get_mailers() method which
returns a class containing the various mailers settings if email
alerts are configured for the given proxy (else nil is returned).

Both the class and the proxy method were marked as LEGACY since this
feature relies on mailers configuration, and the goal here is to provide
the last missing bits of information to lua scripts in order to make
them capable of sending email alerts instead of relying on the soon-to-
be deprecated mailers implementation based on checks (see src/mailers.c)

Lua documentation was updated accordingly.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
fc84553df8 MINOR: hlua_fcn: add Proxy.get_srv_act() and Proxy.get_srv_bck()
Proxy.get_srv_act: number of active servers that are eligible for LB
Proxy.get_srv_bck: number of backup servers that are eligible for LB
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
fc759b4ac2 MINOR: hlua_fcn: add Server.get_pend_conn() and Server.get_cur_sess()
Server.get_pend_conn: number of pending connections to the server
Server.get_cur_sess: number of current sessions handled by the server

Lua documentation was updated accordingly.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
3889efa8e4 MINOR: hlua_fcn: add Server.get_proxy()
Server.get_proxy(): get the proxy to which the server belongs
(or nil if not available)
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
4be36a1337 MINOR: hlua_fcn: add Server.get_trackers()
This function returns an array of servers who are currently tracking
the server.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
406511a2df MINOR: hlua_fcn: add Server.tracking()
This function returns the currently tracked server, if any.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
7a03dee36f MINOR: hlua_fcn: add Server.is_dynamic()
This function returns true if the current server is dynamic,
meaning that it was instantiated at runtime (ie: from the cli)
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
c72051d53a MINOR: hlua_fcn: add Server.is_backup()
This function returns true if the current server is a backup server.
2023-05-05 16:28:32 +02:00
Aurelien DARRAGON
862a0fe75a MINOR: hlua_fcn: fix Server.is_draining() return type
Adjusting Server.is_draining() return type from integer to boolean
to comply with the documentation.
2023-05-05 16:28:32 +02:00
Willy Tarreau
c05d30e9d8 MINOR: clock: replace the timeval start_time with start_time_ns
Now that "now" is no more a timeval, there's no point keeping a copy
of it as a timeval, let's also switch start_time to nanoseconds, it
simplifies operations.
2023-04-28 16:08:08 +02:00
Willy Tarreau
69530f59ae MEDIUM: clock: replace timeval "now" with integer "now_ns"
This puts an end to the occasional confusion between the "now" date
that is internal, monotonic and not synchronized with the system's
date, and "date" which is the system's date and not necessarily
monotonic. Variable "now" was removed and replaced with a 64-bit
integer "now_ns" which is a counter of nanoseconds. It wraps every
585 years, so if all goes well (i.e. if humanity does not need
haproxy anymore in 500 years), it will just never wrap. This implies
that now_ns is never nul and that the zero value can reliably be used
as "not set yet" for a timestamp if needed. This will also simplify
date checks where it becomes possible again to do "date1<date2".

All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns".
Due to the intricacies between now, global_now and now_offset, all 3
had to be turned to nanoseconds at once. It's not a problem since all
of them were solely used in 3 functions in clock.c, but they make the
patch look bigger than it really  is.

The clock_update_local_date() and clock_update_global_date() functions
are now much simpler as there's no need anymore to perform conversions
nor to round the timeval up or down.

The wrapping continues to happen by presetting the internal offset in
the short future so that the 32-bit now_ms continues to wrap 20 seconds
after boot.

The start_time used to calculate uptime can still be turned to
nanoseconds now. One interrogation concerns global_now_ms which is used
only for the freq counters. It's unclear whether there's more value in
using two variables that need to be synchronized sequentially like today
or to just use global_now_ns divided by 1 million. Both approaches will
work equally well on modern systems, the difference might come from
smaller ones. Better not change anyhting for now.

One benefit of the new approach is that we now have an internal date
with a resolution of the nanosecond and the precision of the microsecond,
which can be useful to extend some measurements given that timestamps
also have this resolution.
2023-04-28 16:08:08 +02:00
Willy Tarreau
d2f61de8c2 BUG/MINOR: hlua: return wall-clock date, not internal date in core.now()
That's hopefully the last one affected by this. It was a bit trickier
because there's the promise in the doc that the date is monotonous, so
we continue to use now-start_time as the uptime value and add it to
start_date to get the current date. It was also emphasized by commit
28360dc ("MEDIUM: clock: force internal time to wrap early after boot"),
causing core.now() to return a date of Mar 20 on Apr 27. No backport is
needed.
2023-04-27 18:44:14 +02:00
Ilya Shipitsin
ccf8012f28 CLEANUP: assorted typo fixes in the code and comments
This is 36th iteration of typo fixes
2023-04-23 09:44:53 +02:00
Aurelien DARRAGON
1746b56e68 MINOR: server: change srv_op_st_chg_cause storage type
This one is greatly inspired by "MINOR: server: change adm_st_chg_cause storage type".

While looking at current srv_op_st_chg_cause usage, it was clear that
the struct needed some cleanup since some leftovers from asynchronous server
state change updates were left behind and resulted in some useless code
duplication, and making the whole thing harder to maintain.

Two observations were made:

- by tracking down srv_set_{running, stopped, stopping} usage,
  we can see that the <reason> argument is always a fixed statically
  allocated string.
- check-related state change context (duration, status, code...) is
  not used anymore since srv_append_status() directly extracts the
  values from the server->check. This is pure legacy from when
  the state changes were applied asynchronously.

To prevent code duplication, useless string copies and make the reason/cause
more exportable, we store it as an enum now, and we provide
srv_op_st_chg_cause() function to fetch the related description string.
HEALTH and AGENT causes (check related) are now explicitly identified to
make consumers like srv_append_op_chg_cause() able to fetch checks info
from the server itself if they need to.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON
223770ddca MINOR: hlua/event_hdl: per-server event subscription
Now that event_hdl api is properly implemented in hlua, we may add the
per-server event subscription in addition to the global event
subscription.

Per-server subscription allows to be notified for events related to
single server. It is useful to track a server UP/DOWN and DEL events.

It works exactly like core.event_sub() except that the subscription
will be performed within the server dedicated subscription list instead
of the global one.
The callback function will only be called for server events affecting
the server from which the subscription was performed.

Regarding the implementation, it is pretty trivial at this point, we add
more doc than code this time.

Usage examples have been added to the (lua) documentation.
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
c84899c636 MEDIUM: hlua/event_hdl: initial support for event handlers
Now that the event handler API is pretty mature, we can expose it in
the lua API.

Introducing the core.event_sub(<event_types>, <cb>) lua function that
takes an array of event types <event_types> as well as a callback
function <cb> as argument.

The function returns a subscription <sub> on success.
Subscription <sub> allows you to manage the subscription from anywhere
in the script.
To this day only the sub->unsub method is implemented.

The following event types are currently supported:
  - "SERVER_ADD": when a server is added
  - "SERVER_DEL": when a server is removed from haproxy
  - "SERVER_DOWN": server states goes from up to down
  - "SERVER_UP": server states goes from down to up

As for the <cb> function: it will be called when one of the registered
event types occur. The function will be called with 3 arguments:
  cb(<event>,<data>,<sub>)

<event>: event type (string) that triggered the function.
(could be any of the types used in <event_types> when registering
the subscription)

<data>: data associated with the event (specific to each event family).

For "SERVER_" family events, server details such as server name/id/proxy
will be provided.
If the server still exists (not yet deleted), a reference to the live
server is provided to spare you from an additionnal lookup if you need
to have direct access to the server from lua.

<sub> refers to the subscription. In case you need to manage it from
within an event handler.
(It refers to the same subscription that the one returned from
core.event_sub())

Subscriptions are per-thread: the thread that will be handling the
event is the one who performed the subscription using
core.event_sub() function.

Each thread treats events sequentially, it means that if you have,
let's say SERVER_UP, then SERVER_DOWN in a short timelapse, then your
cb function will first be called with SERVER_UP, and once you're done
handling the event, your function will be called again with SERVER_DOWN.

This is to ensure event consitency when it comes to logging / triggering
logic from lua.

Your lua cb function may yield if needed, but you're pleased to process
the event as fast as possible to prevent the event queue from growing up

To prevent abuses, if the event queue for the current subscription goes
over 100 unconsumed events, the subscription will pause itself
automatically for as long as it takes for your handler to catch up.
This would lead to events being missed, so a warning will be emitted in
the logs to inform you about that. This is not something you want to let
happen too often, it may indicate that you subscribed to an event that
is occurring too frequently or/and that your callback function is too
slow to keep up the pace and you should review it.

If you want to do some parallel processing because your callback
functions are slow: you might want to create subtasks from lua using
core.register_task() from within your callback function to perform the
heavy job in a dedicated task and allow remaining events to be processed
more quickly.

Please check the lua documentation for more information.
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
94ee6632ee MINOR: hlua_fcn: add server->get_rid() method
Server revision ID was recently added to haproxy with 61e3894
("MINOR: server: add srv->rid (revision id) value")

Let's add it to the hlua server class.
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
c4b2437037 MEDIUM: hlua_fcn/api: remove some old server and proxy attributes
Since ("MINOR: hlua_fcn: alternative to old proxy and server attributes"):
 - s->name(), s->puid() are superseded by s->get_name() and s->get_puid()
 - px->name(), px->uuid() are superseded by px->get_name() and
   px->get_uuid()

And considering this is now the proper way to retrieve proxy name/uuid
and server name/puid from lua:

We're now removing such legacy attributes, but for retro-compatibility
purposes we will be emulating them and warning the user for some time
before completely dropping their support.

To do this, we first remove old legacy code.
Then we move server and proxy methods out of the metatable to allow
direct elements access without systematically involving the "__index"
metamethod.

This allows us to involve the "__index" metamethod only when the requested
key is missing from the table.

Then we define relevant hlua_proxy_index and hlua_server_index functions
that will be used as the "__index" metamethod to respectively handle
"name, uuid" (proxy) or "name, puid" (server) keys, in which case we
warn the user about the need to use the new getter function instead the
legacy attribute (to prepare for the potential upcoming removal), and we
call the getter function to return the value as if the getter function
was directly called from the script.

Note: Using the legacy variables instead of the getter functions results
in a slight overhead due to the "__index" metamethod indirection, thus
it is recommended to switch to the getter functions right away.

With this commit we're also adding a deprecation notice about legacy
attributes.
2023-04-05 08:58:16 +02:00
Thierry Fournier
1edf36a369 MEDIUM: hlua_fcn: dynamic server iteration and indexing
This patch proposes to enumerate servers using internal HAProxy list.
Also, remove the flag SRV_F_NON_PURGEABLE which makes the server non
purgeable each time Lua uses the server.

Removing reg-tests/cli_delete_server_lua.vtc since this test is no
longer relevant (we don't set the SRV_F_NON_PURGEABLE flag anymore)
and we already have a more generic test:
  reg-tests/server/cli_delete_server.vtc

Co-authored-by: Aurelien DARRAGON <adarragon@haproxy.com>
2023-04-05 08:58:16 +02:00
Thierry Fournier
b0467730a0 MINOR: hlua_fcn: alternative to old proxy and server attributes
This patch adds new lua methods:
- "Proxy.get_uuid()"
- "Proxy.get_name()"
- "Server.get_puid()"
- "Server.get_name()"

These methods will be equivalent to their old analog Proxy.{uuid,name}
and Server.{puid,name} attributes, but this will be the new preferred
way to fetch such infos as it duplicates memory only when necessary and
thus reduce the overall lua Server/Proxy objects memory footprint.

Legacy attributes (now superseded by the explicit getters) are expected
to be removed some day.

Co-authored-by: Aurelien DARRAGON <adarragon@haproxy.com>
2023-04-05 08:58:16 +02:00
Thierry Fournier
467913c84e MEDIUM: hlua: Dynamic list of frontend/backend in Lua
When HAproxy is loaded with a lot of frontends/backends (tested with 300k),
it is slow to start and it uses a lot of memory just for indexing backends
in the lua tables.

This patch uses the internal frontend/backend index of HAProxy in place of
lua table.

HAProxy startup is now quicker as each frontend/backend object is created
on demand and not at init.
This has to come with some cost: the execution of Lua will be a little bit
slower.
2023-04-05 08:58:16 +02:00
Thierry Fournier
599f2311a8 MINOR: hlua: Fix two functions that return nothing useful
Two lua init function seems to return something useful, but it
is not the case. The function "hlua_concat_init" seems to return
a failure status, but the function never fails. The function
"hlua_fcn_reg_core_fcn" seems to return a number of elements in
the stack, but it is not the case.
2023-04-05 08:58:16 +02:00
Willy Tarreau
76642223f0 MEDIUM: stick-table: switch the table lock to rwlock
Right now a spinlock is used, but most accesses are for reads, so let's
switch the lock to an rwlock and switch all accesses to exclusive locks
for now. There should be no visible difference at this point.
2022-10-12 14:19:05 +02:00
Aurelien DARRAGON
7d00077fd5 BUG/MEDIUM: proxy: ensure pause_proxy() and resume_proxy() own PROXY_LOCK
There was a race involving hlua_proxy_* functions
and some proxy management functions.

pause_proxy() and resume_proxy() can be used directly from lua code,
but that could lead to some race as lua code didn't make sure PROXY_LOCK
was owned before calling the proxy functions.

This patch makes sure it won't happen again elsewhere in the code
by locking PROXY_LOCK directly in resume and pause proxy functions
so that it's not the caller's responsibility anymore.
(based on stop_proxy() behavior that was already safe prior to the patch)

This should be backported to stable series.
Note that the API will likely differ < 2.4
2022-09-09 17:23:01 +02:00
Tim Duesterhus
6eded62f6a CLEANUP: Add missing header to hlua_fcn.c
Found with -Wmissing-prototypes:

    src/hlua_fcn.c:53:5: fatal error: no previous prototype for function 'hlua_checkboolean' [-Wmissing-prototypes]
    int hlua_checkboolean(lua_State *L, int index)
        ^
    src/hlua_fcn.c:53:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
    int hlua_checkboolean(lua_State *L, int index)
    ^
    static
    1 error generated.
2022-05-17 11:40:33 +02:00
William Lallemand
82d5f013f9 BUG/MINOR: lua: don't expose internal proxies
Since internal proxies are now in the global proxy list, they are now
reachable from core.proxies, core.backends, core.frontends.

This patch fixes the issue by checking the PR_CAP_INT flag before
exposing them in lua, so the user can't have access to them.

This patch must be backported in 2.5.
2021-11-24 16:14:24 +01:00
Willy Tarreau
63617dbec6 BUILD: idleconns: include missing ebmbtree.h at several places
backend.c, all muxes, backend.c started manipulating ebmb_nodes with
the introduction of idle conns but the types were inherited through
other includes. Let's add ebmbtree.h there.
2021-10-07 01:36:51 +02:00
Willy Tarreau
27539409fd BUILD: hlua: needs to include stream-t.h
It uses the SF_ERR_* error codes and currently gets them via
intermediary includes.
2021-10-07 01:36:51 +02:00
Amaury Denoyelle
86f3707d14 MINOR: server: mark servers referenced by LUA script as non purgeable
Each server that is retrieved by a LUA script is marked as non
purgeable. Note that for this to work, the script must have been
executed already once.
2021-08-25 15:53:54 +02:00