This function doesn't need to be write-locked. It performs a lookup
of the next update at its index, atomically updates the ref_cnt on
the stksess, updates some shared_table fields on the local thread,
and updates the table's commitupdate. Now that this update is atomic
we don't need to keep the write lock during that period. In addition
this function's callers do not rely on the write lock to be held
either since it was droped during peer_send_updatemsg() anyway.
Now, when the function is entered with a write lock, it's downgraded
to a read lock, otherwise a read lock is grabbed. Updates are looked
up under the read lock and the message is sent without the lock. The
commitupdate is still performed under the read lock (so as not to
break the code too much), and the write lock is re-acquired when
leaving if needed. This allows multiple peers to look up updates in
parallel and to avoid stalling stick-table lookups.
This function maintains the write lock for a while. In practice it does
not need to hold it that long, and some parts could be performed under a
read lock. This patch first drops then re-acquires the write lock at the
function's entry. The purpose is simply to break the end-to-end atomicity
to prove that it has no impact in case something needs to be bisected
later. In fact the write lock is already dropped while calling
peer_send_updatemsg().
The ->commitupdate index doesn't need to be kept consistent with other
operations, it only needs to be correct and to reflect the last known
value. Right now it's updated under the stick-table lock, which is
expensive and maintains this lock longer than needed. Let's move it
outside of the lock, and update it using a CAS. This patch simply
replaces the assignment with a CAS and makes sure all reads are atomic.
On failed CAS we use a simple cpu_relax(), no need for more as there
should not be that much contention here (updates are not that fast).
Due to the ts->ref_cnt being manipulated and checked inside wrlocks,
we continue to have it updated under plenty of read locks, which have
an important cost on many-thread machines.
This patch turns them all to atomic ops and carefully moves them outside
of locks every time this is possible:
- the ref_cnt is incremented before write-unlocking on creation otherwise
the element could vanish before we can do it
- the ref_cnt is decremented after write-locking on release
- for all other cases it's updated out of locks since it's guaranteed by
the sequence that it cannot vanish
- checks are done before locking every time it's used to decide
whether we're going to release the element (saves several write locks)
- expiration tests are just done using atomic loads, since there's no
particular ordering constraint there, we just want consistent values.
For Lua, the loop that is used to dump stick-tables could switch to read
locks only, but this was not done.
For peers, the loop that builds updates in peer_send_teachmsgs is extremely
expensive in write locks and it doesn't seem this is really needed since
the only updated variables are last_pushed and commitupdate, the first
one being on the shared table (thus not used by other threads) and the
commitupdate could likely be changed using a CAS. Thus all of this could
theoretically move under a read lock, but that was not done here.
On a 80-thread machine with a peers section enabled, the request rate
increased from 415 to 520k rps.
sc_need_room() now takes the required free space to receive more data as
parameter. All calls to this function are updated accordingly. For now, this
value is set but not used. When we are waiting for a buffer, 0 is used. So
we expect to be unblocked ASAP. However this must be reviewed because
SC_FL_NEED_BUF is probably enough in this case and this flag is already set
if the input buffer allocation fails.
The peers applet now use the applet API to send message instead of the
channel API. This way, it does not need to take care to request more room if
it fails to put data into the channel's buffer.
This puts an end to the occasional confusion between the "now" date
that is internal, monotonic and not synchronized with the system's
date, and "date" which is the system's date and not necessarily
monotonic. Variable "now" was removed and replaced with a 64-bit
integer "now_ns" which is a counter of nanoseconds. It wraps every
585 years, so if all goes well (i.e. if humanity does not need
haproxy anymore in 500 years), it will just never wrap. This implies
that now_ns is never nul and that the zero value can reliably be used
as "not set yet" for a timestamp if needed. This will also simplify
date checks where it becomes possible again to do "date1<date2".
All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns".
Due to the intricacies between now, global_now and now_offset, all 3
had to be turned to nanoseconds at once. It's not a problem since all
of them were solely used in 3 functions in clock.c, but they make the
patch look bigger than it really is.
The clock_update_local_date() and clock_update_global_date() functions
are now much simpler as there's no need anymore to perform conversions
nor to round the timeval up or down.
The wrapping continues to happen by presetting the internal offset in
the short future so that the 32-bit now_ms continues to wrap 20 seconds
after boot.
The start_time used to calculate uptime can still be turned to
nanoseconds now. One interrogation concerns global_now_ms which is used
only for the freq counters. It's unclear whether there's more value in
using two variables that need to be synchronized sequentially like today
or to just use global_now_ns divided by 1 million. Both approaches will
work equally well on modern systems, the difference might come from
smaller ones. Better not change anyhting for now.
One benefit of the new approach is that we now have an internal date
with a resolution of the nanosecond and the precision of the microsecond,
which can be useful to extend some measurements given that timestamps
also have this resolution.
Instead we're using ns_to_sec(tv_to_ns(&now)) which allows the tv_sec
part to disappear. At this point, "now" is only used as a timeval in
clock.c where it is updated.
Just like for other applets, we now use the SE descriptor instead of the
channel to report error and end-of-stream. We must just be sure to consume
request data when we are waiting the applet to be released.
It was done by hand by callers when a shutdown for read or write was
performed. It is now always handled by the functions performing the
shutdown. This way the callers don't take care of it. This will avoid some
bugs.
Since 2.6 with commit 34e4085f8 ("MEDIUM: peers: Balance applets across
threads") the initialization of a peers appctx may be postponed with a
wakeup, causing some partially initialized appctx to be visible. The
"show peers" command used to only care about peers without appctx, but
now it must also take care of those with no stconn, otherwise it can
occasionally crash while dumping them.
This fix must be backported to 2.6. Thanks to Patrick Hemmer for
reporting the problem.
CF_READ_NULL flag is not really useful and used. It is a transient event
used to wakeup the stream. As we will see, all read events on a channel may
be resumed to only one and are all used to wake up the stream.
In this patch, we introduce CF_READ_EVENT flag as a replacement to
CF_READ_NULL. There is no breaking change for now, it is just a
rename. Gradually, other read events will be merged with this one.
In process_peer_sync: a check was performed to know whether the peers section
handler should kill itself if the corresponding proxy was not started on the
current process.
This logic was initially implemented in early 1.6 development to prevent
some issues when peers where used in conjunction with nbproc > 1:
f83d3fe00a MEDIUM: init: stop any peers section not bound to the correct process
46dc1ca MEDIUM: peers: unregister peers that were never started
But later in 1.6 dev, a new commit has been introduced:
47c8c029db MEDIUM: init: completely deallocate unused peers
With the latter, the check implemented in 46dc1ca ("MEDIUM: peers: unregister
peers that were never started") will never succeed: it is dead code.
Since nbproc support has been dropped in 2.5, things have changed a bit:
f83d3fe00a logic was moved in mworker_cleanlisteners, but as in 46dc1ca :
peers task is safely destroyed before peers_fe is set to NULL.
Conversely, peers_fe is first set by init_peers_frontend() before peers task
is scheduled by peers_init_sync() in check_config_validity().
Again, it is safe to say that we will never reach !peers->peers_fe
in process_peer_sync(): this self-killing mechanism is not relevant anymore.
--
To cut a long story short: I stumbled on this while tracking down
current signal api usage.
This led me to a signal_unregister_handler() call performed in the
aforementionned dead code. To me this code was potentially unsafe because
signal_unregister_handler() is not thread safe and here it was used within a
task initialized via task_new_anywhere(). So I decided to check how bad this
could be (ie: conditions to be met for this code to run).. and here we are.
If shards are in use, we must fill the shard number on incoming updates,
otherwise some entries are assigned shard number zero, and may be broadcast
everywhere once updated, instead of being sent only to the peers having the
same shard number.
This fixes commit 36d156564 ("MINOR: peers: Support for peer shards"). No
backport is needed.
In peer_treat_updatemsg(), the lower layers of the stick-table code are
reimplemented, and the key length is never really known for an entry
being processed, it depends on the type being parsed and the moment
where it's done. This makes it quite difficult to stuff some shard
number calculation there.
This patch adds a keylen local variable that is always set to the length
of the current key depending on its type. It takes this opportunity for
reducing redudant expressions involving this length and always using the
new variable instead, limiting the risk of errors. Arguably that code
would have been way simpler by creating a dummy stktable_key and passing
it to stksess_new() as done anywhere else, but let's not change all that
a few days before the release.
In issue #1939, Ilya mentions that cppchecks warned about use of "%d" to
report the status state that's locally stored as an unsigned int. While
technically valid, this will never cause any trouble since in the end
what we store there are the applet's states (just a few enum values).
Better use %u anyway to silence this warning.
Since 0909f62266 ("BUG/MEDIUM: peers: messages about unkown tables not
correctly ignored"), the 'sc' variable is no longer used in
peer_treat_updatemsg() and peer_treat_definemsg() functions. So, we must
remove them to avoid compilation warning.
This patch must be backported with the commit above.
Table defintion's messages and update messages are not correctly
ignored if the table is not configured on the local peer.
It is a bug because, receiving those messages, the parser
returns an error and the upper layer considers that the state of the
peer's connection is modified (as it is done in the case of protocol
error) and switch immediatly the automate to process the new state.
But, even if message is silently ignored because the connection's
state doesn't change and we continue to process the next message, some
processing remains not performed: for instance the ALIVE flag is not set
on the peer's connection as it should be done after receiving any valid
messages. This results in a shutdown of the connection when timeout
is elapsed as if no message has been received during this delay.
This patch fix the behavior, those messages are now silently ignored
and the upper layer continue the processing as it is done for any valid
messages.
This bug appears with the code re-work of the peers on 2.0 so
it should be backported until this version.
We considered the resync process is finished if a full resync request
is ended receiving the "resync-finish" message. But in the case of
"shards" each node declared with a "shard" has only a partial view
of the table. And the resync process is ended whereas the original
peer tables content contains only a "shard" of the full content.
This patch allow to retrieve the entire tables requesting a resync
from all different "shards".
To do so we don't commit the end of a resync process receiving a
"resync-finish" if the node is part of "shard", we only flag this
peer and all peers using the same shard as "notup2date" as if we
received a "resync-partial" message, and we re-schedule a request
of a resync as it is done receiving a "resync-partial" message.
Doing this the peers flagged "notup2date" won't be addressed for the
next resync request round and the next resync request will be send to
a shard not yet requested.
Receving a "resync-finish" message we also check if all peers using
"shards" are flagged "notup2date". It meens that all peers have been
addressed and we can considered the resync process is now finished.
Note also that the "resync request" scheduler already handle a timeout
and if we are not able to retrieve a full resync after a delay. The
resync process is ended.
This patch should be backported in all versions handling "shard"
on peer lines.
Add "shards" new keyword for "peers" section to configure the number
of peer shards attached to such secions. This impact all the stick-tables
attached to the section.
Add "shard" new "server" parameter to configure the peers which participate to
all the stick-tables contents distribution. Each peer receive the stick-tables updates
only for keys with this shard value as distribution hash. The "shard" value
is stored in ->shard new server struct member.
cfg_parse_peers() which is the function which is called to parse all
the lines of a "peers" section is modified to parse the "shards" parameter
stored in ->nb_shards new peers struct member.
Add srv_parse_shard() new callback into server.c to pare the "shard"
parameter.
Implement stksess_getkey_hash() to compute the distribution hash for a
stick-table key as the 64-bits xxhash of the key concatenated to the stick-table
name. This function is called by stksess_setkey_shard(), itself
called by the already implemented function which create a new stick-table
key (stksess_new()).
Add ->idlen new stktable struct member to store the stick-table name length
to not have to compute it each time a stick-table key hash is computed.
Right now a spinlock is used, but most accesses are for reads, so let's
switch the lock to an rwlock and switch all accesses to exclusive locks
for now. There should be no visible difference at this point.
On a reload, if the previous resync was not finished, the freshly old worker
must not try to start a new resync. Otherwise, it will compete with the
older wokers, slowing down or blocking the resync. Only an up-to-date woker
must try to perform a local resync.
This patch must be backported as far as 2.0 (and maybe to 1.8 too).
When a worker is stopped, the resync timer is used to limit in time the
connection stage to the new worker to perform the local resync. However,
this timer must be stopped when the resync is in progress and it must be
re-armed if the resync is interrupted (for instance because another
reload). Otherwise, if the resync is a bit long, an old worker may be killed
too early.
This bug was introduce by the commit 160fff665 ("BUG/MEDIUM: peers: limit
reconnect attempts of the old process on reload"). It must be backported as
far as 2.0.
Only the client timeout was set. Nothing prevent a peer applet to stall
during a connect or waiting a message from a remote peer. To avoid any
issue, it is important to also set connection and server timeouts. The
connect timeout is set to 1s and the server timeout is set to 5s.
This patch must be backported to all supported versions.
As seen in GH issue #1770, peers synchronization do not cope well with
very large buffers because by default the only two reasons for stopping
the processing of updates is either that the end was reached or that
the buffer is full. This can cause high latencies, and even rightfully
trigger the watchdog when the operations are numerous and slowed down
by competition on the stick-table lock.
This patch introduces a limit to the number of messages one may send
at once, which now defaults to 200, regardless of the buffer size. This
means taking and releasing the lock up to 400 times in a row, which is
costly enough to let some other parts work.
After some observation this could be backported to 2.6. If so, however,
previous commits "BUG/MEDIUM: applet: fix incorrect check for abnormal
return condition from handler" and "BUG/MINOR: applet: make the call_rate
only count the no-progress calls" must be backported otherwise the call
rate might trigger the looping protection.
When a peer open a new connection to another peer, it is considered as
connected when the hello message is sent. To do so, the peer applet was
relying on CF_WRITE_PARTIAL channel flag. However it is not the right flag
to use. This one is a transient flag. Depending on the scheduling, this flag
may be removed by the stream before the peer has a chance to see
it. Instead, CF_WROTE_DATA flag must be checked.
This patch is related to the issue #1799. It must be backported as far as
2.0.
When peers are configured and HAProxy is reloaded or restarted, a
synchronization is performed between the old process and the new one. To do
so, the old process connects on the new one. If the synchronization fails,
it retries. However, there is no delay and reconnect attempts are not
bounded. Thus, it may loop for a while, consuming all the CPU. Of course, it
is unexpected, but it is possible. For instance, if the local peer is
misconfigured, an infinite loop can be observed if the connection succeeds
but not the synchronization. This prevents the old process to exit, except
if "hard-stop-after" option is set.
To fix the bug, the reconnect is delayed. The local peer already has a
expiration date to delay the reconnects. But it was not used on stopping
mode. So we use it not. Thanks to the previous fix, the reconnect timeout is
shorter in this case (500ms against 5s on running mode). In addition, we
also use the peers resync expiration date to not infinitely retries. It is
accurate because the new process, on its side, use this timeout to switch
from a local resync to a remote resync.
This patch depends on "MINOR: peers: Use a dedicated reconnect timeout when
stopping the local peer". It fixes the issue #1799. It should be backported
as far as 2.0.
When a process is stopped or reload, a dedicated reconnect timeout is now
used. For now, this timeout is not used because the current code retries
immediately to reconnect to perform the local synchronization with the new
local peer, if any.
This patch is required to fix the issue #1799. It should be backported as
far as 2.0 with next fixes.
At several places we need to figure the ID of the first thread allowed
to run a task. Till now this was performed using my_ffsl(t->thread_mask)
but since we now have the thread ID stored into the task, let's use it
instead. This is tagged major because it starts to assume that tid<0 is
strictly equivalent to atleast2(thread_mask), and that as such, among
the allowed threads are the current one.
By having the appctx in argument this function wouldn't have experienced
the previous bug. Better do that now to avoid proliferation of awkward
functions.
In the context of a CLI command, it's particularly not welcome to use
an "appctx" variable that is not the current one. In addition it was
created for use at exactly 6 places in 2 lines. Let's just remove it
and stick to peer->appctx which is used elsewhere in the function and
is unambiguous.
Commit d0a06d52f ("CLEANUP: applet: use applet_put*() everywhere possible")
replaced most accesses to the conn_stream with simpler accesses to the
appctx. Unfortunately, in all the CLI functions using an appctx, one
makes an exception where the appctx is not the caller's but the one being
inspected! When no peers connection is active, the early exit immediately
crashes.
No backport is needed.
There's no more reason for keepin the code and definitions in conn_stream,
let's move all that to stconn. The alphabetical ordering of include files
was adjusted.
This file contains all the stream-connector functions that are specific
to application layers of type stream. So let's name it accordingly so
that it's easier to figure what's located there.
The alphabetical ordering of include files was preserved.
An equivalent applet_need_more_data() was added as well since that function
is mostly used from applet code. It makes it much clearer that the applet
is waiting for data from the stream layer.
The new name mor eclearly indicates that a stream connector cannot make
any more progress because it needs room in the channel buffer, or that
it may be unblocked because the buffer now has more room available. The
testing function is sc_waiting_room(). This is mostly used by applets.
Note that the flags will change soon.
These functions return the app-layer associated with an stconn, which
is a check, a stream or a stream's task. They're used a lot to access
channels, flags and for waking up tasks. Let's just name them
appropriately for the stream connector.
We're starting to propagate the stream connector's new name through the
API. Most call places of these functions that retrieve the channel or its
buffer are in applets. The local variable names are not changed in order
to keep the changes small and reviewable. There were ~92 uses of cs_ic(),
~96 of cs_oc() (due to co_get*() being less factorizable than ci_put*),
and ~5 accesses to the buffer itself.
This applies the change so that the applet code stops using ci_putchk()
and friends everywhere possible, for the much saferapplet_put*() instead.
The change is mechanical but large. Two or three functions used to have no
appctx and a cs derived from the appctx instead, which was a reminiscence
of old times' stream_interface. These were simply changed to directly take
the appctx. No sensitive change was performed, and the old (more complex)
API is still usable when needed (e.g. the channel is already known).
The change touched roughly a hundred of locations, with no less than 124
lines removed.
It's worth noting that the stats applet, the oldest of the series, could
get a serious lifting, as it's still very channel-centric instead of
propagating the appctx along the chain. Given that this code doesn't
change often, there's no emergency to clean it up but it would look
better.
This renames the "struct conn_stream" to "struct stconn" and updates
the descriptions in all comments (and the rare help descriptions) to
"stream connector" or "connector". This touches a lot of files but
the change is minimal. The local variables were not even renamed, so
there's still a lot of "cs" everywhere.
When creating a new applet for peer outgoing connection, we check
the load on each thread. Threads with least applet count are
preferred.
With this solution we avoid a situation when many outgoing
connections run on the same thread causing significant load on
single CPU core.
In the same way than for the tasks, the applets api was changed to be able
to start a new appctx on a thread subset. For now the feature is
disabled. Only appctx_new_here() is working. But it will be possible to
start an appctx on a specific thread or a subset via a mask.
A .init callback function is defined for the peer_applet applet. This
function finishes the appctx startup by calling appctx_finalize_startup()
and its handles the stream customization.
The session created for frontend applets is now totally owns by the
corresponding appctx. It means the appctx is now responsible to release
it. This removes the hack in stream_free() about frontend applets to be sure
to release the session.
The two functions became exact copies since there's no more special case
for the appctx owner. Let's merge them into a single one, that simplifies
the code.
This one is the pointer to the conn_stream which is always in the
endpoint that is always present in the appctx, thus it's not needed.
This patch removes it and replaces it with appctx_cs() instead. A
few occurences that were using __cs_strm(appctx->owner) were moved
directly to appctx_strm() which does the equivalent.
Let's instead define a 4-state enum solely for this use case, and place
it into the command's context. Note that END and FIN were already
aliases, which is why they were merged.
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing. The code also uses st2 which deserves being
addressed in separate commit.
This flag is no longer needed now that it must always match the presence
of a destination address on the backend conn_stream. Worse, before previous
patch, if it were to be accidently removed while the address is present, it
could result in a leak of that address since alloc_dst_address() would first
be called to flush it.
Its usage has a long history where addresses were stored in an area shared
with the connection, but as this is no longer the case, there's no reason
for putting this burden onto application-level code that should not focus
on setting obscure flags.
The only place where that made a small difference is in the dequeuing code
in case of queue redistribution, because previously the code would first
clear the flag, and only later when trying to deal with the queue, would
release the address. It's not even certain whether there would exist a
code path going to connect_server() without calling pendconn_dequeue()
first (e.g. retries on queue timeout maybe?).
Now the pendconn_dequeue() code will rely on SF_ASSIGNED to decide to
clear and release the address, since that flag is always set while in
a server's queue, and its clearance implies that we don't want to keep
the address. At least it remains consistent and there's no more risk of
leaking it.
There were plenty of leftovers from old code that were never removed
and that are not needed at all since these files do not use any
definition depending on fcntl.h, let's drop them.
Remaining flags and associated functions are move in the conn-stream
scope. These flags are added on the endpoint and not the conn-stream
itself. This way it will be possible to get them from the mux or the
applet. The functions to get or set these flags are renamed accordingly with
the "cs_" prefix and updated to manipualte a conn-stream instead of a
stream-interface.
si_shutr(), si_shutw(), si_chk_rcv() and si_chk_snd() are moved in the
conn-stream scope and renamed, respectively, cs_shutr(), cs_shutw(),
cs_chk_rcv(), cs_chk_snd() and manipulate a conn-stream instead of a
stream-interface.
The stream-interface state (SI_ST_*) is now in the conn-stream. It is a
mechanical replacement for now. Nothing special. SI_ST_* and SI_SB_* were
renamed accordingly. Utils functions to manipulate these infos were moved
under the conn-stream scope.
But it could be good to keep in mind that this part should be
reworked. Indeed, at the CS level, we only need to know if it is ready to
receive or to send. The state of conn-stream from INI to EST is only used on
the server side. The client CS is immediately set to EST. Thus current
SI_ST_* states should probably be moved to the stream to reflect the server
connection state during the establishment stage.
Flags to disable lingering and half-close are now handled at the conn-stream
level. Thus SI_FL_NOLINGER and SI_FL_NOHALF stream-int flags are replaced by
CS_FL_NOLINGER and CS_FL_NOHALF conn-stream flags.
The source and destination addresses at the applicative layer are moved from
the stream-interface to the conn-stream. This simplifies a bit the code and
it is a logicial step to remove the stream-interface.
At many places, we now use the new CS functions to get a stream or a channel
from a conn-stream instead of using the stream-interface API. It is the
first step to reduce the scope of the stream-interfaces. The main change
here is about the applet I/O callback functions. Before the refactoring, the
stream-interface was the appctx owner. Thus, it was heavily used. Now, as
far as possible,the conn-stream is used. Of course, it remains many calls to
the stream-interface API.
The conn-stream endpoint is now shared between the conn-stream and the
applet or the multiplexer. If the mux or the applet is created first, it is
responsible to also create the endpoint and share it with the conn-stream.
If the conn-stream is created first, it is the opposite.
When the endpoint is only owned by an applet or a mux, it is called an
orphan endpoint (there is no conn-stream). When it is only owned by a
conn-stream, it is called a detached endpoint (there is no mux/applet).
The last entity that owns an endpoint is responsible to release it. When a
mux or an applet is detached from a conn-stream, the conn-stream
relinquishes the endpoint to recreate a new one. This way, the endpoint
state is never lost for the mux or the applet.
It is a transient commit to prepare next changes. Now, when a conn-stream is
created from an applet or a multiplexer, an endpoint is always provided. In
addition, the API to create a conn-stream was specialized to have one
function per type.
The next step will be to share the endpoint structure.
It is a transient commit to prepare next changes. It is possible to pass a
pre-allocated endpoint to create a new conn-stream. If it is NULL, a new
endpoint is created, otherwise the existing one is used. There no more
change at the conn-stream level.
In the applets, all conn-stream are created with no pre-allocated
endpoint. But for multiplexers, an endpoint is systematically created before
creating the conn-stream.
This change is only significant for the multiplexer part. For the applets,
the context and the endpoint are the same. Thus, there is no much change. For
the multiplexer part, the connection was used to set the conn-stream
endpoint and the mux's stream was the context. But it is a bit strange
because once a mux is installed, it takes over the connection. In a
wonderful world, the connection should be totally hidden behind the mux. The
stream-interface and, in a lesser extent, the stream, still access the
connection because that was inherited from the pre-multiplexer era.
Now, the conn-stream endpoint is the mux's stream (an opaque entity for the
conn-stream) and the connection is the context. Dedicated functions have
been added to attached an applet or a mux to a conn-stream.
The appctx owner is now always a conn-stream. Thus, it can be set during the
appctx allocation. But, to do so, the conn-stream must be created first. It
is not a problem on the server side because the conn-stream is created with
the stream. On the client side, we must take care to create the conn-stream
first.
This change should ease other changes about the applets bootstrapping.
In issue #1184, cppcheck complains about some inconsistent printf
formats. At least the one in peer_prepare_hellomsg() that uses "%u"
for the int "min_ver" is wrong. Let's force other types to make it
happy, though constants cannot cause trouble.
Thanks to all previous changes, it is now possible to move the
stream-interface into the conn-stream. To do so, some SI functions are
removed and their conn-stream counterparts are added. In addition, the
conn-stream is now responsible to create and release the
stream-interface. While the stream-interfaces were inlined in the stream
structure, there is now a pointer in the conn-stream. stream-interfaces are
now dynamically allocated. Thus a dedicated pool is added. It is a temporary
change because, at the end, the stream-interface structure will most
probably disappear.
To be able to move the stream-interface from the stream to the conn-stream,
all access to the SI is done via the conn-stream. This patch is limited to
the peers part.
In the same way the conn-stream has a pointer to the stream endpoint , this
patch adds a pointer to the application entity in the conn-stream
structure. For now, it is a stream or a health-check. It is mandatory to
merge the stream-interface with the conn-stream.
Because appctx is now an endpoint of the conn-stream, there is no reason to
still have the stream-interface as appctx owner. Thus, the conn-stream is
now the appctx owner.
Thanks to previous changes, it is now possible to set an appctx as endpoint
for a conn-stream. This means the appctx is no longer linked to the
stream-interface but to the conn-stream. Thus, a pointer to the conn-stream
is explicitly stored in the stream-interface. The endpoint (connection or
appctx) can be retrieved via the conn-stream.
The approach used for skipping conn_cur in commit db2ab8218 ("MEDIUM:
stick-table: never learn the "conn_cur" value from peers") was wrong,
it only works with simple tables but as soon as frequency counters or
arrays are exchanged after conn_cur, the stream is desynchronized and
incorrect values are read. This is because the fields have a variable
length depending on their types and cannot simply be skipped by a
"continue" statement.
Let's change the approach to make sure we continue to completely parse
these local-only fields, and only drop the value at the moment we're
about to store them, since this is exactly the intent.
A simpler approach could consist in having two sets of stktable_data_ptr()
functions, one for retrieval and one for storage, and to make the store
function return a NULL pointer for local types. For now this doesn't
seem worth the trouble.
This fixes github issue #1497. Thanks to @brenc for the reproducer.
This must be backported to 2.5.
There have been a large number of issues reported with conn_cur
synchronization because the concept is wrong. In an active-passive
setup, pushing the local connections count from the active node to
the passive one will result in the passive node to have a higher
counter than the real number of connections. Due to this, after a
switchover, it will never be able to close enough connections to
go down to zero. The same commonly happens on reloads since the new
process preloads its values from the old process, and if no connection
happens for a key after the value is learned, it is impossible to reset
the previous ones. In active-active setups it's a bit different, as the
number of connections reflects the number on the peer that pushed last.
This patch solves this by marking the "conn_cur" local and preventing
it from being learned from peers. It is still pushed, however, so that
any monitoring system that collects values from the peers will still
see it.
The patch is tiny and trivially backportable. While a change of behavior
in stable branches is never welcome, it remains possible to fix issues
if reports become frequent.
We'll need to improve the API to pass other arguments in the future, so
let's start to adapt better to the current use cases. task_new() is used:
- 18 times as task_new(tid_bit)
- 18 times as task_new(MAX_THREADS_MASK)
- 2 times with a single bit (in a loop)
- 1 in the debug code that uses a mask
This patch provides 3 new functions to achieve this:
- task_new_here() to create a task on the calling thread
- task_new_anywhere() to create a task to be run anywhere
- task_new_on() to create a task to run on a specific thread
The change is trivial and will allow us to later concentrate the
required adaptations to these 3 functions only. It's still possible
to call task_new() if needed but a comment was added to encourage the
use of the new ones instead. The debug code was not changed and still
uses it.
appctx_new() is exclusively called with tid_bit and it only uses the
mask to pass it to the accompanying task. There is no point requiring
the caller to know about a mask there, nor is there any point in
creating an applet outside of the context of its own thread anyway.
Let's drop this and pass tid_bit to task_new() directly.
This patch adds support of array data_types on the peer protocol.
The table definition message will provide an additionnal parameter
for array data-types: the number of elements of the array.
In case of array of frqp it also provides a second parameter:
the period used to compute freq counter.
The array elements are std_type values linearly encoded in
the update message.
Note: if a remote peer announces an array data_type without
parameters into the table definition message, all updates
on this table will be ignored because we can not
parse update messages consistently.
This patch fixes the computation of the bit of the current data_type
in some part of code of peer protocol where the computation is limited
to 32bits whereas the bitfield of data_types can support 64bits.
Without this patch it could result in bugs when we will define more
than 32 data_types.
Backport is useless because there is currently less than 32 data_types
The relative_pid is always 1. In mworker mode we also have a
child->relative_pid which is always equalt relative_pid, except for a
master (0) or external process (-1), but these types are usually tested
for, except for one place that was amended to carefully check for the
PROC_O_TYPE_WORKER option.
Changes were pretty limited as most usages of relative_pid were for
designating a process in stats output and peers protocol.
Lots of places iterating over nbproc or comparing with nbproc could be
simplified. Further, "bind-process" and "process" parsing that was
already limited to process 1 or "all" or "odd" resulted in a bind_proc
field that was either 0 or 1 during the init phase and later always 1.
All the checks for compatibilities were removed since it's not possible
anymore to run a frontend and a backend on different processes or to
have peers and stick-tables bound on different ones. This is the largest
part of this patch.
The bind_proc field was removed from both the proxy and the receiver
structs.
Since the "process" and "bind-process" directives are still parsed,
configs making use of correct values allowing process 1 will continue
to work.
A memory allocation failure happening during peers_register_table would
have resulted in a crash. This function is only called during init.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
There were 102 CLI commands whose help were zig-zagging all along the dump
making them unreadable. This patch realigns all these messages so that the
command now uses up to 40 characters before the delimiting colon. About a
third of the commands did not correctly list their arguments which were
added after the first version, so they were all updated. Some abuses of
the term "id" were fixed to use a more explanatory term. The
"set ssl ocsp-response" command was not listed because it lacked a help
message, this was fixed as well. The deprecated enable/disable commands
for agent/health/server were prominently written as deprecated. Whenever
possible, clearer explanations were provided.
This patch adds miscellenous informative flags raised during the initial
full resync process performed during the reload for debugging purpose.
0x00000010: Timeout waiting for a full resync from a local node
0x00000020: Timeout waiting for a full resync from a remote node
0x00000040: Session aborted learning from a local node
0x00000080: Session aborted learning from a remote node
0x00000100: A local node teach us and was fully up to date
0x00000200: A remote node teach us and was fully up to date
0x00000400: A local node teach us but was partially up to date
0x00000800: A remote node teach us but was partially up to date
0x00001000: A local node was assigned for a full resync
0x00002000: A remote node was assigned for a full resync
0x00004000: A resync was explicitly requested
This patch could be backported on any supported branch