Previous patch 160287b676 ("MEDIUM: pipe/thread: maintain a per-thread
local cache of recently used pipes") didn't replace all pipe counter
updates with atomic ops since some were already under a lock, which is
obviously not a valid reason since these ones can be updated in parallel
to other atomic ops. The result was that the pipes_used could seldom be
seen as negative in the stats (harmless) but also this could result in
slightly more pipes being allocated than permitted, thus stealing a few
file descriptors that were not usable for connections anymore. Let's use
pure atomic ops everywhere these counters are updated.
No backport is needed.
In order to completely remove the pipe locking cost and try to reuse
hot pipes, each thread now maintains a local cache of recently used pipes
that is no larger than its share (maxpipes/nbthreads). All extra pipes
are instead refilled into the global pool. Allocations are made from the
local pool first, and fall back to the global one before allocating one.
This completely removes the observed pipe locking cost at high bit rates,
which was still around 5-6%.
In a quick test involving splicing, we can see that get_pipe() and
put_pipe() together consume up to 12% of the CPU. That's not surprizing
considering how much work is performed under the lock, including the
pipe struct allocation, the pipe creation and its initialization. Same
for releasing, we don't need a lock there to call close() nor to free
to the pool.
Changing this alone was enough to cut the overhead in half. A better
approach should consist in having a per-thread pipe cache, which will
also help keep pages hot in the CPU caches.
Older compilers don't like to see "inline" placed after the type in a
function declaration, it must be "static inline <type>" only. This
patch touches various areas. The warnings were seen with gcc-3.4.
This commit replaces the explicit pool creation that are made in
constructors with a pool registration. Not only this simplifies the
pools declaration (it can be done on a single line after the head is
declared), but it also removes references to pools from within
constructors. The only remaining create_pool() calls are those
performed in init functions after the config is parsed, so there
is no more user of potentially uninitialized pool now.
It has been the opportunity to remove no less than 12 constructors
and 6 init functions.
This patch replaces a number of __decl_hathread() followed by HA_SPIN_INIT
or HA_RWLOCK_INIT by the new __decl_spinlock() or __decl_rwlock() which
automatically registers the lock for initialization in during the STG_LOCK
init stage. A few static modifiers were lost in the process, but since they
were not essential at all it was not worth extending the API to provide such
a variant.
During the migration to the second version of the pools, the new
functions and pool pointers were all called "pool_something2()" and
"pool2_something". Now there's no more pool v1 code and it's a real
pain to still have to deal with this. Let's clean this up now by
removing the "2" everywhere, and by renaming the pool heads
"pool_head_something".
This macro should be used to declare variables or struct members depending on
the USE_THREAD compile option. It avoids the encapsulation of such declarations
between #ifdef/#endif. It is used to declare all lock variables.
Marcus Rückert reported that commit d8b3b65 ("BUG/MEDIUM: splice/threads:
pipe reuse list was not protected.") broke threadless support. Add the
required #ifdef.
By default, pipes are the default size for the system. But sometimes when
using TCP splicing, it can improve performance to increase pipe sizes,
especially if it is suspected that pipes are not filled and that many
calls to splice() are performed. This has an impact on the kernel's
memory footprint, so this must not be changed if impacts are not understood.
A new data type has been added : pipes. Some pre-allocated empty pipes
are maintained in a pool for users such as splice which use them a lot
for very short times.
Pipes are allocated using get_pipe() and released using put_pipe().
Pipes which are released with pending data are immediately killed.
The struct pipe is small (16 to 20 bytes) and may even be further
reduced by unifying ->data and ->next.
It would be nice to have a dedicated cleanup task which would watch
for the pipes usage and destroy a few of them from time to time.