Add a struct server pointer in the mworker_proc struct so we can easily
use it as a target for the mworker proxy.
pcli_prefix_to_pid() is used to find the right PID of the worker
when using a prefix in the CLI. (@master, @#<relative pid> , @<pid>)
pcli_pid_to_server() is used to find the right target server for the
CLI proxy.
The master process does not need all the keywords of the cli, add 2
flags to chose which keyword to use.
It might be useful to activate some of them in a debug mode later...
This patch introduces mworker_cli_proxy_new_listener() which allows the
creation of new listeners for the CLI proxy.
Using this function it is possible to create new listeners from the
program arguments with -Sa <unix_socket>. It is allowed to create
multiple listeners with several -Sa.
This patch implements a listen proxy within the master. It uses the
sockpair of all the workers as servers.
In the current state of the code, the proxy is only doing round robin on
the CLI of the workers. A CLI mode will be needed to know to which CLI
send the requests.
The init code of the mworker_proc structs has been moved before the
init of the listeners.
Each socketpair is now connected to a CLI within the workers, which
allows the master to access their CLI.
The inherited flag of the worker side socketpair is removed so the
socket can be closed in the master.
There's a call there to si_cs_send() while we're supposed to come from
si_cs_io_cb() which has just done it. But in fact we can also come here
as a lower layer callback from ->wake() after a connection is established.
Since most of the time we'll end up here with either no data in the buffer
or a blocked output, let's simply check if we're already susbcribed to send
events before calling si_cs_send().
stream_int_notify() is I/O agnostic and should not wake up the tasklet,
it's up to si_cs_process() to do that, just like si_applet_wake_cb()
does it for the applet.
This one was added by commit 53216e7db ("MEDIUM: connections: Don't
directly mess with the polling from the upper layers.") after the
removal of the conditional cs_want_send() call. But after analysis
it turned out that it's not needed since the si_cs_send() call will
either succeed or subscribe.
Calling si_cs_send() alone is always dangerous because it can result
in the loss of an event if it manages to empty the buffer. Indeed, in
this case it's critical to call si_chk_rcv() on the opposite stream-int.
Given that si_cs_process() takes care of all this, let's call it instead.
All this code could possibly be refined soon to avoid redoing the whole
stream_int_notify() and do it only after a send(), but at the moment it's
not important.
With the new synchronous si_cs_send() at the end of process_stream(),
we're seeing re-appear the I/O layer specific part of the stream interface
which is supposed to deal with I/O event subscription. The only difference
is that now we subscribe to I/Os only after having attempted (and failed)
them.
This patch brings a cleanup in this by reintroducing stream_int_update_conn()
with the send code from process_stream(). However this alone would not be
enough because the flags which are cleared afterwards would result in the
loss of the possible events (write events only at the moment). So the flags
clearing and stream-int state updates are also performed inside si_update()
between the generic code and the I/O specific code. This definitely makes
sense as after this call we can simply check again for channel and SI flag
changes and decide to loop once again or not.
The rationale here is that we should never need to try to send() at the
beginning of process_stream() because :
- if something was pending, it's very unlikely that it was unblocked
and not sent just between the last poll() and the wakeup instant.
- if something pending was recently sent, then we don't have anything
to send anymore.
So at first glance it doesn't seem like there could be any valid case
where trying to send before entering the function brings any benefit.
If a buffer allocation failed, we have SI_FL_WAIT_ROOM set and c_size(buf)
being zero. It's the only moment where we have a new opportunity to try to
allocate this buffer. However we don't want to waste our time trying this
if both are non-null since it indicates missing room without any changed
condition.
Well that's only 3 places (applet.c, stream_interface.c, hlua.c). This
ensures we always clear SI_FL_WAIT_ROOM before setting it on failure,
so that it is granted that SI_FL_WAIT_ROOM always indicates a lack of
room for doing an operation, including the inability to allocate a
buffer for this.
The vars_prune() and vars_init() functions involve locking while most of
the time there is no variable at all in streams nor sessions. Let's check
for emptiness before calling these functions. Simply doing this has
increased the multithreaded performance from 1.5 to 5% depending on the
workload.
While "option prefer-last-server" only applies to non-deterministic load
balancing algorithms, 401/407 responses actually caused haproxy to prefer
the last server unconditionally.
As this breaks deterministic load balancing algorithms like uri, this
patch applies the same condition here.
Should be backported to 1.8 (together with "BUG/MINOR: only mark
connections private if NTLM is detected").
Instead of marking all connections that see a 401/407 response private
(for connection reuse), this patch detects a RFC4559/NTLM authentication
scheme and restricts the private setting to those connections.
This is so we can reuse connections with 401/407 responses with
deterministic load balancing algorithms later (which requires another fix).
This fixes the problem reported here by Elliot Barlas :
https://discourse.haproxy.org/t/unable-to-configure-load-balancing-per-request-over-persistent-connection/3144
Should be backported to 1.8.
The behaviour of the flag CF_WRITE_PARTIAL was modified by commit
95fad5ba4 ("BUG/MAJOR: stream-int: don't re-arm recv if send fails") due
to a situation where it could trigger an immediate wake up of the other
side, both acting in loops via the FD cache. This loss has caused the
need to introduce CF_WRITE_EVENT as commit c5a9d5bf, to replace it, but
both flags express more or less the same thing and this distinction
creates a lot of confusion and complexity in the code.
Since the FD cache now acts via tasklets, the issue worked around in the
first patch no longer exists, so it's more than time to kill this hack
and to restore CF_WRITE_PARTIAL's semantics (i.e.: there has been some
write activity since we last left process_stream).
This patch mostly reverts the two commits above. Only the part making
use of CF_WROTE_DATA instead of CF_WRITE_PARTIAL to detect the loss of
data upon connection setup was kept because it's more accurate and
better suited.
With this patch we avoid parsing "max-object-size" with atoi() and we store its
value as an unsigned int to prevent bad implicit conversion issues especially
when we compare it with others unsigned value (content length).
With this patch we check that shctx_init() does not returns 0.
This is possible if the maxblocks argument, which is passed as an
int, is negative due to an implicit conversion.
Must be backported to 1.8.
With this patch we support cache size larger than 2047 (MB) and prevent haproxy from crashing when "total-max-size" is parsed as negative values by atoi().
The limit at parsing time is 4095 MB (UINT_MAX >> 20).
May be backported to 1.8.
This patch adds "max-object-size" option to the cache to limit
the size in bytes of the HTTP objects to be cached. When not provided,
the maximum size of an HTTP object is a 256th of the cache size.
This patch makes the capable of storing HTTP objects larger than a buffer.
It makes usage of the "block by block shared object allocation" new shctx API.
A new pointer to struct shared_block has been added to the cache applet
context to memorize the next block to be used by the HTTP cache I/O handler
http_cache_io_handler() to emit the data. Another member, named "sent" memorize
the number of bytes already sent by this handler. So, to send an object from cache,
http_cache_io_handler() must be called until "sent" counter reaches the size
of this object.
This patch makes shctx capable of storing objects in several parts,
each parts being made of several blocks. There is no more need to
walk through until reaching the end of a row to append new blocks.
A new pointer to a struct shared_block member, named last_reserved,
has been added to struct shared_block so that to memorize the last block which was
reserved by shctx_row_reserve_hot(). Same thing about "last_append" pointer which
is used to memorize the last block used by shctx_row_data_append() to store the data.
Fred reported a random crash related to the pools. This was introduced
by commit e18db9e98 ("MEDIUM: pools: implement a thread-local cache for
pool entries") because the minimum pool item size should have been
increased to 32 bytes to accommodate the 2 double-linked lists.
No backport is needed.
This option makes a proxy use only HTX-compatible muxes instead of the
HTTP-compatible ones for HTTP modes. It must be set on both ends, this
is checked at parsing time.
With the previous connection model, when we purposely decided to stop
receiving in order to avoid polling after a complete request was received
for example, it was needed to set SI_FL_WAIT_ROOM to prevent receive
polling from being re-armed. Now with the new subscription-based model
there is no such thing anymore and there is noone to remove this flag
either. Thus if a request takes more than one packet to come in or
spans over too many packets, this flag will cause it to wait forever.
Let's simply remove this flag now.
This patch should not be backported since older versions still need
that this flag is set here to stop receiving.
We wake up all the streams waiting to send data when we have space available
in the mux buffer. Doing so means we probably wake way too many streams,
because after a few the buffer will probably be full instead. So keep a
list of all the streams that are about to send data, and if we detect that
the buffer is full, unschedule the tasks and put the streams back to the
send_list.
Make sure we call tasklet_free() only after si_release_endpoint(), when the
unsubscribe() method has been called, so that we're sure the mux won't
attempt to access the taslet.
Avoid using conn_xprt_want_send/recv, and totally nuke cs_want_send/recv,
from the upper layers. The polling is now directly handled by the connection
layer, it is activated on subscribe(), and unactivated once we got the event
and we woke the related task.
In h2_recv(), return 1 if we have data available, or if h2_recv_allowed()
failed, to be sure h2_process() is called.
Also don't subscribe if our buffer is full.
When retrying to connect to a server, because the previous connection failed,
make sure if we subscribed to the previous connection, the polling flags will
be true for the new fd.
No backport is needed.
When we're closing a stream, is there's no stream left and a goaway was sent,
close the connection, there's no reason to keep it open.
[wt: it's likely that this is needed in 1.8 as well, though it's unclear
how to trigger this issue, some tests are needed]
Similary to what's been done in 7a6ad88b02d8b74c2488003afb1a7063043ddd2d,
take into account that free_list that free_list is a void **, and so use
a void ** too when attempting to do a CAS.
The purpose is to detect if threads or processes are competing for the
same CPU. This can happen when threads are incorrectly bound, or after a
reload if the previous process still has an important activity. With
threads this situation is problematic because a preempted thread holding
a lock will block other ones waiting for this lock to be released.
A first attempt consisted in measuring the cumulated lost time more
precisely but the system's scheduler is smart enough to try to limit the
thread preemption rate by mostly context switching during poll()'s blank
periods, so most of the time lost is not seen. In essence this is good
because it means a thread is not preempted with a lock held, and even
regarding the rendez-vous point it cannot prevent the other ones from
making progress. But still it happens tens to hundreds of times per
second that a thread might be preempted, so it's still possible to detect
that the situation is happening, thus it's interesting to measure and
report its frequency.
Each time we enter the poller, we check the CPU time spent working and
see if we've lost time doing something else. To limit false positives,
we're only interested in losses of 500 microseconds or more (i.e. half
a clock tick on a 1 kHz system). If so, it indicates that some time was
stolen by another thread or process. Note that we purposely store some
sub-millisecond counters so that under heavy traffic with a 1 kHz clock,
it's still possible to measure something without being subject to the
risk of rounding errors (i.e. if exactly 1 ms is stolen it's possible
that the time difference could often be slightly lower).
This counter of lost CPU time slots time is reported in "show activity"
in numbers of milliseconds of CPU lost per second, per 15s, and total
over the process' life. By definition, the per-second counter cannot
report values larger than 1000 per thread per second and the 15s one
will be limited to 15000/s in the worst case, but it's possible that
peak values exceed such thresholds after long pauses.
The calls to HA_ATOMIC_CAS() on the lockfree version of the pool allocator
were mistakenly done on (void*) for the old value instead of (void **).
While this has no impact on "recent" gcc, it does have one for gcc < 4.7
since the CAS was open coded and it's not possible to assign a temporary
variable of type "void".
No backport is needed, this only affects 1.9.
By placing this code into time.h (tv_entering_poll() and tv_leaving_poll())
we can remove the logic from the pollers and prepare for extending this to
offer more accurate time measurements.
The 4 pollers all contain the same code used to compute the poll timeout.
This is pointless, let's centralize this into fd.h. This also gets rid of
the useless SCHEDULER_RESOLUTION macro which used to work arond a very old
linux 2.2 bug causing select() to wake up slightly before the timeout.
There are as many ways to build the globalfilepathlen variable as branches
in the if/then/else, creating lots of confusion. Address the most obvious
parts, but some polishing definitely is still needed.