The code which enables tunnel mode or TCP transfers is rarely used
and at most once per session. Putting it in an unlikely() clause
reduces the length of the hot path of process_session() which is
already quite long, and also slightly reduces its overall size.
Some measurements show a steady gain of about 0.2% thanks to this.
si_connect() used to only return SI_ST_CON. But it already detect the
connection reuse and is the function which avoids calling connect().
So it already knows the connection is valid and reuse. Thus we make it
return SI_ST_EST when a connection is reused. This means that
connect_server() can return this state and sess_update_stream_int()
as well.
Thanks to this change, we don't need to leave process_session() in
SI_ST_CON state to immediately enter it again to switch to SI_ST_EST.
Implementing this removes one call to process_session() per request
in keep-alive mode. We're now at 2 calls per request, which is the
minimum (one for the request and another one for the response). The
number of calls to http_wait_for_response() has also dropped from 2
to one.
Tests indicate a performance gain of about 2.6% in request rate in
keep-alive mode. There should be no gain in http-server-close() since
we don't use this faster path.
At the moment it is possible in sess_prepare_conn_req() to switch to the
established state when the target is an applet. But sess_update_stream_int()
will soon also have the ability to set the established state via
connect_server() when a connection is reused, leading to a synchronous
connect.
So prepare the code to handle this SI_ST_ASS -> SI_ST_EST transition, which
really matches what's done in the lower layers.
Currently there are 3 places in the code where t_connect is set after
switching to state SI_ST_EST, and a fourth one will soon come. Since
all these places lead to an immediate call to sess_establish() to
complete the session establishment, better move that measurement
there.
As soon as we connect to the server, we want to limit the number of
recvfrom() on the response path because most of the time a single
call will retrieve enough information.
At the moment this is only done in the HTTP response parser, after
some reads have already failed, which is too late. We need to do
that at the earliest possible instant. It was already done for the
request side by frontend_accept() for the first request, and by
http_reset_txn() for the next requests.
Thanks to this change, there are no more failed recvfrom() calls in
keep-alive mode.
Since commit 6b66f3e ([MAJOR] implement autonomous inter-socket forwarding)
introduced in 1.3.16-rc1, we've been relying on a stupid mechanism to wake
up the task after a write, which was an exact copy-paste of the reader side.
The principle was that if we empty a buffer and there's no forwarding
scheduled or if the *producer* is not in a connected state, then we wake
the task up.
That does not make any sense. It happens to wake up too late sometimes (eg,
when the request analyser waits for some room in the buffer to start to
work), and leads to unneeded wakeups in client-side keep-alive, because
the task is woken up when the response is sent, while the analysers are
simply waiting for a new request.
In order to fix this, we introduce a new channel flag : CF_WAKE_WRITE. It
is designed so that an analyser can explicitly request being notified when
some data were written. It is used only when the HTTP request or response
analysers need to wait for more room in the buffers. It is automatically
cleared upon wake up.
The flag is also automatically set by the functions which try to write into
a buffer from an applet when they fail (bi_putblk() etc...).
That allows us to remove the stupid condition above and avoid some wakeups.
In http-server-close and in http-keep-alive modes, this reduces from 4 to 3
the average number of wakeups per request, and increases the overall
performance by about 1.5%.
In HTTP keep-alive mode, if we receive a 401, we still have a chance
of being able to send the visitor again to the same server over the
same connection. This is required by some broken protocols such as
NTLM, and anyway whenever there is an opportunity for sending the
challenge to the proper place, it's better to do it (at least it
helps with debugging).
The memset() was put here to corrupt memory for a debugging test,
it's not needed anymore and was unfortunately committed. It does
not harm anyway, it probably just slightly affects performance.
Since recent commit f79c817 (MAJOR: connection: add two new flags to
indicate readiness of control/transport) and the surrounding commits,
the session initialization has been slightly delayed and the control
layer of the connection is not yet initialized when processing the
rules.
We need to move that minimal initialization a bit above.
The bug was introduced with latest changes, no backport is needed.
It's becoming increasingly difficult to ignore unwanted function returns in
debug code with gcc. Now even when you try to work around it, it suggests a
way to write your code differently. For example :
src/frontend.c:187:65: warning: if statement has empty body [-Wempty-body]
if (write(1, trash.str, trash.len) < 0) /* shut gcc warning */;
^
src/frontend.c:187:65: note: put the semicolon on a separate line to silence this warning
1 warning generated.
This is totally unacceptable, this code already had to be written this way
to shut it up in earlier versions. And now it comments the form ? What's the
purpose of the C language if you can't write anymore the code that does what
you want ?
Emeric proposed to just keep a global variable to drain such useless results
so that gcc stops complaining all the time it believes people who write code
are monkeys. The solution is acceptable because the useless assignment is done
only in debug code so it will not impact performance. This patch implements
this, until gcc becomes even "smarter" to detect that we tried to cheat.
SSL and keep-alive will need to be able to fail on allocation errors,
and the stream interface did not allow to report such a cause. The flag
will then be "RC" as already documented.
Some applet users don't need to initialize their applet, they just want
to route the traffic there just as if it were a server. Since applets
are now connected to from session.c, let's simply ensure that when
connecting, the applet in si->end matches the target, and allocate
one there if it's not already done. In case of error, we force the
status code to resource and connection so that it's clear that it
happens because of a memory shortage.
The outgoing connection is now allocated dynamically upon the first attempt
to touch the connection's source or destination address. If this allocation
fails, we fail on SN_ERR_RESOURCE.
As we didn't use si->conn anymore, it was removed. The endpoints are released
upon session_free(), on the error path, and upon a new transaction. That way
we are able to carry the existing server's address across retries.
The stream interfaces are not initialized anymore before session_complete(),
so we could even think about allocating them dynamically as well, though
that would not provide much savings.
The session initialization now makes use of conn_new()/conn_free(). This
slightly simplifies the code and makes it more logical. The connection
initialization code is now shorter by about 120 bytes because it's done
at once, allowing the compiler to remove all redundant initializations.
The si_attach_applet() function now takes care of first detaching the
existing endpoint, and it is called from stream_int_register_handler(),
so we can safely remove the calls to si_release_endpoint() in the
application code around this call.
A call to si_detach() was made upon stream_int_unregister_handler() to
ensure we always free the allocated connection if one was allocated in
parallel to setting an applet (eg: detect HTTP proxy while proceeding
with stats maybe).
si_prepare_conn() is not appropriate in our case as it both initializes and
attaches the connection to the stream interface. Due to the asymmetry between
accept() and connect(), it causes some fields such as the control and transport
layers to be reinitialized.
Now that we can separately initialize these fields using conn_prepare(), let's
break this function to only attach the connection to the stream interface.
Also, by analogy, si_prepare_none() was renamed si_detach(), and
si_prepare_applet() was renamed si_attach_applet().
We don't want to assign the control nor transport layers anymore
at the same time as the data layer, because it prevents one from
keeping existing settings when reattaching a connection to an
existing stream interface.
Let's have conn_attach() replace conn_assign() for this purpose.
Thus, conn_prepare() + conn_attach() do exactly the same as the
previous conn_assign().
Now that we can assign conn->xprt regardless of the initialization state,
we can reintroduce conn_prepare() to set only the protocol, the transport
layer and initialize the transport layer's state.
The first function is used to (re)initialize a stream interface and
the second to force it into a known state. These are intended for
cleaning up the stream interface initialization code in session.c
and peers.c and avoiding future issues with missing initializations.
Currently the control and transport layers of a connection are supposed
to be initialized when their respective pointers are not NULL. This will
not work anymore when we plan to reuse connections, because there is an
asymmetry between the accept() side and the connect() side :
- on accept() side, the fd is set first, then the ctrl layer then the
transport layer ; upon error, they must be undone in the reverse order,
then the FD must be closed. The FD must not be deleted if the control
layer was not yet initialized ;
- on the connect() side, the fd is set last and there is no reliable way
to know if it has been initialized or not. In practice it's initialized
to -1 first but this is hackish and supposes that local FDs only will
be used forever. Also, there are even less solutions for keeping trace
of the transport layer's state.
Also it is possible to support delayed close() when something (eg: logs)
tracks some information requiring the transport and/or control layers,
making it even more difficult to clean them.
So the proposed solution is to add two flags to the connection :
- CO_FL_CTRL_READY is set when the control layer is initialized (fd_insert)
and cleared after it's released (fd_delete).
- CO_FL_XPRT_READY is set when the control layer is initialized (xprt->init)
and cleared after it's released (xprt->close).
The functions have been adapted to rely on this and not on the pointers
anymore. conn_xprt_close() was unused and dangerous : it did not close
the control layer (eg: the socket itself) but still marks the transport
layer as closed, preventing any future call to conn_full_close() from
finishing the job.
The problem comes from conn_full_close() in fact. It needs to close the
xprt and ctrl layers independantly. After that we're still having an issue :
we don't know based on ->ctrl alone whether the fd was registered or not.
For this we use the two new flags CO_FL_XPRT_READY and CO_FL_CTRL_READY. We
now rely on this and not on conn->xprt nor conn->ctrl anymore to decide what
remains to be done on the connection.
In order not to miss some flag assignments, we introduce conn_ctrl_init()
to initialize the control layer, register the fd using fd_insert() and set
the flag, and conn_ctrl_close() which unregisters the fd and removes the
flag, but only if the transport layer was closed.
Similarly, at the transport layer, conn_xprt_init() calls ->init and sets
the flag, while conn_xprt_close() checks the flag, calls ->close and clears
the flag, regardless xprt_ctx or xprt_st. This also ensures that the ->init
and the ->close functions are called only once each and in the correct order.
Note that conn_xprt_close() does nothing if the transport layer is still
tracked.
conn_full_close() now simply calls conn_xprt_close() then conn_full_close()
in turn, which do nothing if CO_FL_XPRT_TRACKED is set.
In order to handle the error path, we also provide conn_force_close() which
ignores CO_FL_XPRT_TRACKED and closes the transport and the control layers
in turns. All relevant instances of fd_delete() have been replaced with
conn_force_close(). Now we always know what state the connection is in and
we can expect to split its initialization.
Everywhere conn_prepare() is used, the call to conn_init() has already
been done. We can now safely replace all instances of conn_prepare()
with conn_assign() which does not reset the transport layer, and remove
conn_prepare().
In order to reduce the dependency over stream-interfaces, we now
attach the incoming connection to the embryonic session's target
instead of the stream-interface's connection. This means we won't
need to initialize stream interfaces anymore after we implement
dynamic connection allocation. The session's target is reset to
NULL after the session has been converted to a complete session.
The connection will only remain there as a pre-allocated entity whose
goal is to be placed in ->end when establishing an outgoing connection.
All connection initialization can be made on this connection, but all
information retrieved should be applied to the end point only.
This change is huge because there were many users of si->conn. Now the
only users are those who initialize the new connection. The difficulty
appears in a few places such as backend.c, proto_http.c, peers.c where
si->conn is used to hold the connection's target address before assigning
the connection to the stream interface. This is why we have to keep
si->conn for now. A future improvement might consist in dynamically
allocating the connection when it is needed.
This function makes no sense anymore and will cause trouble to convert
the remains of connection/applet to end points. Let's replace it now
with its contents.
The long-term goal is to have a context for applets as an alternative
to the connection and not as a complement. At the moment, the context
is still stored into the stream interface, and we only put a pointer
to the applet's context in si->end, initialize the context with object
type OBJ_TYPE_APPCTX, and this allows us not to allocate an entry when
deciding to switch to an applet.
A special care is taken to never dereference si->conn anymore when
dealing with an applet. That's why it's important that si->end is
always set to the proper type :
si->end == NULL => not connected to anything
*si->end == OBJ_TYPE_APPCTX => connected to an applet
*si->end == OBJ_TYPE_CONN => real connection (server, proxy, ...)
The session management code used to check the applet from the connection's
target. Now it uses the stream interface's end point and does not touch the
connection at all. Similarly, we stop checking the connection's addresses
and file descriptors when reporting the applet's status in the stats dump.
Since last commit, we now have a pointer to the applet in the
applet context. So we don't need the si->release function pointer
anymore, it can be extracted from applet->applet.release. At many
places, the ->release function was still tested for real connections
while it is only limited to applets, so most of them were simply
removed. For the remaining valid uses, a new inline function
si_applet_release() was added to simplify the check and the call.
si_prepare_embedded() was used both to attach an applet and to detach
anything from a stream interface. Split it into si_prepare_none() to
detach and si_prepare_applet() to attach an applet.
si->conn->target is now assigned from within these two functions instead
of their respective callers.
Now that applets work like real connections, there is no reason for
them to evade the response analysers. The stats applet emits valid
HTTP responses, it can flow through the HTTP response analyser just
fine. This now allows http-response/rsprep/rspadd rules to be applied
on top of stats. Cookie insertion does nothing since applets are not
servers and thus do not have a cookie. We can imagine compression to be
applied later if the stats output is emitted in chunks and in HTTP/1.1.
A minor visible effect of this change is that there is no more "-1" in
the timers presented in the logs when viewing the stats, all timers are
real.
Instead of having applets bypass the whole connection process, we now
follow the common path through sess_prepare_conn_req(). It is this
function which detects an applet an sets the output state so SI_ST_EST
instead of initiating a connection to a server. It is made possible
because we now have s->target pointing to the applet.
We used to rely on the stream interface's target to detect an applet
from within the session while trying to process the connection request,
but this is incorrect, as this target is the one currently connected
and not the next one to process. This will make a difference when we
later support keep-alive. The only "official" value indicating where
we want to connect is the session's target, which can be :
- &applet : connect to this applet
- NULL : connect using the normal LB algos
- anything else : direct connection to some entity
Since we're interested in detecting the specific case of applets, it's
OK to make use of s->target then.
Also, applets are being isolated from connections, and as such there
will not be any ->connect method available when an applet is running,
so we can get rid of this test as well.
The commit 37e340c (BUG/MEDIUM: stick: completely remove the unused flag
from the store entries) was incomplete. We also need to ensure that only
the first store-response for a table is applied and that it may coexist
with a possible store-request that was already done on this table.
This patch with the previous one should be backported to 1.4.
The store[] array in the session holds a flag which probably aimed to
differenciate store entries learned from the request from those learned
from the response, and allowing responses to overwrite only the request
ones (eg: have a server set a response cookie which overwrites the request
one).
But this flag is set when a response data is stored, and is never cleared.
So in practice, haproxy always runs with this flag set, meaning that
responses prevent themselves from overriding the request data.
It is desirable anyway to keep the ability not to override data, because
the override is performed only based on the table and not on the key, so
that would mean that it would be impossible to retrieve two different
keys to store into a same table. For example, if a client sets a cookie
and a server another one, both need to be updated in the table in the
proper order. This is especially true when multiple keys may be tracked
on each side into the same table (eg: list of IP addresses in a header).
So the correct fix which also maintains the current behaviour consists in
simply removing this flag and never try to optimize for the overwrite case.
This fix also has the benefit of significantly reducing the session size,
by 64 bytes due to alignment issues caused by this flag!
The bug has been there forever (since 1.4-dev7), so a backport to 1.4
would be appropriate.
Commit 986a9d2d12 moved the source address from the stream interface
to the session, but it did not set the flag on the connection to
report that the source address is known. Thus when logs are enabled,
we had a call to getpeername() which is redundant with the result
from accept(). This patch simply sets the flag.
In session_accept(), if we face a memory allocation error, we try to
emit an HTTP 500 error message in HTTP mode. The problem is that we
must not use http_error_message() for this since it dereferences the
session which can be NULL in this case.
We don't need the session to build the error message anyway since
this function only uses it to retrieve the backend and frontend to
get the most suited error message. Let's pick it ourselves, we're
at the beginning of the session, only the frontend is relevant.
This bug is 1.5-specific.
sc_* sample fetches now take an optional parameter which allows to look
the key in an alternate table. This is convenient to pass multiple
information for the same key at once (eg: have multiple gpc0 for the
same key, or support being fed complementary information from the CLI).
Example :
listen front
bind :8000
tcp-request content track-sc0 src table local-ip
http-response set-header src-id %[sc0_get_gpc0]+%[sc0_get_gpc0(global-ip)]
server dummy 127.0.0.1:8001
backend local-ip
stick-table size 1k type ip store gpc0
backend global-ip
stick-table size 1k type ip store gpc0
One very annoying issue when trying to extend the sticky counters beyond
the current 3 counters is that it requires a massive copy-paste of fetch
functions (we don't have to copy-paste code anymore), just so that the
fetch names exist.
So let's have an alternate form like "sc_*(num)" to allow passing the
counter number as an argument without having to redefine new fetch names.
The MAX_SESS_STKCTR macro defines the number of usable sticky counters,
which defaults to 3.
In preparation of more flexibility in the stick counters, make their
number configurable. It still defaults to 3 which is the minimum
accepted value. Changing the value alone is not sufficient to get
more counters, some bitfields still need to be updated and the TCP
actions need to be updated as well, but this update tries to be
easier, which is nice for experimentation purposes.
smp_fetch_sc0_trackers, smp_fetch_sc1_trackers and smp_fetch_sc2_trackers
were merged into a single function which relies on the fetch name to decide
what to return.
This is also a bug fix for this feature which has never worked till its bogus
introduction by commit "2406db4 MEDIUM: counters: add sc1_trackers/sc2_trackers"
(1.5-dev10).
Instead of returning the value in the sample, it was returned as the fetch
result!
There is no need to backport this fix anyway since it's 1.5-specific and
nobody uses the feature.
smp_fetch_sc0_bytes_out_rate, smp_fetch_sc1_bytes_out_rate, smp_fetch_sc2_bytes_out_rate,
smp_fetch_src_bytes_out_rate and smp_fetch_bytes_out_rate were merged into a single
function which relies on the fetch name to decide what to return.
smp_fetch_sc0_kbytes_out, smp_fetch_sc1_kbytes_out, smp_fetch_sc2_kbytes_out,
smp_fetch_src_kbytes_out and smp_fetch_kbytes_out were merged into a single
function which relies on the fetch name to decide what to return.
smp_fetch_sc0_bytes_in_rate, smp_fetch_sc1_bytes_in_rate, smp_fetch_sc2_bytes_in_rate,
smp_fetch_src_bytes_in_rate and smp_fetch_bytes_in_rate were merged into a single
function which relies on the fetch name to decide what to return.
smp_fetch_sc0_kbytes_in, smp_fetch_sc1_kbytes_in, smp_fetch_sc2_kbytes_in,
smp_fetch_src_kbytes_in and smp_fetch_kbytes_in were merged into a single
function which relies on the fetch name to decide what to return.
smp_fetch_sc0_http_err_rate, smp_fetch_sc1_http_err_rate, smp_fetch_sc2_http_err_rate,
smp_fetch_src_http_err_rate and smp_fetch_http_err_rate were merged into a single
function which relies on the fetch name to decide what to return.
smp_fetch_sc0_http_err_cnt, smp_fetch_sc1_http_err_cnt, smp_fetch_sc2_http_err_cnt,
smp_fetch_src_http_err_cnt and smp_fetch_http_err_cnt were merged into a single
function which relies on the fetch name to decide what to return.
smp_fetch_sc0_http_req_rate, smp_fetch_sc1_http_req_rate, smp_fetch_sc2_http_req_rate,
smp_fetch_src_http_req_rate and smp_fetch_http_req_rate were merged into a single
function which relies on the fetch name to decide what to return.
smp_fetch_sc0_http_req_cnt, smp_fetch_sc1_http_req_cnt, smp_fetch_sc2_http_req_cnt,
smp_fetch_src_http_req_cnt and smp_fetch_http_req_cnt were merged into a single
function which relies on the fetch name to decide what to return.
smp_fetch_sc0_sess_rate, smp_fetch_sc1_sess_rate, smp_fetch_sc2_sess_rate,
smp_fetch_src_sess_rate and smp_fetch_sess_rate were merged into a single
function which relies on the fetch name to decide what to return.