I/O handlers are still delicate to manipulate. They have no type, they're
just raw functions which have no knowledge of themselves. Let's have them
declared as applets once for all. That way we can have multiple applets
share the same handler functions and we can store their names there. When
we later need to add more parameters (eg: usage stats), we'll be able to
do so in the applets themselves.
The CLI functions has been prefixed with "cli" instead of "stats" as it's
clearly what is going on there.
The applet descriptor in the stream interface should get all the applet
specific data (st0, ...) but this will be done in the next patch so that
we don't pollute this one too much.
When a connection error is encountered on a server and the server's
connection pool is full, pending connections are not woken up because
the current connection is still accounted for on the server, so it
still appears full. This becomes visible on a server which has
"maxconn 1" because the pending connections will only be able to
expire in the queue.
Now we take care of releasing our current connection before trying to
offer it to another pending request, so that the server can accept a
next connection.
This patch should be backported to 1.4.
HTTP pipelining currently needs to monitor the response buffer to wait
for some free space to be able to send a response. It was not possible
for the HTTP analyser to be called based on response buffer activity.
Now we introduce a new buffer flag BF_WAKE_ONCE which is set when the
HTTP request analyser is set on the response buffer and some activity
is detected. This is not clean at all but once of the only ways to fix
the issue before we make it possible to register events for analysers.
Also it appeared that one realign condition did not cover all cases.
Analysers were re-evaluated when some flags were still present in the
buffers, even if they had not changed since previous pass, resulting
in a waste of CPU cycles.
Ensuring that the flags have changed has saved some useless calls :
function min calls per session (before -> after)
http_request_forward_body 5 -> 4
http_response_forward_body 3 -> 2
http_sync_req_state 10 -> 8
http_sync_res_state 8 -> 6
http_resync_states 8 -> 6
The stream_sock's accept() used to close the FD upon error, but this
was also sometimes performed by the frontend's accept() called via the
session's accept(). Those interlaced calls were also responsible for the
spaghetti-looking error unrolling code in session.c and stream_sock.c.
Now the frontend must not close the FD anymore, the session is responsible
for that. It also takes care of just closing the FD or also removing from
the FD lists, depending on its state. The socket-level accept() does not
have to care about that anymore.
Some Alert() messages were remaining in the accept() path, which they
would have no chance to be detected. Remove some of them (the impossible
ones) and replace the relevant ones with send_log() so that the admin
has a chance to catch them.
Enhance pattern convs and fetch argument parsing, now fetchs and convs callbacks used typed args.
Add more details on error messages on parsing pattern expression function.
Update existing pattern convs and fetchs to new proto.
Create stick table key type "binary".
Manage Truncation and padding if pattern's fetch-converted result don't match table key size.
If a read shutdown is encountered on the first packet of a connection
right after the data and the last analyser is unplugged at the same
time, then that last data chunk may never be forwarded. In practice,
right now it cannot happen on requests due to the way they're scheduled,
nor can it happen on responses due to the way their analysers work.
But this behaviour has been observed with new response analysers being
developped.
The reason is that when the read shutdown is encountered and an analyser
is present, data cannot be forwarded but the BF_SHUTW_NOW flag is set.
After that, the analyser gets called and unplugs itself, hoping that
process_session() will automatically forward the data. This does not
happen due to BF_SHUTW_NOW.
Simply removing the test on this flag is not enough because then aborted
requests still get forwarded, due to the forwarding code undoing the
abort.
The solution here consists in checking BF_SHUTR_NOW instead of BF_SHUTW_NOW.
BF_SHUTR_NOW is only set on aborts and remains set until ->shutr() is called.
This is enough to catch recent aborts but not prevent forwarding in other
cases. Maybe a new special buffer flag "BF_ABORT" might be desirable in the
future.
This patch does not need to be backported because older versions don't
have the analyser which make the problem appear.
This counter is incremented for each incoming connection and each active
listener, and is used to prevent haproxy from stopping upon SIGUSR1. It
will thus be possible for some tasks in increment this counter in order
to prevent haproxy from dying until they have completed their job.
The assumption that there was a 1:1 relation between tracked counters and
the frontend/backend role was wrong. It is perfectly possible to track the
track-fe-counters from the backend and the track-be-counters from the
frontend. Thus, in order to reduce confusion, let's remove this useless
{fe,be} reference and simply use {1,2} instead. The keywords have also been
renamed in order to limit confusion. The ACL rule action now becomes
"track-sc{1,2}". The ACLs are now "sc{1,2}_*" instead of "trk{fe,be}_*".
That means that we can reasonably document "sc1" and "sc2" (sticky counters
1 and 2) as sort of patterns that are available during the whole session's
life and use them just like any other pattern.
Having a single tracking pointer for both frontend and backend counters
does not work. Instead let's have one for each. The keyword has changed
to "track-be-counters" and "track-fe-counters", and the ACL "trk_*"
changed to "trkfe_*" and "trkbe_*".
This patch adds support for the following session counters :
- http_req_cnt : HTTP request count
- http_req_rate: HTTP request rate
- http_err_cnt : HTTP request error count
- http_err_rate: HTTP request error rate
The equivalent ACLs have been added to check the tracked counters
for the current session or the counters of the current source.
This counter may be used to track anything. Two sets of ACLs are available
to manage it, one gets its value, and the other one increments its value
and returns it. In the second case, the entry is created if it did not
exist.
Thus it is possible for example to mark a source as being an abuser and
to keep it marked as long as it does not wait for the entry to expire :
# The rules below use gpc0 to track abusers, and reject them if
# a source has been marked as such. The track-counters statement
# automatically refreshes the entry which will not expire until a
# 1-minute silence is respected from the source. The second rule
# evaluates the second part if the first one is true, so GPC0 will
# be increased once the conn_rate is above 100/5s.
stick-table type ip size 200k expire 1m store conn_rate(5s),gpc0
tcp-request track-counters src
tcp-request reject if { trk_get_gpc0 gt 0 }
tcp-request reject if { trk_conn_rate gt 100 } { trk_inc_gpc0 gt 0}
Alternatively, it is possible to let the entry expire even in presence of
traffic by swapping the check for gpc0 and the track-counters statement :
stick-table type ip size 200k expire 1m store conn_rate(5s),gpc0
tcp-request reject if { src_get_gpc0 gt 0 }
tcp-request track-counters src
tcp-request reject if { trk_conn_rate gt 100 } { trk_inc_gpc0 gt 0}
It is also possible not to track counters at all, but entry lookups will
then be performed more often :
stick-table type ip size 200k expire 1m store conn_rate(5s),gpc0
tcp-request reject if { src_get_gpc0 gt 0 }
tcp-request reject if { src_conn_rate gt 100 } { src_inc_gpc0 gt 0}
The '0' at the end of the counter name is there because if we find that more
counters may be useful, other ones will be added.
This function looks up a key, updates its expiration date, or creates
it if it was not found. acl_fetch_src_updt_conn_cnt() was updated to
make use of it.
These counters maintain incoming and outgoing byte rates in a stick-table,
over a period which is defined in the configuration (2 ms to 24 days).
They can be used to detect service abuse and enforce a certain bandwidth
limits per source address for instance, and block if the rate is passed
over. Since 32-bit counters are used to compute the rates, it is important
not to use too long periods so that we don't have to deal with rates above
4 GB per period.
Example :
# block if more than 5 Megs retrieved in 30 seconds from a source.
stick-table type ip size 200k expire 1m store bytes_out_rate(30s)
tcp-request track-counters src
tcp-request reject if { trk_bytes_out_rate gt 5000000 }
# cause a 15 seconds pause to requests from sources in excess of 2 megs/30s
tcp-request inspect-delay 15s
tcp-request content accept if { trk_bytes_out_rate gt 2000000 } WAIT_END
These counters maintain incoming connection rates and session rates
in a stick-table, over a period which is defined in the configuration
(2 ms to 24 days). They can be used to detect service abuse and
enforce a certain accept rate per source address for instance, and
block if the rate is passed over.
Example :
# block if more than 50 requests per 5 seconds from a source.
stick-table type ip size 200k expire 1m store conn_rate(5s),sess_rate(5s)
tcp-request track-counters src
tcp-request reject if { trk_conn_rate gt 50 }
# cause a 3 seconds pause to requests from sources in excess of 20 requests/5s
tcp-request inspect-delay 3s
tcp-request content accept if { trk_sess_rate gt 20 } WAIT_END
Most of the time we'll want to check the connection count of the
criterion we're currently tracking. So instead of duplicating the
src* tests, let's add trk_conn_cnt to report the total number of
connections from the stick table entry currently being tracked.
A nice part of the code was factored, and we should do the same
for the other criteria.
The new "bytes_in_cnt" and "bytes_out_cnt" session counters have been
added. They're automatically updated when session counters are updated.
They can be matched with the "src_kbytes_in" and "src_kbytes_out" ACLs
which apply to the volume per source address. This can be used to deny
access to service abusers.
The new "conn_cur" session counter has been added. It is automatically
updated upon "track XXX" directives, and the entry is touched at the
moment we increment the value so that we don't consider further counter
updates as real updates, otherwise we would end up updating upon completion,
which may not be desired. Probably that some other event counters (eg: HTTP
requests) will have to be updated upon each event though.
This counter can be matched against current session's source address using
the "src_conn_cur" ACL.
It was not normal to have counter fetches in proto_tcp.c. The only
reason was that the key based on the source address was fetched there,
but now we have split the key extraction and data processing, we must
move that to a more appropriate place. Session seems OK since the
counters are all manipulated from here.
Also, since we're precisely counting number of connections with these
ACLs, we rename them src_conn_cnt and src_updt_conn_cnt. This is not
a problem right now since no version was emitted with these keywords.
This patch adds the ability to set a pointer in the session to an
entry in a stick table which holds various counters related to a
specific pattern.
Right now the syntax matches the target syntax and only the "src"
pattern can be specified, to track counters related to the session's
IPv4 source address. There is a special function to extract it and
convert it to a key. But the goal is to be able to later support as
many patterns as for the stick rules, and get rid of the specific
function.
The "track-counters" directive may only be set in a "tcp-request"
statement right now. Only the first one applies. Probably that later
we'll support multi-criteria tracking for a single session and that
we'll have to name tracking pointers.
No counter is updated right now, only the refcount is. Some subsequent
patches will have to bring that feature.
Sometimes it's necessary to be able to perform some "layer 6" analysis
in the backend. TCP request rules were not available till now, although
documented in the diagram. Enable them in backend now.
Since the BF_READ_ATTACHED bug was fixed, a new issue surfaced. When
a connection closes on the return path in tunnel mode while the request
input is already closed, the request analyser which is waiting for a
state change never gets woken up so it never closes the request output.
This causes stuck sessions to remain indefinitely.
One way to reliably reproduce the issue is the following (note that the
client expects a keep-alive but not the server) :
server: printf "HTTP/1.0 303\r\n\r\n" | nc -lp8080
client: printf "GET / HTTP/1.1\r\n\r\n" | nc 127.1 2500
The reason for the issue is that we don't wake the analysers up on
stream interface state changes. So the least intrusive and most reliable
thing to do is to consider stream interface state changes to call the
analysers.
We just need to remember what state each series of analysers have seen
and check for the differences. In practice, that works.
A later improvement later could consist in being able to let analysers
state what they're interested to monitor :
- left SI's state
- right SI's state
- request buffer flags
- response buffer flags
That could help having only one set of analysers and call them once
status changes.
This will be used when an I/O handler running in a stream interface
needs to establish a connection somewhere. We want the session
processor to evaluate both I/O handlers, depending on which side has
one. Doing so also requires that stream_int_update_embedded() wakes
the session up only when the other side is established or has closed,
for instance in order to handle connection errors without looping
indefinitely during the connection setup time.
The session processor still relies on BF_READ_ATTACHED being set,
though we must do whatever is required to remove this dependency.
When a connection is closed on a stream interface, some iohandlers
will need to be informed in order to release some resources. This
normally happens upon a shutr+shutw. It is the equivalent of the
fd_delete() call which is done for real sockets, except that this
time we release internal resources.
It can also be used with real sockets because it does not cost
anything else and might one day be useful.
(cherry picked from commit 61ba936e6858dfcf9964d25870726621d8188fb9)
[ note: the bug was finally not present in 1.5-dev but at least we
have to reset store_count to be compatible with 1.4 ]
Commit d6e9e3b5e320b957e6c491bd92d91afad30ba638 caused recently created
entries to be removed as soon as they were created, breaking stickiness.
It is not clear whether a use-after-free was possible or not in this case.
This bug was reported by Ben Congleton and narrowed down by Herv Commowick,
both of whom also tested the fix. Thanks to them !
When an entry already exists, we just need to update its expiration
timer. Let's have a dedicated function for that instead of spreading
open code everywhere.
This change also ensures that an update of an existing sticky session
really leads to an update of its expiration timer, which was apparently
not the case till now. This point needs to be checked in 1.4.
Till now sticky sessions only held server IDs. Now there are other
data types so it is not acceptable anymore to overwrite the server ID
when writing something. The server ID must then only be written from
the caller when appropriate. Doing this has also led to separate
lookup and storage.
pattern.c depended on stick_table while in fact it should be the opposite.
So we move from pattern.c everything related to stick_tables and invert the
dependency. That way the code becomes more logical and intuitive.
Right now we're only able to store a server ID in a sticky session.
The goal is to be able to store anything whose size is known at startup
time. For this, we store the extra data before the stksess pointer,
using a negative offset. It will then be easy to cumulate multiple
data provided they each have their own offset.
The frontend's connection was accounted for once the session was
instanciated. This was problematic because the early ACLs weren't
able to correctly account for the number of concurrent connections.
Now we count the connection once it is assigned to the frontend.
It also brings the nice advantage of being more symmetrical, because
the stream_sock's accept() does not have to account for that anymore,
only the session's accept() does.
Now we're able to reject connections very early, so we need to use a
different counter for the connections that are received and the ones
that are accepted and converted into sessions, so that the rate limits
can still apply to the accepted ones. The session rate must still be
used to compute the rate limit, so that we can reject undesired traffic
without affecting the rate.
A new function session_accept() is now called from the lower layer to
instanciate a new session. Once the session is instanciated, the upper
layer's frontent_accept() is called. This one can be service-dependant.
That way, we have a 3-phase accept() sequence :
1) protocol-specific, session-less accept(), which is pointed to by
the listener. It defaults to the generic stream_sock_accept().
2) session_accept() which relies on a frontend but not necessarily
for use in a proxy (eg: stats or any future service).
3) frontend_accept() which performs the accept for the service
offerred by the frontend. It defaults to frontend_accept() which
is really what is used by a proxy.
The TCP/HTTP proxies have been moved to this mode so that we can now rely on
frontend_accept() for any type of session initialization relying on a frontend.
The next step will be to convert the stats to use the same system for the stats.
The conn_retries attribute is now assigned when switching from SI_ST_INI
to SI_ST_REQ. This eliminates one of the last dependencies on the backend
in the frontend's accept() function.
The conn_retries still lies in the session and its initialization depends
on the backend when it may not yet be known. Let's first move it to the
stream interface.
It was particularly embarrassing that the server timeout was assigned
to buffers during an accept() just to be potentially changed later in
case of a use_backend rule. The frontend side has nothing to do with
server timeouts.
Now we initialize them right after the connect() succeeds. Later this
should change for a unique stream-interface timeout setting only.
Calling sess_establish() upon a successful connect() was essential, but
it was not clearly stated whether it was necessary for an access to an
I/O handler or not. While it would be desired, having it automatically
add the response analyzers is quite a problem, and it breaks HTTP stats.
The solution is thus not to call it for now and to perform the few response
initializations as needed.
For the long term, we need to find a way to specify the analyzers to install
during a stream_int_register_handler() if any.
If a "stick store-request" rule is present, an entry is preallocated during
the request. However, if there is no response due to an error or to a redir
mode server, we never release it.
The BF_READ_ATTACHED flag was created to wake analysers once after
a connection was established. It turns out that this flag is never
cleared once set, so even if there is no event, some analysers are
still evaluated for no reason.
The bug was introduced with commit ea38854d34.
It may cause slightly increased CPU usages during data transfers, maybe
even quite noticeable once when transferring transfer-encoded data,
due to the fact that the request analysers are being checked for every
chunk.
This fix must be backported in 1.4 after all non-reg tests have been
completed.
This is used to disable persistence depending on some conditions (for
example using an ACL matching static files or a specific User-Agent).
You can see it as a complement to "force-persist".
In the configuration file, the force-persist/ignore-persist declaration
order define the rules priority.
Used with the "appsesion" keyword, it can also help reducing memory usage,
as the session won't be hashed the persistence is ignored.
The following patch fixed an issue but brought another one :
296897 [MEDIUM] connect to servers even when the input has already been closed
The new issue is that when a connection is inspected and aborted using
TCP inspect rules, now it is sent to the server before being closed. So
that test is not satisfying. A probably better way is not to prevent a
connection from establishing if only BF_SHUTW_NOW is set but BF_SHUTW
is not. That way, the BF_SHUTW flag is not set if the request has any
data pending, which still fixes the stats issue, but does not let any
empty connection pass through.
Also, as a safety measure, we extend buffer_abort() to automatically
disable the BF_AUTO_CONNECT flag. While it appears to always be OK,
it is by pure luck, so better safe than sorry.
The BF_AUTO_CLOSE flag prevented a connection from establishing on
a server if the other side's input channel was already closed. This
is wrong because there may be pending data to be sent.
This was causing an issue with stats, as noticed and reported by
Cyril Bont. Since the stats are now handled as a server, sometimes
concurrent accesses were causing one of the connections to send the
shutdown(write) before the connection to the stats function was
established, which aborted it early.
This fix causes the BF_AUTO_CLOSE flag to be checked only when the
connection on the outgoing stream interface has reached an established
state. That way we're still able to connect, send the request then
close.
This duplicate test should have been removed with the loop rework but was forgotten.
It was harmless, but disassembly shows that it prevents gcc from correctly optimizing
the loop.
Often we need to understand why some transfers were aborted or what
constitutes server response errors. With those two counters, it is
now possible to detect an unexpected transfer abort during a data
phase (eg: too short HTTP response), and to know what part of the
server response errors may in fact be assigned to aborted transfers.
Some people have reported seeing "SL" flags in their logs quite often while
this should never happen. The reason was that then a server error is detected,
we close the connection to that server and when we decide what state we were
in, we see the connection is closed, and deduce it was the last data transfer,
which is wrong. We should report DATA if the previous state was an established
state, which this patch does.
Now logs correctly report "SD" and not "SL" when a server resets a connection
before the end of the transfer.
It is wrong to merge FE and BE stats for a proxy because when we consult a
BE's stats, it reflects the FE's stats eventhough the BE has received no
traffic. The most common example happens with listen instances, where the
backend gets credited for all the trafic even when a use_backend rule makes
use of another backend.
This is used to force access to down servers for some requests. This
is useful when validating that a change on a server correctly works
before enabling the server again.
The initial code's intention was to loop on the analysers as long
as an analyser is added by another one. [This code was wrong due to
the while(0) which breaks even on a continue statement, but the
initial intention must be changed too]. In fact we should limit the
number of times we loop on analysers in order to limit latency.
Using maxpollevents as a limit makes sense since this tunable is
used for the exact same purposes. We may add another tunable later
if that ever makes sense, so it's very unlikely.
That patch was incorrect because under some circumstances, the
capture memory could be freed by session_free() and then again
by http_end_txn(), causing a double free and an eventual segfault.
The pool use count was also reported wrong due to this bug.
The cleanup code was removed from session_free() to remain only
in http_end_txn().
Several HTTP analysers used to set those flags to values that
were useful but without considering the possibility that they
were not called again to clean what they did. First, replace
direct flag manipulation with more explicit macros. Second,
enforce a rule stating that any buffer which changes one of
these flags from the default must restore it after completion,
so that other analysers see correct flags.
With both this fix and the previous one about analyser bits,
we should not see any more stuck sessions.
A request analyser may very well be added while processing a response
(eg: end of an HTTP keep-alive response). It's very dangerous to only
rely on flags that ought to change in order to loop back, so let's
correctly detect a possible new analyser addition instead of guessing.
With the introduction of keep-alive, we have created situations
where an analyser can add other analysers to the current list,
which are behind it, which have already been processed once, and
which are needed immediately because without them there will be
no more I/O activity. This is typically the case for enabling
reading of a new request after preparing for a new request.
Instead of creating specific cases for some analysers (there was
already one such before), we now use a little bit of algorithmics
to create an ordered bit chain supporting priorities and fast
operations.
Another advantage of this new construction is that it's not a
real loop anymore, so if an analyser is unknown, it will not
loop but just ignore it.
Note that it is easy to skip multiple analysers at once now in
order to speed up the checking a bit. Some test code has shown
a minor gain though.
This change has been carefully re-read and has no direct reason
of causing a regression. However it has been tagged "major"
because the fact that it runs the analysers correctly might
trigger an old sleeping bug somewhere in one of the analysers.
Doing this helps us flush the system buffers from all unread data. This
avoids having orphans when clients suddenly get off the net without
reading their entire response.
The "forceclose" option used to close the output channel to the
server once it started to respond. While this happened to work with
most servers, some of them considered this as a connection abort and
immediately stopped responding.
Now that we're aware of the end of a request and response, we're able
to trivially handle this option and properly close both sides when the
server's response is complete.
During this change it appeared that forwarding could be allowed when
the BF_SHUTW_NOW flag was set on a buffer, which obviously is not
acceptable and was causing some trouble. This has been fixed too and
is the reason for the MEDIUM status on this patch.
The body parser will be used in close and keep-alive modes. It follows
the stream to keep in sync with both the request and the response message.
Both chunked transfer-coding and content-length are supported according to
RFC2616.
The multipart/byterange encoding has not yet been implemented and if not
seconded by any of the two other ones, will be forwarded till the close,
as requested by the specification.
Both the request and the response analysers converge into an HTTP_MSG_DONE
state where it will be possible to force a close (option forceclose) or to
restart with a fresh new transaction and maintain keep-alive.
This change is important. All tests are OK but any possible behaviour
change with "option httpclose" might find its root here.
This code really belongs to the http part since it's transaction-specific.
This will also make it easier to later reinitialize a transaction in order
to support keepalive.
We used to apply a limit to each buffer's size in order to leave
some room to rewrite headers, then we used to remove this limit
once the session switched to a data state.
Proceeding that way becomes a problem with keepalive because we
have to know when to stop reading too much data into the buffer
so that we can leave some room again to process next requests.
The principle we adopt here consists in only relying on to_forward+send_max.
Indeed, both of those data define how many bytes will leave the buffer.
So as long as their sum is larger than maxrewrite, we can safely
fill the buffers. If they are smaller, then we refrain from filling
the buffer. This means that we won't risk to fill buffers when
reading last data chunk followed by a POST request and its contents.
The only impact identified so far is that we must ensure that the
BF_FULL flag is correctly dropped when starting to forward. Right
now this is OK because nobody inflates to_forward without using
buffer_forward().
Implement decreasing health based on observing communication between
HAProxy and servers.
Changes in this version 2:
- documentation
- close race between a started check and health analysis event
- don't force fastinter if it is not set
- better names for options
- layer4 support
Changes in this version 3:
- add stats
- port to the current 1.4 tree
This patch extends and corrects the functionality introduced by
"Collect & provide http response codes received from servers":
- responses are now also accounted for frontends
- backend's and frontend's counters are incremented based
on responses sent to client, not received from servers
The code part which waits for an HTTP response has been extracted
from the old function. We now have two analysers and the second one
may re-enable the first one when an 1xx response is encountered.
This has been tested and works.
The calls to stream_int_return() that were remaining in the wait
analyser have been converted to stream_int_retnclose().
This patch has 2 goals :
1. I wanted to test the appsession feature with a small PHP code,
using PHPSESSID. The problem is that when PHP gets an unknown session
id, it creates a new one with this ID. So, when sending an unknown
session to PHP, persistance is broken : haproxy won't see any new
cookie in the response and will never attach this session to a
specific server.
This also happens when you restart haproxy : the internal hash becomes
empty and all sessions loose their persistance (load balancing the
requests on all backend servers, creating a new session on each one).
For a user, it's like the service is unusable.
The patch modifies the code to make haproxy also learn the persistance
from the client : if no session is sent from the server, then the
session id found in the client part (using the URI or the client cookie)
is used to associated the server that gave the response.
As it's probably not a feature usable in all cases, I added an option
to enable it (by default it's disabled). The syntax of appsession becomes :
appsession <cookie> len <length> timeout <holdtime> [request-learn]
This helps haproxy repair the persistance (with the risk of losing its
session at the next request, as the user will probably not be load
balanced to the same server the first time).
2. This patch also tries to reduce the memory usage.
Here is a little example to explain the current behaviour :
- Take a Tomcat server where /session.jsp is valid.
- Send a request using a cookie with an unknown value AND a path
parameter with another unknown value :
curl -b "JSESSIONID=12345678901234567890123456789012" http://<haproxy>/session.jsp;jsessionid=00000000000000000000000000000001
(I know, it's unexpected to have a request like that on a live service)
Here, haproxy finds the URI session ID and stores it in its internal
hash (with no server associated). But it also finds the cookie session
ID and stores it again.
- As a result, session.jsp sends a new session ID also stored in the
internal hash, with a server associated.
=> For 1 request, haproxy has stored 3 entries, with only 1 which will be usable
The patch modifies the behaviour to store only 1 entry (maximum).
This patch allows to collect & provide separate statistics for each socket.
It can be very useful if you would like to distinguish between traffic
generate by local and remote users or between different types of remote
clients (peerings, domestic, foreign).
Currently no "Session rate" is supported, but adding it should be possible
if we found it useful.
We can get rid of the stats analyser by moving all the stats code
to a stream interface applet. Above being cleaner, it provides new
advantages such as the ability to process requests and responses
from the same function and work only with simple state machines.
There's no need for any hijack hack anymore.
The direct advantage for the user are the interactive mode and the
ability to chain several commands delimited by a semi-colon. Now if
the user types "prompt", he gets a prompt from which he can send
as many requests as he wants. All outputs are terminated by a
blank line followed by a new prompt, so this can be used from
external tools too.
The code is not very clean, it needs some rework, but some part
of the dirty parts are due to the remnants of the hijack mode used
in the old functions we call.
The old AN_REQ_STATS_SOCK analyser flag is now unused and has been
removed.
Currently, it's up to process_session() to call the internal tasks
if any are associated to the task being processed. If such a task
is referenced, we don't use ->update() in process_session(), but
only ->iohandler(), which itself is free to use ->update() to
complete its work.
It it also important to understand that an I/O handler may wake the
task up again, for instance because it tries to send data to the
other stream interface, which itself will wake the task up. So
after returning from ->iohandler(), we must check if the task has
been sent back to the runqueue, and if so, immediately return.
We had to add a new stream_interface flag : SI_FL_DONT_WAKE. This flag
is used to indicate that a stream interface is being updated and that
no wake up should be sent to its owner. This will be required for tasks
embedded into stream interfaces. Otherwise, we could have the
owner task send wakeups to itself during status updates, thus
preventing the state from converging. As long as a stream_interface's
status is being monitored and adjusted, there is no reason to wake it
up again, as we know its changes will be seen and considered.
In TCP, we don't want to forward chunks of data, we want to forward
indefinitely. This patch introduces a special value for the amount
of data to be forwarded. When buffer_forward() is called with
BUF_INFINITE_FORWARD, it configures the buffer to never stop
forwarding until the end.
An abort during a connect would go to the SI_ST_CLO state without
the buffers shut. This was causing some sessions to never end if
they would abort before the connect request was initiated. This
bug has been introduced after 1.4-dev2.
The doc has been extended to reflect that too.
The BF_EMPTY flag was once used to indicate an empty buffer. However,
it was used half the time as meaning the buffer is empty for the reader,
and half the time as meaning there is nothing left to send.
"nothing to send" is only indicated by "->send_max=0 && !pipe". Once
we fix this, we discover that the flag is not used anymore. So the
flags has been renamed BF_OUT_EMPTY and means exactly the condition
above, ie, there is nothing to send.
Doing so has allowed us to remove some unused tests for emptiness,
but also to uncover a certain amount of situations where the flag
was not correctly set or tested.
The BF_WRITE_ENA buffer flag became very complex to deal with, because
it was used to :
- enable automatic connection
- enable close forwarding
- enable data forwarding
The last point was not very true anymore since we introduced ->send_max,
but still the test remained everywhere. This was causing issues such as
impossibility to connect without forwarding data, impossibility to prevent
closing when data was forwarded, etc...
This patch clarifies the situation by getting rid of this multi-purpose
flag and replacing it with :
- data forwarding based only on ->send_max || ->pipe ;
- a new BF_AUTO_CONNECT flag to allow automatic connection and only
that ;
- ability to perform an automatic connection when ->send_max or ->pipe
indicate that data is waiting to leave the buffer ;
- a new BF_AUTO_CLOSE flag to let the producer automatically set the
BF_SHUTW_NOW flag when it gets a BF_SHUTR.
During this cleanup, it was discovered that some tests were performed
twice, or that the BF_HIJACK flag was still tested, which is not needed
anymore since ->send_max replcaed it. These places have been fixed too.
These cleanups have also revealed a few areas where the other flags
such as BF_EMPTY are not cleanly used. This will be an opportunity for
a second patch.
This flag was incorrectly used as meaning "close immediately",
while it needs to say "close ASAP". ASAP here means when unsent
data pending in the buffer are sent. This helps cleaning up some
dirty tricks where the buffer output was checking the BF_SHUTR
flag combined with EMPTY and other such things. Now we have a
clearly defined semantics :
- producer sets SHUTR and *may* set SHUTW_NOW if WRITE_ENA is
set, otherwise leave it to the session processor to set it.
- consumer only checks SHUTW_NOW to decide whether or not to
call shutw().
This also induced very minor changes at some locations which were
not protected against buffer changes while the SHUTW_NOW flag was
set. Now we prevent send_max from changing when the flag is set.
Several tests have been run without any unexpected behaviour detected.
Some more cleanups are needed, as it clearly appears that some tests
could be removed with stricter semantics.
Tarpit was broken by recent splitting of analysers. It would still
let the connection go to the server due to a missing buffer_write_dis().
Also, it was performed too late (after content switching rules).
We used to call stream_sock_data_finish() directly at the end of
a session update, but if we want to support non-socket interfaces,
we need to have this function configurable. Now we access it via
->update().
The new tune.bufsize and tune.maxrewrite global directives allow one to
change the buffer size and the maxrewrite size. Right now, setting bufsize
too low will block stats sockets which will not be able to write at all.
An error checking must be added to buffer_write_chunk() so that if it
cannot write its message to an empty buffer, it causes the caller to abort.
The first step towards dynamic buffer size consists in removing
all static definitions of the buffer size. Instead, we store a
buffer's size in itself. Right now they're all preinitialized
to BUFSIZE, but we will change that.
sess_establish() used to resort to protocol-specific guesses
in order to set rep->analysers. This is no longer needed as it
gets set from the frontend and the backend as a copy of what
was defined in the configuration.
The remains of the stats socket code has nothing to do in proto_uxst
anymore and must move to dumpstats. The code is much cleaner and more
structured. It was also an opportunity to rename AN_REQ_UNIX_STATS
as AN_REQ_STATS_SOCK as the stats socket is no longer unix-specific
either.
The last item refering to stats in proto_uxst is the setting of the
task's nice value which should in fact come from the listener.
process_session() is now ready to handle unix stats sockets. This
first step works and old code has not been removed. A cleanup is
required. The stats handler is not unix socket-centric anymore and
should move to dumpstats.c.
When a stream interface has no connect() function, it means it is
immediately connected, so we don't need any connection request.
This will be used with unix sockets.
In order to merge the unix session handling code, we have to maintain
the number of per-listener connections in the session. This was only
performed for unix sockets till now.
This Linux-specific option was never really used in production and
has since been superseded by new splicing options brought by recent
Linux kernels.
It caused several particular cases in the code because the kernel
would take care of the session without haproxy being able to do
anything on it, which became hard to handle in the new architecture.
Let's simply get rid of it now that there is a replacement available.
The new statement "persist rdp-cookie" enables RDP cookie
persistence. The RDP cookie is then extracted from the RDP
protocol, and compared against available servers. If a server
matches the RDP cookie, then it gets the connection.
In case of switching from TCP to HTTP, we want the HTTP request timeout
to be properly initialized. For this, we have to jump to the analyser
without breaking out of the loop nor waiting for incoming data. The way
it is done right now is not particularly clean but it works.
A cleaner method might involve pushing function pointers into a circular
list.
The HTTP processing has been splitted into 7 steps, one of which
is not anymore HTTP-specific (content-switching). That way, it
becomes possible to use "use_backend" rules in TCP mode. A new
"use_server" directive should follow soon.
Some stream analysers might become generic enough to be called
for several bits. So we cannot have the analyser bit hard coded
into the analyser itself. Let's make the caller inform the callee.