When a connection error is encountered on a server and the server's
connection pool is full, pending connections are not woken up because
the current connection is still accounted for on the server, so it
still appears full. This becomes visible on a server which has
"maxconn 1" because the pending connections will only be able to
expire in the queue.
Now we take care of releasing our current connection before trying to
offer it to another pending request, so that the server can accept a
next connection.
This patch should be backported to 1.4.
HTTP pipelining currently needs to monitor the response buffer to wait
for some free space to be able to send a response. It was not possible
for the HTTP analyser to be called based on response buffer activity.
Now we introduce a new buffer flag BF_WAKE_ONCE which is set when the
HTTP request analyser is set on the response buffer and some activity
is detected. This is not clean at all but once of the only ways to fix
the issue before we make it possible to register events for analysers.
Also it appeared that one realign condition did not cover all cases.
Analysers were re-evaluated when some flags were still present in the
buffers, even if they had not changed since previous pass, resulting
in a waste of CPU cycles.
Ensuring that the flags have changed has saved some useless calls :
function min calls per session (before -> after)
http_request_forward_body 5 -> 4
http_response_forward_body 3 -> 2
http_sync_req_state 10 -> 8
http_sync_res_state 8 -> 6
http_resync_states 8 -> 6
The stream_sock's accept() used to close the FD upon error, but this
was also sometimes performed by the frontend's accept() called via the
session's accept(). Those interlaced calls were also responsible for the
spaghetti-looking error unrolling code in session.c and stream_sock.c.
Now the frontend must not close the FD anymore, the session is responsible
for that. It also takes care of just closing the FD or also removing from
the FD lists, depending on its state. The socket-level accept() does not
have to care about that anymore.
Some Alert() messages were remaining in the accept() path, which they
would have no chance to be detected. Remove some of them (the impossible
ones) and replace the relevant ones with send_log() so that the admin
has a chance to catch them.
Enhance pattern convs and fetch argument parsing, now fetchs and convs callbacks used typed args.
Add more details on error messages on parsing pattern expression function.
Update existing pattern convs and fetchs to new proto.
Create stick table key type "binary".
Manage Truncation and padding if pattern's fetch-converted result don't match table key size.
If a read shutdown is encountered on the first packet of a connection
right after the data and the last analyser is unplugged at the same
time, then that last data chunk may never be forwarded. In practice,
right now it cannot happen on requests due to the way they're scheduled,
nor can it happen on responses due to the way their analysers work.
But this behaviour has been observed with new response analysers being
developped.
The reason is that when the read shutdown is encountered and an analyser
is present, data cannot be forwarded but the BF_SHUTW_NOW flag is set.
After that, the analyser gets called and unplugs itself, hoping that
process_session() will automatically forward the data. This does not
happen due to BF_SHUTW_NOW.
Simply removing the test on this flag is not enough because then aborted
requests still get forwarded, due to the forwarding code undoing the
abort.
The solution here consists in checking BF_SHUTR_NOW instead of BF_SHUTW_NOW.
BF_SHUTR_NOW is only set on aborts and remains set until ->shutr() is called.
This is enough to catch recent aborts but not prevent forwarding in other
cases. Maybe a new special buffer flag "BF_ABORT" might be desirable in the
future.
This patch does not need to be backported because older versions don't
have the analyser which make the problem appear.
This counter is incremented for each incoming connection and each active
listener, and is used to prevent haproxy from stopping upon SIGUSR1. It
will thus be possible for some tasks in increment this counter in order
to prevent haproxy from dying until they have completed their job.
The assumption that there was a 1:1 relation between tracked counters and
the frontend/backend role was wrong. It is perfectly possible to track the
track-fe-counters from the backend and the track-be-counters from the
frontend. Thus, in order to reduce confusion, let's remove this useless
{fe,be} reference and simply use {1,2} instead. The keywords have also been
renamed in order to limit confusion. The ACL rule action now becomes
"track-sc{1,2}". The ACLs are now "sc{1,2}_*" instead of "trk{fe,be}_*".
That means that we can reasonably document "sc1" and "sc2" (sticky counters
1 and 2) as sort of patterns that are available during the whole session's
life and use them just like any other pattern.
Having a single tracking pointer for both frontend and backend counters
does not work. Instead let's have one for each. The keyword has changed
to "track-be-counters" and "track-fe-counters", and the ACL "trk_*"
changed to "trkfe_*" and "trkbe_*".
This patch adds support for the following session counters :
- http_req_cnt : HTTP request count
- http_req_rate: HTTP request rate
- http_err_cnt : HTTP request error count
- http_err_rate: HTTP request error rate
The equivalent ACLs have been added to check the tracked counters
for the current session or the counters of the current source.
This counter may be used to track anything. Two sets of ACLs are available
to manage it, one gets its value, and the other one increments its value
and returns it. In the second case, the entry is created if it did not
exist.
Thus it is possible for example to mark a source as being an abuser and
to keep it marked as long as it does not wait for the entry to expire :
# The rules below use gpc0 to track abusers, and reject them if
# a source has been marked as such. The track-counters statement
# automatically refreshes the entry which will not expire until a
# 1-minute silence is respected from the source. The second rule
# evaluates the second part if the first one is true, so GPC0 will
# be increased once the conn_rate is above 100/5s.
stick-table type ip size 200k expire 1m store conn_rate(5s),gpc0
tcp-request track-counters src
tcp-request reject if { trk_get_gpc0 gt 0 }
tcp-request reject if { trk_conn_rate gt 100 } { trk_inc_gpc0 gt 0}
Alternatively, it is possible to let the entry expire even in presence of
traffic by swapping the check for gpc0 and the track-counters statement :
stick-table type ip size 200k expire 1m store conn_rate(5s),gpc0
tcp-request reject if { src_get_gpc0 gt 0 }
tcp-request track-counters src
tcp-request reject if { trk_conn_rate gt 100 } { trk_inc_gpc0 gt 0}
It is also possible not to track counters at all, but entry lookups will
then be performed more often :
stick-table type ip size 200k expire 1m store conn_rate(5s),gpc0
tcp-request reject if { src_get_gpc0 gt 0 }
tcp-request reject if { src_conn_rate gt 100 } { src_inc_gpc0 gt 0}
The '0' at the end of the counter name is there because if we find that more
counters may be useful, other ones will be added.
This function looks up a key, updates its expiration date, or creates
it if it was not found. acl_fetch_src_updt_conn_cnt() was updated to
make use of it.
These counters maintain incoming and outgoing byte rates in a stick-table,
over a period which is defined in the configuration (2 ms to 24 days).
They can be used to detect service abuse and enforce a certain bandwidth
limits per source address for instance, and block if the rate is passed
over. Since 32-bit counters are used to compute the rates, it is important
not to use too long periods so that we don't have to deal with rates above
4 GB per period.
Example :
# block if more than 5 Megs retrieved in 30 seconds from a source.
stick-table type ip size 200k expire 1m store bytes_out_rate(30s)
tcp-request track-counters src
tcp-request reject if { trk_bytes_out_rate gt 5000000 }
# cause a 15 seconds pause to requests from sources in excess of 2 megs/30s
tcp-request inspect-delay 15s
tcp-request content accept if { trk_bytes_out_rate gt 2000000 } WAIT_END
These counters maintain incoming connection rates and session rates
in a stick-table, over a period which is defined in the configuration
(2 ms to 24 days). They can be used to detect service abuse and
enforce a certain accept rate per source address for instance, and
block if the rate is passed over.
Example :
# block if more than 50 requests per 5 seconds from a source.
stick-table type ip size 200k expire 1m store conn_rate(5s),sess_rate(5s)
tcp-request track-counters src
tcp-request reject if { trk_conn_rate gt 50 }
# cause a 3 seconds pause to requests from sources in excess of 20 requests/5s
tcp-request inspect-delay 3s
tcp-request content accept if { trk_sess_rate gt 20 } WAIT_END
Most of the time we'll want to check the connection count of the
criterion we're currently tracking. So instead of duplicating the
src* tests, let's add trk_conn_cnt to report the total number of
connections from the stick table entry currently being tracked.
A nice part of the code was factored, and we should do the same
for the other criteria.
The new "bytes_in_cnt" and "bytes_out_cnt" session counters have been
added. They're automatically updated when session counters are updated.
They can be matched with the "src_kbytes_in" and "src_kbytes_out" ACLs
which apply to the volume per source address. This can be used to deny
access to service abusers.
The new "conn_cur" session counter has been added. It is automatically
updated upon "track XXX" directives, and the entry is touched at the
moment we increment the value so that we don't consider further counter
updates as real updates, otherwise we would end up updating upon completion,
which may not be desired. Probably that some other event counters (eg: HTTP
requests) will have to be updated upon each event though.
This counter can be matched against current session's source address using
the "src_conn_cur" ACL.
It was not normal to have counter fetches in proto_tcp.c. The only
reason was that the key based on the source address was fetched there,
but now we have split the key extraction and data processing, we must
move that to a more appropriate place. Session seems OK since the
counters are all manipulated from here.
Also, since we're precisely counting number of connections with these
ACLs, we rename them src_conn_cnt and src_updt_conn_cnt. This is not
a problem right now since no version was emitted with these keywords.
This patch adds the ability to set a pointer in the session to an
entry in a stick table which holds various counters related to a
specific pattern.
Right now the syntax matches the target syntax and only the "src"
pattern can be specified, to track counters related to the session's
IPv4 source address. There is a special function to extract it and
convert it to a key. But the goal is to be able to later support as
many patterns as for the stick rules, and get rid of the specific
function.
The "track-counters" directive may only be set in a "tcp-request"
statement right now. Only the first one applies. Probably that later
we'll support multi-criteria tracking for a single session and that
we'll have to name tracking pointers.
No counter is updated right now, only the refcount is. Some subsequent
patches will have to bring that feature.
Sometimes it's necessary to be able to perform some "layer 6" analysis
in the backend. TCP request rules were not available till now, although
documented in the diagram. Enable them in backend now.
Since the BF_READ_ATTACHED bug was fixed, a new issue surfaced. When
a connection closes on the return path in tunnel mode while the request
input is already closed, the request analyser which is waiting for a
state change never gets woken up so it never closes the request output.
This causes stuck sessions to remain indefinitely.
One way to reliably reproduce the issue is the following (note that the
client expects a keep-alive but not the server) :
server: printf "HTTP/1.0 303\r\n\r\n" | nc -lp8080
client: printf "GET / HTTP/1.1\r\n\r\n" | nc 127.1 2500
The reason for the issue is that we don't wake the analysers up on
stream interface state changes. So the least intrusive and most reliable
thing to do is to consider stream interface state changes to call the
analysers.
We just need to remember what state each series of analysers have seen
and check for the differences. In practice, that works.
A later improvement later could consist in being able to let analysers
state what they're interested to monitor :
- left SI's state
- right SI's state
- request buffer flags
- response buffer flags
That could help having only one set of analysers and call them once
status changes.
This will be used when an I/O handler running in a stream interface
needs to establish a connection somewhere. We want the session
processor to evaluate both I/O handlers, depending on which side has
one. Doing so also requires that stream_int_update_embedded() wakes
the session up only when the other side is established or has closed,
for instance in order to handle connection errors without looping
indefinitely during the connection setup time.
The session processor still relies on BF_READ_ATTACHED being set,
though we must do whatever is required to remove this dependency.
When a connection is closed on a stream interface, some iohandlers
will need to be informed in order to release some resources. This
normally happens upon a shutr+shutw. It is the equivalent of the
fd_delete() call which is done for real sockets, except that this
time we release internal resources.
It can also be used with real sockets because it does not cost
anything else and might one day be useful.
(cherry picked from commit 61ba936e6858dfcf9964d25870726621d8188fb9)
[ note: the bug was finally not present in 1.5-dev but at least we
have to reset store_count to be compatible with 1.4 ]
Commit d6e9e3b5e320b957e6c491bd92d91afad30ba638 caused recently created
entries to be removed as soon as they were created, breaking stickiness.
It is not clear whether a use-after-free was possible or not in this case.
This bug was reported by Ben Congleton and narrowed down by Hervé Commowick,
both of whom also tested the fix. Thanks to them !
When an entry already exists, we just need to update its expiration
timer. Let's have a dedicated function for that instead of spreading
open code everywhere.
This change also ensures that an update of an existing sticky session
really leads to an update of its expiration timer, which was apparently
not the case till now. This point needs to be checked in 1.4.
Till now sticky sessions only held server IDs. Now there are other
data types so it is not acceptable anymore to overwrite the server ID
when writing something. The server ID must then only be written from
the caller when appropriate. Doing this has also led to separate
lookup and storage.
pattern.c depended on stick_table while in fact it should be the opposite.
So we move from pattern.c everything related to stick_tables and invert the
dependency. That way the code becomes more logical and intuitive.
Right now we're only able to store a server ID in a sticky session.
The goal is to be able to store anything whose size is known at startup
time. For this, we store the extra data before the stksess pointer,
using a negative offset. It will then be easy to cumulate multiple
data provided they each have their own offset.
The frontend's connection was accounted for once the session was
instanciated. This was problematic because the early ACLs weren't
able to correctly account for the number of concurrent connections.
Now we count the connection once it is assigned to the frontend.
It also brings the nice advantage of being more symmetrical, because
the stream_sock's accept() does not have to account for that anymore,
only the session's accept() does.
Now we're able to reject connections very early, so we need to use a
different counter for the connections that are received and the ones
that are accepted and converted into sessions, so that the rate limits
can still apply to the accepted ones. The session rate must still be
used to compute the rate limit, so that we can reject undesired traffic
without affecting the rate.
A new function session_accept() is now called from the lower layer to
instanciate a new session. Once the session is instanciated, the upper
layer's frontent_accept() is called. This one can be service-dependant.
That way, we have a 3-phase accept() sequence :
1) protocol-specific, session-less accept(), which is pointed to by
the listener. It defaults to the generic stream_sock_accept().
2) session_accept() which relies on a frontend but not necessarily
for use in a proxy (eg: stats or any future service).
3) frontend_accept() which performs the accept for the service
offerred by the frontend. It defaults to frontend_accept() which
is really what is used by a proxy.
The TCP/HTTP proxies have been moved to this mode so that we can now rely on
frontend_accept() for any type of session initialization relying on a frontend.
The next step will be to convert the stats to use the same system for the stats.
The conn_retries attribute is now assigned when switching from SI_ST_INI
to SI_ST_REQ. This eliminates one of the last dependencies on the backend
in the frontend's accept() function.
The conn_retries still lies in the session and its initialization depends
on the backend when it may not yet be known. Let's first move it to the
stream interface.
It was particularly embarrassing that the server timeout was assigned
to buffers during an accept() just to be potentially changed later in
case of a use_backend rule. The frontend side has nothing to do with
server timeouts.
Now we initialize them right after the connect() succeeds. Later this
should change for a unique stream-interface timeout setting only.
Calling sess_establish() upon a successful connect() was essential, but
it was not clearly stated whether it was necessary for an access to an
I/O handler or not. While it would be desired, having it automatically
add the response analyzers is quite a problem, and it breaks HTTP stats.
The solution is thus not to call it for now and to perform the few response
initializations as needed.
For the long term, we need to find a way to specify the analyzers to install
during a stream_int_register_handler() if any.
If a "stick store-request" rule is present, an entry is preallocated during
the request. However, if there is no response due to an error or to a redir
mode server, we never release it.
The BF_READ_ATTACHED flag was created to wake analysers once after
a connection was established. It turns out that this flag is never
cleared once set, so even if there is no event, some analysers are
still evaluated for no reason.
The bug was introduced with commit ea38854d34675d5472319c453b7027af42fe8aab.
It may cause slightly increased CPU usages during data transfers, maybe
even quite noticeable once when transferring transfer-encoded data,
due to the fact that the request analysers are being checked for every
chunk.
This fix must be backported in 1.4 after all non-reg tests have been
completed.
This is used to disable persistence depending on some conditions (for
example using an ACL matching static files or a specific User-Agent).
You can see it as a complement to "force-persist".
In the configuration file, the force-persist/ignore-persist declaration
order define the rules priority.
Used with the "appsesion" keyword, it can also help reducing memory usage,
as the session won't be hashed the persistence is ignored.
The following patch fixed an issue but brought another one :
296897 [MEDIUM] connect to servers even when the input has already been closed
The new issue is that when a connection is inspected and aborted using
TCP inspect rules, now it is sent to the server before being closed. So
that test is not satisfying. A probably better way is not to prevent a
connection from establishing if only BF_SHUTW_NOW is set but BF_SHUTW
is not. That way, the BF_SHUTW flag is not set if the request has any
data pending, which still fixes the stats issue, but does not let any
empty connection pass through.
Also, as a safety measure, we extend buffer_abort() to automatically
disable the BF_AUTO_CONNECT flag. While it appears to always be OK,
it is by pure luck, so better safe than sorry.