For large farms where servers are regularly added or removed, picking
a random server from the pool can ensure faster load transitions than
when using round-robin and less traffic surges on the newly added
servers than when using leastconn.
This commit introduces "balance random". It internally uses a random as
the key to the consistent hashing mechanism, thus all features available
in consistent hashing such as weights and bounded load via hash-balance-
factor are usable. It is extremely convenient because one common concern
when using random is what happens when a server is hammered a bit too
much. Here that can trivially be avoided, like in the configuration below :
backend bk0
balance random
hash-balance-factor 110
server-template s 1-100 127.0.0.1:8000 check inter 1s
Note that while "balance random" internally relies on a hash algorithm,
it holds the same properties as round-robin and as such is compatible with
reusing an existing server connection with "option prefer-last-server".
Commit 522eea7 ("MINOR: ssl: Handle sending early data to server.") added
a dependency on SRV_SSL_O_EARLY_DATA which only exists when USE_OPENSSL
is defined (which is probably not the best solution) and breaks the build
when ssl is not enabled. Just add an ifdef USE_OPENSSL around the block
for now.
This adds a new keyword on the "server" line, "allow-0rtt", if set, we'll try
to send early data to the server, as long as the client sent early data, as
in case the server rejects the early data, we no longer have them, and can't
resend them, so the only option we have is to send back a 425, and we need
to be sure the client knows how to interpret it correctly.
All the references to connections in the data path from streams and
stream_interfaces were changed to use conn_streams. Most functions named
"something_conn" were renamed to "something_cs" for this. Sometimes the
connection still is what matters (eg during a connection establishment)
and were not always renamed. The change is significant and minimal at the
same time, and was quite thoroughly tested now. As of this patch, all
accesses to the connection from upper layers go through the pass-through
mux.
For HTTP/2 and QUIC, we'll need to deal with multiplexed streams inside
a connection. After quite a long brainstorming, it appears that the
connection interface to the existing streams is appropriate just like
the connection interface to the lower layers. In fact we need to have
the mux layer in the middle of the connection, between the transport
and the data layer.
A mux can exist on two directions/sides. On the inbound direction, it
instanciates new streams from incoming connections, while on the outbound
direction it muxes streams into outgoing connections. The difference is
visible on the mux->init() call : in one case, an upper context is already
known (outgoing connection), and in the other case, the upper context is
not yet known (incoming connection) and will have to be allocated by the
mux. The session doesn't have to create the new streams anymore, as this
is performed by the mux itself.
This patch introduces this and creates a pass-through mux called
"mux_pt" which is used for all new connections and which only
calls the data layer's recv,send,wake() calls. One incoming stream
is immediately created when init() is called on the inbound direction.
There should not be any visible impact.
Note that the connection's mux is purposely not set until the session
is completed so that we don't accidently run with the wrong mux. This
must not cause any issue as the xprt_done_cb function is always called
prior to using mux's recv/send functions.
A lock for LB parameters has been added inside the proxy structure and atomic
operations have been used to update server variables releated to lb.
The only significant change is about lb_map. Because the servers status are
updated in the sync-point, we can call recalc_server_map function synchronously
in map_set_server_status_up/down function.
For now, we have a list of each type per thread. So there is no need to lock
them. This is the easiest solution for now, but not the best one because there
is no sharing between threads. An idle connection on a thread will not be able
be used by a stream on another thread. So it could be a good idea to rework this
patch later.
Now, each proxy contains a lock that must be used when necessary to protect
it. Moreover, all proxy's counters are now updated using atomic operations.
srv_queue([<backend>/]<server>) : integer
Returns an integer value corresponding to the number of connections currently
pending in the designated server's queue. If <backend> is omitted, then the
server is looked up in the current backend. It can sometimes be used together
with the "use-server" directive to force to use a known faster server when it
is not much loaded. See also the "srv_conn", "avg_queue" and "queue" sample
fetch methods.
The server state and weight was reworked to handle
"pending" values updated by checks/CLI/LUA/agent.
These values are commited to be propagated to the
LB stack.
In further dev related to multi-thread, the commit
will be handled into a sync point.
Pending values are named using the prefix 'next_'
Current values used by the LB stack are named 'cur_'
2 places were using an open-coded implementation of this function to count
available servers. Note that the avg_queue_size() fetch didn't check that
the proxy was in STOPPED state so it would possibly return a wrong server
count here but that wouldn't impact the returned value.
Signed-off-by: Nenad Merdanovic <nmerdan@haproxy.com>
This is like the nbsrv() sample fetch function except that it works as
a converter so it can count the number of available servers of a backend
name retrieved using a sample fetch or an environment variable.
Signed-off-by: Nenad Merdanovic <nmerdan@haproxy.com>
Keeping the address and the port in the same field causes a lot of problems,
specifically on the DNS part where we're forced to cheat on the family to be
able to keep the port. This causes some issues such as some families not being
resolvable anymore.
This patch first moves the service port to a new field "svc_port" so that the
port field is never used anymore in the "addr" field (struct sockaddr_storage).
All call places were adapted (there aren't that many).
when using "option prefer-last-server", we may not always stay on
the same backend if option balance told us otherwise.
For example, backend may change in the following cases:
balance hdr()
balance rdp-cookie
balance source
balance uri
balance url_param
[wt: backport this to 1.7 and 1.6]
According to nbsrv() documentation this fetcher should return "an
integer value corresponding to the number of usable servers".
In case backend is disabled none of servers is usable, so I believe
fetcher should return 0.
This patch should be backported to 1.7, 1.6, 1.5.
Now we exclusively use xprt_get(XPRT_RAW) instead of &raw_sock or
xprt_get(XPRT_SSL) for &ssl_sock. This removes a bunch of #ifdef and
include spread over a number of location including backend, cfgparse,
checks, cli, hlua, log, server and session.
These 2 patches add ability to fetch frontend/backend name in your
logic, so they can be used later to make routing decisions (fe_name) or
taking some actions based on backend which responded to request (be_name).
In our case we needed a fetcher to be able to extract information we
needed from frontend name.
When a backend doesn't use any known LB algorithm, backend_lb_algo_str()
returns NULL. It used to cause "nil" to be printed in the stats dump
since version 1.4 but causes 1.7 to try to parse this NULL to encode
it as a CSV string, causing a crash on "show stat" in this case.
The only situation where this can happen is when "transparent" or
"dispatch" are used in a proxy, in which case the LB algorithm is
BE_LB_ALGO_NONE. Thus now we explicitly report "none" when this
situation is detected, and we preventively report "unknown" if any
unknown algorithm is detected, which may happen if such an algo is
added in the future and the function is not updated.
This fix must be backported to 1.7 and may be backported as far as
1.4, though it has less impact there.
We'll have to use srv_set_admin_flag() to propagate some server flags
during the startup, and we don't want the resulting actions to cause
warnings, logs nor e-mail alerts to be generated since we're just applying
the config or a state file. So let's condition these notifications to the
fact that we're starting.
For active servers, this is the sum of the eweights of all active
servers before this one in the backend, and
[srv->cumulative_weight .. srv_cumulative_weight + srv_eweight) is a
space occupied by this server in the range [0 .. lbprm.tot_wact), and
likewise for backup servers with tot_wbck. This allows choosing a
server or a range of servers proportional to their weight, by simple
integer comparison.
Signed-off-by: Andrew Rodland <andrewr@vimeo.com>
The "sni" server directive does some bad stuff on many occasions because
it works on a sample of type string and limits len to size-1 by hand. The
problem is that size used to be zero on many occasions before the recent
changes to smp_dup() and that it effectively results in setting len to -1
and writing the zero byte *before* the string (and not terminating the
string).
This patch makes use of the recently introduced smp_make_safe() to address
this issue.
This fix must be backported to 1.6.
Since commit 6879ad3 ("MEDIUM: sample: fill the struct sample with the
session, proxy and stream pointers") merged in 1.6-dev2, the sample
contains the pointer to the stream and sample fetch functions as well
as converters use it heavily.
The problem is that earlier commit 87b0966 ("REORG/MAJOR: session:
rename the "session" entity to "stream"") had split the session and
stream resulting in the possibility for smp->strm to be NULL before
the stream was initialized. This is what happens in tcp-request
connection rulesets, as discovered by Baptiste.
The sample fetch functions must now check that smp->strm is valid
before using it. An alternative could consist in using a dummy stream
with nothing in it to avoid some checks but it would only result in
deferring them to the next step anyway, and making it harder to detect
that a stream is valid or the dummy one.
There is still an issue with variables which requires a complete
independant fix. They use strm->sess to find the session with strm
possibly NULL and passed as an argument. All call places indirectly
use smp->strm to build strm. So the problem is there but the API needs
to be changed to remove this duplicate argument that makes it much
harder to know what pointer to use.
This fix must be backported to 1.6, as well as the next one fixing
variables.
There is a bug in connect_server() : we use si_attach_conn() to offer
the current session's connection to the session we're stealing the
connection from. Unfortunately, si_attach_conn() uses the standard data
connection operations while here we need to use the idle connection
operations.
This results in a situation where when the server's idle timeout strikes,
the read0 is silently ignored, causes the response channel to be shut down
for reads, and the connection remains attached. Next attempt to send a
request when using this connection simply results in nothing being done
because we try to send over an already closed connection. Worse, if the
client aborts, then no timeout remains at all and the session waits
forever and remains assigned to the server.
A more-or-less easy way to reproduce this bug is to have two concurrent
streams each connecting to a different server with "http-reuse aggressive",
typically a cache farm using a URL hash :
stream1: GET /1 HTTP/1.1
stream2: GET /2 HTTP/1.1
stream1: GET /2 HTTP/1.1
wait for the server 1's connection to timeout
stream2: GET /1 HTTP/1.1
The connection hangs here, and "show sess all" shows a closed connection
with a SHUTR on the response channel.
The fix is very simple though not optimal. It consists in calling
si_idle_conn() again after attaching the connection. But in practise
it should not be done like this. The real issue is that there's no way
to cleanly attach a connection to a stream interface without changing
the connection's operations. So the API clearly needs to be revisited
to make such operations easier.
Many thanks to Yves Lafon from W3C for providing lots of useful dumps
and testing patches to help figure the root cause!
This fix must be backported to 1.6.
This was the first transparent proxy technology supported by haproxy
circa 2005 but it was obsoleted in 2007 by Tproxy 4.0 which removed a
lot of the earlier versions' shortcomings and was finally merged into
the kernel. Since nobody has been using cttproxy for many years now
and nobody has even just tried to compile the files, it's time to
remove it. The doc was updated as well.
The union name "data" is a little bit heavy while we read the source
code because we can read "data.data.sint". The rename from "data" to "u"
makes the read easiest like "data.u.sint".
This patch remove the struct information stored both in the struct
sample_data and in the striuct sample. Now, only thestruct sample_data
contains data, and the struct sample use the struct sample_data for storing
his own data.
This strategy is less extreme than "always", it only dispatches first
requests to validated reused connections, and moves a connection from
the idle list to the safe list once it has seen a second request, thus
proving that it could be reused.
The "safe" mode consists in picking existing connections only when
processing a request that's not the first one from a connection. This
ensures that in case where the server finally times out and closes, the
client can decide to replay idempotent requests.
Now instead of closing the existing connection attached to the
stream interface, we first check if the one we pick was attached to
another stream interface, in which case the connections are swapped
if possible (eg: if the current connection is not private). That way
the previous connection remains attached to an existing session and
significantly increases the chances of being reused.
In connect_server(), if we don't have a connection attached to the
stream-int, we first look into the server's idle_conns list and we
pick the first one there, we detach it from its owner if it had one.
If we used to have a connection, we close it.
This mechanism works well but doesn't scale : as servers increase,
the likeliness that the connection attached to the stream interface
doesn't match the server and gets closed increases.
This flag is set on an outgoing connection when this connection gets
some properties that must not be shared with other connections, such
as dynamic transparent source binding, SNI or a proxy protocol header,
or an authentication challenge from the server. This will be needed
later to implement connection reuse.
Since we now always call this function with the reuse parameter cleared,
let's simplify the function's logic as it cannot return the existing
connection anymore. The savings on this inline function are appreciable
(240 bytes) :
$ size haproxy.old haproxy.new
text data bss dec hex filename
1020383 40816 36928 1098127 10c18f haproxy.old
1020143 40816 36928 1097887 10c09f haproxy.new
connect_server() already does most of the check that is done again in
si_alloc_conn(), so let's simply reuse the existing connection instead
of calling the function again. It will also simplify the connection
reuse.
Indeed, for reuse to be set, it also requires srv_conn to be valid. In the
end, the only situation where we have to release the existing connection
and allocate a new one is when reuse == 0.
objt_server() is called multiple times at various places while some
places already make use of srv for this. Let's move the call at the
top of the function and use it all over the place.
This patch removes the 32 bits unsigned integer and the 32 bit signed
integer. It replaces these types by a unique type 64 bit signed.
This makes easy the usage of integer and clarify signed and unsigned use.
With the previous version, signed and unsigned are used ones in place of
others, and sometimes the converter loose the sign. For example, divisions
are processed with "unsigned", if one entry is negative, the result is
wrong.
Note that the integer pattern matching and dotted version pattern matching
are already working with signed 64 bits integer values.
There is one user-visible change : the "uint()" and "sint()" sample fetch
functions which used to return a constant integer have been replaced with
a new more natural, unified "int()" function. These functions were only
introduced in the latest 1.6-dev2 so there's no impact on regular
deployments.
The new "sni" server directive takes a sample fetch expression and
uses its return value as a hostname sent as the TLS SNI extension.
A typical use case consists in forwarding the front connection's SNI
value to the server in a bridged HTTPS forwarder :
sni ssl_fc_sni
This patch removes the structs "session", "stream" and "proxy" from
the sample-fetches and converters function prototypes.
This permits to remove some weight in the prototype call.
The get_server_ph_post() function assumes that the buffer is contiguous.
While this is true for all the header part, it is not necessarily true
for the end of data the fit in the reserve. In this case there's a risk
to read past the end of the buffer for a few hundred bytes, and possibly
to crash the process if what follows is not mapped.
The fix consists in truncating the analyzed length to the length of the
contiguous block that follows the headers.
A config workaround for this bug would be to disable balance url_param.
This fix must be backported to 1.5. It seems 1.4 did have the check.
Many such function need a session, and till now they used to dereference
the stream. Once we remove the stream from the embryonic session, this
will not be possible anymore.
So as of now, sample fetch functions will be called with this :
- sess = NULL, strm = NULL : never
- sess = valid, strm = NULL : tcp-req connection
- sess = valid, strm = valid, strm->txn = NULL : tcp-req content
- sess = valid, strm = valid, strm->txn = valid : http-req / http-res
All of them can now retrieve the HTTP transaction *if it exists* from
the stream and be sure to get NULL there when called with an embryonic
session.
The patch is a bit large because many locations were touched (all fetch
functions had to have their prototype adjusted). The opportunity was
taken to also uniformize the call names (the stream is now always "strm"
instead of "l4") and to fix indent where it was broken. This way when
we later introduce the session here there will be less confusion.
Now this one is dynamically allocated. It means that 280 bytes of memory
are saved per TCP stream, but more importantly that it will become
possible to remove the l7 pointer from fetches and converters since
it will be deduced from the stream and will support being null.
A lot of care was taken because it's easy to forget a test somewhere,
and the previous code used to always trust s->txn for being valid, but
all places seem to have been visited.
All HTTP fetch functions check the txn first so we shouldn't have any
issue there even when called from TCP. When branching from a TCP frontend
to an HTTP backend, the txn is properly allocated at the same time as the
hdr_idx.
When s->si[0].end was dereferenced as a connection or anything in
order to retrieve information about the originating session, we'll
now use sess->origin instead so that when we have to chain multiple
streams in HTTP/2, we'll keep accessing the same origin.
Just like for the listener, the frontend is session-wide so let's move
it to the session. There are a lot of places which were changed but the
changes are minimal in fact.
With HTTP/2, we'll have to support multiplexed streams. A stream is in
fact the largest part of what we currently call a session, it has buffers,
logs, etc.
In order to catch any error, this commit removes any reference to the
struct session and tries to rename most "session" occurrences in function
names to "stream" and "sess" to "strm" when that's related to a session.
The files stream.{c,h} were added and session.{c,h} removed.
The session will be reintroduced later and a few parts of the stream
will progressively be moved overthere. It will more or less contain
only what we need in an embryonic session.
Sample fetch functions and converters will have to change a bit so
that they'll use an L5 (session) instead of what's currently called
"L4" which is in fact L6 for now.
Once all changes are completed, we should see approximately this :
L7 - http_txn
L6 - stream
L5 - session
L4 - connection | applet
There will be at most one http_txn per stream, and a same session will
possibly be referenced by multiple streams. A connection will point to
a session and to a stream. The session will hold all the information
we need to keep even when we don't yet have a stream.
Some more cleanup is needed because some code was already far from
being clean. The server queue management still refers to sessions at
many places while comments talk about connections. This will have to
be cleaned up once we have a server-side connection pool manager.
Stream flags "SN_*" still need to be renamed, it doesn't seem like
any of them will need to move to the session.
These 4 combinations are needlessly complicated since the session already
has direct access to the associated stream interfaces without having to
check an indirect pointer.
The purpose of these two macros will be to pass via the session to
find the relevant stream interfaces so that we don't need to store
the ->cons nor ->prod pointers anymore. Currently they're only defined
so that all references could be removed.
Note that many places need a second pass of clean up so that we don't
have any chn_prod(&s->req) anymore and only &s->si[0] instead, and
conversely for the 3 other cases.
The channels were pointers to outside structs and this is not needed
anymore since the buffers have moved, but this complicates operations.
Move them back into the session so that both channels and stream interfaces
are always allocated for a session. Some places (some early sample fetch
functions) used to validate that a channel was NULL prior to dereferencing
it. Now instead we check if chn->buf is NULL and we force it to remain NULL
until the channel is initialized.
When the destination IP is dynamically set, we can't use the "target"
to define the proto. This patch ensures that we always use the protocol
associated with the address family. The proto field was removed from
the server and check structs.
balance hdr(<name>) provides on option 'use_domain_only' to match only the
domain part in a header (designed for the Host header).
Olivier Fredj reported that the hashes were not the same for
'subdomain.domain.tld' and 'domain.tld'.
This is because the pointer was rewinded one step to far, resulting in a hash
calculated against wrong values :
- '.domai' for 'subdomain.domain.tld'
- ' domai' for 'domain.tld' (beginning with the space in the header line)
Another special case is when no dot can be found in the header : the hash will
be calculated against an empty string.
The patch addresses both cases : 'domain' will be used to compute the hash for
'subdomain.domain.tld', 'domain.tld' and 'domain' (using the whole header value
for the last case).
The fix must be backported to haproxy 1.5 and 1.4.
The path "MAJOR: namespace: add Linux network namespace support" doesn't
permit to use internal data producer like a "peers synchronisation"
system. The result is a segfault when the internal application starts.
This patch fix the commit b3e54fe387
It is introduced in 1.6dev version, it doesn't need to be backported.
This patch makes it possible to create binds and servers in separate
namespaces. This can be used to proxy between multiple completely independent
virtual networks (with possibly overlapping IP addresses) and a
non-namespace-aware proxy implementation that supports the proxy protocol (v2).
The setup is something like this:
net1 on VLAN 1 (namespace 1) -\
net2 on VLAN 2 (namespace 2) -- haproxy ==== proxy (namespace 0)
net3 on VLAN 3 (namespace 3) -/
The proxy is configured to make server connections through haproxy and sending
the expected source/target addresses to haproxy using the proxy protocol.
The network namespace setup on the haproxy node is something like this:
= 8< =
$ cat setup.sh
ip netns add 1
ip link add link eth1 type vlan id 1
ip link set eth1.1 netns 1
ip netns exec 1 ip addr add 192.168.91.2/24 dev eth1.1
ip netns exec 1 ip link set eth1.$id up
...
= 8< =
= 8< =
$ cat haproxy.cfg
frontend clients
bind 127.0.0.1:50022 namespace 1 transparent
default_backend scb
backend server
mode tcp
server server1 192.168.122.4:2222 namespace 2 send-proxy-v2
= 8< =
A bind line creates the listener in the specified namespace, and connections
originating from that listener also have their network namespace set to
that of the listener.
A server line either forces the connection to be made in a specified
namespace or may use the namespace from the client-side connection if that
was set.
For more documentation please read the documentation included in the patch
itself.
Signed-off-by: KOVACS Tamas <ktamas@balabit.com>
Signed-off-by: Sarkozi Laszlo <laszlo.sarkozi@balabit.com>
Signed-off-by: KOVACS Krisztian <hidden@balabit.com>
Commit 98634f0 ("MEDIUM: backend: Enhance hash-type directive with an
algorithm options") cleaned up the hashing code by using a centralized
function. A bug appeared in get_server_uh() which is the URI hashing
function. Prior to the patch, the function would stop hashing on the
question mark, or on the trailing slash of a maximum directory count.
Consecutive to the patch, this last character is included into the
hash computation. This means that :
GET /0
GET /0?
Are not hashed similarly. The following configuration reproduces it :
mode http
balance uri
server s1 0.0.0.0:1234 redir /s1
server s2 0.0.0.0:1234 redir /s2
Many thanks to Vedran Furac for reporting this issue. The fix must
be backported to 1.5.
When we were generating a hash, it was done using an unsigned long. When the hash was used
to select a backend, it was sent as an unsigned int. This made it difficult to predict which
backend would be selected.
This patch updates get_hash, and the hash methods to use an unsigned int, to remain consistent
throughout the codebase.
This fix should be backported to 1.5 and probably in part to 1.4.
Checks.c has become a total mess. A number of proxy or server maintenance
and queue management functions were put there probably because they were
used there, but that makes the code untouchable. And that's without saying
that their names does not always relate to what they really do!
So let's do a first pass by moving these ones :
- set_backend_down() => backend.c
- redistribute_pending() => queue.c:pendconn_redistribute()
- check_for_pending() => queue.c:pendconn_grab_from_px()
- shutdown_sessions => server.c:srv_shutdown_sessions()
- shutdown_backup_sessions => server.c:srv_shutdown_backup_sessions()
All of them were moved at once.
Servers used to have 3 flags to store a state, now they have 4 states
instead. This avoids lots of confusion for the 4 remaining undefined
states.
The encoding from the previous to the new states can be represented
this way :
SRV_STF_RUNNING
| SRV_STF_GOINGDOWN
| | SRV_STF_WARMINGUP
| | |
0 x x SRV_ST_STOPPED
1 0 0 SRV_ST_RUNNING
1 0 1 SRV_ST_STARTING
1 1 x SRV_ST_STOPPING
Note that the case where all bits were set used to exist and was randomly
dealt with. For example, the task was not stopped, the throttle value was
still updated and reported in the stats and in the http_server_state header.
It was the same if the server was stopped by the agent or for maintenance.
It's worth noting that the internal function names are still quite confusing.
Now we introduce srv->admin and srv->prev_admin which are bitfields
containing one bit per source of administrative status (maintenance only
for now). For the sake of backwards compatibility we implement a single
source (ADMF_FMAINT) but the code already checks any source (ADMF_MAINT)
where the STF_MAINTAIN bit was previously checked. This will later allow
us to add ADMF_IMAINT for maintenance mode inherited from tracked servers.
Along doing these changes, it appeared that some places will need to be
revisited when implementing the inherited bit, this concerns all those
modifying the ADMF_FMAINT bit (enable/disable actions on the CLI or stats
page), and the checks to report "via" on the stats page. But currently
the code is harmless.
Till now, the server's state and flags were all saved as a single bit
field. It causes some difficulties because we'd like to have an enum
for the state and separate flags.
This commit starts by splitting them in two distinct fields. The first
one is srv->state (with its counter-part srv->prev_state) which are now
enums, but which still contain bits (SRV_STF_*).
The flags now lie in their own field (srv->flags).
The function srv_is_usable() was updated to use the enum as input, since
it already used to deal only with the state.
Note that currently, the maintenance mode is still in the state for
simplicity, but it must move as well.
We used to call srv_is_usable() with either the current state and weights
or the previous ones. This causes trouble for future changes, so let's first
split it in two variants :
- srv_is_usable(srv) considers the current status
- srv_was_usable(srv) considers the previous status
We used to have is_addr() in place to validate sometimes the existence
of an address, sometimes a valid IPv4 or IPv6 address. Replace them
carefully so that is_inet_addr() is used wherever we can only use an
IPv4/IPv6 address.
The RDP cookie extractor compares the 32-bit address from the request
to the address of each server in the farm without first checking that
the server's address is IPv4. This is a leftover from the IPv4 to IPv6
conversion. It's harmless as it's unlikely that IPv4 and IPv6 servers
will be mixed in an RDP farm, but better fix it.
This patch does not need to be backported.
This commit modifies the PROXY protocol V2 specification to support headers
longer than 255 bytes allowing for optional extensions. It implements the
PROXY protocol V2 which is a binary representation of V1. This will make
parsing more efficient for clients who will know in advance exactly how
many bytes to read. Also, it defines and implements some optional PROXY
protocol V2 extensions to send information about downstream SSL/TLS
connections. Support for PROXY protocol V1 remains unchanged.
Finn Arne Gangstad suggested that we should have the ability to break
keep-alive when the target server has reached its maxconn and that a
number of connections are present in the queue. After some discussion
around his proposed patch, the following solution was suggested : have
a per-proxy setting to fix a limit to the number of queued connections
on a server after which we break keep-alive. This ensures that even in
high latency networks where keep-alive is beneficial, we try to find a
different server.
This patch is partially based on his original proposal and implements
this configurable threshold.
All the code inherited from version 1.1 still holds a lot ot sessions
called "t" because in 1.1 they were tasks. This naming is very annoying
and sometimes even confusing, for example in code involving tables.
Let's get rid of this once for all and before 1.5-final.
Nothing changed beyond just carefully renaming these variables.
Cyril Bont reported that the "lastsess" field of a stats-only backend
was never updated. In fact the same is true for any applet and anything
not a server. Also, lastsess was not updated for a server reusing its
connection for a new request.
Since the goal of this field is to report recent activity, it's better
to ensure that all accesses are reported. The call has been moved to
the code validating the session establishment instead, since everything
passes there.
http_body_rewind() returns the number of bytes to rewind before buf->p to
find the message's body. It relies on http_hdr_rewind() to find the beginning
and adds msg->eoh + msg->eol which are always safe.
http_data_rewind() does the same to get the beginning of the data, which
differs from above when a chunk is present. It uses the function above and
adds msg->sol.
The purpose is to centralize further ->sov changes aiming at avoiding
to rely on buf->o.
http_uri_rewind() returns the number of bytes to rewind before buf->p to
find the URI. It relies on http_hdr_rewind() to find the beginning and
is just here to simplify operations.
The purpose is to centralize further ->sov changes aiming at avoiding
to rely on buf->o.
http_hdr_rewind() returns the number of bytes to rewind before buf->p to
find the beginning of headers. At the moment it's not exact as it still
relies on buf->o, assuming that no other data from a past message were
pending there, but it's what was done till there.
The purpose is to centralize further ->sov changes aiming at avoiding
to rely on buf->o.
http_body_bytes() returns the number of bytes of the current message body
present in the buffer. It is compatible with being called before and after
the headers are forwarded.
This is done to centralize further ->sov changes.
We used to have msg->sov updated for every chunk that was parsed. The issue
is that we want to be able to rewind after chunks were parsed in case we need
to redispatch a request and perform a new hash on the request or insert a
different server header name.
Currently, msg->sov and msg->next make parallel progress. We reached a point
where they're always equal because msg->next is initialized from msg->sov,
and is subtracted msg->sov's value each time msg->sov bytes are forwarded.
So we can now ensure that msg->sov can always be replaced by msg->next for
every state after HTTP_MSG_BODY where it is used as a position counter.
This allows us to keep msg->sov untouched whatever the number of chunks that
are parsed, as is needed to extract data from POST request (eg: url_param).
However, we still need to know the starting position of the data relative to
the body, which differs by the chunk size length. We use msg->sol for this
since it's now always zero and unused in the body.
So with this patch, we have the following situation :
- msg->sov = msg->eoh + msg->eol = size of the headers including last CRLF
- msg->sol = length of the chunk size if any. So msg->sov + msg->sol = DATA.
- msg->next corresponds to the byte being inspected based on the current
state and is always >= msg->sov before starting to forward anything.
Since sov and next are updated in case of header rewriting, a rewind will
fix them both when needed. Of course, ->sol has no reason for changing in
such conditions, so it's fine to keep it relative to msg->sov.
In theory, even if a redispatch has to be performed, a transformation
occurring on the request would still work because the data moved would
still appear at the same place relative to bug->p.
This is the continuation of previous patch. Now that full buffers are
not rejected anymore, let's wait for at least the advertised chunk or
body length to be present or the buffer to be full. When either
condition is met, the message processing can go forward.
Thus we don't need to use url_param_post_limit anymore, which was passed
in the configuration as an optionnal <max_wait> parameter after the
"check_post" value. This setting was necessary when the feature was
implemented because there was no support for parsing message bodies.
The argument is now silently ignored if set in the configuration.
Finn Arne Gangstad reported that commit 6b726adb35 ("MEDIUM: http: do
not report connection errors for second and further requests") breaks
support for serving static files by abusing the errorfile 503 statement.
Indeed, a second request over a connection sent to any server or backend
returning 503 would silently be dropped.
The proper solution consists in adding a flag on the session indicating
that the server connection was reused, and to only avoid the error code
in this case.
Since 1.5-dev20, we have a working server-side keep-alive and an option
"prefer-last-server" to indicate that we explicitly want to reuse the
same server as the last one. Unfortunately this breaks the redispatch
feature because assign_server() insists on reusing the same server as
the first one attempted even if the connection failed to establish.
A simple solution consists in only considering the last connection if
it was connected. Otherwise there is no reason for being interested in
reusing the same server.
Summary:
Track and report last session time on the stats page for each server
in every backend, as well as the backend.
This attempts to address the requirement in the ROADMAP
- add a last activity date for each server (req/resp) that will be
displayed in the stats. It will be useful with soft stop.
The stats page reports this as time elapsed since last session. This
change does not adequately address the requirement for long running
session (websocket, RDP... etc).
In HTTP keep-alive mode, if we receive a 401, we still have a chance
of being able to send the visitor again to the same server over the
same connection. This is required by some broken protocols such as
NTLM, and anyway whenever there is an opportunity for sending the
challenge to the proper place, it's better to do it (at least it
helps with debugging).
If we reuse a server-side connection, we must not reinitialize its context nor
try to enable send_proxy. At the moment HTTP keep-alive over SSL fails on the
first attempt because the SSL context was cleared, so it only worked after a
retry.
When the load balancing algorithm in use is not deterministic, and a previous
request was sent to a server to which haproxy still holds a connection, it is
sometimes desirable that subsequent requests on a same session go to the same
server as much as possible. Note that this is different from persistence, as
we only indicate a preference which haproxy tries to apply without any form
of warranty. The real use is for keep-alive connections sent to servers. When
this option is used, haproxy will try to reuse the same connection that is
attached to the server instead of rebalancing to another server, causing a
close of the connection. This can make sense for static file servers. It does
not make much sense to use this in combination with hashing algorithms.
This commit allows an existing server-side connection to be reused if
it matches the same target. Basic controls are performed ; right now
we do not allow to reuse a connection when dynamic source binding is
in use or when the destination address or port is dynamic (eg: proxy
mode). Later we'll have to also disable connection sharing when PROXY
protocol is being used or when non-idempotent requests are processed.
When allocating a new connection, only the caller knows whether it's
acceptable to reuse the previous one or not. Let's pass this information
to si_alloc_conn() which will do the cleanup if the connection is not
acceptable.
Having the check state partially stored in the server doesn't help.
Some functions such as srv_getinter() rely on the server being checked
to decide what check frequency to use, instead of relying on the check
being configured. So let's get rid of SRV_CHECKED and SRV_AGENT_CHECKED
and only use the check's states instead.
Till now the send_proxy_ofs field remained in the stream interface,
but since the dynamic allocation of the connection, it makes a lot
of sense to move that into the connection instead of the stream
interface, since it will not be statically allocated for each
session.
Also, it turns out that moving it to the connection fils an alignment
hole on 64 bit architectures so it does not consume more memory, and
removing it from the stream interface was an opportunity to correctly
reorder fields and reduce the stream interface's size from 160 to 144
bytes (-10%). This is 32 bytes saved per session.
The outgoing connection is now allocated dynamically upon the first attempt
to touch the connection's source or destination address. If this allocation
fails, we fail on SN_ERR_RESOURCE.
As we didn't use si->conn anymore, it was removed. The endpoints are released
upon session_free(), on the error path, and upon a new transaction. That way
we are able to carry the existing server's address across retries.
The stream interfaces are not initialized anymore before session_complete(),
so we could even think about allocating them dynamically as well, though
that would not provide much savings.
The session initialization now makes use of conn_new()/conn_free(). This
slightly simplifies the code and makes it more logical. The connection
initialization code is now shorter by about 120 bytes because it's done
at once, allowing the compiler to remove all redundant initializations.
The si_attach_applet() function now takes care of first detaching the
existing endpoint, and it is called from stream_int_register_handler(),
so we can safely remove the calls to si_release_endpoint() in the
application code around this call.
A call to si_detach() was made upon stream_int_unregister_handler() to
ensure we always free the allocated connection if one was allocated in
parallel to setting an applet (eg: detect HTTP proxy while proceeding
with stats maybe).
si_prepare_conn() is not appropriate in our case as it both initializes and
attaches the connection to the stream interface. Due to the asymmetry between
accept() and connect(), it causes some fields such as the control and transport
layers to be reinitialized.
Now that we can separately initialize these fields using conn_prepare(), let's
break this function to only attach the connection to the stream interface.
Also, by analogy, si_prepare_none() was renamed si_detach(), and
si_prepare_applet() was renamed si_attach_applet().
The connection will only remain there as a pre-allocated entity whose
goal is to be placed in ->end when establishing an outgoing connection.
All connection initialization can be made on this connection, but all
information retrieved should be applied to the end point only.
This change is huge because there were many users of si->conn. Now the
only users are those who initialize the new connection. The difficulty
appears in a few places such as backend.c, proto_http.c, peers.c where
si->conn is used to hold the connection's target address before assigning
the connection to the stream interface. This is why we have to keep
si->conn for now. A future improvement might consist in dynamically
allocating the connection when it is needed.
This function makes no sense anymore and will cause trouble to convert
the remains of connection/applet to end points. Let's replace it now
with its contents.
A very old bug resulting from some code refactoring causes
assign_server_address() to refrain from retrieving the destination
address from the client-side connection when transparent mode is
enabled and we're connecting to a server which has address 0.0.0.0.
The impact is low since such configurations are unlikely to ever
be encountered. The fix should be backported to older branches.