Commit Graph

914 Commits

Author SHA1 Message Date
Willy Tarreau
02bce8be01 MAJOR: http: update connection mode configuration
At the very beginning of haproxy, there was "option httpclose" to make
haproxy add a "Connection: close" header in both directions to invite
both sides to agree on closing the connection. It did not work with some
rare products, so "option forceclose" was added to do the same and actively
close the connection. Then client-side keep-alive was supported, so option
http-server-close was introduced. Now we have keep-alive with a fourth
option, not to mention the implicit tunnel mode.

The connection configuration has become a total mess because all the
options above may be combined together, despite almost everyone thinking
they cancel each other, as judging from the common problem reports on the
mailing list. Unfortunately, re-reading the doc shows that it's not clear
at all that options may be combined, and the opposite seems more obvious
since they're compared. The most common issue is options being set in the
defaults section that are not negated in other sections, but are just
combined when the user expects them to be overloaded. The migration to
keep-alive by default will only make things worse.

So let's start to address the first problem. A transaction can only work in
5 modes today :
  - tunnel : haproxy doesn't bother with what follows the first req/resp
  - passive close : option http-close
  - forced close : option forceclose
  - server close : option http-server-close with keep-alive on the client side
  - keep-alive   : option http-keep-alive, end to end

All 16 combination for each section fall into one of these cases. Same for
the 256 combinations resulting from frontend+backend different modes.

With this patch, we're doing something slightly different, which will not
change anything for users with valid configs, and will only change the
behaviour for users with unsafe configs. The principle is that these options
may not combined anymore, and that the latest one always overrides all the
other ones, including those inherited from the defaults section. The "no
option xxx" statement is still supported to cancel one option and fall back
to the default one. It is mainly needed to ignore defaults sections (eg:
force the tunnel mode). The frontend+backend combinations have not changed.

So for examplen the following configuration used to put the connection
into forceclose :

    defaults http
        mode http
        option httpclose

    frontend foo.
        option http-server-close

  => http-server-close+httpclose = forceclose before this patch! Now
     the frontend's config replaces the defaults config and results in
     the more expected http-server-close.

All 25 combinations of the 5 modes in (frontend,backend) have been
successfully tested.

In order to prepare for upcoming changes, a new "option http-tunnel" was
added. It currently only voids all other options, and has the lowest
precedence when mixed with another option in another frontend/backend.
2014-01-30 03:14:29 +01:00
Emeric Brun
850efd5149 MEDIUM: ssl: Set verify 'required' as global default for servers side.
If no CA file specified on a server line, the config parser will show an error.

Adds an cmdline option '-dV' to re-set verify 'none' as global default on
servers side (previous behavior).

Also adds 'ssl-server-verify' global statement to set global default to
'none' or 'required'.

WARNING: this changes the default verify mode from "none" to "required" on
the server side, and it *will* break insecure setups.
2014-01-29 17:08:15 +01:00
Willy Tarreau
cc08d2c9ff MEDIUM: counters: stop relying on session flags at all
Till now, we had one flag per stick counter to indicate if it was
tracked in a backend or in a frontend. We just had to add another
flag per stick-counter to indicate if it relies on contents or just
connection. These flags are quite painful to maintain and tend to
easily conflict with other flags if their number is changed.

The correct solution consists in moving the flags to the stkctr struct
itself, but currently this struct is made of 2 pointers, so adding a
new entry there to store only two bits will cause at least 16 more bytes
to be eaten per counter due to alignment issues, and we definitely don't
want to waste tens to hundreds of bytes per session just for things that
most users don't use.

Since we only need to store two bits per counter, an intermediate
solution consists in replacing the entry pointer with a composite
value made of the original entry pointer and the two flags in the
2 unused lower bits. If later a need for other flags arises, we'll
have to store them in the struct.

A few inline functions have been added to abstract the retrieval
and assignment of the pointers and flags, resulting in very few
changes. That way there is no more dependence on the number of
stick-counters and their position in the session flags.
2014-01-28 23:34:45 +01:00
Willy Tarreau
f3338349ec BUG/MEDIUM: counters: flush content counters after each request
One year ago, commit 5d5b5d8 ("MEDIUM: proto_tcp: add support for tracking
L7 information") brought support for tracking L7 information in tcp-request
content rules. Two years earlier, commit 0a4838c ("[MEDIUM] session-counters:
correctly unbind the counters tracked by the backend") used to flush the
backend counters after processing a request.

While that earliest patch was correct at the time, it became wrong after
the second patch was merged. The code does what it says, but the concept
is flawed. "TCP request content" rules are evaluated for each HTTP request
over a single connection. So if such a rule in the frontend decides to
track any L7 information or to track L4 information when an L7 condition
matches, then it is applied to all requests over the same connection even
if they don't match. This means that a rule such as :

     tcp-request content track-sc0 src if { path /index.html }

will count one request for index.html, and another one for each of the
objects present on this page that are fetched over the same connection
which sent the initial matching request.

Worse, it is possible to make the code do stupid things by using multiple
counters:

     tcp-request content track-sc0 src if { path /foo }
     tcp-request content track-sc1 src if { path /bar }

Just sending two requests first, one with /foo, one with /bar, shows
twice the number of requests for all subsequent requests. Just because
both of them persist after the end of the request.

So the decision to flush backend-tracked counters was not the correct
one. In practice, what is important is to flush countent-based rules
since they are the ones evaluated for each request.

Doing so requires new flags in the session however, to keep track of
which stick-counter was tracked by what ruleset. A later change might
make this easier to maintain over time.

This bug is 1.5-specific, no backport to stable is needed.
2014-01-28 21:40:28 +01:00
Willy Tarreau
e43d5323c6 MEDIUM: listener: apply a limit on the session rate submitted to SSL
Just like the previous commit, we sometimes want to limit the rate of
incoming SSL connections. While it can be done for a frontend, it was
not possible for a whole process, which makes sense when multiple
processes are running on a system to server multiple customers.

The new global "maxsslrate" setting is usable to fix a limit on the
session rate going to the SSL frontends. The limits applies before
the SSL handshake and not after, so that it saves the SSL stack from
expensive key computations that would finally be aborted before being
accounted for.

The same setting may be changed at run time on the CLI using
"set rate-limit ssl-session global".
2014-01-28 15:50:10 +01:00
Willy Tarreau
93e7c006c1 MEDIUM: listener: add support for limiting the session rate in addition to the connection rate
It's sometimes useful to be able to limit the connection rate on a machine
running many haproxy instances (eg: per customer) but it removes the ability
for that machine to defend itself against a DoS. Thus, better also provide a
limit on the session rate, which does not include the connections rejected by
"tcp-request connection" rules. This permits to have much higher limits on
the connection rate without having to raise the session rate limit to insane
values.

The limit can be changed on the CLI using "set rate-limit sessions global",
or in the global section using "maxsessrate".
2014-01-28 15:49:27 +01:00
Willy Tarreau
baf5b9b445 CLEANUP: connection: fix comments in connection.h to reflect new behaviour.
The polling has substantially changed, better fix the comments.
2014-01-26 00:42:31 +01:00
Willy Tarreau
310987a038 MAJOR: connection: remove the CO_FL_WAIT_{RD,WR} flags
These flags were used to report the readiness of the file descriptor.
Now this readiness is directly checked at the file descriptor itself.
This removes the need for constantly synchronizing updates between the
file descriptor and the connection and ensures that all layers share
the same level of information.

For now, the readiness is updated in conn_{sock,data}_poll_* by directly
touching the file descriptor. This must move to the lower layers instead
so that these functions can disappear as well. In this state, the change
works but is incomplete. It's sensible enough to avoid making it more
complex.

Now the sock/data updates become much simpler because they just have to
enable/disable access to a file descriptor and not to care anymore about
its readiness.
2014-01-26 00:42:30 +01:00
Willy Tarreau
f817e9f473 MAJOR: polling: rework the whole polling system
This commit heavily changes the polling system in order to definitely
fix the frequent breakage of SSL which needs to remember the last
EAGAIN before deciding whether to poll or not. Now we have a state per
direction for each FD, as opposed to a previous and current state
previously. An FD can have up to 8 different states for each direction,
each of which being the result of a 3-bit combination. These 3 bits
indicate a wish to access the FD, the readiness of the FD and the
subscription of the FD to the polling system.

This means that it will now be possible to remember the state of a
file descriptor across disable/enable sequences that generally happen
during forwarding, where enabling reading on a previously disabled FD
would result in forgetting the EAGAIN flag it met last time.

Several new state manipulation functions have been introduced or
adapted :
  - fd_want_{recv,send} : enable receiving/sending on the FD regardless
    of its state (sets the ACTIVE flag) ;

  - fd_stop_{recv,send} : stop receiving/sending on the FD regardless
    of its state (clears the ACTIVE flag) ;

  - fd_cant_{recv,send} : report a failure to receive/send on the FD
    corresponding to EAGAIN (clears the READY flag) ;

  - fd_may_{recv,send}  : report the ability to receive/send on the FD
    as reported by poll() (sets the READY flag) ;

Some functions are used to report the current FD status :

  - fd_{recv,send}_active
  - fd_{recv,send}_ready
  - fd_{recv,send}_polled

Some functions were removed :
  - fd_ev_clr(), fd_ev_set(), fd_ev_rem(), fd_ev_wai()

The POLLHUP/POLLERR flags are now reported as ready so that the I/O layers
knows it can try to access the file descriptor to get this information.

In order to simplify the conditions to add/remove cache entries, a new
function fd_alloc_or_release_cache_entry() was created to be used from
pollers while scanning for updates.

The following pollers have been updated :

   ev_select() : done, built, tested on Linux 3.10
   ev_poll()   : done, built, tested on Linux 3.10
   ev_epoll()  : done, built, tested on Linux 3.10 & 3.13
   ev_kqueue() : done, built, tested on OpenBSD 5.2
2014-01-26 00:42:30 +01:00
Willy Tarreau
15a4dec87e REORG: polling: rename "spec_e" to "state" and "spec_p" to "cache"
We're completely changing the way FDs will be polled. There will be no
more speculative I/O since we'll know the exact FD state, so these will
only be cached events.

First, let's fix a few field names which become confusing. "spec_e" was
used to store a speculative I/O event state. Now we'll store the whole
R/W states for the FD there. "spec_p" was used to store a speculative
I/O cache position. Now let's clearly call it "cache".
2014-01-26 00:42:29 +01:00
Willy Tarreau
69a41fa8a3 CLEANUP: polling: rename "spec_e" to "state"
We're completely changing the way FDs will be polled. First, let's fix
a few field names which become confusing. "spec_e" was used to store a
speculative I/O event state. Now we'll store the whole R/W states for
the FD there.
2014-01-26 00:42:28 +01:00
Willy Tarreau
1f0da2485e BUG/MEDIUM: unique_id: HTTP request counter is not stable
Patrick Hemmer reported that using unique_id_format and logs did not
report the same unique ID counter since commit 9f09521 ("BUG/MEDIUM:
unique_id: HTTP request counter must be unique!"). This is because
the increment was done while producing the log message, so it was
performed twice.

A better solution consists in fetching a new value once per request
and saving it in the request or session context for all of this
request's life.

It happens that sessions already have a unique ID field which is used
for debugging and reporting errors, and which differs from the one
sent in logs and unique_id header.

So let's change this to reuse this field to have coherent IDs everywhere.
As of now, a session gets a new unique ID once it is instanciated. This
means that TCP sessions will also benefit from a unique ID that can be
logged. And this ID is renewed for each extra HTTP request received on
an existing session. Thus, all TCP sessions and HTTP requests will have
distinct IDs that will be stable along all their life, and coherent
between all places where they're used (logs, unique_id header,
"show sess", "show errors").

This feature is 1.5-specific, no backport to 1.4 is needed.
2014-01-25 11:07:06 +01:00
Willy Tarreau
45b34e8abc MINOR: connection: add more error codes to report connection errors
It is quite often that an connection error only reports "socket error" with
no more information. This is especially problematic with health checks where
many causes are possible, including resource exhaustion which do not lead to
a valid errno code. So let's add explicit codes to cover these cases.
2014-01-24 16:15:04 +01:00
Willy Tarreau
d7ad9f5b0d MAJOR: channel: add a new flag CF_WAKE_WRITE to notify the task of writes
Since commit 6b66f3e ([MAJOR] implement autonomous inter-socket forwarding)
introduced in 1.3.16-rc1, we've been relying on a stupid mechanism to wake
up the task after a write, which was an exact copy-paste of the reader side.

The principle was that if we empty a buffer and there's no forwarding
scheduled or if the *producer* is not in a connected state, then we wake
the task up.

That does not make any sense. It happens to wake up too late sometimes (eg,
when the request analyser waits for some room in the buffer to start to
work), and leads to unneeded wakeups in client-side keep-alive, because
the task is woken up when the response is sent, while the analysers are
simply waiting for a new request.

In order to fix this, we introduce a new channel flag : CF_WAKE_WRITE. It
is designed so that an analyser can explicitly request being notified when
some data were written. It is used only when the HTTP request or response
analysers need to wait for more room in the buffers. It is automatically
cleared upon wake up.

The flag is also automatically set by the functions which try to write into
a buffer from an applet when they fail (bi_putblk() etc...).

That allows us to remove the stupid condition above and avoid some wakeups.
In http-server-close and in http-keep-alive modes, this reduces from 4 to 3
the average number of wakeups per request, and increases the overall
performance by about 1.5%.
2013-12-31 18:37:36 +01:00
Willy Tarreau
983eb31fd1 BUG/MINOR: channel: CHN_INFINITE_FORWARD must be unsigned
This value is stored as unsigned in chn->to_forward. Having it defined
as signed makes it impossible to pass channel_forward() a previously
saved value because the argument will be zero-extended during the
conversion to long long, while the test will be performed using sign
extension. There is no impact on existing code right now.
2013-12-28 21:33:37 +01:00
Willy Tarreau
068621e4ad MINOR: http: try to stick to same server after status 401/407
In HTTP keep-alive mode, if we receive a 401, we still have a chance
of being able to send the visitor again to the same server over the
same connection. This is required by some broken protocols such as
NTLM, and anyway whenever there is an opportunity for sending the
challenge to the proper place, it's better to do it (at least it
helps with debugging).
2013-12-23 15:12:44 +01:00
Willy Tarreau
9420b1271d MINOR: http: add option prefer-last-server
When the load balancing algorithm in use is not deterministic, and a previous
request was sent to a server to which haproxy still holds a connection, it is
sometimes desirable that subsequent requests on a same session go to the same
server as much as possible. Note that this is different from persistence, as
we only indicate a preference which haproxy tries to apply without any form
of warranty. The real use is for keep-alive connections sent to servers. When
this option is used, haproxy will try to reuse the same connection that is
attached to the server instead of rebalancing to another server, causing a
close of the connection. This can make sense for static file servers. It does
not make much sense to use this in combination with hashing algorithms.
2013-12-16 02:23:54 +01:00
Willy Tarreau
ad38acedaa MEDIUM: connection: centralize handling of nolinger in fd management
Right now we see many places doing their own setsockopt(SO_LINGER).
Better only do it just before the close() in fd_delete(). For this
we add a new flag on the file descriptor, indicating if it's safe or
not to linger. If not (eg: after a connect()), then the setsockopt()
call is automatically performed before a close().

The flag automatically turns to safe when receiving a read0.
2013-12-16 02:23:52 +01:00
Willy Tarreau
3343432fcd MINOR: checks: add a flag to indicate what check is an agent
Currently to know if a check is an agent, we compare its pointer to its
servers' agent pointer. Better have a flag in its state to indicate this.
2013-12-14 16:02:20 +01:00
Willy Tarreau
33a08db932 MINOR: checks: add a PAUSED state for the checks
Health checks can now be paused. This is the status they get when the
server is put into maintenance mode, which is more logical than relying
on the server's state at some places. It will be needed to allow agent
checks to run when health checks are disabled (currently not possible).
2013-12-14 16:02:20 +01:00
Willy Tarreau
ff5ae35b9f MINOR: checks: use check->state instead of srv->state & SRV_CHECKED
Having the check state partially stored in the server doesn't help.
Some functions such as srv_getinter() rely on the server being checked
to decide what check frequency to use, instead of relying on the check
being configured. So let's get rid of SRV_CHECKED and SRV_AGENT_CHECKED
and only use the check's states instead.
2013-12-14 16:02:19 +01:00
Willy Tarreau
2e10f5a759 MINOR: checks: replace state DISABLED with CONFIGURED and ENABLED
At the moment, health checks and agent checks are tied : no agent
check is emitted if no health check is enabled. Other parameters
are considered in the condition for letting checks run. It will
help us selectively enable checks (agent and regular checks) to be
know whether they're enabled/disabled and configured or not. Now
we can already emit an error when trying to enable an unconfigured
agent.
2013-12-14 16:02:19 +01:00
Willy Tarreau
2c115e5047 MINOR: checks: rename the state flags
The flag CHK_STATE_RUNNING is misleading as one may believe it means
the state is enabled (just like SRV_RUNNING). Let's rename these two
flags CHK_ST_INPROGRESS and CHK_ST_DISABLED.
2013-12-14 16:02:19 +01:00
Willy Tarreau
6aaa1b87cf MINOR: checks: use an enum instead of flags to report a check result
We used to have up to 4 sets of flags which were almost all exclusive
to report a check result. And the names were inherited from the old
server states, adding to the confusion. Let's replace that with an
enum handling only the possible combinations :

   SRV_CHK_UNKNOWN                   => CHK_RES_UNKNOWN
   SRV_CHK_FAILED                    => CHK_RES_FAILED
   SRV_CHK_PASSED                    => CHK_RES_PASSED
   SRV_CHK_PASSED | SRV_CHK_DISABLE  => CHK_RES_CONDPASS
2013-12-14 16:02:19 +01:00
Willy Tarreau
8e85ad5211 REORG: checks: retrieve the check-specific defines from server.h to checks.h
After the move of checks from servers to autonomous checks, we need a
massive cleanup and reordering as it's becoming increasingly difficult
to find the definitions of types and enums.

Nothing was changed, blocks were just moved.
2013-12-14 16:02:18 +01:00
Willy Tarreau
1a53a3af13 MINOR: checks: improve handling of the servers tracking chain
Server tracking uses the same "tracknext" list for servers tracking
another one and for the servers being tracked. This caused an issue
which was fixed by commit f39c71c ([CRITICAL] fix server state tracking:
it was O(n!) instead of O(n)), consisting in ensuring that a server is
being checked before walking down the list, so that we don't propagate
the up/down information via servers being part of the track chain.

But the root cause is the fact that all servers share the same list.
The correct solution consists in having a list head for the tracked
servers and a list of next tracking servers. This simplifies the
propagation logic, especially for the case where status changes might
be passed to individual servers via the CLI.
2013-12-14 16:02:18 +01:00
Thierry FOURNIER
c0e0d7b7cf MEDIUM: map: dynamic manipulation of maps
This patch adds map manipulation commands to the socket interface.

add map <map> <key> <value>
  Add the value <value> in the map <map>, at the entry corresponding to
  the key <key>. This command does not verify if the entry already
  exists.

clear map <map>
  Remove entries from the map <map>

del map <map> <key>
  Delete all the map entries corresponding to the <key> value in the map
  <map>.

set map <map> <key> <value>
  Modify the value corresponding to each key <key> in a map <map>. The
  new value is <value>.

show map [<map>]
  Dump info about map converters. Without argument, the list of all
  available maps are returned. If a <map> is specified, is content is
  dumped.
2013-12-12 15:58:30 +01:00
Thierry FOURNIER
0b2fe4a5cd MINOR: pattern: add support for compiling patterns for lookups
With this patch, patterns can be compiled for two modes :
  - match
  - lookup

The match mode is used for example in ACLs or maps. The lookup mode
is used to lookup a key for pattern maintenance. For example, looking
up a network is different from looking up one address belonging to
this network.

A special case is made for regex. In lookup mode they return the input
regex string and do not compile the regex.
2013-12-12 15:44:02 +01:00
Thierry FOURNIER
799c042daa MINOR: regex: Change the struct containing regex
This change permits to remove the typedef. The original regex structs
are set in haproxy's struct.
2013-12-12 15:42:58 +01:00
Thierry FOURNIER
7148ce6ef4 MEDIUM: pattern: Extract the index process from the pat_parse_*() functions
Now, the pat_parse_*() functions parses the incoming data. The input
"pattern" struct can be preallocated. If the parser needs to add some
buffers, it allocates memory.

The function pattern_register() runs the call to the parser, process
the key indexation and associate the "sample_storage" used by maps.
2013-12-12 15:42:11 +01:00
Thierry FOURNIER
e3ded59706 MEDIUM: acl: Last patch change the output type
This patch remove the compatibility check from the input type and the
match method. Now, it checks if a casts from the input type to output
type exists and the pattern_exec_match() function apply casts before
each pattern matching.
2013-12-12 15:42:11 +01:00
Thierry FOURNIER
cc0e0b3dbb MINOR: pattern: Each pattern sets the expected input type
This is used later for increasing the compability with incoming
sample types. When multiple compatible types are supported, one
is arbitrarily used (eg: UINT).
2013-12-12 11:07:33 +01:00
Willy Tarreau
9ba813cd69 CLEANUP: check: server port is unsigned
Baptiste Assmann reported some confusing printf() output of the server
port since it's declared signed. Better turn it to unsigned.

There's no need to backport this, it's only used in 16-bit places.
2013-12-10 23:32:30 +01:00
Willy Tarreau
2d400bb931 MINOR: stream_interface: add reporting of ressouce allocation errors
SSL and keep-alive will need to be able to fail on allocation errors,
and the stream interface did not allow to report such a cause. The flag
will then be "RC" as already documented.
2013-12-09 17:12:18 +01:00
Willy Tarreau
05efc0f33a DIET/MINOR: task: reduce struct task size by 8 bytes
Just by reordering the struct task, we could shrink it by 8 bytes from
120 to 112 bytes. A careful reordering allowed each part to be located
closer to the hot parts it's used with, resulting in another performance
increase of about 0.5%.
2013-12-09 16:06:22 +01:00
Willy Tarreau
5735d7e2a2 MINOR: http: use an enum for the auth method in http_auth_data
This method now takes a single byte, with 7 bytes left to be used
after it. No savings were gained but at least now we have an enum.
2013-12-09 16:06:22 +01:00
Willy Tarreau
3770f23a3a MINOR: http: switch the http state to an enum
This reduces its size which is not reused by anything else. However it
will significantly improve the debugger's output since we'll now get
real state values.

The default case had to be enabled in the parsers because gcc tries
to optimize the switch/case and noticed some values were missing from
the enums and emitted a warning.
2013-12-09 16:06:22 +01:00
Willy Tarreau
c8987b3664 DIET/MINOR: http: reduce the size of struct http_txn by 8 bytes
Here again we had some oversized and misaligned entries. The method
and the status don't need 4 bytes each, and there was a hole after
the status that does not exist anymore. That's 8 additional bytes
saved from http_txn and as much for the session.

Also some fields were slightly moved to present better memory access
patterns resulting in a steady 0.5% performance increase.
2013-12-09 16:06:22 +01:00
Willy Tarreau
721854f0ac DIET/MINOR: stream-int: rearrange a few fields in struct stream_interface to save 8 bytes
The current and previous states are now packed enums instead of ints. This will
also help in gdb. The flags have been turned to 16-bit instead of 32 since only
10 are used. This resulted in saving 8 bytes per streamm interface, or 16 per
session.
2013-12-09 16:06:21 +01:00
Willy Tarreau
2518db4bfa DIET/MINOR: session: reduce the struct session size by 8 bytes
Move uniq_id upper to fill a hole and kill one. Another hole remains
after store_count.
2013-12-09 16:06:21 +01:00
Willy Tarreau
8379c17adf DIET/MINOR: proxy: rearrange a few fields in struct proxy to save 16 bytes
Turn the proxy state to a packed enum (1 char), same for the proxy mode,
and store the capabitilies as a char. These 3 ints can now fill the hole
after obj_type and save 8 bytes in the proxy struct. Moving the maxconn
value just after, which is frequently accessed and was in a block of 3
ints saved another 8 bytes.
2013-12-09 16:06:21 +01:00
Willy Tarreau
f6502c5062 DIET/MINOR: listener: rearrange a few fields in struct listener to save 16 bytes
Pack the listener state to 1 char, store it as an enum instead of an
int (more gdb-friendly), and move a few fields around to fill holes.

The <nice> field can only be -1024..1024 so it was stored as a signed
short and completes well with obj_type and li_state.

Doing this has reduced the struct listener from 376 to 360 bytes (4.2%).
2013-12-09 16:06:21 +01:00
Willy Tarreau
ad5281ca04 DIET/MINOR: connection: rearrange a few fields to save 8 bytes in the struct
By moving the error code to 8 bits the send_proxy_ofs to 16 bits, and
moving them just after the obj_type, we can save 8 bytes in the struct
connection, down from 328 to 320.
2013-12-09 16:06:15 +01:00
Willy Tarreau
939478d04d DIET/MINOR: obj: pack the obj_type enum to 8 bits
Taking 32-bit in each struct just to store an obj_type is a waste
considering the very small amount of possible values. Let's force
it to be as small as possible (1 char) and we'll be able to move
some structs around to save some space.
2013-12-09 16:06:08 +01:00
Willy Tarreau
0a23bcb8be MAJOR: stream-interface: dynamically allocate the applet context
From now on, a call to stream_int_register_handler() causes a call
to si_alloc_appctx() and returns an initialized appctx for the
current stream interface. If one was previously allocated, it is
released. If the stream interface was attached to a connection, it
is released as well.

The appctx are allocated from the same pools as the connections, because
they're substantially smaller in size, and we can't have both a connection
and an appctx on an interface at any moment.

In case of memory shortage, the call may return NULL, which is already
handled by all consumers of stream_int_register_handler().

The field appctx was removed from the stream interface since we only
rely on the endpoint now. On 32-bit, the stream_interface size went down
from 108 to 44 bytes. On 64-bit, it went down from 144 to 64 bytes. This
represents a memory saving of 160 bytes per session.

It seems that a later improvement could be to move the call to
stream_int_register_handler() to session.c for most cases.
2013-12-09 15:40:23 +01:00
Willy Tarreau
57cd3e46b9 MEDIUM: connection: merge the send_proxy and local_send_proxy calls
We used to have two very similar functions for sending a PROXY protocol
line header. The reason is that the default one relies on the stream
interface to retrieve the other end's address, while the "local" one
performs a local address lookup and sends that instead (used by health
checks).

Now that the send_proxy_ofs is stored in the connection and not the
stream interface, we can make the local_send_proxy rely on it and
support partial sends. This also simplifies the code by removing the
local_send_proxy function, making health checks use send_proxy_ofs,
resulting in the removal of the CO_FL_LOCAL_SPROXY flag, and the
associated test in the connection handler. The other flag,
CO_FL_SI_SEND_PROXY was renamed without the "SI" part so that it
is clear that it is not dedicated anymore to a usage with a stream
interface.
2013-12-09 15:40:23 +01:00
Willy Tarreau
b8020cefed MEDIUM: connection: move the send_proxy offset to the connection
Till now the send_proxy_ofs field remained in the stream interface,
but since the dynamic allocation of the connection, it makes a lot
of sense to move that into the connection instead of the stream
interface, since it will not be statically allocated for each
session.

Also, it turns out that moving it to the connection fils an alignment
hole on 64 bit architectures so it does not consume more memory, and
removing it from the stream interface was an opportunity to correctly
reorder fields and reduce the stream interface's size from 160 to 144
bytes (-10%). This is 32 bytes saved per session.
2013-12-09 15:40:23 +01:00
Willy Tarreau
32e3c6a607 MAJOR: stream interface: dynamically allocate the outgoing connection
The outgoing connection is now allocated dynamically upon the first attempt
to touch the connection's source or destination address. If this allocation
fails, we fail on SN_ERR_RESOURCE.

As we didn't use si->conn anymore, it was removed. The endpoints are released
upon session_free(), on the error path, and upon a new transaction. That way
we are able to carry the existing server's address across retries.

The stream interfaces are not initialized anymore before session_complete(),
so we could even think about allocating them dynamically as well, though
that would not provide much savings.

The session initialization now makes use of conn_new()/conn_free(). This
slightly simplifies the code and makes it more logical. The connection
initialization code is now shorter by about 120 bytes because it's done
at once, allowing the compiler to remove all redundant initializations.

The si_attach_applet() function now takes care of first detaching the
existing endpoint, and it is called from stream_int_register_handler(),
so we can safely remove the calls to si_release_endpoint() in the
application code around this call.

A call to si_detach() was made upon stream_int_unregister_handler() to
ensure we always free the allocated connection if one was allocated in
parallel to setting an applet (eg: detect HTTP proxy while proceeding
with stats maybe).
2013-12-09 15:40:23 +01:00
Willy Tarreau
f79c8171b2 MAJOR: connection: add two new flags to indicate readiness of control/transport
Currently the control and transport layers of a connection are supposed
to be initialized when their respective pointers are not NULL. This will
not work anymore when we plan to reuse connections, because there is an
asymmetry between the accept() side and the connect() side :

  - on accept() side, the fd is set first, then the ctrl layer then the
    transport layer ; upon error, they must be undone in the reverse order,
    then the FD must be closed. The FD must not be deleted if the control
    layer was not yet initialized ;

  - on the connect() side, the fd is set last and there is no reliable way
    to know if it has been initialized or not. In practice it's initialized
    to -1 first but this is hackish and supposes that local FDs only will
    be used forever. Also, there are even less solutions for keeping trace
    of the transport layer's state.

Also it is possible to support delayed close() when something (eg: logs)
tracks some information requiring the transport and/or control layers,
making it even more difficult to clean them.

So the proposed solution is to add two flags to the connection :

  - CO_FL_CTRL_READY is set when the control layer is initialized (fd_insert)
    and cleared after it's released (fd_delete).

  - CO_FL_XPRT_READY is set when the control layer is initialized (xprt->init)
    and cleared after it's released (xprt->close).

The functions have been adapted to rely on this and not on the pointers
anymore. conn_xprt_close() was unused and dangerous : it did not close
the control layer (eg: the socket itself) but still marks the transport
layer as closed, preventing any future call to conn_full_close() from
finishing the job.

The problem comes from conn_full_close() in fact. It needs to close the
xprt and ctrl layers independantly. After that we're still having an issue :
we don't know based on ->ctrl alone whether the fd was registered or not.
For this we use the two new flags CO_FL_XPRT_READY and CO_FL_CTRL_READY. We
now rely on this and not on conn->xprt nor conn->ctrl anymore to decide what
remains to be done on the connection.

In order not to miss some flag assignments, we introduce conn_ctrl_init()
to initialize the control layer, register the fd using fd_insert() and set
the flag, and conn_ctrl_close() which unregisters the fd and removes the
flag, but only if the transport layer was closed.

Similarly, at the transport layer, conn_xprt_init() calls ->init and sets
the flag, while conn_xprt_close() checks the flag, calls ->close and clears
the flag, regardless xprt_ctx or xprt_st. This also ensures that the ->init
and the ->close functions are called only once each and in the correct order.
Note that conn_xprt_close() does nothing if the transport layer is still
tracked.

conn_full_close() now simply calls conn_xprt_close() then conn_full_close()
in turn, which do nothing if CO_FL_XPRT_TRACKED is set.

In order to handle the error path, we also provide conn_force_close() which
ignores CO_FL_XPRT_TRACKED and closes the transport and the control layers
in turns. All relevant instances of fd_delete() have been replaced with
conn_force_close(). Now we always know what state the connection is in and
we can expect to split its initialization.
2013-12-09 15:40:23 +01:00
Willy Tarreau
f8a49eab4f MEDIUM: session: attach incoming connection to target on embryonic sessions
In order to reduce the dependency over stream-interfaces, we now
attach the incoming connection to the embryonic session's target
instead of the stream-interface's connection. This means we won't
need to initialize stream interfaces anymore after we implement
dynamic connection allocation. The session's target is reset to
NULL after the session has been converted to a complete session.
2013-12-09 15:40:22 +01:00