Commit Graph

389 Commits

Author SHA1 Message Date
Willy Tarreau
9ad7bd48d2 MEDIUM: session: use the pointer to the origin instead of s->si[0].end
When s->si[0].end was dereferenced as a connection or anything in
order to retrieve information about the originating session, we'll
now use sess->origin instead so that when we have to chain multiple
streams in HTTP/2, we'll keep accessing the same origin.
2015-04-06 11:34:29 +02:00
Willy Tarreau
e36cbcb3b0 MEDIUM: stream: move the frontend's pointer to the session
Just like for the listener, the frontend is session-wide so let's move
it to the session. There are a lot of places which were changed but the
changes are minimal in fact.
2015-04-06 11:23:58 +02:00
Willy Tarreau
fb0afa77c9 MEDIUM: stream: move the listener's pointer to the session
The listener is session-specific, move it there.
2015-04-06 11:23:57 +02:00
Willy Tarreau
e7dff02dd4 REORG/MEDIUM: stream: rename stream flags from SN_* to SF_*
This is in order to keep things consistent.
2015-04-06 11:23:57 +02:00
Willy Tarreau
87b09668be REORG/MAJOR: session: rename the "session" entity to "stream"
With HTTP/2, we'll have to support multiplexed streams. A stream is in
fact the largest part of what we currently call a session, it has buffers,
logs, etc.

In order to catch any error, this commit removes any reference to the
struct session and tries to rename most "session" occurrences in function
names to "stream" and "sess" to "strm" when that's related to a session.

The files stream.{c,h} were added and session.{c,h} removed.

The session will be reintroduced later and a few parts of the stream
will progressively be moved overthere. It will more or less contain
only what we need in an embryonic session.

Sample fetch functions and converters will have to change a bit so
that they'll use an L5 (session) instead of what's currently called
"L4" which is in fact L6 for now.

Once all changes are completed, we should see approximately this :

   L7 - http_txn
   L6 - stream
   L5 - session
   L4 - connection | applet

There will be at most one http_txn per stream, and a same session will
possibly be referenced by multiple streams. A connection will point to
a session and to a stream. The session will hold all the information
we need to keep even when we don't yet have a stream.

Some more cleanup is needed because some code was already far from
being clean. The server queue management still refers to sessions at
many places while comments talk about connections. This will have to
be cleaned up once we have a server-side connection pool manager.
Stream flags "SN_*" still need to be renamed, it doesn't seem like
any of them will need to move to the session.
2015-04-06 11:23:56 +02:00
Willy Tarreau
73796535a9 REORG/MEDIUM: channel: only use chn_prod / chn_cons to find stream-interfaces
The purpose of these two macros will be to pass via the session to
find the relevant stream interfaces so that we don't need to store
the ->cons nor ->prod pointers anymore. Currently they're only defined
so that all references could be removed.

Note that many places need a second pass of clean up so that we don't
have any chn_prod(&s->req) anymore and only &s->si[0] instead, and
conversely for the 3 other cases.
2015-03-11 20:41:47 +01:00
Willy Tarreau
22ec1eadd0 REORG/MAJOR: move session's req and resp channels back into the session
The channels were pointers to outside structs and this is not needed
anymore since the buffers have moved, but this complicates operations.
Move them back into the session so that both channels and stream interfaces
are always allocated for a session. Some places (some early sample fetch
functions) used to validate that a channel was NULL prior to dereferencing
it. Now instead we check if chn->buf is NULL and we force it to remain NULL
until the channel is initialized.
2015-03-11 20:41:46 +01:00
Thierry FOURNIER
bc4c1ac6ad MEDIUM: http/tcp: permit to resume http and tcp custom actions
Later, the processing of some actions needs to be interrupted and resumed
later. This patch permit to resume the actions. The actions that needs
to run with the resume mode are not yet avalaible. It will be soon with
Lua patches. So the code added by this patch is untestable for the moment.

The list of "tcp_exec_req_rules" cannot resme because is called by the
unresumable function "accept_session".
2015-02-28 23:12:33 +01:00
Thierry FOURNIER
cc87a11842 MEDIUM: tcp: add register keyword system.
This patch introduces an action keyword registration system for TCP
rulesets similar to what is available for HTTP rulesets. This sytem
will be useful with lua.
2015-02-28 23:12:32 +01:00
Thierry FOURNIER
f41a809dc9 MINOR: sample: add private argument to the struct sample_fetch
The add of this private argument is to prepare the integration
of the lua fetchs.
2015-02-28 23:12:31 +01:00
Willy Tarreau
2af207a5f5 MEDIUM: tcp: implement tcp-ut bind option to set TCP_USER_TIMEOUT
On Linux since 2.6.37, it's possible to set the socket timeout for
pending outgoing data, with an accuracy of 1 millisecond. This is
pretty handy to deal with dead connections to clients and or servers.

For now we only implement it on the frontend side (bind line) so
that when a client disappears from the net, we're able to quickly
get rid of its connection and possibly release a server connection.
This can be useful with long-lived connections where an application
level timeout is not suited because long pauses are expected (remote
terminals, connection pools, etc).

Thanks to Thijs Houtenbos and John Eckersberg for the suggestion.
2015-02-04 00:54:40 +01:00
Willy Tarreau
529c13933b BUG/MAJOR: namespaces: conn->target is not necessarily a server
create_server_socket() used to dereference objt_server(conn->target),
but if the target is not a server (eg: a proxy) then it's NULL and we
get a segfault. This can be reproduced with a proxy using "dispatch"
with no server, even when namespaces are disabled, because that code
is not #ifdef'd. The fix consists in first checking if the target is
a server.

This fix does not need to be backported, this is 1.6-only.
2014-12-24 13:47:55 +01:00
KOVACS Krisztian
b3e54fe387 MAJOR: namespace: add Linux network namespace support
This patch makes it possible to create binds and servers in separate
namespaces.  This can be used to proxy between multiple completely independent
virtual networks (with possibly overlapping IP addresses) and a
non-namespace-aware proxy implementation that supports the proxy protocol (v2).

The setup is something like this:

net1 on VLAN 1 (namespace 1) -\
net2 on VLAN 2 (namespace 2) -- haproxy ==== proxy (namespace 0)
net3 on VLAN 3 (namespace 3) -/

The proxy is configured to make server connections through haproxy and sending
the expected source/target addresses to haproxy using the proxy protocol.

The network namespace setup on the haproxy node is something like this:

= 8< =
$ cat setup.sh
ip netns add 1
ip link add link eth1 type vlan id 1
ip link set eth1.1 netns 1
ip netns exec 1 ip addr add 192.168.91.2/24 dev eth1.1
ip netns exec 1 ip link set eth1.$id up
...
= 8< =

= 8< =
$ cat haproxy.cfg
frontend clients
  bind 127.0.0.1:50022 namespace 1 transparent
  default_backend scb

backend server
  mode tcp
  server server1 192.168.122.4:2222 namespace 2 send-proxy-v2
= 8< =

A bind line creates the listener in the specified namespace, and connections
originating from that listener also have their network namespace set to
that of the listener.

A server line either forces the connection to be made in a specified
namespace or may use the namespace from the client-side connection if that
was set.

For more documentation please read the documentation included in the patch
itself.

Signed-off-by: KOVACS Tamas <ktamas@balabit.com>
Signed-off-by: Sarkozi Laszlo <laszlo.sarkozi@balabit.com>
Signed-off-by: KOVACS Krisztian <hidden@balabit.com>
2014-11-21 07:51:57 +01:00
Willy Tarreau
5e0d0e046a BUG/MEDIUM: tcp: don't use SO_ORIGINAL_DST on non-AF_INET sockets
There's an issue when using SO_ORIGINAL_DST to retrieve the original
destination of a connection's address before being translated by
Netfilter's DNAT/REDIRECT or the old TPROXY. SO_ORIGINAL_DST is
able to retrieve an IPv4 address when the original destination was
IPv4 mapped into IPv6. At first glance it's not a big deal, but it
is for logging and for the proxy protocol, because we then have
two different address families for the source and destination. In
this case, the proxy protocol correctly detects the issue and emits
"UNKNOWN".

In order to fix this, we perform getsockname() first, and only if
the address family is AF_INET, then we perform the getsockopt() call.

This fix must be backported to 1.5, and probably even to 1.4 and 1.3.
2014-10-29 21:46:01 +01:00
Willy Tarreau
fb20e4668d BUG/MEDIUM: tcp: fix outgoing polling based on proxy protocol
During a tcp connection setup in tcp_connect_server(), we check if
there are pending data to start polling for writes immediately. We
also use the same test to know if we can disable the quick ack and
merge the first data packet with the connection's ACK. This last
case is also valid for the proxy protocol.

The problem lies in the way it's done, as the "data" variable is
improperly completed with the presence of the proxy protocol, resulting
in the connection being polled for data writes if the proxy protocol is
enabled. It's not a big issue per se, except that the proxy protocol
uses the fact that we're polling for data to know if it can use MSG_MORE.

This causes no problem on HTTP/HTTPS, but with banner protocols, it
introduces a 200ms delay if the server waits for the PROXY header.

This has been caused by the connection management changes introduced in
1.5-dev12, specifically commit a1a7474 ("MEDIUM: proxy-proto: don't use
buffer flags in conn_si_send_proxy()"), so this fix must be backported
to 1.5.
2014-10-24 12:09:12 +02:00
Willy Tarreau
e1cfc1f2b4 BUG/MINOR: config: do not accept more track-sc than configured
MAX_SESS_STKCTR allows one to define the number of stick counters that can
be used in parallel in track-sc* rules. The naming of this macro creates
some confusion because the value there is sometimes used as a max instead
of a count, and the config parser accepts values from 0 to MAX_SESS_STKCTR
and the processing ignores anything tracked on the last one. This means
that by default, track-sc3 is allowed and ignored.

This fix must be backported to 1.5 where the problem there only affects
TCP rules.
2014-10-17 11:53:05 +02:00
Willy Tarreau
3986b9c140 MEDIUM: config: report it when tcp-request rules are misplaced
A config where a tcp-request rule appears after an http-request rule
might seem valid but it is not. So let's report a warning about this
since this case is hard to detect by the naked eye.
2014-09-16 15:43:24 +02:00
Willy Tarreau
6bcb0a84e7 BUG/MAJOR: tcp: fix a possible busy spinning loop in content track-sc*
As a consequence of various recent changes on the sample conversion,
a corner case has emerged where it is possible to wait forever for a
sample in track-sc*.

The issue is caused by the fact that functions relying on sample_process()
don't all exactly work the same regarding the SMP_F_MAY_CHANGE flag and
the output result. Here it was possible to wait forever for an output
sample from stktable_fetch_key() without checking the SMP_OPT_FINAL flag.
As a result, if the client connects and closes without sending the data
and haproxy expects a sample which is capable of coming, it will ignore
this impossible case and will continue to wait.

This change adds control for SMP_OPT_FINAL before waiting for extra data.
The various relevant functions have been better documented regarding their
output values.

This fix must be backported to 1.5 since it appeared there.
2014-07-30 08:56:35 +02:00
Willy Tarreau
092d865c53 MEDIUM: listener: implement a per-protocol pause() function
In order to fix the abstact socket pause mechanism during soft restarts,
we'll need to proceed differently depending on the socket protocol. The
pause_listener() function already supports some protocol-specific handling
for the TCP case.

This commit makes this cleaner by adding a new ->pause() function to the
protocol struct, which, if defined, may be used to pause a listener of a
given protocol.

For now, only TCP has been adapted, with the specific code moved from
pause_listener() to tcp_pause_listener().
2014-07-08 01:13:34 +02:00
Willy Tarreau
1b71eb581e BUG/MEDIUM: counters: fix track-sc* to wait on unstable contents
I've been facing multiple configurations which involved track-sc* rules
in tcp-request content without the "if ..." to force it to wait for the
contents, resulting in random behaviour with contents sometimes retrieved
and sometimes not.

Reading the doc doesn't make it clear either that the tracking will be
performed only if data are already there and that waiting on an ACL is
the only way to avoid this.

Since this behaviour is not natural and we now have the ability to fix
it, this patch ensures that if input data are still moving, instead of
silently dropping them, we naturally wait for them to stabilize up to
the inspect-delay. This way it's not needed anymore to implement an
ACL-based condition to force to wait for data, eventhough the behaviour
is not changed for when an ACL is present.

The most obvious usage will be when track-sc is followed by any HTTP
sample expression, there's no need anymore for adding "if HTTP".

It's probably worth backporting this to 1.5 to avoid further configuration
issues. Note that it requires previous patch.
2014-06-25 17:26:54 +02:00
Willy Tarreau
b5975defba MINOR: stick-table: make stktable_fetch_key() indicate why it failed
stktable_fetch_key() does not indicate whether it returns NULL because
the input sample was not found or because it's unstable. It causes trouble
with track-sc* rules. Just like with sample_fetch_string(), we want it to
be able to give more information to the caller about what it found. Thus,
now we use the pointer to a sample passed by the caller, and fill it with
the information we have about the sample. That way, even if we return NULL,
the caller has the ability to check whether a sample was found and if it is
still changing or not.
2014-06-25 17:17:53 +02:00
Willy Tarreau
18bf01e900 MEDIUM: tcp: add a new tcp-request capture directive
This new directive captures the specified fetch expression, converts
it to text and puts it into the next capture slot. The capture slots
are shared with header captures so that it is possible to dump all
captures at once or selectively in logs and header processing.

The purpose is to permit logs to contain whatever payload is found in
a request, for example bytes at a fixed location or the SNI of forwarded
SSL traffic.
2014-06-13 16:45:53 +02:00
Willy Tarreau
9cf8d3f46b MINOR: protocols: use is_inet_addr() when only INET addresses are desired
We used to have is_addr() in place to validate sometimes the existence
of an address, sometimes a valid IPv4 or IPv6 address. Replace them
carefully so that is_inet_addr() is used wherever we can only use an
IPv4/IPv6 address.
2014-05-10 01:26:37 +02:00
Thierry FOURNIER
eeaa951726 MINOR: configuration: File and line propagation
This patch permits to communicate file and line of the
configuration file at the configuration parser.
2014-03-17 18:06:08 +01:00
Thierry FOURNIER
0d6ba513a5 MINOR: pattern: store configuration reference for each acl or map pattern.
This patch permit to add reference for each pattern reference. This is
useful to identify the acl listed.
2014-03-17 18:06:07 +01:00
Lukas Tribus
7640e72a31 MINOR: set IP_FREEBIND on IPv6 sockets in transparent mode
Lets set IP_FREEBIND on IPv6 sockets as well, this works since Linux 3.3
and doesn't require CAP_NET_ADMIN privileges (IPV6_TRANSPARENT does).

This allows unprivileged users to bind to non-local IPv6 addresses, which
can be useful when setting up the listening sockets or when connecting
to backend servers with a specific, non-local source IPv6 address (at that
point we usually dropped root privileges already).
2014-03-03 21:31:10 +01:00
Willy Tarreau
cc08d2c9ff MEDIUM: counters: stop relying on session flags at all
Till now, we had one flag per stick counter to indicate if it was
tracked in a backend or in a frontend. We just had to add another
flag per stick-counter to indicate if it relies on contents or just
connection. These flags are quite painful to maintain and tend to
easily conflict with other flags if their number is changed.

The correct solution consists in moving the flags to the stkctr struct
itself, but currently this struct is made of 2 pointers, so adding a
new entry there to store only two bits will cause at least 16 more bytes
to be eaten per counter due to alignment issues, and we definitely don't
want to waste tens to hundreds of bytes per session just for things that
most users don't use.

Since we only need to store two bits per counter, an intermediate
solution consists in replacing the entry pointer with a composite
value made of the original entry pointer and the two flags in the
2 unused lower bits. If later a need for other flags arises, we'll
have to store them in the struct.

A few inline functions have been added to abstract the retrieval
and assignment of the pointers and flags, resulting in very few
changes. That way there is no more dependence on the number of
stick-counters and their position in the session flags.
2014-01-28 23:34:45 +01:00
Willy Tarreau
f3338349ec BUG/MEDIUM: counters: flush content counters after each request
One year ago, commit 5d5b5d8 ("MEDIUM: proto_tcp: add support for tracking
L7 information") brought support for tracking L7 information in tcp-request
content rules. Two years earlier, commit 0a4838c ("[MEDIUM] session-counters:
correctly unbind the counters tracked by the backend") used to flush the
backend counters after processing a request.

While that earliest patch was correct at the time, it became wrong after
the second patch was merged. The code does what it says, but the concept
is flawed. "TCP request content" rules are evaluated for each HTTP request
over a single connection. So if such a rule in the frontend decides to
track any L7 information or to track L4 information when an L7 condition
matches, then it is applied to all requests over the same connection even
if they don't match. This means that a rule such as :

     tcp-request content track-sc0 src if { path /index.html }

will count one request for index.html, and another one for each of the
objects present on this page that are fetched over the same connection
which sent the initial matching request.

Worse, it is possible to make the code do stupid things by using multiple
counters:

     tcp-request content track-sc0 src if { path /foo }
     tcp-request content track-sc1 src if { path /bar }

Just sending two requests first, one with /foo, one with /bar, shows
twice the number of requests for all subsequent requests. Just because
both of them persist after the end of the request.

So the decision to flush backend-tracked counters was not the correct
one. In practice, what is important is to flush countent-based rules
since they are the ones evaluated for each request.

Doing so requires new flags in the session however, to keep track of
which stick-counter was tracked by what ruleset. A later change might
make this easier to maintain over time.

This bug is 1.5-specific, no backport to stable is needed.
2014-01-28 21:40:28 +01:00
Willy Tarreau
3c72872da1 CLEANUP: connection: use conn_ctrl_ready() instead of checking the flag
It's easier and safer to rely on conn_ctrl_ready() everywhere than to
check the flag itself. It will also simplify adding extra checks later
if needed. Some useless controls for !ctrl have been removed, as the
CTRL_READY flag itself guarantees ctrl is set.
2014-01-26 00:42:31 +01:00
Willy Tarreau
fd803bb4d7 MEDIUM: connection: add check for readiness in I/O handlers
The recv/send callbacks must check for readiness themselves instead of
having their callers do it. This will strengthen the test and will also
ensure we never refrain from calling a handshake handler because a
direction is being polled while the other one is ready.
2014-01-26 00:42:30 +01:00
Willy Tarreau
e1f50c4b02 MEDIUM: connection: remove conn_{data,sock}_poll_{recv,send}
We simply remove these functions and replace their calls with the
appropriate ones :

  - if we're in the data phase, we can simply report wait on the FD
  - if we're in the socket phase, we may also have to signal the
    desire to read/write on the socket because it might not be
    active yet.
2014-01-26 00:42:30 +01:00
Willy Tarreau
f817e9f473 MAJOR: polling: rework the whole polling system
This commit heavily changes the polling system in order to definitely
fix the frequent breakage of SSL which needs to remember the last
EAGAIN before deciding whether to poll or not. Now we have a state per
direction for each FD, as opposed to a previous and current state
previously. An FD can have up to 8 different states for each direction,
each of which being the result of a 3-bit combination. These 3 bits
indicate a wish to access the FD, the readiness of the FD and the
subscription of the FD to the polling system.

This means that it will now be possible to remember the state of a
file descriptor across disable/enable sequences that generally happen
during forwarding, where enabling reading on a previously disabled FD
would result in forgetting the EAGAIN flag it met last time.

Several new state manipulation functions have been introduced or
adapted :
  - fd_want_{recv,send} : enable receiving/sending on the FD regardless
    of its state (sets the ACTIVE flag) ;

  - fd_stop_{recv,send} : stop receiving/sending on the FD regardless
    of its state (clears the ACTIVE flag) ;

  - fd_cant_{recv,send} : report a failure to receive/send on the FD
    corresponding to EAGAIN (clears the READY flag) ;

  - fd_may_{recv,send}  : report the ability to receive/send on the FD
    as reported by poll() (sets the READY flag) ;

Some functions are used to report the current FD status :

  - fd_{recv,send}_active
  - fd_{recv,send}_ready
  - fd_{recv,send}_polled

Some functions were removed :
  - fd_ev_clr(), fd_ev_set(), fd_ev_rem(), fd_ev_wai()

The POLLHUP/POLLERR flags are now reported as ready so that the I/O layers
knows it can try to access the file descriptor to get this information.

In order to simplify the conditions to add/remove cache entries, a new
function fd_alloc_or_release_cache_entry() was created to be used from
pollers while scanning for updates.

The following pollers have been updated :

   ev_select() : done, built, tested on Linux 3.10
   ev_poll()   : done, built, tested on Linux 3.10
   ev_epoll()  : done, built, tested on Linux 3.10 & 3.13
   ev_kqueue() : done, built, tested on OpenBSD 5.2
2014-01-26 00:42:30 +01:00
Willy Tarreau
9ce7013429 MEDIUM: tcp: report connection error at the connection level
Now when a connection error happens, it is reported in the connection
so that upper layers know exactly what happened. This is particularly
useful with health checks and resources exhaustion.
2014-01-24 16:15:04 +01:00
Willy Tarreau
3bd3e57a9b MEDIUM: tcp: report in tcp_drain() that lingering is already disabled on close
When an incoming shutdown or error is detected, we know that we
can safely close without disabling lingering. Do it in tcp_drain()
so that we don't have to do it from each and every caller.
2014-01-20 22:27:17 +01:00
Willy Tarreau
7f4bcc312d MINOR: protocol: improve the proto->drain() API
It was not possible to know if the drain() function had hit an
EAGAIN, so now we change the API of this function to return :
  < 0 if EAGAIN was met
  = 0 if some data remain
  > 0 if a shutdown was received
2014-01-20 22:27:16 +01:00
Willy Tarreau
ad38acedaa MEDIUM: connection: centralize handling of nolinger in fd management
Right now we see many places doing their own setsockopt(SO_LINGER).
Better only do it just before the close() in fd_delete(). For this
we add a new flag on the file descriptor, indicating if it's safe or
not to linger. If not (eg: after a connect()), then the setsockopt()
call is automatically performed before a close().

The flag automatically turns to safe when receiving a read0.
2013-12-16 02:23:52 +01:00
Willy Tarreau
975c1784c8 MINOR: sample: make sample_parse_expr() use memprintf() to report parse errors
Doing so ensures that we're consistent between all the functions in the whole
chain. This is important so that we can extract the argument parsing from this
function.
2013-12-12 23:16:54 +01:00
Willy Tarreau
57cd3e46b9 MEDIUM: connection: merge the send_proxy and local_send_proxy calls
We used to have two very similar functions for sending a PROXY protocol
line header. The reason is that the default one relies on the stream
interface to retrieve the other end's address, while the "local" one
performs a local address lookup and sends that instead (used by health
checks).

Now that the send_proxy_ofs is stored in the connection and not the
stream interface, we can make the local_send_proxy rely on it and
support partial sends. This also simplifies the code by removing the
local_send_proxy function, making health checks use send_proxy_ofs,
resulting in the removal of the CO_FL_LOCAL_SPROXY flag, and the
associated test in the connection handler. The other flag,
CO_FL_SI_SEND_PROXY was renamed without the "SI" part so that it
is clear that it is not dedicated anymore to a usage with a stream
interface.
2013-12-09 15:40:23 +01:00
Willy Tarreau
1ec74bf660 MINOR: connection: check for send_proxy during the connect(), not the SI
It's cleaner to check for a pending send_proxy_ofs while establishing
the connection (which already checks it anyway) and not in the stream
interface.
2013-12-09 15:40:23 +01:00
Willy Tarreau
f79c8171b2 MAJOR: connection: add two new flags to indicate readiness of control/transport
Currently the control and transport layers of a connection are supposed
to be initialized when their respective pointers are not NULL. This will
not work anymore when we plan to reuse connections, because there is an
asymmetry between the accept() side and the connect() side :

  - on accept() side, the fd is set first, then the ctrl layer then the
    transport layer ; upon error, they must be undone in the reverse order,
    then the FD must be closed. The FD must not be deleted if the control
    layer was not yet initialized ;

  - on the connect() side, the fd is set last and there is no reliable way
    to know if it has been initialized or not. In practice it's initialized
    to -1 first but this is hackish and supposes that local FDs only will
    be used forever. Also, there are even less solutions for keeping trace
    of the transport layer's state.

Also it is possible to support delayed close() when something (eg: logs)
tracks some information requiring the transport and/or control layers,
making it even more difficult to clean them.

So the proposed solution is to add two flags to the connection :

  - CO_FL_CTRL_READY is set when the control layer is initialized (fd_insert)
    and cleared after it's released (fd_delete).

  - CO_FL_XPRT_READY is set when the control layer is initialized (xprt->init)
    and cleared after it's released (xprt->close).

The functions have been adapted to rely on this and not on the pointers
anymore. conn_xprt_close() was unused and dangerous : it did not close
the control layer (eg: the socket itself) but still marks the transport
layer as closed, preventing any future call to conn_full_close() from
finishing the job.

The problem comes from conn_full_close() in fact. It needs to close the
xprt and ctrl layers independantly. After that we're still having an issue :
we don't know based on ->ctrl alone whether the fd was registered or not.
For this we use the two new flags CO_FL_XPRT_READY and CO_FL_CTRL_READY. We
now rely on this and not on conn->xprt nor conn->ctrl anymore to decide what
remains to be done on the connection.

In order not to miss some flag assignments, we introduce conn_ctrl_init()
to initialize the control layer, register the fd using fd_insert() and set
the flag, and conn_ctrl_close() which unregisters the fd and removes the
flag, but only if the transport layer was closed.

Similarly, at the transport layer, conn_xprt_init() calls ->init and sets
the flag, while conn_xprt_close() checks the flag, calls ->close and clears
the flag, regardless xprt_ctx or xprt_st. This also ensures that the ->init
and the ->close functions are called only once each and in the correct order.
Note that conn_xprt_close() does nothing if the transport layer is still
tracked.

conn_full_close() now simply calls conn_xprt_close() then conn_full_close()
in turn, which do nothing if CO_FL_XPRT_TRACKED is set.

In order to handle the error path, we also provide conn_force_close() which
ignores CO_FL_XPRT_TRACKED and closes the transport and the control layers
in turns. All relevant instances of fd_delete() have been replaced with
conn_force_close(). Now we always know what state the connection is in and
we can expect to split its initialization.
2013-12-09 15:40:23 +01:00
Willy Tarreau
b363a1f469 MAJOR: stream-int: stop using si->conn and use si->end instead
The connection will only remain there as a pre-allocated entity whose
goal is to be placed in ->end when establishing an outgoing connection.
All connection initialization can be made on this connection, but all
information retrieved should be applied to the end point only.

This change is huge because there were many users of si->conn. Now the
only users are those who initialize the new connection. The difficulty
appears in a few places such as backend.c, proto_http.c, peers.c where
si->conn is used to hold the connection's target address before assigning
the connection to the stream interface. This is why we have to keep
si->conn for now. A future improvement might consist in dynamically
allocating the connection when it is needed.
2013-12-09 15:40:22 +01:00
Willy Tarreau
26f4a04744 MEDIUM: connection: set the socket shutdown flags on socket errors
When we get a hard error from a syscall indicating the socket is dead,
it makes sense to set the CO_FL_SOCK_WR_SH and CO_FL_SOCK_RD_SH flags
to indicate that the socket may not be used anymore. It will ease the
error processing in health checks where the state of socket is very
important. We'll also be able to avoid some setsockopt(nolinger) after
an error.

For now, the rest of the code is not impacted because CO_FL_ERROR is
always tested prior to these flags.
2013-12-04 23:50:36 +01:00
Willy Tarreau
f12a20ebce BUG/MINOR: tcp: check that no error is pending during a connect probe
The tcp_connect_probe() function may be called upon I/O activity when
no recv/send callbacks were called (eg: recv not possible, nothing to
send). It only relies on connect() to observe the connection establishment
progress but that does not work when some network errors are pending on
the socket (eg: a delayed connection refused).

For this reason we need to run a getsockopt() in the case where the
poller reports FD_POLL_ERR on the socket. We use this opportunity to
update errno so that the conn->data->wake() function has all relevant
info when it sees CO_FL_ERROR.

At the moment no code is impacted by this bug because recv polling is
always enabled during a connect, so recvfrom() always sees the error
first. But this may change with the health check cleanup.

No backport is needed.
2013-12-04 23:50:10 +01:00
Willy Tarreau
0cba607400 MINOR: acl/pattern: use types different from int to clarify who does what.
We now have the following enums and all related functions return them and
consume them :

   enum pat_match_res {
	PAT_NOMATCH = 0,         /* sample didn't match any pattern */
	PAT_MATCH = 3,           /* sample matched at least one pattern */
   };

   enum acl_test_res {
	ACL_TEST_FAIL = 0,           /* test failed */
	ACL_TEST_MISS = 1,           /* test may pass with more info */
	ACL_TEST_PASS = 3,           /* test passed */
   };

   enum acl_cond_pol {
	ACL_COND_NONE,		/* no polarity set yet */
	ACL_COND_IF,		/* positive condition (after 'if') */
	ACL_COND_UNLESS,	/* negative condition (after 'unless') */
   };

It's just in order to avoid doubts when reading some code.
2013-12-02 23:31:33 +01:00
Thierry FOURNIER
a65b343eee MEDIUM: pattern: rename "acl" prefix to "pat"
This patch just renames functions, types and enums. No code was changed.
A significant number of files were touched, especially the ACL arrays,
so it is likely that some external patches will not apply anymore.

One important thing is that we had to split ACL_PAT_* into two groups :
  - ACL_TEST_{PASS|MISS|FAIL}
  - PAT_{MATCH|UNMATCH}

A future patch will enforce enums on all these places to avoid confusion.
2013-12-02 23:31:33 +01:00
Willy Tarreau
0bb166be5e MINOR: tcp: don't use tick_add_ifset() when timeout is known to be set
These two useless tests propably result from a copy-paste. The test is
performed in the condition to enter the block.
2013-11-04 18:12:20 +01:00
Willy Tarreau
44778ad87d BUG/MEDIUM: tcp: do not skip tracking rules on second pass
The track-sc* tcp rules are bogus. The test to verify if the
tracked counter was already assigned is performed in the same
condition as the test for the action. The effect is that a
rule which tracks a counter that is already being tracked
is implicitly converted to an accept because the default
rule is an accept.

This bug only affects 1.5-dev releases.
2013-10-30 19:29:21 +01:00
Willy Tarreau
cc1e04b1e8 MINOR: tcp: add new "close" action for tcp-response
This new action immediately closes the connection with the server
when the condition is met. The first such rule executed ends the
rules evaluation. The main purpose of this action is to force a
connection to be finished between a client and a server after an
exchange when the application protocol expects some long time outs
to elapse first. The goal is to eliminate idle connections which
take signifiant resources on servers with certain protocols.
2013-09-11 23:28:51 +02:00
Willy Tarreau
b4c8493a9f MINOR: session: make the number of stick counter entries more configurable
In preparation of more flexibility in the stick counters, make their
number configurable. It still defaults to 3 which is the minimum
accepted value. Changing the value alone is not sufficient to get
more counters, some bitfields still need to be updated and the TCP
actions need to be updated as well, but this update tries to be
easier, which is nice for experimentation purposes.
2013-08-01 21:17:14 +02:00
Willy Tarreau
ef38c39287 MEDIUM: sample: systematically pass the keyword pointer to the keyword
We're having a lot of duplicate code just because of minor variants between
fetch functions that could be dealt with if the functions had the pointer to
the original keyword, so let's pass it as the last argument. An earlier
version used to pass a pointer to the sample_fetch element, but this is not
the best solution for two reasons :
  - fetch functions will solely rely on the keyword string
  - some other smp_fetch_* users do not have the pointer to the original
    keyword and were forced to pass NULL.

So finally we're passing a pointer to the keyword as a const char *, which
perfectly fits the original purpose.
2013-08-01 21:17:13 +02:00
Willy Tarreau
dc13c11c1e BUG/MEDIUM: prevent gcc from moving empty keywords lists into BSS
Benoit Dolez reported a failure to start haproxy 1.5-dev19. The
process would immediately report an internal error with missing
fetches from some crap instead of ACL names.

The cause is that some versions of gcc seem to trim static structs
containing a variable array when moving them to BSS, and only keep
the fixed size, which is just a list head for all ACL and sample
fetch keywords. This was confirmed at least with gcc 3.4.6. And we
can't move these structs to const because they contain a list element
which is needed to link all of them together during the parsing.

The bug indeed appeared with 1.5-dev19 because it's the first one
to have some empty ACL keyword lists.

One solution is to impose -fno-zero-initialized-in-bss to everyone
but this is not really nice. Another solution consists in ensuring
the struct is never empty so that it does not move there. The easy
solution consists in having a non-null list head since it's not yet
initialized.

A new "ILH" list head type was thus created for this purpose : create
an Initialized List Head so that gcc cannot move the struct to BSS.
This fixes the issue for this version of gcc and does not create any
burden for the declarations.
2013-06-21 23:29:02 +02:00
Willy Tarreau
be4a3eff34 MEDIUM: counters: use sc0/sc1/sc2 instead of sc1/sc2/sc3
It was a bit inconsistent to have gpc start at 0 and sc start at 1,
so make sc start at zero like gpc. No previous release was issued
with sc3 anyway, so no existing setup should be affected.
2013-06-17 15:04:07 +02:00
Willy Tarreau
6d4e4e8dd2 MEDIUM: acl: remove a lot of useless ACLs that are equivalent to their fetches
The following 116 ACLs were removed because they're redundant with their
fetch function since last commit which allows the fetch function to be
used instead for types BOOL, INT and IP. Most places are now left with
an empty ACL keyword list that was not removed so that it's easier to
add other ACLs later.

always_false, always_true, avg_queue, be_conn, be_id, be_sess_rate, connslots,
nbsrv, queue, srv_conn, srv_id, srv_is_up, srv_sess_rate, res.comp, fe_conn,
fe_id, fe_sess_rate, dst_conn, so_id, wait_end, http_auth, http_first_req,
status, dst, dst_port, src, src_port, sc1_bytes_in_rate, sc1_bytes_out_rate,
sc1_clr_gpc0, sc1_conn_cnt, sc1_conn_cur, sc1_conn_rate, sc1_get_gpc0,
sc1_gpc0_rate, sc1_http_err_cnt, sc1_http_err_rate, sc1_http_req_cnt,
sc1_http_req_rate, sc1_inc_gpc0, sc1_kbytes_in, sc1_kbytes_out, sc1_sess_cnt,
sc1_sess_rate, sc1_tracked, sc1_trackers, sc2_bytes_in_rate,
sc2_bytes_out_rate, sc2_clr_gpc0, sc2_conn_cnt, sc2_conn_cur, sc2_conn_rate,
sc2_get_gpc0, sc2_gpc0_rate, sc2_http_err_cnt, sc2_http_err_rate,
sc2_http_req_cnt, sc2_http_req_rate, sc2_inc_gpc0, sc2_kbytes_in,
sc2_kbytes_out, sc2_sess_cnt, sc2_sess_rate, sc2_tracked, sc2_trackers,
sc3_bytes_in_rate, sc3_bytes_out_rate, sc3_clr_gpc0, sc3_conn_cnt,
sc3_conn_cur, sc3_conn_rate, sc3_get_gpc0, sc3_gpc0_rate, sc3_http_err_cnt,
sc3_http_err_rate, sc3_http_req_cnt, sc3_http_req_rate, sc3_inc_gpc0,
sc3_kbytes_in, sc3_kbytes_out, sc3_sess_cnt, sc3_sess_rate, sc3_tracked,
sc3_trackers, src_bytes_in_rate, src_bytes_out_rate, src_clr_gpc0,
src_conn_cnt, src_conn_cur, src_conn_rate, src_get_gpc0, src_gpc0_rate,
src_http_err_cnt, src_http_err_rate, src_http_req_cnt, src_http_req_rate,
src_inc_gpc0, src_kbytes_in, src_kbytes_out, src_sess_cnt, src_sess_rate,
src_updt_conn_cnt, table_avl, table_cnt, ssl_c_ca_err, ssl_c_ca_err_depth,
ssl_c_err, ssl_c_used, ssl_c_verify, ssl_c_version, ssl_f_version, ssl_fc,
ssl_fc_alg_keysize, ssl_fc_has_crt, ssl_fc_has_sni, ssl_fc_use_keysize,
2013-06-11 21:22:58 +02:00
Willy Tarreau
4f0d919bd4 MEDIUM: tcp: add "tcp-request connection expect-proxy layer4"
This configures the client-facing connection to receive a PROXY protocol
header before any byte is read from the socket. This is equivalent to
having the "accept-proxy" keyword on the "bind" line, except that using
the TCP rule allows the PROXY protocol to be accepted only for certain
IP address ranges using an ACL. This is convenient when multiple layers
of load balancers are passed through by traffic coming from public
hosts.
2013-06-11 20:40:55 +02:00
Willy Tarreau
2b57cb8f30 MEDIUM: protocol: implement a "drain" function in protocol layers
Since commit cfd97c6f was merged into 1.5-dev14 (BUG/MEDIUM: checks:
prevent TIME_WAITs from appearing also on timeouts), some valid health
checks sometimes used to show some TCP resets. For example, this HTTP
health check sent to a local server :

  19:55:15.742818 IP 127.0.0.1.16568 > 127.0.0.1.8000: S 3355859679:3355859679(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
  19:55:15.742841 IP 127.0.0.1.8000 > 127.0.0.1.16568: S 1060952566:1060952566(0) ack 3355859680 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
  19:55:15.742863 IP 127.0.0.1.16568 > 127.0.0.1.8000: . ack 1 win 257
  19:55:15.745402 IP 127.0.0.1.16568 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
  19:55:15.745488 IP 127.0.0.1.8000 > 127.0.0.1.16568: FP 1:146(145) ack 23 win 257
  19:55:15.747109 IP 127.0.0.1.16568 > 127.0.0.1.8000: R 23:23(0) ack 147 win 257

After some discussion with Chris Huang-Leaver, it appeared clear that
what we want is to only send the RST when we have no other choice, which
means when the server has not closed. So we still keep SYN/SYN-ACK/RST
for pure TCP checks, but don't want to see an RST emitted as above when
the server has already sent the FIN.

The solution against this consists in implementing a "drain" function at
the protocol layer, which, when defined, causes as much as possible of
the input socket buffer to be flushed to make recv() return zero so that
we know that the server's FIN was received and ACKed. On Linux, we can make
use of MSG_TRUNC on TCP sockets, which has the benefit of draining everything
at once without even copying data. On other platforms, we read up to one
buffer of data before the close. If recv() manages to get the final zero,
we don't disable lingering. Same for hard errors. Otherwise we do.

In practice, on HTTP health checks we generally find that the close was
pending and is returned upon first recv() call. The network trace becomes
cleaner :

  19:55:23.650621 IP 127.0.0.1.16561 > 127.0.0.1.8000: S 3982804816:3982804816(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
  19:55:23.650644 IP 127.0.0.1.8000 > 127.0.0.1.16561: S 4082139313:4082139313(0) ack 3982804817 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
  19:55:23.650666 IP 127.0.0.1.16561 > 127.0.0.1.8000: . ack 1 win 257
  19:55:23.651615 IP 127.0.0.1.16561 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
  19:55:23.651696 IP 127.0.0.1.8000 > 127.0.0.1.16561: FP 1:146(145) ack 23 win 257
  19:55:23.652628 IP 127.0.0.1.16561 > 127.0.0.1.8000: F 23:23(0) ack 147 win 257
  19:55:23.652655 IP 127.0.0.1.8000 > 127.0.0.1.16561: . ack 24 win 257

This change should be backported to 1.4 which is where Chris encountered
this issue. The code is different, so probably the tcp_drain() function
will have to be put in the checks only.
2013-06-10 20:33:23 +02:00
Willy Tarreau
e25c917af8 MEDIUM: counters: add support for tracking a third counter
We're often missin a third counter to track base, src and base+src at
the same time. Here we introduce track_sc3 to have this third counter.
It would be wise not to add much more counters because that slightly
increases the session size and processing time though the real issue
is more the declaration of the keywords in the code and in the doc.
2013-05-29 00:37:16 +02:00
Willy Tarreau
d5ca9abb0d MINOR: counters: make it easier to extend the amount of tracked counters
By properly affecting the flags and values, it becomes easier to add
more tracked counters, for example for experimentation. It also slightly
reduces the code and the number of tests. No counters were added with
this patch.
2013-05-28 17:43:40 +02:00
Pieter Baauw
1eb7592bba MINOR: tproxy: add support for OpenBSD
OpenBSD uses (SOL_SOCKET, SO_BINDANY) to enable transparent
proxy on a socket.

This patch adds support for the relevant setsockopt() calls.
2013-05-11 08:03:50 +02:00
Pieter Baauw
ff30b6667b MINOR: tproxy: add support for FreeBSD
FreeBSD uses (IPPROTO_IP, IP_BINDANY) and (IPPROTO_IPV6, IPV6_BINDANY)
to enable transparent proxy on a socket.

This patch adds support for the relevant setsockopt() calls.
2013-05-11 08:03:43 +02:00
Pieter Baauw
d551fb5a8d REORG: tproxy: prepare the transparent proxy defines for accepting other OSes
This patch does not change the logic of the code, it only changes the
way OS-specific defines are tested.

At the moment the transparent proxy code heavily depends on Linux-specific
defines. This first patch introduces a new define "CONFIG_HAP_TRANSPARENT"
which is set every time the defines used by transparent proxy are present.
This also means that with an up-to-date libc, it should not be necessary
anymore to force CONFIG_HAP_LINUX_TPROXY during the build, as the flags
will automatically be detected.

The CTTPROXY flags still remain separate because this older API doesn't
work the same way.

A new line has been added in the version output for haproxy -vv to indicate
what transparent proxy support is available.
2013-05-11 08:03:37 +02:00
Willy Tarreau
33c60dece5 MINOR: tcp: report the erroneous word in tcp-request track*
We used to report the word after the fetch.
2013-04-12 08:26:32 +02:00
Willy Tarreau
deaec2fda3 BUG/MINOR: tcp: fix error reporting for TCP rules
tcp-request swapped two output words in the error message, making it meaningless.
2013-04-11 17:24:53 +02:00
Willy Tarreau
a4312fa28e MAJOR: sample: maintain a per-proxy list of the fetch args to resolve
While ACL args were resolved after all the config was parsed, it was not the
case with sample fetch args because they're almost everywhere now.

The issue is that ACLs now solely rely on sample fetches, so their args
resolving doesn't work anymore. And many fetches involving a server, a
proxy or a userlist don't work at all.

The real issue is that at the bottom layers we have no information about
proxies, line numbers, even ACLs in order to report understandable errors,
and that at the top layers we have no visibility over the locations where
fetches are referenced (think log node).

After failing multiple unsatisfying solutions attempts, we now have a new
concept of args list. The principle is that every proxy has a list head
which contains a number of indications such as the config keyword, the
context where it's used, the file and line number, etc... and a list of
arguments. This list head is of the same type as the elements, so it
serves as a template for adding new elements. This way, it is filled from
top to bottom by the callers with the information they have (eg: line
numbers, ACL name, ...) and the lower layers just have to duplicate it and
add an element when they face an argument they cannot resolve yet.

Then at the end of the configuration parsing, a loop passes over each
proxy's list and resolves all the args in sequence. And this way there is
all necessary information to report verbose errors.

The first immediate benefit is that for the first time we got very precise
location of issues (arg number in a keyword in its context, ...). Second,
in order to do this we had to parse log-format and unique-id-format a bit
earlier, so that was a great opportunity for doing so when the directives
are encountered (unless it's a default section). This way, the recorded
line numbers for these args are the ones of the place where the log format
is declared, not the end of the file.

Userlists report slightly more information now. They're the only remaining
ones in the ACL resolving function.
2013-04-03 02:13:02 +02:00
Willy Tarreau
93fddf1dbc MEDIUM: acl: have a pointer to the keyword name in acl_expr
The acl_expr struct used to hold a pointer to the ACL keyword. But since
we now have all the relevant pointers, we don't need that anymore, we just
need the pointer to the keyword as a string in order to return warnings
and error messages.

So let's change this in order to remove the dependency on the acl_keyword
struct from acl_expr.

During this change, acl_cond_kw_conflicts() used to return a pointer to an
ACL keyword but had to be changed to return a const char* for the same reason.
2013-04-03 02:13:01 +02:00
Willy Tarreau
d86e29d2a1 CLEANUP: acl: remove unused references to ACL_USE_*
Now that acl->requires is not used anymore, we can remove all references
to it as well as all ACL_USE_* flags.
2013-04-03 02:13:00 +02:00
Willy Tarreau
a91d0a583c MAJOR: acl: convert all ACL requires to SMP use+val instead of ->requires
The ACLs now use the fetch's ->use and ->val to decide upon compatibility
between the place where they are used and where the information are fetched.
The code is capable of reporting warnings about very fine incompatibilities
between certain fetches and an exact usage location, so it is expected that
some new warnings will be emitted on some existing configurations.

Two degrees of detection are provided :
  - detecting ACLs that never match
  - detecting keywords that are ignored

All tests show that this seems to work well, though bugs are still possible.
2013-04-03 02:13:00 +02:00
Willy Tarreau
25320b2906 MEDIUM: proxy: remove acl_requires and just keep a flag "http_needed"
Proxy's acl_requires was a copy of all bits taken from ACLs, but we'll
get rid of ACL flags and only rely on sample fetches soon. The proxy's
acl_requires was only used to allocate an HTTP context when needed, and
was even forced in HTTP mode. So better have a flag which exactly says
what it's supposed to be used for.
2013-04-03 02:13:00 +02:00
Willy Tarreau
c48c90dfa5 MAJOR: acl: remove the arg_mask from the ACL definition and use the sample fetch's
Now that ACLs solely rely on sample fetch functions, make them use the
same arg mask. All inconsistencies have been fixed separately prior to
this patch, so this patch almost only adds a new pointer indirection
and removes all references to ARG*() in the definitions.

The parsing is still performed by the ACL code though.
2013-04-03 02:12:58 +02:00
Willy Tarreau
8ed669b12a MAJOR: acl: make all ACLs reference the fetch function via a sample.
ACL fetch functions used to directly reference a fetch function. Now
that all ACL fetches have their sample fetches equivalent, we can make
ACLs reference a sample fetch keyword instead.

In order to simplify the code, a sample keyword name may be NULL if it
is the same as the ACL's, which is the most common case.

A minor change appeared, http_auth always expects one argument though
the ACL allowed it to be missing and reported as such afterwards, so
fix the ACL to match this. This is not really a bug.
2013-04-03 02:12:58 +02:00
Willy Tarreau
d4c33c8889 MEDIUM: samples: move payload-based fetches and ACLs to their own file
The file acl.c is a real mess, it both contains functions to parse and
process ACLs, and some sample extraction functions which act on buffers.
Some other payload analysers were arbitrarily dispatched to proto_tcp.c.

So now we're moving all payload-based fetches and ACLs to payload.c
which is capable of extracting data from buffers and rely on everything
that is protocol-independant. That way we can safely inflate this file
and only use the other ones when some fetches are really specific (eg:
HTTP, SSL, ...).

As a result of this cleanup, the following new sample fetches became
available even if they're not really useful :

  always_false, always_true, rep_ssl_hello_type, rdp_cookie_cnt,
  req_len, req_ssl_hello_type, req_ssl_sni, req_ssl_ver, wait_end

The function 'acl_fetch_nothing' was wrong and never used anywhere so it
was removed.

The "rdp_cookie" sample fetch used to have a mandatory argument while it
was optional in ACLs, which are supposed to iterate over RDP cookies. So
we're making it optional as a fetch too, and it will return the first one.
2013-04-03 02:12:57 +02:00
Willy Tarreau
80aca90ad2 MEDIUM: samples: use new flags to describe compatibility between fetches and their usages
Samples fetches were relying on two flags SMP_CAP_REQ/SMP_CAP_RES to describe
whether they were compatible with requests rules or with response rules. This
was never reliable because we need a finer granularity (eg: an HTTP request
method needs to parse an HTTP request, and is available past this point).

Some fetches are also dependant on the context (eg: "hdr" uses request or
response depending where it's involved, causing some abiguity).

In order to solve this, we need to precisely indicate in fetches what they
use, and their users will have to compare with what they have.

So now we have a bunch of bits indicating where the sample is fetched in the
processing chain, with a few variants indicating for some of them if it is
permanent or volatile (eg: an HTTP status is stored into the transaction so
it is permanent, despite being caught in the response contents).

The fetches also have a second mask indicating their validity domain. This one
is computed from a conversion table at registration time, so there is no need
for doing it by hand. This validity domain consists in a bitmask with one bit
set for each usage point in the processing chain. Some provisions were made
for upcoming controls such as connection-based TCP rules which apply on top of
the connection layer but before instantiating the session.

Then everywhere a fetch is used, the bit for the control point is checked in
the fetch's validity domain, and it becomes possible to finely ensure that a
fetch will work or not.

Note that we need these two separate bitfields because some fetches are usable
both in request and response (eg: "hdr", "payload"). So the keyword will have
a "use" field made of a combination of several SMP_USE_* values, which will be
converted into a wider list of SMP_VAL_* flags.

The knowledge of permanent vs dynamic information has disappeared for now, as
it was never used. Later we'll probably reintroduce it differently when
dealing with variables. Its only use at the moment could have been to avoid
caching a dynamic rate measurement, but nothing is cached as of now.
2013-04-03 02:12:56 +02:00
Willy Tarreau
e0db1e8946 MEDIUM: acl: remove flag ACL_MAY_LOOKUP which is improperly used
This flag is used on ACL matches that support being looking up patterns
in trees. At the moment, only strings and IPs support tree-based lookups,
but the flag is randomly set also on integers and binary data, and is not
even always set on strings nor IPs.

Better get rid of this mess by only relying on the matching function to
decide whether or not it supports tree-based lookups, this is safer and
easier to maintain.
2013-04-03 02:12:56 +02:00
Willy Tarreau
40aa070c51 MAJOR: listener: support inheriting a listening fd from the parent
Using the address syntax "fd@<num>", a listener may inherit a file
descriptor that the caller process has already bound and passed as
this number. The fd's socket family is detected using getsockname(),
and the usual initialization is performed through the existing code
for that family, but the socket creation is skipped.

Whether the parent has performed the listen() call or not is not
important as this is detected.

For UNIX sockets, we immediately clear the path after preparing a
socket so that we never remove it in case an abort would happen due
to a late error during startup.
2013-03-11 01:30:01 +01:00
Lukas Tribus
0defb90784 DOC: tfo: bump required kernel to linux-3.7
Support for server side TFO was actually introduced in linux-3.7,
linux-3.6 just has client support.

This patch fixes documentation and a code comment about the
kernel requirement. It also fixes a wrong tfo related code
comment in src/proto_tcp.c.
2013-02-14 00:03:04 +01:00
Willy Tarreau
8ab505bdef CLEANUP: tcp/unix: remove useless NULL check in {tcp,unix}_bind_listener()
errmsg may only be NULL if errlen is zero. Clarify this in the comment too.

Reported-by: Dinko Korunic <dkorunic@reflected.net>
2013-01-24 16:19:18 +01:00
Willy Tarreau
d486ef5045 BUG/MINOR: connection: remove a few synchronous calls to polling updates
There were a few synchronous calls to polling updates in some functions
called from the connection handler. These ones are not needed and should
be replaced by more efficient and more debugable asynchronous calls.
2012-12-10 17:03:52 +01:00
Willy Tarreau
b54b6ca483 BUG/MINOR: proto_tcp: bidirectional fetches not supported anymore in track-sc1/2
Sample fetch capabilities indicate when the fetch may be used and not
what it requires, so we need to check if a fetch is compatible with
the direction we want and not if it works backwards.
2012-12-09 17:04:41 +01:00
Willy Tarreau
598718a7ab BUG/MINOR: proto_tcp: fix parsing of "table" in track-sc1/2
Recent commit 5d5b5d8e left the "table" argument in the list of
arguments to parse.
2012-12-09 16:57:27 +01:00
Willy Tarreau
20d46a5a95 CLEANUP: session: use an array for the stick counters
The stick counters were in two distinct sets of struct members,
causing some code to be duplicated. Now we use an array, which
enables some processing to be performed in loops. This allowed
the code to be shrunk by 700 bytes.
2012-12-09 15:57:16 +01:00
Willy Tarreau
5d5b5d8eaf MEDIUM: proto_tcp: add support for tracking L7 information
Until now it was only possible to use track-sc1/sc2 with "src" which
is the IPv4 source address. Now we can use track-sc1/sc2 with any fetch
as well as any transformation type. It works just like the "stick"
directive.

Samples are automatically converted to the correct types for the table.

Only "tcp-request content" rules may use L7 information, and such information
must already be present when the tracking is set up. For example it becomes
possible to track the IP address passed in the X-Forwarded-For header.

HTTP request processing now also considers tracking from backend rules
because we want to be able to update the counters even when the request
was already parsed and tracked.

Some more controls need to be performed (eg: samples do not distinguish
between L4 and L6).
2012-12-09 14:08:47 +01:00
Willy Tarreau
a4380b4f15 CLEANUP: proto_tcp: use the same code to bind servers and backends
The tproxy and source binding code has now be factored out for
servers and backends. A nice effect is that the code now supports
having backends use source port ranges, though the config does not
support it yet. This change has reduced the executable by around
700 bytes.
2012-12-09 10:05:37 +01:00
Willy Tarreau
ef9a360555 MEDIUM: connection: introduce "struct conn_src" for servers and proxies
Both servers and proxies share a common set of parameters for outgoing
connections, and since they're not stored in a similar structure, a lot
of code is duplicated in the connection setup, which is one sensible
area.

Let's first define a common struct for these settings and make use of it.
Next patches will de-duplicate code.

This change also fixes a build breakage that happens when USE_LINUX_TPROXY
is not set but USE_CTTPROXY is set, which seem to be very unlikely
considering that the issue was introduced almost 2 years ago an never
reported.
2012-12-09 10:04:39 +01:00
Willy Tarreau
b1719517b7 BUG/MEDIUM: tcp: process could theorically crash on lack of source ports
When connect() fails with EAGAIN or EADDRINUSE, an error message is
sent to logs and uses srv->id to indicate the server name (this is
very old code). Since version 1.4, it is possible to have srv == NULL,
so the message could cause a crash when connect() returns EAGAIN or
EADDRINUSE. However in practice this does not happen because on lack
of source ports, EADDRNOTAVAIL is returned instead, so this code is
never called.

This fix consists in not displaying the server name anymore, and in
adding the test for EADDRNOTAVAIL.

Also, the log level was lowered from LOG_EMERG to LOG_ERR in order
not to spam all consoles when source ports are missing for a given
target.

This fix should be backported to 1.4.
2012-12-08 23:07:33 +01:00
Willy Tarreau
fc8f1f0382 BUG/MINOR: tcp: set the ADDR_TO_SET flag on outgoing connections
tcp_connect_server() resets all of the connection's flags. This means
that an outgoing connection does not have the ADDR_TO_SET flag
eventhough the address is set.

The first impact is that logging the outgoing address or displaying
it on the CLI while dumping sessions will result in an extra call to
getpeername().

But there is a nastier impact. If such a lookup happens *after* the
first connect() attempt and this one fails, the destination address
is corrupted by the call to getsockname(), and subsequent connection
retries will fail with socket errors.

For now we fix this by making tcp_connect_server() set the flag. But
we'll soon need a function to initialize an outgoing connection with
appropriate address and flags before calling the connect() function.
2012-12-08 18:53:44 +01:00
Willy Tarreau
77e3af9e6f MINOR: tcp: add support for the "v4v6" bind option
Commit 9b6700f added "v6only". As suggested by Vincent Bernat, it is
sometimes useful to have the opposite option to force binding to the
two protocols when the system is configured to bind to v6 only by
default. This option does exactly this. v6only still has precedence.
2012-11-24 15:07:23 +01:00
Willy Tarreau
9b6700f673 MINOR: tcp: add support for the "v6only" bind option
This option forces a socket to bind to IPv6 only when it uses the
default address (eg: ":::80").
2012-11-24 12:20:28 +01:00
Willy Tarreau
f0837b259b MEDIUM: tcp: add explicit support for delayed ACK in connect()
Commit 24db47e0 tried to improve support for delayed ACK upon connect
but it was incomplete, because checks with the proxy protocol would
always enable polling for data receive and there was no way of
distinguishing data polling and delayed ack.

So we add a distinct delack flag to the connect() function so that
the caller decides whether or not to use a delayed ack regardless
of pending data (eg: when send-proxy is in use). Doing so covers all
combinations of { (check with data), (sendproxy), (smart-connect) }.
2012-11-24 10:24:27 +01:00
Willy Tarreau
24db47e0cc MEDIUM: checks: avoid waking the application up for pure TCP checks
Pure TCP checks only use the SYN/ACK in return to a SYN. By forcing
the system to use delayed ACKs, it is possible to send an RST instead
of the ACK and thus ensure that the application will never be needlessly
woken up. This avoids error logs or counters on checked components since
the application is never made aware of this connection which dies in the
network stack.
2012-11-23 14:18:39 +01:00
Willy Tarreau
6b0a850503 BUG/MEDIUM: checks: mark the check as stopped after a connect error
Health checks currently still use the connection's fd to know whether
a check is running (this needs to change). When a health check
immediately fails during connect() because of a lack of local resource
(eg: port), we failed to unset the fd, so each time the process_chk
woken up after such an error, it believed a check was still running
and used to close the fd again instead of starting a new check. This
could result in other connections being closed because they were
assigned the same fd value.

The bug is only marked medium because when this happens, the system
is already in a bad state.

A comment was added above tcp_connect_server() to clarify that the
fd is *not* valid on error.
2012-11-23 09:03:29 +01:00
Willy Tarreau
3fdb366885 MAJOR: connection: replace struct target with a pointer to an enum
Instead of storing a couple of (int, ptr) in the struct connection
and the struct session, we use a different method : we only store a
pointer to an integer which is stored inside the target object and
which contains a unique type identifier. That way, the pointer allows
us to retrieve the object type (by dereferencing it) and the object's
address (by computing the displacement in the target structure). The
NULL pointer always corresponds to OBJ_TYPE_NONE.

This reduces the size of the connection and session structs. It also
simplifies target assignment and compare.

In order to improve the generated code, we try to put the obj_type
element at the beginning of all the structs (listener, server, proxy,
si_applet), so that the original and target pointers are always equal.

A lot of code was touched by massive replaces, but the changes are not
that important.
2012-11-12 00:42:33 +01:00
Willy Tarreau
f2943dccd0 MAJOR: session: detach the connections from the stream interfaces
We will need to be able to switch server connections on a session and
to keep idle connections. In order to achieve this, the preliminary
requirement is that the connections can survive the session and be
detached from them.

Right now they're still allocated at exactly the same place, so when
there is a session, there are always 2 connections. We could soon
improve on this by allocating the outgoing connection only during a
connect().

This current patch touches a lot of code and intentionally does not
change any functionnality. Performance tests show no regression (even
a very minor improvement). The doc has not yet been updated.
2012-10-26 20:15:20 +02:00
Willy Tarreau
5f2877a7dd BUG/MEDIUM: tcp: transparent bind to the source only when address is set
Thomas Heil reported that health checks did not work anymore when a backend
or server has "usesrc clientip". This is because the source address is not
set and tcp_bind_socket() tries to bind to that address anyway.

The solution consists in explicitly clearing the source address in the checks
and to make tcp_bind_socket() avoid binding when the address is not set. This
also has an indirect benefit that a useless bind() syscall will be avoided
when using "source 0.0.0.0 usesrc clientip" in health checks.
2012-10-26 20:04:27 +02:00
Willy Tarreau
9b28e03b66 MAJOR: channel: replace the struct buffer with a pointer to a buffer
With this commit, we now separate the channel from the buffer. This will
allow us to replace buffers on the fly without touching the channel. Since
nobody is supposed to keep a reference to a buffer anymore, doing so is not
a problem and will also permit some copy-less data manipulation.

Interestingly, these changes have shown a 2% performance increase on some
workloads, probably due to a better cache placement of data.
2012-10-13 09:07:52 +02:00
Willy Tarreau
697d85045a CLEANUP: tcp: use 'chn' instead of 'buf' or 'b' for channel pointer names
Same as previous patches, avoid confusion in local variable names.
2012-10-12 23:53:39 +02:00
Willy Tarreau
1c862c5920 MEDIUM: tcp: enable TCP Fast Open on systems which support it
If TCP_FASTOPEN is defined, then the "tfo" option is supported on
"bind" lines to enable TCP Fast Open (linux >= 3.6).
2012-10-05 16:22:35 +02:00
Willy Tarreau
f7bc57ca6e REORG: connection: rename the data layer the "transport layer"
While working on the changes required to make the health checks use the
new connections, it started to become obvious that some naming was not
logical at all in the connections. Specifically, it is not logical to
call the "data layer" the layer which is in charge for all the handshake
and which does not yet provide a data layer once established until a
session has allocated all the required buffers.

In fact, it's more a transport layer, which makes much more sense. The
transport layer offers a medium on which data can transit, and it offers
the functions to move these data when the upper layer requests this. And
it is the upper layer which iterates over the transport layer's functions
to move data which should be called the data layer.

The use case where it's obvious is with embryonic sessions : an incoming
SSL connection is accepted. Only the connection is allocated, not the
buffers nor stream interface, etc... The connection handles the SSL
handshake by itself. Once this handshake is complete, we can't use the
data functions because the buffers and stream interface are not there
yet. Hence we have to first call a specific function to complete the
session initialization, after which we'll be able to use the data
functions. This clearly proves that SSL here is only a transport layer
and that the stream interface constitutes the data layer.

A similar change will be performed to rename app_cb => data, but the
two could not be in the same commit for obvious reasons.
2012-10-04 22:26:09 +02:00
Cyril Bonté
3aaba440a2 BUILD: fix compilation error with DEBUG_FULL
Recent changes in structures broke the compilation when using DEBUG_FULL.
Let's update apply the changes also to the variables used in DPRINTF calls.
2012-09-24 20:36:39 +02:00
Willy Tarreau
eb6cead1de MINOR: standard: make memprintf() support a NULL destination
Doing so removes many checks that were systematically made because
the callees don't know if the caller passed a valid pointer.
2012-09-24 10:53:16 +02:00
Willy Tarreau
4348fad1c1 MAJOR: listeners: use dual-linked lists to chain listeners with frontends
Navigating through listeners was very inconvenient and error-prone. Not to
mention that listeners were linked in reverse order and reverted afterwards.
In order to definitely get rid of these issues, we now do the following :
  - frontends have a dual-linked list of bind_conf
  - frontends have a dual-linked list of listeners
  - bind_conf have a dual-linked list of listeners
  - listeners have a pointer to their bind_conf

This way we can now navigate from anywhere to anywhere and always find the
proper bind_conf for a given listener, as well as find the list of listeners
for a current bind_conf.
2012-09-20 16:48:07 +02:00
Willy Tarreau
28a47d6408 MINOR: config: pass the file and line to config keyword parsers
This will be needed when we need to create bind config settings.
2012-09-18 20:02:48 +02:00
Willy Tarreau
51fb7651c4 MINOR: listener: add a scope field in the bind keyword lists
This scope is used to report what the keywords are used for (eg: TCP,
UNIX, ...). It is now reported by bind_dump_kws().
2012-09-18 18:27:14 +02:00
Willy Tarreau
4479124cda MEDIUM: config: move the "bind" TCP parameters to proto_tcp
Now proto_tcp.c is responsible for the 4 settings it handles :
  - defer-accept
  - interface
  - mss
  - transparent

These ones do not need to be handled in cfgparse anymore. If support for a
setting is disabled by a missing build option, then cfgparse correctly
reports :

  [ALERT] 255/232700 (2701) : parsing [echo.cfg:114] : 'bind' : 'transparent' option is not implemented in this version (check build options).
2012-09-15 22:33:16 +02:00
Willy Tarreau
d1d5454180 REORG: split "protocols" files into protocol and listener
It was becoming confusing to have protocols and listeners in the same
files, split them.
2012-09-15 22:29:32 +02:00
Willy Tarreau
184636e3e7 BUG: tcp: close socket fd upon connect error
When the data layer fails to initialize (eg: out of memory for SSL), we
must close the socket fd we just allocated.
2012-09-06 14:04:41 +02:00
Willy Tarreau
dd2f85eb3b CLEANUP: includes: fix includes for a number of users of fd.h
It appears that fd.h includes a number of unneeded files and was
included from standard.h, and as such served as an intermediary
to provide almost everything to everyone.

By removing its useless includes, a long dependency chain broke
but could easily be fixed.
2012-09-03 20:49:14 +02:00
Willy Tarreau
40ff59d820 CLEANUP: fd: remove fdtab->flags
These flags were added for TCP_CORK. They were only set at various places
but never checked by any user since TCP_CORK was replaced with MSG_MORE.
Simply get rid of this now.
2012-09-03 20:49:14 +02:00
Willy Tarreau
15678efc45 MEDIUM: connection: add an ->init function to data layer
SSL need to initialize the data layer before proceeding with data. At
the moment, this data layer is automatically initialized from itself,
which will not be possible once we extract connection from sessions
since we'll only create the data layer once the handshake is finished.

So let's have the application layer initialize the data layer before
using it.
2012-09-03 20:47:34 +02:00
Willy Tarreau
64ee491309 MINOR: tcp: replace tcp_src_to_stktable_key with addr_to_stktable_key
Make it more obvious that this function does not depend on any knowledge
of the session. This is important to plan for TCP rules that can run on
connection without any initialized session yet.
2012-09-03 20:47:34 +02:00
Willy Tarreau
14f8e86da5 MEDIUM: proto_tcp: remove any dependence on stream_interface
The last uses of the stream interfaces were in tcp_connect_server() and
could easily and more appropriately be moved to its callers, si_connect()
and connect_server(), making a lot more sense.

Now the function should theorically be usable for health checks.

It also appears more obvious that the file is split into two distinct
parts :
  - the protocol layer used at the connection level
  - the tcp analysers executing tcp-* rules and their samples/acls.
2012-09-03 20:47:34 +02:00
Willy Tarreau
93b0f4f6c6 MEDIUM: stream_interface: remove CAP_SPLTCP/CAP_SPLICE flags
These ones are implicitly handled by the connection's data layer, no need
to rely on them anymore and reaching them maintains undesired dependences
on stream-interface.
2012-09-03 20:47:34 +02:00
Willy Tarreau
986a9d2d12 MAJOR: connection: move the addr field from the stream_interface
We need to have the source and destination addresses in the connection.
They were lying in the stream interface so let's move them. The flags
SI_FL_FROM_SET and SI_FL_TO_SET have been moved as well.

It's worth noting that tcp_connect_server() almost does not use the
stream interface anymore except for a few flags.

It has been identified that once we detach the connection from the SI,
it will probably be needed to keep a copy of the server-side addresses
in the SI just for logging purposes. This has not been implemented right
now though.
2012-09-03 20:47:34 +02:00
Willy Tarreau
3cefd521fa REORG: connection: move the target pointer from si to connection
The target is per connection and is directly used by the connection, so
we need it there. It's not needed anymore in the SI however.
2012-09-03 20:47:34 +02:00
Willy Tarreau
8263d2b259 CLEANUP: channel: use "channel" instead of "buffer" in function names
This is a massive rename of most functions which should make use of the
word "channel" instead of the word "buffer" in their names.

In concerns the following ones (new names) :

unsigned long long channel_forward(struct channel *buf, unsigned long long bytes);
static inline void channel_init(struct channel *buf)
static inline int channel_input_closed(struct channel *buf)
static inline int channel_output_closed(struct channel *buf)
static inline void channel_check_timeouts(struct channel *b)
static inline void channel_erase(struct channel *buf)
static inline void channel_shutr_now(struct channel *buf)
static inline void channel_shutw_now(struct channel *buf)
static inline void channel_abort(struct channel *buf)
static inline void channel_stop_hijacker(struct channel *buf)
static inline void channel_auto_connect(struct channel *buf)
static inline void channel_dont_connect(struct channel *buf)
static inline void channel_auto_close(struct channel *buf)
static inline void channel_dont_close(struct channel *buf)
static inline void channel_auto_read(struct channel *buf)
static inline void channel_dont_read(struct channel *buf)
unsigned long long channel_forward(struct channel *buf, unsigned long long bytes)

Some functions provided by channel.[ch] have kept their "buffer" name because
they are really designed to act on the buffer according to some information
gathered from the channel. They have been moved together to the same place in
the file for better readability but they were not changed at all.

The "buffer" memory pool was also renamed "channel".
2012-09-03 20:47:33 +02:00
Willy Tarreau
03cdb7c678 CLEANUP: channel: usr CF_/CHN_ prefixes instead of BF_/BUF_
Get rid of these confusing BF_* flags. Now channel naming should clearly
be used everywhere appropriate.

No code was changed, only a renaming was performed. The comments about
channel operations was updated.
2012-09-03 20:47:33 +02:00
Willy Tarreau
3bf1b2b816 MAJOR: channel: stop relying on BF_FULL to take action
This flag is quite complex to get right and updating it everywhere is a
major pain, especially since the buffer/channel split. This is the first
step of getting rid of it. Instead now it's dynamically computed whenever
needed.
2012-09-03 20:47:33 +02:00
Willy Tarreau
8e21bb9e52 MAJOR: channel: remove the BF_OUT_EMPTY flag
This flag was very problematic because it was composite in that both changes
to the pipe or to the buffer had to cause this flag to be updated, which is
not always simple (eg: there may not even be a channel attached to a buffer
at all).

There were not that many users of this flags, mostly setters. So the flag got
replaced with a macro which reports whether the channel is empty or not, by
checking both the pipe and the buffer.

One part of the change is sensible : the flag was also part of BF_MASK_STATIC,
which is used by process_session() to rescan all analysers in case the flag's
status changes. At first glance, none of the analysers seems to change its
mind base on this flag when it is subject to change, so it seems fine not to
add variation checks here. Otherwise it's possible that checking the buffer's
output size is more useful than checking the flag's replacement.
2012-09-03 20:47:32 +02:00
Willy Tarreau
c7e4238df0 REORG: buffers: split buffers into chunk,buffer,channel
Many parts of the channel definition still make use of the "buffer" word.
2012-09-03 20:47:32 +02:00
Willy Tarreau
96199b1016 MAJOR: stream-interface: restore splicing mechanism
The splicing is now provided by the data-layer rcv_pipe/snd_pipe functions
which in turn are called by the stream interface's recv and send callbacks.

The presence of the rcv_pipe/snd_pipe functions is used to attest support
for splicing at the data layer. It looks like the stream-interface's
SI_FL_CAP_SPLICE flag does not make sense anymore as it's used as a proxy
for the pointers above.

It also appears that we call chk_snd() from the recv callback and then
try to call it again in update_conn(). It is very likely that this last
function will progressively slip into the recv/send callbacks in order
to avoid duplicate check code.

The code works right now with and without splicing. Only raw_sock provides
support for it and it is automatically selected when the various splice
options are set. However it looks like splice-auto doesn't enable it, which
possibly means that the streamer detection code does not work anymore, or
that it's only called at a time where it's too late to enable splicing (in
process_session).
2012-09-03 20:47:31 +02:00
Willy Tarreau
75bf2c925f REORG: sock_raw: rename the files raw_sock*
The "raw_sock" prefix will be more convenient for naming functions as
it will be prefixed with the data layer and suffixed with the data
direction. So let's rename the files now to avoid any further confusion.

The #include directive was also removed from a number of files which do
not need it anymore.
2012-09-02 21:54:56 +02:00
Willy Tarreau
572bf9095d REORG/MAJOR: extract "struct buffer" from "struct channel"
At the moment, the struct is still embedded into the struct channel, but
all the functions have been updated to use struct buffer only when possible,
otherwise struct channel. Some functions would likely need to be splitted
between a buffer-layer primitive and a channel-layer function.

Later the buffer should become a pointer in the struct buffer, but doing so
requires a few changes to the buffer allocation calls.
2012-09-02 21:54:56 +02:00
Willy Tarreau
7421efb85f REORG/MAJOR: use "struct channel" instead of "struct buffer"
This is a massive rename. We'll then split channel and buffer.

This change needs a lot of cleanups. At many locations, the parameter
or variable is still called "buf" which will become ambiguous. Also,
the "struct channel" is still defined in buffers.h.
2012-09-02 21:54:55 +02:00
Willy Tarreau
afad0e0f80 MAJOR: make use of conn_{data|sock}_{poll|stop|want}* in connection handlers
This is a second attempt at getting rid of FD_WAIT_*. Now the situation is
much better since native I/O handlers can directly manipulate the FD using
fd_{poll|want|stop}_* and the connection handlers manipulate connection-level
flags using the conn_{data|sock}_* equivalent.

Proceeding this way ensures that the connection flags always reflect the
reality even after data<->handshake switches.
2012-09-02 21:53:12 +02:00
Willy Tarreau
f9dabecd03 MEDIUM: connection: make use of the new polling functions
Now the connection handler, the handshake callbacks and the I/O callbacks
make use of the connection-layer polling functions to enable or disable
polling on a file descriptor.

Some changes still need to be done to avoid using the FD_WAIT_* constants.
2012-09-02 21:53:11 +02:00
Willy Tarreau
49b046dddf MAJOR: fd: replace all EV_FD_* macros with new fd_*_* inline calls
These functions have a more explicity meaning and will offer provisions
for explicit polling.

EV_FD_ISSET() has been left for now as it is still in use in checks.
2012-09-02 21:53:11 +02:00
Willy Tarreau
0b0c097a3a MINOR: rearrange tcp_connect_probe() and fix wrong return codes
Sometimes we returned the need for polling while it was not needed. Remove
some of the spaghetti in the function.
2012-09-02 21:53:09 +02:00
Willy Tarreau
8f8c92fe93 MAJOR: connection: add a new CO_FL_CONNECTED flag
This new flag is used to indicate that the connection was already
connected. It can be used by I/O handlers to know that a connection
has just completed. It is used by stream_sock_update_conn(), allowing
the sock_opt handlers not to manipulate the SI timeout nor the
BF_WRITE_NULL flag anymore.
2012-09-02 21:53:09 +02:00
Willy Tarreau
3c55ec2020 MEDIUM: stream_interface: centralize the SI_FL_ERR management
It's better to have only stream_sock_update_conn() handle the conversion
of the CO_FL_ERROR flag to SI_FL_ERR than having it in each and every I/O
callback.
2012-09-02 21:53:09 +02:00
Willy Tarreau
239d7189fc MEDIUM: stream_interface: pass connection instead of fd in sock_ops
The sock_ops I/O callbacks made use of an FD till now. This has become
inappropriate and the struct connection is much more useful. It also
fixes the race condition introduced by previous change.
2012-09-02 21:53:08 +02:00
Willy Tarreau
fd31e53139 MAJOR: remove the stream interface and task management code from sock_*
The socket data layer code must only focus on moving data between a
socket and a buffer. We need a special stream interface handler to
update the stream interface and the file descriptor status.

At the moment the code works but suffers from a race condition caused
by its API : the read/write callbacks still make use of the fd instead
of using the connection. And when a double shutdown is performed, a call
to ->write() after ->read() processed an error results in dereferencing
a NULL fdtab[]->owner. This is only a temporary issue which doesn't need
to be fixed now since this will automatically go away when the functions
change to use the connection instead.
2012-09-02 21:53:08 +02:00
Willy Tarreau
076be25ab8 CLEANUP: remove the now unused fdtab direct I/O callbacks
They were all left to NULL since last commit so we can safely remove them
all now and remove the temporary dual polling logic in pollers.
2012-09-02 21:51:29 +02:00
Willy Tarreau
2da156fe5e MAJOR: tcp: remove the specific I/O callbacks for TCP connection probes
Use a single tcp_connect_probe() instead of tcp_connect_write() and
tcp_connect_read(). We call this one only when no data layer function
have been processed, so this is a fallback to test for completion of
a connection attempt.

With this done, we don't have the need for any direct I/O callback
anymore.

The function still relies on ->write() to wake the stream interface up,
so it's not finished.
2012-09-02 21:51:29 +02:00
Willy Tarreau
2c6be84b3a MEDIUM: connection: extract the send_proxy callback from proto_tcp
This handshake handler must be independant, so move it away from
proto_tcp. It has a dedicated connection flag. It is tested before
I/O handlers and automatically removes the CO_FL_WAIT_L4_CONN flag
upon success.

It also sets the BF_WRITE_NULL flag on the stream interface and
stops the SI timeout. However it does not perform the task_wakeup(),
and relies on the data handler to do so for now. The SI wakeup will
have to be moved elsewhere anyway.
2012-09-02 21:51:28 +02:00
Willy Tarreau
8018471f44 MINOR: fd: make fdtab->owner a connection and not a stream_interface anymore
It is more convenient with a connection here and will abstract stream_interface
more easily.
2012-09-02 21:51:28 +02:00
Willy Tarreau
d2274c6536 MAJOR: connection: replace direct I/O callbacks with the connection callback
Almost all direct I/O callbacks have been changed to use the connection
callback instead. Only the TCP connection validation remains.
2012-09-02 21:51:28 +02:00
Willy Tarreau
aece46a44d MEDIUM: protocols: use the generic I/O callback for accept callbacks
This one is used only on read events, and it was easy to convert to
use the new I/O callback.
2012-09-02 21:51:27 +02:00
Willy Tarreau
4e6049e553 MINOR: fd: add a new I/O handler to fdtab
This one will eventually replace both cb[] handlers. At the moment it
is not used yet.
2012-09-02 21:51:27 +02:00
Willy Tarreau
505e34a36d MAJOR: get rid of fdtab[].state and use connection->flags instead
fdtab[].state was only used to know whether a connection was in progress
or an error was encountered. Instead we now use connection->flags to store
a flag for both. This way, connection management will be able to update the
connection status on I/O.
2012-09-02 21:51:26 +02:00
Willy Tarreau
ed8f614078 REORG/MEDIUM: fd: get rid of FD_STLISTEN
This state was only used so that ev_sepoll did not match FD_STERROR, which
changed in previous patch. We can now safely remove this state.
2012-09-02 21:51:25 +02:00
David du Colombier
65c1796c4a MINOR: IPv6 support for transparent proxy
Set socket option IPV6_TRANSPARENT on binding
to enable transparent proxy on IPv6.
This option is available from Linux 2.6.37.
2012-07-31 07:53:42 +02:00
Willy Tarreau
96596aeead MEDIUM: fd/si: move peeraddr from struct fdinfo to struct connection
The destination address is purely a connection thing and not an fd thing.
It's also likely that later the address will be stored into the connection
and linked to by the SI.

struct fdinfo only keeps the pointer to the port range and the local port
for now. All of this also needs to move to the connection but before this
the release of the port range must move from fd_delete() to a new function
dedicated to the connection.
2012-06-08 22:59:52 +02:00
Willy Tarreau
fb7508aefb REORG/MINOR: stream_interface: move si->fd to struct connection
The socket fd is used only when in socket mode and with a connection.
2012-05-21 16:47:54 +02:00
Willy Tarreau
73b013b070 MINOR: stream_interface: introduce a new "struct connection" type
We start to move everything needed to manage a connection to a special
entity "struct connection". We have the data layer operations and the
control operations there. We'll also have more info in the future such
as file descriptors and applet contexts, so that in the end it becomes
detachable from the stream interface, which will allow connections to
be reused between sessions.

For now on, we start with minimal changes.
2012-05-21 16:31:45 +02:00
Willy Tarreau
a190d591fc REORG: move the send-proxy code to tcp_connect_write()
It is much better and more efficient to consider that the send-proxy
feature is part of the protocol layer than part of the data layer.
Now the connection is considered established once the send-proxy line
has been sent.

This way the data layer doesn't have to care anymore about this specific
part.

The tcp_connect_write() function now automatically calls the data layer
write() function once the connection is established, which saves calls
to epoll_ctl/epoll_wait/process_session.

It's starting to look more and more obvious that tcp_connect_read() and
tcp_connect_write() are not TCP-specific but only socket-specific and as
such should probably move, along with some functions from protocol.c, to
a socket-specific file (eg: stream_sock).

It would be nice to be able to support autonomous listeners to parse the
proxy protocol before accepting a connection, so that we get rid of it
at the session layer and to support using these informations in the
tcp-request connection rules.
2012-05-20 18:35:19 +02:00
Willy Tarreau
8ae52cb144 BUG/MINOR: stop connect timeout when connect succeeds
If the connect succeeds exactly at the same millisecond as the connect
timeout is supposed to strike, the timeout is still considered while
data may have already be sent. This results in a new connection attempt
with no data and with the response being lost.

Note that in practice the only real-world situation where this is observed
is when connect timeouts are extremely low, too low for safe operations.
This bug was encountered with a 1ms connect timeout.

It is also present on 1.4 and needs to be fixed there too.
2012-05-20 10:38:46 +02:00
Willy Tarreau
be0688c64d MEDIUM: stream_interface: remove the si->init
Calling the init() function in sess_establish was a bad idea, it is
too late to allow it to fail on lack of resource and does not help at
all. Remove it for now before it's used.
2012-05-18 15:15:26 +02:00
Willy Tarreau
b147a8382a CLEANUP: fd: remove unused cb->b pointers in the struct fdtab
These pointers were used to hold pointers to buffers in the past, but
since we introduced the stream interface, they're no longer used but
they were still sometimes set.

Removing them shrink the struct fdtab from 32 to 24 bytes on 32-bit machines,
and from 52 to 36 bytes on 64-bit machines, which is a significant saving. A
quick tests shows a steady 0.5% performance gain, probably due to the better
cache efficiency.
2012-05-13 00:35:44 +02:00
Willy Tarreau
eeda90e68c MAJOR: fd: remove the need for the socket layer to recheck the connection
Up to now, if an outgoing connection had no data to send, the socket layer
had to perform a connect() again to check for establishment. This is not
acceptable for SSL, and will cause problems with socketpair(). Some socket
layers will also need an initializer before sending data (eg: SSL).

The solution consists in moving the connect() test to the protocol layer
(eg: TCP) and to make it hold the fd->write callback until the connection
is validated. At this point, it will switch the write callback to the
socket layer's write function. In fact we need to hold both read and write
callbacks to ensure the socket layer is never called before being initialized.

This intermediate callback is used only if there is a socket init function
or if there are no data to send.

The socket layer does not have any code to check for connection establishment
anymore, which makes sense.
2012-05-11 20:18:26 +02:00
Willy Tarreau
59b9479667 BUG/MEDIUM: stream_interface: restore get_src/get_dst
Commit e164e7a removed get_src/get_dst setting in the stream interfaces but
forgot to set it in proto_tcp. Get the feature back because we need it for
logging, transparent mode, ACLs etc... We now rely on the stream interface
direction to know what syscall to use.

One benefit of doing it this way is that we don't use getsockopt() anymore
on outgoing stream interfaces nor on UNIX sockets.
2012-05-11 16:48:10 +02:00
Willy Tarreau
c63190d429 REORG: use the name sock_raw instead of stream_sock
We'll soon have an SSL socket layer, and in order to ease the difference
between the two, we use the name "sock_raw" to designate the one which
directly talks to the sockets without any conversion.
2012-05-11 14:23:52 +02:00
Willy Tarreau
0a3dd74c9c MEDIUM: cfgparse: use the new error reporting framework for remaining cfg_keywords
All keywords registered using a cfg_kw_list now make use of the new error reporting
framework. This allows easier and more precise error reporting without having to
deal with complex buffer allocation issues.
2012-05-08 21:28:17 +02:00