Commit Graph

631 Commits

Author SHA1 Message Date
Willy Tarreau
086735a688 BUG/MINOR: tasks: make sure wakeup events are properly reported to subscribers
The tasks API was changed in 1.9-dev1 with commit 9f6af3322 ("MINOR: tasks:
Change the task API so that the callback takes 3 arguments."), causing the
task's state not to be usable anymore and to have been replaced with an
explicit argument in the callee. The task's state doesn't contain any trace
of the wakeup cause anymore. But there were two places where the old task's
state remained in use :
  - sessions, used to more accurately report timeouts in logs when seeing
    TASK_WOKEN_TIMEOUT ;
  - peers, used to finish resynchronization when seeing TASK_WOKEN_SIGNAL

This commit fixes both occurrences by making sure we don't access task->state
directly (should we rename it by the way ?).

No backport is needed.
2018-11-05 17:15:21 +01:00
Willy Tarreau
35b51c6e5b REORG: http: move the HTTP semantics definitions to http.h/http.c
It's a bit painful to have to deal with HTTP semantics for each protocol
version (H1 and H2), and working on the version-agnostic code further
emphasizes the problem.

This patch creates http.h and http.c which are agnostic to the version
in use, and which borrow a few parts from proto_http and from h1. For
example the once thought h1-specific h1_char_classes array is in fact
dictated by RFC7231 and is used to parse HTTP headers. A few changes
were made to a few files which were including proto_http.h while they
only needed http.h.

Certain string definitions pre-dated the introduction of indirect
strings (ist) so some were used to simplify the definition of the known
HTTP methods. The current lookup code saves 2 kB of a heavily used table
and is faster than the previous table based lookup (typ. 14 ns vs 16
before).
2018-09-11 10:30:25 +02:00
Willy Tarreau
be373150c7 MINOR: connection: make the initialization more consistent
Sometimes a connection is prepared before the target is set, sometimes
after. There's no real rule since the few functions involved operate on
different and independent fields. Soon we'll benefit from knowing the
target at the connection layer, in order to figure the associated proxy
and retrieve the various parameters (timeouts etc). This patch slightly
reorders a few calls to conn_prepare() so that we can make sure that the
target is always known to the mux.
2018-09-06 11:45:30 +02:00
Willy Tarreau
590a0514f2 BUG/MEDIUM: session: fix reporting of handshake processing time in the logs
The handshake processing time used to be stored per stream, which was
valid when there was exactly one stream per session. With H2 and
multiplexing it's not the case anymore and the reported handshake times
are wrong in the logs as it's computed between the TCP accept() and the
stream creation. Let's first move the handshake where it belongs, which
is the session.

However, this is not enough because we don't want to report an excessive
idle time either for H2 (since many requests use the connection).

So the solution used here is to have the stream retrieve sess->tv_accept
and the handshake duration when the stream is created, and let the mux
immediately reset them. This way, the handshake time becomes zero for the
second and subsequent requests in H2 (which was already the case in H1),
and the idle time exactly counts how long the connection remained unused
while it could be used, so in H1 it runs from the end of the previous
response and in H2 it runs from the end of the previous request since the
channel is already available.

This patch will need to be backported to 1.8.
2018-09-05 16:30:23 +02:00
Olivier Houchard
fde2a09a15 BUG/MEDIUM: sessions: Don't use t->state.
In session_expire_embryonic(), don't use t->state, use the "state" argument
instead, as t->state has been cleaned before we're being called.
2018-08-16 19:25:56 +02:00
Christopher Faulet
7ce0c891ab MEDIUM: mux: Use the mux protocol specified on bind/server lines
To do so, mux choices are split to handle incoming and outgoing connections in a
different way. The protocol specified on the bind/server line is used in
priority. Then, for frontend connections, the ALPN is retrieved and used to
choose the best mux. For backend connection, there is no ALPN. Finaly, if no
protocol is specified and no protocol matches the ALPN, we fall back on a
default mux, choosing in priority the first mux with exactly the same mode.
2018-08-08 10:42:08 +02:00
Willy Tarreau
83061a820e MAJOR: chunks: replace struct chunk with struct buffer
Now all the code used to manipulate chunks uses a struct buffer instead.
The functions are still called "chunk*", and some of them will progressively
move to the generic buffer handling code as they are cleaned up.
2018-07-19 16:23:43 +02:00
Willy Tarreau
843b7cbe9d MEDIUM: chunks: make the chunk struct's fields match the buffer struct
Chunks are only a subset of a buffer (a non-wrapping version with no head
offset). Despite this we still carry a lot of duplicated code between
buffers and chunks. Replacing chunks with buffers would significantly
reduce the maintenance efforts. This first patch renames the chunk's
fields to match the name and types used by struct buffers, with the goal
of isolating the code changes from the declaration changes.

Most of the changes were made with spatch using this coccinelle script :

  @rule_d1@
  typedef chunk;
  struct chunk chunk;
  @@
  - chunk.str
  + chunk.area

  @rule_d2@
  typedef chunk;
  struct chunk chunk;
  @@
  - chunk.len
  + chunk.data

  @rule_i1@
  typedef chunk;
  struct chunk *chunk;
  @@
  - chunk->str
  + chunk->area

  @rule_i2@
  typedef chunk;
  struct chunk *chunk;
  @@
  - chunk->len
  + chunk->data

Some minor updates to 3 http functions had to be performed to take size_t
ints instead of ints in order to match the unsigned length here.
2018-07-19 16:23:43 +02:00
Dave Chiluk
8618a6a5e2 MINOR: Some spelling cleanup in the comments.
Signed-off-by: Dave Chiluk <chiluk+haproxy@indeed.com>
2018-06-21 20:43:52 +02:00
Olivier Houchard
9f6af33222 MINOR: tasks: Change the task API so that the callback takes 3 arguments.
In preparation for thread-specific runqueues, change the task API so that
the callback takes 3 arguments, the task itself, the context, and the state,
those were retrieved from the task before. This will allow these elements to
change atomically in the scheduler while the application uses the copied
value, and even to have NULL tasks later.
2018-05-26 19:23:57 +02:00
Christopher Faulet
fe234281d6 BUG/MINOR: listener: Don't decrease actconn twice when a new session is rejected
When a freshly created session is rejected, for any reason, during the accept in
the function "session_accept_fd", the variable "actconn" is decreased twice. The
first time when the rejected session is released, then in the function
"listener_accpect", because of the failure. So it is possible to have an
negative value for actconn. Note that, in this case, we will also have a negatve
value for the current number of connections on the listener rejecting the
session (actconn and l->nbconn are in/decreased in same time).

It is easy to reproduce the bug with this small configuration:

  global
      stats socket /tmp/haproxy

  listen test
      bind *:12345
      tcp-request connection reject if TRUE

A "show info" on the stat socket, after a connection attempt, will show a very
high value (the unsigned representation of -1).

To fix the bug, if the function "session_accept_fd" returns an error, it
decrements the right counters and "listener_accpect" leaves them untouched.

This patch must be backported in 1.8.
2018-03-23 16:21:50 +01:00
Emeric Brun
1738e86771 BUG/MINOR: session: Fix tcp-request session failure if handshake.
Some sample fetches check if session is established using
the flag CO_FL_CONNECTED. But in some cases, when a handshake
is performed this flag is set too late, after the process
of the tcp-request session rules.

This fix move the raising of the flag at the beginning of the
conn_complete_session function which processes the tcp-request
session rules.

This fix must be backported to 1.8 (and perhaps 1.7)
2018-03-06 14:04:45 +01:00
Willy Tarreau
bafbe01028 CLEANUP: pools: rename all pool functions and pointers to remove this "2"
During the migration to the second version of the pools, the new
functions and pool pointers were all called "pool_something2()" and
"pool2_something". Now there's no more pool v1 code and it's a real
pain to still have to deal with this. Let's clean this up now by
removing the "2" everywhere, and by renaming the pool heads
"pool_head_something".
2017-11-24 17:49:53 +01:00
Willy Tarreau
3e13cbafe2 MEDIUM: session: make use of the connection's destroy callback
Now we don't remove the session when a stream dies, instead we
detach the stream and let the mux decide to release the connection
and call session_free() instead.
2017-10-31 18:03:24 +01:00
Willy Tarreau
4f0c64cad7 MINOR: session: release the listener with the session, not the stream
Since multiple streams can share one session attached to one listener,
the listener_release() call must be done in session_free() and not in
stream_free(), otherwise we end up with a negative count in H2.
2017-10-31 18:03:24 +01:00
Willy Tarreau
436d333124 MEDIUM: connection: add a destroy callback
This callback will be used to release upper layers when a mux is in
use. Given that the mux can be asynchronously deleted, we need a way
to release the extra information such as the session.

This callback will be called directly by the mux upon releasing
everything and before the connection itself is released, so that
the callee can find its information inside the connection if needed.

The way it currently works is not perfect, and most likely this should
instead become a mux release callback, but for now we have no easy way
to add mux-specific stuff, and since there's one mux per connection,
it works fine this way.
2017-10-31 18:03:24 +01:00
Willy Tarreau
2e0b2b5f83 MEDIUM: session: use the ALPN token and proxy mode to select the mux
When an incoming connection is made on an HTTP mode frontend, the
session now looks up the mux to use based on the ALPN token and the
proxy mode. This will allow easier mux registration, and we don't
need to hard-code the mux_pt_ops anymore.
2017-10-31 18:03:23 +01:00
Willy Tarreau
53a4766e40 MEDIUM: connection: start to introduce a mux layer between xprt and data
For HTTP/2 and QUIC, we'll need to deal with multiplexed streams inside
a connection. After quite a long brainstorming, it appears that the
connection interface to the existing streams is appropriate just like
the connection interface to the lower layers. In fact we need to have
the mux layer in the middle of the connection, between the transport
and the data layer.

A mux can exist on two directions/sides. On the inbound direction, it
instanciates new streams from incoming connections, while on the outbound
direction it muxes streams into outgoing connections. The difference is
visible on the mux->init() call : in one case, an upper context is already
known (outgoing connection), and in the other case, the upper context is
not yet known (incoming connection) and will have to be allocated by the
mux. The session doesn't have to create the new streams anymore, as this
is performed by the mux itself.

This patch introduces this and creates a pass-through mux called
"mux_pt" which is used for all new connections and which only
calls the data layer's recv,send,wake() calls. One incoming stream
is immediately created when init() is called on the inbound direction.
There should not be any visible impact.

Note that the connection's mux is purposely not set until the session
is completed so that we don't accidently run with the wrong mux. This
must not cause any issue as the xprt_done_cb function is always called
prior to using mux's recv/send functions.
2017-10-31 18:03:23 +01:00
Christopher Faulet
ff8abcd31d MEDIUM: threads/proxy: Add a lock per proxy and atomically update proxy vars
Now, each proxy contains a lock that must be used when necessary to protect
it. Moreover, all proxy's counters are now updated using atomic operations.
2017-10-31 13:58:30 +01:00
Christopher Faulet
8d8aa0d681 MEDIUM: threads/listeners: Make listeners thread-safe
First, we use atomic operations to update jobs/totalconn/actconn variables,
listener's nbconn variable and listener's counters. Then we add a lock on
listeners to protect access to their information. And finally, listener queues
(global and per proxy) are also protected by a lock. Here, because access to
these queues are unusal, we use the same lock for all queues instead of a global
one for the global queue and a lock per proxy for others.
2017-10-31 13:58:30 +01:00
Emeric Brun
c60def8368 MAJOR: threads/task: handle multithread on task scheduler
2 global locks have been added to protect, respectively, the run queue and the
wait queue. And a process mask has been added on each task. Like for FDs, this
mask is used to know which threads are allowed to process a task.

For many tasks, all threads are granted. And this must be your first intension
when you create a new task, else you have a good reason to make a task sticky on
some threads. This is then the responsibility to the process callback to lock
what have to be locked in the task context.

Nevertheless, all tasks linked to a session must be sticky on the thread
creating the session. It is important that I/O handlers processing session FDs
and these tasks run on the same thread to avoid conflicts.
2017-10-31 13:58:30 +01:00
Olivier Houchard
c2aae74f01 MEDIUM: ssl: Handle early data with OpenSSL 1.1.1
When compiled with Openssl >= 1.1.1, before attempting to do the handshake,
try to read any early data. If any early data is present, then we'll create
the session, read the data, and handle the request before we're doing the
handshake.

For this, we add a new connection flag, CO_FL_EARLY_SSL_HS, which is not
part of the CO_FL_HANDSHAKE set, allowing to proceed with a session even
before an SSL handshake is completed.

As early data do have security implication, we let the origin server know
the request comes from early data by adding the "Early-Data" header, as
specified in this draft from the HTTP working group :

    https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-replay
2017-10-27 10:54:05 +02:00
Willy Tarreau
5b78a9dd04 MINOR: session: use conn_full_close() instead of conn_force_close()
We simply disable tracking before calling it.
2017-10-22 09:54:17 +02:00
Olivier Houchard
1a0545f3d7 REORG: connection: rename CO_FL_DATA_* -> CO_FL_XPRT_*
These flags are not exactly for the data layer, they instead indicate
what is expected from the transport layer. Since we're going to split
the connection between the transport and the data layers to insert a
mux layer, it's important to have a clear idea of what each layer does.

All function conn_data_* used to manipulate these flags were renamed to
conn_xprt_*.
2017-10-22 09:54:15 +02:00
Willy Tarreau
bf08beb2a3 MINOR: session: remove the list of streams from struct session
Commit bcb86ab ("MINOR: session: add a streams field to the session
struct") added this list of streams that is not needed anymore. Let's
get rid of it now.
2017-10-08 22:32:05 +02:00
Willy Tarreau
0bf6fa5e40 MEDIUM: session: count the frontend's connections at a single place
There are several places where we see feconn++, feconn--, totalconn++ and
an increment on the frontend's number of connections and connection rate.
This is done exactly once per session in each direction, so better take
care of this counter in the session and simplify the callers. At least it
ensures a better symmetry. It also ensures consistency as till now the
lua/spoe/peers frontend didn't have these counters properly set, which can
be useful at least for troubleshooting.
2017-09-15 11:49:52 +02:00
Willy Tarreau
0c4ed35225 MEDIUM: session: factor out duplicated code for conn_complete_session
session_accept_fd() may either successfully complete a session creation,
or defer it to conn_complete_session() depending of whether a handshake
remains to be performed or not. The problem is that all the code after
the handshake was duplicated between the two functions.

This patch make session_accept_fd() synchronously call
conn_complete_session() to finish the session creation. It is only needed
to check if the session's task has to be released or not at the end, which
is fairly minimal. This way there is now a single place where the sessions
are created.
2017-09-15 11:49:52 +02:00
Willy Tarreau
eaa7e44ad7 MINOR: session: small cleanup of conn_complete_session()
Commit 8e3c6ce ("MEDIUM: connection: get rid of data->init() which was
not for data") simplified conn_complete_session() but introduced a
confusing check which cannot happen on CO_FL_HANDSHAKE. Make it clear
that this call is final and will either succeed and complete the
session or fail.
2017-09-15 11:49:52 +02:00
Willy Tarreau
05f5047d40 MINOR: listener: new function listener_release
Instead of duplicating some sensitive listener-specific code in the
session and in the stream code, let's call listener_release() when
releasing a connection attached to a listener.
2017-09-15 11:49:52 +02:00
Willy Tarreau
6f5e4b98df MEDIUM: session: take care of incrementing/decrementing jobs
Each user of a session increments/decrements the jobs variable at its
own place, resulting in a real mess and inconsistencies between them.
Let's have session_new() increment jobs and session_free() decrement
it.
2017-09-15 11:49:52 +02:00
Willy Tarreau
5790eb0a76 MINOR: stream: provide a new stream creation function for connections
The purpose will be to create new streams for a given connection so
that we can later abstract this from a mux.
2017-08-30 07:06:39 +02:00
Willy Tarreau
0b74eae1f1 MEDIUM: session: add a pointer to a struct task in the session
The session may need to enforce a timeout when waiting for a handshake.
Till now we used a trick to avoid allocating a pointer, we used to set
the connection's owner to the task and set the task's context to the
session, so that it was possible to circle between all of them. The
problem is that we'll really need to pass the pointer to the session
to the upper layers during initialization and that the only place to
store it is conn->owner, which is squatted for this trick.

So this patch moves the struct task* into the session where it should
always have been and ensures conn->owner points to the session until
the data layer is properly initialized.
2017-08-30 07:05:49 +02:00
Willy Tarreau
87787acf72 MEDIUM: stream: make stream_new() allocate its own task
Currently a task is allocated in session_new() and serves two purposes :
  - either the handshake is complete and it is offered to the stream via
    the second arg of stream_new()

  - or the handshake is not complete and it's diverted to be used as a
    timeout handler for the embryonic session and repurposed once we land
    into conn_complete_session()

Furthermore, the task's process() function was taken from the listener's
handler in conn_complete_session() prior to being replaced by a call to
stream_new(). This will become a serious mess with the mux.

Since it's impossible to have a stream without a task, this patch removes
the second arg from stream_new() and make this function allocate its own
task. In session_accept_fd(), we now only allocate the task if needed for
the embryonic session and delete it later.
2017-08-30 07:05:04 +02:00
Willy Tarreau
8e3c6ce75a MEDIUM: connection: get rid of data->init() which was not for data
The ->init() callback of the connection's data layer was only used to
complete the session's initialisation since sessions and streams were
split apart in 1.6. The problem is that it creates a big confusion in
the layers' roles as the session has to register a dummy data layer
when waiting for a handshake to complete, then hand it off to the
stream which will replace it.

The real need is to notify that the transport has finished initializing.
This should enable a better splitting between these layers.

This patch thus introduces a connection-specific callback called
xprt_done_cb() which informs about handshake successes or failures. With
this, data->init() can disappear, CO_FL_INIT_DATA as well, and we don't
need to register a dummy data->wake() callback to be notified of errors.
2017-08-30 07:04:04 +02:00
Willy Tarreau
585744bf2e REORG/MEDIUM: connection: introduce the notion of connection handle
Till now connections used to rely exclusively on file descriptors. It
was planned in the past that alternative solutions would be implemented,
leading to member "union t" presenting sock.fd only for now.

With QUIC, the connection will need to continue to exist but will not
rely on a file descriptor but a connection ID.

So this patch introduces a "connection handle" which is either a file
descriptor or a connection ID, to replace the existing "union t". We've
now removed the intermediate "struct sock" which was never used. There
is no functional change at all, though the struct connection was inflated
by 32 bits on 64-bit platforms due to alignment.
2017-08-24 19:30:04 +02:00
Willy Tarreau
f92a73d2fc MEDIUM: session: do not free a session until no stream references it
We now refrain from clearing a session's variables, counters, and from
releasing it as long as at least one stream references it. For now it
never happens but with H2 this will be mandatory to avoid double frees.
2017-08-18 13:26:35 +02:00
Willy Tarreau
bcb86abaca MINOR: session: add a streams field to the session struct
This will be used to hold the list of streams belonging to a given session.
2017-08-18 13:26:35 +02:00
Willy Tarreau
9b82d941c5 MEDIUM: stream: make stream_new() always set the target and analysers
It doesn't make sense that stream_new() doesn't sets the target nor
analysers and that the caller has to do it even if it doesn't know
about streams (eg: in session_accept_fd()). This causes trouble for
H2 where the applet handling the protocol cannot properly change
these information during its init phase.

Let's ensure it's always set and that the callers don't set it anymore.

Note: peers and lua don't use analysers and that's properly handled.
2017-06-27 14:38:02 +02:00
Emeric Brun
5f77fef34e MINOR: task/stream: tasks related to a stream must be init by the caller.
The task_wakeup was called on stream_new, but the task/stream
wasn't fully initialized yet. The task_wakeup must be called
explicitly by the caller once the task/stream is initialized.
2017-06-27 14:38:02 +02:00
Willy Tarreau
de40d798de CLEANUP: connection: completely remove CO_FL_WAKE_DATA
Since it's only set and never tested anymore, let's remove it.
2017-03-19 12:18:27 +01:00
Willy Tarreau
a261e9b094 CLEANUP: connection: remove all direct references to raw_sock and ssl_sock
Now we exclusively use xprt_get(XPRT_RAW) instead of &raw_sock or
xprt_get(XPRT_SSL) for &ssl_sock. This removes a bunch of #ifdef and
include spread over a number of location including backend, cfgparse,
checks, cli, hlua, log, server and session.
2016-12-22 23:26:38 +01:00
Willy Tarreau
c95bad5013 MEDIUM: move listener->frontend to bind_conf->frontend
Historically, all listeners have a pointer to the frontend. But since
the introduction of SSL, we now have an intermediary layer called
bind_conf corresponding to a "bind" line. It makes no sense to have
the frontend on each listener given that it's the same for all
listeners belonging to a same bind_conf. Also certain parts like
SSL can only operate on bind_conf and need the frontend.

This patch fixes this by moving the frontend pointer from the listener
to the bind_conf. The extra indirection is quite cheap given and the
places were this is used are very scarce.
2016-12-22 23:26:38 +01:00
Willy Tarreau
71a8c7c49e MINOR: listener: move the transport layer pointer to the bind_conf
A mistake was made when the socket layer was cut into proto and
transport, the transport was attached to the listener while all
listeners in a single "bind" line always have exactly the same
transport. It doesn't seem obvious but this is the reason why there
are so many #ifdefs USE_OPENSSL in cfgparse : a lot of operations
have to be open-coded because cfgparse only manipulates bind_conf
and we don't have the information of the transport layer here.

Very little code makes use of the transport layer, mainly session
setup and log. These places can afford an extra pointer indirection
(the listener points to the bind_conf). This change is thus very small,
it saves a little bit of memory (8B per listener) and makes the code
more flexible.
2016-12-22 23:26:37 +01:00
Willy Tarreau
92b10c954d BUG/MAJOR: stream: fix session abort on resource shortage
In 1.6-dev2, commit 32990b5 ("MEDIUM: session: remove the task pointer
from the session") introduced a bug which can sometimes crash the process
on resource shortage. When stream_complete() returns -1, it has already
reattached the connection to the stream, then kill_mini_session() is
called and still expects to find the task in conn->owner. Note that
since this commit, the code has moved a bit and is now in stream_new()
but the problem remains the same.

Given that we already know the task around these places, let's simply
pass the task to kill_mini_session().

The conditions currently at risk are :
  - failure to initialize filters for the new stream (lack of memory or
    any filter returning < 0 on attach())
  - failure to attach filters (any filter returning < 0 on stream_start())
  - frontend's accept() returning < 0 (allocation failure)

This fix is needed in 1.7 and 1.6.
2016-12-04 20:16:52 +01:00
Willy Tarreau
397131093f REORG: tcp-rules: move tcp rules processing to their own file
There's no more reason to keep tcp rules processing inside proto_tcp.c
given that there is nothing in common there except these 3 letters : tcp.
The tcp rules are in fact connection, session and content processing rules.
Let's move them to "tcp-rules" and let them live their life there.
2016-11-25 15:57:38 +01:00
Willy Tarreau
8e0bb0ae16 MINOR: connection: add names for transport and data layers
This makes debugging easier and avoids having to put ugly checks
against certain well-known internal struct pointers.
2016-11-24 16:58:12 +01:00
Willy Tarreau
620408f406 MEDIUM: tcp: add registration and processing of TCP L5 rules
This commit introduces "tcp-request session" rules. These are very
much like "tcp-request connection" rules except that they're processed
after the handshake, so it is possible to consider SSL information and
addresses rewritten by the proxy protocol header in actions. This is
particularly useful to track proxied sources as this was not possible
before, given that tcp-request content rules are processed after each
HTTP request. Similarly it is possible to assign the proxied source
address or the client's cert to a variable.
2016-10-21 18:19:24 +02:00
Willy Tarreau
7d9736fb5d CLEANUP: tcp rules: mention everywhere that tcp-conn rules are L4
This is in order to make integration of tcp-request-session cleaner :
- tcp_exec_req_rules() was renamed tcp_exec_l4_rules()
- LI_O_TCP_RULES was renamed LI_O_TCP_L4_RULES
  (LI_O_*'s horrible indent was also fixed and a provision was left
   for L5 rules).
2016-10-21 18:19:24 +02:00
Bertrand Jacquin
93b227db95 MINOR: listener: add the "accept-netscaler-cip" option to the "bind" keyword
When NetScaler application switch is used as L3+ switch, informations
regarding the original IP and TCP headers are lost as a new TCP
connection is created between the NetScaler and the backend server.

NetScaler provides a feature to insert in the TCP data the original data
that can then be consumed by the backend server.

Specifications and documentations from NetScaler:
  https://support.citrix.com/article/CTX205670
  https://www.citrix.com/blogs/2016/04/25/how-to-enable-client-ip-in-tcpip-option-of-netscaler/

When CIP is enabled on the NetScaler, then a TCP packet is inserted just after
the TCP handshake. This is composed as:

  - CIP magic number : 4 bytes
    Both sender and receiver have to agree on a magic number so that
    they both handle the incoming data as a NetScaler Client IP insertion
    packet.

  - Header length : 4 bytes
    Defines the length on the remaining data.

  - IP header : >= 20 bytes if IPv4, 40 bytes if IPv6
    Contains the header of the last IP packet sent by the client during TCP
    handshake.

  - TCP header : >= 20 bytes
    Contains the header of the last TCP packet sent by the client during TCP
    handshake.
2016-06-20 23:02:47 +02:00
Christopher Faulet
d7c9196ae5 MAJOR: filters: Add filters support
This patch adds the support of filters in HAProxy. The main idea is to have a
way to "easely" extend HAProxy by adding some "modules", called filters, that
will be able to change HAProxy behavior in a programmatic way.

To do so, many entry points has been added in code to let filters to hook up to
different steps of the processing. A filter must define a flt_ops sutrctures
(see include/types/filters.h for details). This structure contains all available
callbacks that a filter can define:

struct flt_ops {
       /*
        * Callbacks to manage the filter lifecycle
        */
       int  (*init)  (struct proxy *p);
       void (*deinit)(struct proxy *p);
       int  (*check) (struct proxy *p);

        /*
         * Stream callbacks
         */
        void (*stream_start)     (struct stream *s);
        void (*stream_accept)    (struct stream *s);
        void (*session_establish)(struct stream *s);
        void (*stream_stop)      (struct stream *s);

       /*
        * HTTP callbacks
        */
       int  (*http_start)         (struct stream *s, struct http_msg *msg);
       int  (*http_start_body)    (struct stream *s, struct http_msg *msg);
       int  (*http_start_chunk)   (struct stream *s, struct http_msg *msg);
       int  (*http_data)          (struct stream *s, struct http_msg *msg);
       int  (*http_last_chunk)    (struct stream *s, struct http_msg *msg);
       int  (*http_end_chunk)     (struct stream *s, struct http_msg *msg);
       int  (*http_chunk_trailers)(struct stream *s, struct http_msg *msg);
       int  (*http_end_body)      (struct stream *s, struct http_msg *msg);
       void (*http_end)           (struct stream *s, struct http_msg *msg);
       void (*http_reset)         (struct stream *s, struct http_msg *msg);
       int  (*http_pre_process)   (struct stream *s, struct http_msg *msg);
       int  (*http_post_process)  (struct stream *s, struct http_msg *msg);
       void (*http_reply)         (struct stream *s, short status,
                                   const struct chunk *msg);
};

To declare and use a filter, in the configuration, the "filter" keyword must be
used in a listener/frontend section:

  frontend test
    ...
    filter <FILTER-NAME> [OPTIONS...]

The filter referenced by the <FILTER-NAME> must declare a configuration parser
on its own name to fill flt_ops and filter_conf field in the proxy's
structure. An exemple will be provided later to make it perfectly clear.

For now, filters cannot be used in backend section. But this is only a matter of
time. Documentation will also be added later. This is the first commit of a long
list about filters.

It is possible to have several filters on the same listener/frontend. These
filters are stored in an array of at most MAX_FILTERS elements (define in
include/types/filters.h). Again, this will be replaced later by a list of
filters.

The filter API has been highly refactored. Main changes are:

* Now, HA supports an infinite number of filters per proxy. To do so, filters
  are stored in list.

* Because filters are stored in list, filters state has been moved from the
  channel structure to the filter structure. This is cleaner because there is no
  more info about filters in channel structure.

* It is possible to defined filters on backends only. For such filters,
  stream_start/stream_stop callbacks are not called. Of course, it is possible
  to mix frontend and backend filters.

* Now, TCP streams are also filtered. All callbacks without the 'http_' prefix
  are called for all kind of streams. In addition, 2 new callbacks were added to
  filter data exchanged through a TCP stream:

    - tcp_data: it is called when new data are available or when old unprocessed
      data are still waiting.

    - tcp_forward_data: it is called when some data can be consumed.

* New callbacks attached to channel were added:

    - channel_start_analyze: it is called when a filter is ready to process data
      exchanged through a channel. 2 new analyzers (a frontend and a backend)
      are attached to channels to call this callback. For a frontend filter, it
      is called before any other analyzer. For a backend filter, it is called
      when a backend is attached to a stream. So some processing cannot be
      filtered in that case.

    - channel_analyze: it is called before each analyzer attached to a channel,
      expects analyzers responsible for data sending.

    - channel_end_analyze: it is called when all other analyzers have finished
      their processing. A new analyzers is attached to channels to call this
      callback. For a TCP stream, this is always the last one called. For a HTTP
      one, the callback is called when a request/response ends, so it is called
      one time for each request/response.

* 'session_established' callback has been removed. Everything that is done in
  this callback can be handled by 'channel_start_analyze' on the response
  channel.

* 'http_pre_process' and 'http_post_process' callbacks have been replaced by
  'channel_analyze'.

* 'http_start' callback has been replaced by 'http_headers'. This new one is
  called just before headers sending and parsing of the body.

* 'http_end' callback has been replaced by 'channel_end_analyze'.

* It is possible to set a forwarder for TCP channels. It was already possible to
  do it for HTTP ones.

* Forwarders can partially consumed forwardable data. For this reason a new
  HTTP message state was added before HTTP_MSG_DONE : HTTP_MSG_ENDING.

Now all filters can define corresponding callbacks (http_forward_data
and tcp_forward_data). Each filter owns 2 offsets relative to buf->p, next and
forward, to track, respectively, input data already parsed but not forwarded yet
by the filter and parsed data considered as forwarded by the filter. A any time,
we have the warranty that a filter cannot parse or forward more input than
previous ones. And, of course, it cannot forward more input than it has
parsed. 2 macros has been added to retrieve these offets: FLT_NXT and FLT_FWD.

In addition, 2 functions has been added to change the 'next size' and the
'forward size' of a filter. When a filter parses input data, it can alter these
data, so the size of these data can vary. This action has an effet on all
previous filters that must be handled. To do so, the function
'filter_change_next_size' must be called, passing the size variation. In the
same spirit, if a filter alter forwarded data, it must call the function
'filter_change_forward_size'. 'filter_change_next_size' can be called in
'http_data' and 'tcp_data' callbacks and only these ones. And
'filter_change_forward_size' can be called in 'http_forward_data' and
'tcp_forward_data' callbacks and only these ones. The data changes are the
filter responsability, but with some limitation. It must not change already
parsed/forwarded data or data that previous filters have not parsed/forwarded
yet.

Because filters can be used on backends, when we the backend is set for a
stream, we add filters defined for this backend in the filter list of the
stream. But we must only do that when the backend and the frontend of the stream
are not the same. Else same filters are added a second time leading to undefined
behavior.

The HTTP compression code had to be moved.

So it simplifies http_response_forward_body function. To do so, the way the data
are forwarded has changed. Now, a filter (and only one) can forward data. In a
commit to come, this limitation will be removed to let all filters take part to
data forwarding. There are 2 new functions that filters should use to deal with
this feature:

 * flt_set_http_data_forwarder: This function sets the filter (using its id)
   that will forward data for the specified HTTP message. It is possible if it
   was not already set by another filter _AND_ if no data was yet forwarded
   (msg->msg_state <= HTTP_MSG_BODY). It returns -1 if an error occurs.

 * flt_http_data_forwarder: This function returns the filter id that will
   forward data for the specified HTTP message. If there is no forwarder set, it
   returns -1.

When an HTTP data forwarder is set for the response, the HTTP compression is
disabled. Of course, this is not definitive.
2016-02-09 14:53:15 +01:00
Willy Tarreau
ebcd4844e8 MEDIUM: vars: move the session variables to the session, not the stream
It's important that the session-wide variables are in the session and not
in the stream.
2015-06-19 11:59:02 +02:00
Willy Tarreau
73b65acd46 MINOR: stream: pass the pointer to the origin explicitly to stream_new()
We don't pass sess->origin anymore but the pointer to the previous step. Now
it should be much easier to chain elements together once applets are moved out
of streams. Indeed, the session is only used for configuration and not for the
dynamic chaining anymore.
2015-04-08 18:26:29 +02:00
Willy Tarreau
678be62981 MEDIUM: session: adjust the connection flags before stream_new()
It's not the stream's job to manipulate the connection's flags, it's
more related to the session that accepted the new connection. And the
only case where we have to do it conditionally is based on the frontend
which is known from the session, thus it makes sense to do it there.
2015-04-08 18:18:15 +02:00
Willy Tarreau
042cd75bc2 MINOR: session: maintain the session count stats in the session, not the stream
This has nothing to do in the stream, as we'll face absurdities when chaining
multiple streams. The session is where it must be accounted for.
2015-04-08 18:10:49 +02:00
Willy Tarreau
d1769b8b9a MEDIUM: stream: don't rely on the session's listener anymore in stream_new()
When the stream is instanciated from an applet, it doesn't necessarily
have a listener. The listener was sparsely used there, just to retrieve
the task function, update the listeners' stats, and set the analysers
and default target, both of which are often zero from applets. Thus
these elements are now initialized with default values that the caller
is free to change if desired.
2015-04-06 11:37:35 +02:00
Willy Tarreau
f9d1bc6d9a MEDIUM: frontend: move the fd-specific settings to session_accept_fd()
The frontend is generic and does not depend on a file descriptor,
so applying some socket options to the incoming fd is not its role.
Let's move the setsockopt() calls earlier in session_accept_fd()
where others are done as well.
2015-04-06 11:37:35 +02:00
Willy Tarreau
02d863866d MEDIUM: stream: return the stream upon accept()
The function was called stream_accept_session(), let's rename it
stream_new() and make it return the newly allocated pointer. It's
more convenient for some callers who need it.
2015-04-06 11:37:34 +02:00
Willy Tarreau
18b95a4b27 MINOR: session: set the CO_FL_CONNECTED flag on the connection once ready
If we know there's no handshake, we must set the flag on the connection,
it's not the job of the stream initializer to do it.
2015-04-06 11:37:33 +02:00
Willy Tarreau
64beab202c MINOR: session: make use of session_new() when creating a new session
It's better than open-coding it.
2015-04-06 11:37:33 +02:00
Willy Tarreau
c38f71cfcd MINOR: session: introduce session_new()
This one creates a new session and does the minimum initialization.
2015-04-06 11:37:33 +02:00
Willy Tarreau
9903f0e1a2 REORG: session: move the session parts out of stream.c
This concerns everythins related to accepting a new session and
expiring the embryonic session. There's still a hard-coded call
to stream_accept_session() which could be set somewhere in the
frontend, but for now it's not a problem.
2015-04-06 11:37:32 +02:00
Willy Tarreau
bb2ef12a60 MEDIUM: session: update the session's stick counters upon session_free()
Whenever session_free() is called, any possible stick counter stored in
the session will be synchronized.
2015-04-06 11:37:31 +02:00
Willy Tarreau
11c3624c32 MINOR: session: implement session_free() and use it everywhere
We want to call this one everywhere we have to kill a session so
that future parts we move to the session can be released from there.
2015-04-06 11:37:30 +02:00
Willy Tarreau
b1ec8c4a59 MINOR: session: start to reintroduce struct session
There is now a pointer to the session in the stream, which is NULL
for now. The session pool is created as well. Some parts will move
from the stream to the session now.
2015-04-06 11:23:57 +02:00
Willy Tarreau
87b09668be REORG/MAJOR: session: rename the "session" entity to "stream"
With HTTP/2, we'll have to support multiplexed streams. A stream is in
fact the largest part of what we currently call a session, it has buffers,
logs, etc.

In order to catch any error, this commit removes any reference to the
struct session and tries to rename most "session" occurrences in function
names to "stream" and "sess" to "strm" when that's related to a session.

The files stream.{c,h} were added and session.{c,h} removed.

The session will be reintroduced later and a few parts of the stream
will progressively be moved overthere. It will more or less contain
only what we need in an embryonic session.

Sample fetch functions and converters will have to change a bit so
that they'll use an L5 (session) instead of what's currently called
"L4" which is in fact L6 for now.

Once all changes are completed, we should see approximately this :

   L7 - http_txn
   L6 - stream
   L5 - session
   L4 - connection | applet

There will be at most one http_txn per stream, and a same session will
possibly be referenced by multiple streams. A connection will point to
a session and to a stream. The session will hold all the information
we need to keep even when we don't yet have a stream.

Some more cleanup is needed because some code was already far from
being clean. The server queue management still refers to sessions at
many places while comments talk about connections. This will have to
be cleaned up once we have a server-side connection pool manager.
Stream flags "SN_*" still need to be renamed, it doesn't seem like
any of them will need to move to the session.
2015-04-06 11:23:56 +02:00
Willy Tarreau
10b688f2b4 MEDIUM: listener: store the default target per listener
This will be useful later to state that some listeners have to use
certain decoders (typically an HTTP/2 decoder) regardless of the
regular processing applied to other listeners. For now it simply
defaults to the frontend's default target, and it is used by the
session.
2015-03-13 16:45:37 +01:00
Willy Tarreau
f87ab94e3b MINOR: proxy: store the default target into the frontend's configuration
Some services such as peers and CLI pre-set the target applet immediately
during accept(), and for this reason they're forced to have a dedicated
accept() function which does not even properly follow everything the regular
one does (eg: sndbuf/rcvbuf/linger/nodelay are not set, etc).

Let's store the default target when known into the frontend's config so that
it's session_accept() which automatically sets it.
2015-03-13 16:23:00 +01:00
Willy Tarreau
78955f4c8b MEDIUM: session: simplify receive buffer allocator to only use the channel
Now that we can get the session from the channel, let's simplify the
prototype of session_alloc_recv_buffer() to only require the channel.
Both the caller and the function are now simplified.
2015-03-11 20:41:47 +01:00
Willy Tarreau
103197d597 CLEANUP: session: don't use si_{ic,oc} when we know the session.
During the connection establishment, we needlessly rely on pointer
dereferences.
2015-03-11 20:41:47 +01:00
Willy Tarreau
7b8c4f9661 CLEANUP: session: don't needlessly pass a pointer to the stream-int
All functions dealing with connection establishment currently use a
pointer to the stream interface. Now we know it cannot change and is
always s->si[1].
2015-03-11 20:41:47 +01:00
Willy Tarreau
8f128b41ec CLEANUP: session: use local variables to access channels / stream ints
In process_session, we had around 300 accesses to channels and stream-ints
from the session. Not only this inflates the code due to the large offsets
from the original pointer, but readability can be improved. Let's have 4
local variables for the channels and stream-ints.
2015-03-11 20:41:47 +01:00
Willy Tarreau
350f487300 CLEANUP: session: simplify references to chn_{prod,cons}(&s->{req,res})
These 4 combinations are needlessly complicated since the session already
has direct access to the associated stream interfaces without having to
check an indirect pointer.
2015-03-11 20:41:47 +01:00
Willy Tarreau
81cd90069a MEDIUM: channel: remove now unused ->prod and ->cons pointers
Nothing uses them anymore.
2015-03-11 20:41:47 +01:00
Willy Tarreau
ef573c0f22 MEDIUM: channel: add a new flag "CF_ISRESP" for the response channel
This flag designates the response channel. This will be used to know
what channel we're seeing and finding our way back to the session.
2015-03-11 20:41:47 +01:00
Willy Tarreau
73796535a9 REORG/MEDIUM: channel: only use chn_prod / chn_cons to find stream-interfaces
The purpose of these two macros will be to pass via the session to
find the relevant stream interfaces so that we don't need to store
the ->cons nor ->prod pointers anymore. Currently they're only defined
so that all references could be removed.

Note that many places need a second pass of clean up so that we don't
have any chn_prod(&s->req) anymore and only &s->si[0] instead, and
conversely for the 3 other cases.
2015-03-11 20:41:47 +01:00
Willy Tarreau
819d332dfd MEDIUM: stream-int: remove any reference to the owner
si->owner is not used anymore now, so let's remove any reference to it.
2015-03-11 20:41:46 +01:00
Willy Tarreau
07373b8660 MEDIUM: stream-int: use si_task() to retrieve the task from the stream int
We go back to the session to get the owner. Here again it's very easy
and is just a matter of relative offsets. Since the owner always exists
and always points to the session's task, we can remove some unneeded
tests.
2015-03-11 20:41:46 +01:00
Willy Tarreau
a2df3fa251 MEDIUM: stream-interface: remove now unused pointers to channels
Everyone must now use si_ic() / si_oc() to find the relevant channels,
the points have been totally removed.
2015-03-11 20:41:46 +01:00
Willy Tarreau
a5f5d8dc69 MEDIUM: stream-int: add a flag indicating which side the SI is on
This new flag "SI_FL_ISBACK" is set only on the back SI and is cleared
on the front SI. That way it's possible only by looking at the SI to
know what side it is.
2015-03-11 20:41:46 +01:00
Willy Tarreau
2bb4a96f8f REORG/MEDIUM: stream-int: introduce si_ic/si_oc to access channels
We'll soon remove direct references to the channels from the stream
interface since everything belongs to the same session, so let's
first not dereference si->ib / si->ob anymore and use macros instead.
2015-03-11 20:41:46 +01:00
Willy Tarreau
a27dc19eda CLEANUP: remove now unused channel pool
The channels are now part of the struct session. Their pool is
not needed anymore.
2015-03-11 20:41:46 +01:00
Willy Tarreau
22ec1eadd0 REORG/MAJOR: move session's req and resp channels back into the session
The channels were pointers to outside structs and this is not needed
anymore since the buffers have moved, but this complicates operations.
Move them back into the session so that both channels and stream interfaces
are always allocated for a session. Some places (some early sample fetch
functions) used to validate that a channel was NULL prior to dereferencing
it. Now instead we check if chn->buf is NULL and we force it to remain NULL
until the channel is initialized.
2015-03-11 20:41:46 +01:00
Thierry FOURNIER
a718b29b6d MINOR: lua: remove some #define
The #define compilation directives are centralized in the hlua
include files. This permits to remove ome #ifdef from the haproxy
main code.
2015-03-04 17:58:52 +01:00
Thierry FOURNIER
05ac42455f MEDIUM: lua: Lua initialisation "on demand"
Actually, the Lua context is always initilized in each
session, even if the session doesn't use Lua. This
behavior cause 5% performances loss.

This patch initilize the Lua only if it is use by the
session. The initialization is now on demand.
2015-02-28 23:12:37 +01:00
Thierry FOURNIER
65f34c6367 MINOR: lua: txn: create class TXN associated with the transaction.
This class of functions permit to access to all the functions
associated with the transaction like http header, HAProxy internal
fetches, etc ...

This patch puts the skeleton of this class. The class will be
enhanced later.
2015-02-28 23:12:34 +01:00
Thierry FOURNIER
bc4c1ac6ad MEDIUM: http/tcp: permit to resume http and tcp custom actions
Later, the processing of some actions needs to be interrupted and resumed
later. This patch permit to resume the actions. The actions that needs
to run with the resume mode are not yet avalaible. It will be soon with
Lua patches. So the code added by this patch is untestable for the moment.

The list of "tcp_exec_req_rules" cannot resme because is called by the
unresumable function "accept_session".
2015-02-28 23:12:33 +01:00
Thierry FOURNIER
f41a809dc9 MINOR: sample: add private argument to the struct sample_fetch
The add of this private argument is to prepare the integration
of the lua fetchs.
2015-02-28 23:12:31 +01:00
Thierry FOURNIER
b83862dd74 MEDIUM: channel: wake up any request analyzer on response activity
This behavior is already existing for the "WAIT_HTTP" analyzer,
this patch just extends the system to any analyzer that would
be waked up on response activity.
2015-02-28 23:12:31 +01:00
Thierry FOURNIER
2e05a8c742 MEDIUM: task: call session analyzers if the task is woken by a message.
When a task used to receive a message from another one, its analysers
were not called if there was no I/O activity.
2015-02-28 23:12:30 +01:00
Willy Tarreau
a24adf0795 MAJOR: session: only wake up as many sessions as available buffers permit
We've already experimented with three wake up algorithms when releasing
buffers : the first naive one used to wake up far too many sessions,
causing many of them not to get any buffer. The second approach which
was still in use prior to this patch consisted in waking up either 1
or 2 sessions depending on the number of FDs we had released. And this
was still inaccurate. The third one tried to cover the accuracy issues
of the second and took into consideration the number of FDs the sessions
would be willing to use, but most of the time we ended up waking up too
many of them for nothing, or deadlocking by lack of buffers.

This patch completely removes the need to allocate two buffers at once.
Instead it splits allocations into critical and non-critical ones and
implements a reserve in the pool for this. The deadlock situation happens
when all buffers are be allocated for requests pending in a maxconn-limited
server queue, because then there's no more way to allocate buffers for
responses, and these responses are critical to release the servers's
connection in order to release the pending requests. In fact maxconn on
a server creates a dependence between sessions and particularly between
oldest session's responses and latest session's requests. Thus, it is
mandatory to get a free buffer for a response in order to release a
server connection which will permit to release a request buffer.

Since we definitely have non-symmetrical buffers, we need to implement
this logic in the buffer allocation mechanism. What this commit does is
implement a reserve of buffers which can only be allocated for responses
and that will never be allocated for requests. This is made possible by
the requester indicating how much margin it wants to leave after the
allocation succeeds. Thus it is a cooperative allocation mechanism : the
requester (process_session() in general) prefers not to get a buffer in
order to respect other's need for response buffers. The session management
code always knows if a buffer will be used for requests or responses, so
that is not difficult :

  - either there's an applet on the initiator side and we really need
    the request buffer (since currently the applet is called in the
    context of the session)

  - or we have a connection and we really need the response buffer (in
    order to support building and sending an error message back)

This reserve ensures that we don't take all allocatable buffers for
requests waiting in a queue. The downside is that all the extra buffers
are really allocated to ensure they can be allocated. But with small
values it is not an issue.

With this change, we don't observe any more deadlocks even when running
with maxconn 1 on a server under severely constrained memory conditions.

The code becomes a bit tricky, it relies on the scheduler's run queue to
estimate how many sessions are already expected to run so that it doesn't
wake up everyone with too few resources. A better solution would probably
consist in having two queues, one for urgent requests and one for normal
requests. A failed allocation for a session dealing with an error, a
connection event, or the need for a response (or request when there's an
applet on the left) would go to the urgent request queue, while other
requests would go to the other queue. Urgent requests would be served
from 1 entry in the pool, while the regular ones would be served only
according to the reserve. Despite not yet having this, it works
remarkably well.

This mechanism is quite efficient, we don't perform too many wake up calls
anymore. For 1 million sessions elapsed during massive memory contention,
we observe about 4.5M calls to process_session() compared to 4.0M without
memory constraints. Previously we used to observe up to 16M calls, which
rougly means 12M failures.

During a test run under high memory constraints (limit enforced to 27 MB
instead of the 58 MB normally needed), performance used to drop by 53% prior
to this patch. Now with this patch instead it *increases* by about 1.5%.

The best effect of this change is that by limiting the memory usage to about
2/3 to 3/4 of what is needed by default, it's possible to increase performance
by up to about 18% mainly due to the fact that pools are reused more often
and remain hot in the CPU cache (observed on regular HTTP traffic with 20k
objects, buffers.limit = maxconn/10, buffers.reserve = limit/2).

Below is an example of scenario which used to cause a deadlock previously :
  - connection is received
  - two buffers are allocated in process_session() then released
  - one is allocated when receiving an HTTP request
  - the second buffer is allocated then released in process_session()
    for request parsing then connection establishment.
  - poll() says we can send, so the request buffer is sent and released
  - process session gets notified that the connection is now established
    and allocates two buffers then releases them
  - all other sessions do the same till one cannot get the request buffer
    without hitting the margin
  - and now the server responds. stream_interface allocates the response
    buffer and manages to get it since it's higher priority being for a
    response.
  - but process_session() cannot allocate the request buffer anymore

  => We could end up with all buffers used by responses so that none may
     be allocated for a request in process_session().

When the applet processing leaves the session context, the test will have
to be changed so that we always allocate a response buffer regardless of
the left side (eg: H2->H1 gateway). A final improvement would consists in
being able to only retry the failed I/O operation without waking up a
task, but to date all experiments to achieve this have proven not to be
reliable enough.
2014-12-24 23:47:33 +01:00
Willy Tarreau
10fc09e872 MAJOR: session: only allocate buffers when needed
A session doesn't need buffers all the time, especially when they're
empty. With this patch, we don't allocate buffers anymore when the
session is initialized, we only allocate them in two cases :

  - during process_session()
  - during I/O operations

During process_session(), we try hard to allocate both buffers at once
so that we know for sure that a started operation can complete. Indeed,
a previous version of this patch used to allocate one buffer at a time,
but it can result in a deadlock when all buffers are allocated for
requests for example, and there's no buffer left to emit error responses.
Here, if any of the buffers cannot be allocated, the whole operation is
cancelled and the session is added at the tail of the buffer wait queue.

At the end of process_session(), a call to session_release_buffers() is
done so that we can offer unused buffers to other sessions waiting for
them.

For I/O operations, we only need to allocate a buffer on the Rx path.
For this, we only allocate a single buffer but ensure that at least two
are available to avoid the deadlock situation. In case buffers are not
available, SI_FL_WAIT_ROOM is set on the stream interface and the session
is queued. Unused buffers resulting either from a successful send() or
from an unused read buffer are offered to pending sessions during the
->wake() callback.
2014-12-24 23:47:33 +01:00
Willy Tarreau
bf883e0aa7 MAJOR: session: implement a wait-queue for sessions who need a buffer
When a session_alloc_buffers() fails to allocate one or two buffers,
it subscribes the session to buffer_wq, and waits for another session
to release buffers. It's then removed from the queue and woken up with
TASK_WAKE_RES, and can attempt its allocation again.

We decide to try to wake as many waiters as we release buffers so
that if we release 2 and two waiters need only once, they both have
their chance. We must never come to the situation where we don't wake
enough tasks up.

It's common to release buffers after the completion of an I/O callback,
which can happen even if the I/O could not be performed due to half a
failure on memory allocation. In this situation, we don't want to move
out of the wait queue the session that was just added, otherwise it
will never get any buffer. Thus, we only force ourselves out of the
queue when freeing the session.

Note: at the moment, since session_alloc_buffers() is not used, no task
is subscribed to the wait queue.
2014-12-24 23:47:33 +01:00
Willy Tarreau
656859d478 MEDIUM: session: implement a basic atomic buffer allocator
This patch introduces session_alloc_recv_buffer(), session_alloc_buffers()
and session_release_buffers() whose purpose will be to allocate missing
buffers and release unneeded ones around the process_session() and during
I/O operations.

I/O callbacks only need a single buffer for recv operations and none
for send. However we still want to ensure that we don't pick the last
buffer. That's what session_alloc_recv_buffer() is for.

This allocator is atomic in that it always ensures we can get 2 buffers
or fails. Here, if any of the buffers is not ready and cannot be
allocated, the operation is cancelled. The purpose is to guarantee that
we don't enter into the deadlock where all buffers are allocated by the
same size of all sessions.

A queue will have to be implemented for failed allocations. For now
they're just reported as failures.
2014-12-24 23:47:32 +01:00
Willy Tarreau
909e267be0 MINOR: session: group buffer allocations together
We'll soon want to release buffers together upon failure so we need to
allocate them after the channels. Let's change this now. There's no
impact on the behaviour, only the error path is unrolled slightly
differently. The same was done in peers.
2014-12-24 23:47:32 +01:00
Willy Tarreau
7dfca9daec MINOR: buffer: only use b_free to release buffers
We don't call pool_free2(pool2_buffers) anymore, we only call b_free()
to do the job. This ensures that we can start to centralize the releasing
of buffers.
2014-12-24 23:47:32 +01:00
Willy Tarreau
696a2910a0 MINOR: buffer: move buffer initialization after channel initialization
It's not clean to initialize the buffer before the channel since it
dereferences one pointer in the channel. Also we'll want to let the
channel pre-initialize the buffer, so let's ensure that the channel
is always initialized prior to the buffers.
2014-12-24 23:47:32 +01:00
Willy Tarreau
e583ea583a MEDIUM: buffer: use b_alloc() to allocate and initialize a buffer
b_alloc() now allocates a buffer and initializes it to the size specified
in the pool minus the size of the struct buffer itself. This ensures that
callers do not need to care about buffer details anymore. Also this never
applies memory poisonning, which is slow and useless on buffers.
2014-12-24 23:47:32 +01:00
Willy Tarreau
474cf54a97 MINOR: buffer: reset a buffer in b_reset() and not channel_init()
We'll soon need to be able to switch buffers without touching the
channel, so let's move buffer initialization out of channel_init().
We had the same in compressoin.c.
2014-12-24 23:47:31 +01:00
Willy Tarreau
3b24641745 BUG/MAJOR: sessions: unlink session from list on out of memory
Since embryonic sessions were introduced in 1.5-dev12 with commit
2542b53 ("MAJOR: session: introduce embryonic sessions"), a major
bug remained present. If haproxy cannot allocate memory during
session_complete() (for example, no more buffers), it will not
unlink the new session from the sessions list. This will cause
memory corruptions if the memory area from the session is reused
for anything else, and may also cause bogus output on "show sess"
on the CLI.

This fix must be backported to 1.5.
2014-11-25 22:09:05 +01:00
KOVACS Krisztian
b3e54fe387 MAJOR: namespace: add Linux network namespace support
This patch makes it possible to create binds and servers in separate
namespaces.  This can be used to proxy between multiple completely independent
virtual networks (with possibly overlapping IP addresses) and a
non-namespace-aware proxy implementation that supports the proxy protocol (v2).

The setup is something like this:

net1 on VLAN 1 (namespace 1) -\
net2 on VLAN 2 (namespace 2) -- haproxy ==== proxy (namespace 0)
net3 on VLAN 3 (namespace 3) -/

The proxy is configured to make server connections through haproxy and sending
the expected source/target addresses to haproxy using the proxy protocol.

The network namespace setup on the haproxy node is something like this:

= 8< =
$ cat setup.sh
ip netns add 1
ip link add link eth1 type vlan id 1
ip link set eth1.1 netns 1
ip netns exec 1 ip addr add 192.168.91.2/24 dev eth1.1
ip netns exec 1 ip link set eth1.$id up
...
= 8< =

= 8< =
$ cat haproxy.cfg
frontend clients
  bind 127.0.0.1:50022 namespace 1 transparent
  default_backend scb

backend server
  mode tcp
  server server1 192.168.122.4:2222 namespace 2 send-proxy-v2
= 8< =

A bind line creates the listener in the specified namespace, and connections
originating from that listener also have their network namespace set to
that of the listener.

A server line either forces the connection to be made in a specified
namespace or may use the namespace from the client-side connection if that
was set.

For more documentation please read the documentation included in the patch
itself.

Signed-off-by: KOVACS Tamas <ktamas@balabit.com>
Signed-off-by: Sarkozi Laszlo <laszlo.sarkozi@balabit.com>
Signed-off-by: KOVACS Krisztian <hidden@balabit.com>
2014-11-21 07:51:57 +01:00
Willy Tarreau
3a5e060bf6 MINOR: session: release a few other pools when stopping
We currently release all pools when a proxy is stopped, except the
connection, pendconn, and pipe pools. Doing so can improve further
reduce memory usage of old processes, eventhough the connection struct
is quite small, but there are a lot and they can participate to memory
fragmentation. The pipe pool is very small and limited, and not exported
so it's not done here.
2014-11-13 16:56:12 +01:00
Willy Tarreau
e12704bfc7 MINOR: session: export the function 'smp_fetch_sc_stkctr'
This one is sometimes useful outside of this file.
2014-07-15 19:09:56 +02:00
Willy Tarreau
b5975defba MINOR: stick-table: make stktable_fetch_key() indicate why it failed
stktable_fetch_key() does not indicate whether it returns NULL because
the input sample was not found or because it's unstable. It causes trouble
with track-sc* rules. Just like with sample_fetch_string(), we want it to
be able to give more information to the caller about what it found. Thus,
now we use the pointer to a sample passed by the caller, and fill it with
the information we have about the sample. That way, even if we return NULL,
the caller has the ability to check whether a sample was found and if it is
still changing or not.
2014-06-25 17:17:53 +02:00
Willy Tarreau
6f0a7bac28 BUG/MAJOR: session: revert all the crappy client-side timeout changes
This is the 3rd regression caused by the changes below. The latest to
date was reported by Finn Arne Gangstad. If a server responds with no
content-length and the client's FIN is never received, either we leak
the client-side FD or we spin at 100% CPU if timeout client-fin is set.

Enough is enough. The amount of tricks needed to cover these side-effects
starts to look like used toilet paper stacked over a chocolate cake. I
don't want to eat that cake anymore!

All this to avoid reporting a server-side timeout when a client stops
uploading data and haproxy expires faster than the server... A lot of
"ifs" resulting in a technically valid log that doesn't always please
users, and whose alternative causes that many issues for all others
users.

So let's revert this crap merged since 1.5-dev25 :
  Revert "CLEANUP: http: don't clear CF_READ_NOEXP twice"
    This reverts commit 1592d1e72a.
  Revert "BUG/MEDIUM: http: clear CF_READ_NOEXP when preparing a new transaction"
    This reverts commit 77d29029af.
  Revert "BUG/MEDIUM: session: don't clear CF_READ_NOEXP if analysers are not called"
    This reverts commit 0943757a21.
  Revert "BUG/MEDIUM: http: disable server-side expiration until client has sent the body"
    This reverts commit 3bed5e9337.
  Revert "BUG/MEDIUM: http: correctly report request body timeouts"
    This reverts commit b9edf8fbec.
  Revert "BUG/MEDIUM: http/session: disable client-side expiration only after body"
    This reverts commit b1982e27aa.

If a cleaner AND SAFER way to do something equivalent in 1.6-dev, we *might*
consider backporting it to 1.5, but given the vicious bugs that have surfaced
since, I doubt it will happen any time soon.

Fortunately, that crap never made it into 1.4 so no backport is needed.
2014-06-23 15:47:00 +02:00
Willy Tarreau
4bfc580dd3 MEDIUM: session: maintain per-backend and per-server time statistics
Using the last rate counters, we now compute the queue, connect, response
and total times per server and per backend with a 95% accuracy over the last
1024 samples. The operation is cheap so we don't need to condition it.
2014-06-17 17:15:56 +02:00
Willy Tarreau
33a14e515b MEDIUM: session: redispatch earlier when possible
As discussed with Dmitry Sivachenko, is a server farm has more than one
active server, uses a guaranteed non-determinist algorithm (round robin),
and a connection was initiated from a non-persistent connection, there's
no point insisting to reconnect to the same server after a connect failure,
better redispatch upon the very first retry instead of insisting on the same
server multiple times.
2014-06-13 17:53:55 +02:00
Willy Tarreau
db6d012270 MEDIUM: session: don't apply the retry delay when redispatching
The retry delay is only useful when sticking to a same server. During
a redispatch, it's useless and counter-productive if we're sure to
switch to another server, which is almost guaranteed when there's
more than one server and the balancing algorithm is round robin, so
better not pass via the turn-around state in this case. It could be
done as well for leastconn, but there's a risk of always killing the
delay after the recovery of a server in a farm where it's almost
guaranteed to take most incoming traffic. So better only kill the
delay when using round robin.
2014-06-13 17:48:45 +02:00
Willy Tarreau
b02906659b MEDIUM: session: allow shorter retry delay if timeout connect is small
As discussed with Dmitry Sivachenko, the default 1-second connect retry
delay can be large for situations where the connect timeout is much smaller,
because it means that an active connection reject will take more time to be
retried than a silent drop, and that does not make sense.

This patch changes this so that the retry delay is the minimum of 1 second
and the connect timeout. That way people running with sub-second connect
timeout will benefit from the shorter reconnect.
2014-06-13 17:04:44 +02:00
Willy Tarreau
892337c8e1 MAJOR: server: use states instead of flags to store the server state
Servers used to have 3 flags to store a state, now they have 4 states
instead. This avoids lots of confusion for the 4 remaining undefined
states.

The encoding from the previous to the new states can be represented
this way :

  SRV_STF_RUNNING
   |  SRV_STF_GOINGDOWN
   |   |  SRV_STF_WARMINGUP
   |   |   |
   0   x   x     SRV_ST_STOPPED
   1   0   0     SRV_ST_RUNNING
   1   0   1     SRV_ST_STARTING
   1   1   x     SRV_ST_STOPPING

Note that the case where all bits were set used to exist and was randomly
dealt with. For example, the task was not stopped, the throttle value was
still updated and reported in the stats and in the http_server_state header.
It was the same if the server was stopped by the agent or for maintenance.

It's worth noting that the internal function names are still quite confusing.
2014-05-22 11:27:00 +02:00
Willy Tarreau
c93cd16b6c REORG/MEDIUM: server: split server state and flags in two different variables
Till now, the server's state and flags were all saved as a single bit
field. It causes some difficulties because we'd like to have an enum
for the state and separate flags.

This commit starts by splitting them in two distinct fields. The first
one is srv->state (with its counter-part srv->prev_state) which are now
enums, but which still contain bits (SRV_STF_*).

The flags now lie in their own field (srv->flags).

The function srv_is_usable() was updated to use the enum as input, since
it already used to deal only with the state.

Note that currently, the maintenance mode is still in the state for
simplicity, but it must move as well.
2014-05-22 11:27:00 +02:00
Willy Tarreau
0943757a21 BUG/MEDIUM: session: don't clear CF_READ_NOEXP if analysers are not called
As more or less suspected, commit b1982e2 ("BUG/MEDIUM: http/session:
disable client-side expiration only after body") was hazardous. It
introduced a regression causing client side timeout to expire during
connection retries if it's lower than the time needed to cover the
amount of retries, so clients get a 408 when the connection to the
server fails to establish fast enough.

The reason is that the CF_READ_NOEXP flag is set after the MSG_DONE state
is reached, which protects the timeout from being re-armed, then during
the retries, process_session() clears the flag without calling the analyser
(since there's no activity for it), so the timeouts are rearmed.

Ideally, these one-shot flags should be per-analyser, and the analyser
which sets them would be responsible for clearing them, or they would
automatically be cleared when switching to another analyser. Unfortunately
this is not really possible currently.

What can be done however is to only clear them in the following situations :
  - we're going to call analysers
  - analysers have all been unsubscribed

This method seems reliable enough and approaches the ideal case well enough.

No backport is needed, this bug was introduced in 1.5-dev25.
2014-05-21 16:58:17 +02:00
Willy Tarreau
05cdd9655d MEDIUM: session: implement half-closed timeouts (client-fin and server-fin)
Long-lived sessions are often subject to half-closed sessions resulting in
a lot of sessions appearing in FIN_WAIT state in the system tables, and no
way for haproxy to get rid of them. This typically happens because clients
suddenly disconnect without sending any packet (eg: FIN or RST was lost in
the path), and while the server detects this using an applicative heart
beat, haproxy does not close the connection.

This patch adds two new timeouts : "timeout client-fin" and
"timeout server-fin". The former allows one to override the client-facing
timeout when a FIN has been received or sent. The latter does the same for
server-facing connections, which is less useful.
2014-05-10 15:14:05 +02:00
Willy Tarreau
b4f98098aa BUG/MAJOR: session: recover the correct connection pointer in half-initialized sessions
John-Paul Bader reported a nasty segv which happens after a few hours
when SSL is enabled under a high load. Fortunately he could catch a
stack trace, systematically looking like this one :

(gdb) bt full
        level = 6
        conn = (struct connection *) 0x0
        err_msg = <value optimized out>
        s = (struct session *) 0x80337f800
        conn = <value optimized out>
        flags = 41997063
        new_updt = <value optimized out>
        old_updt = 1
        e = <value optimized out>
        status = 0
        fd = 53999616
        nbfd = 279
        wait_time = <value optimized out>
        updt_idx = <value optimized out>
        en = <value optimized out>
        eo = <value optimized out>
        count = 78
        sr = <value optimized out>
        sw = <value optimized out>
        rn = <value optimized out>
        wn = <value optimized out>

The variable "flags" in conn_fd_handler() holds a copy of connection->flags
when entering the function. These flags indicate 41997063 = 0x0280d307 :
  - {SOCK,DATA,CURR}_RD_ENA=1       => it's a handshake, waiting for reading
  - {SOCK,DATA,CURR}_WR_ENA=0       => no need for writing
  - CTRL_READY=1                    => FD is still allocated
  - XPRT_READY=1                    => transport layer is initialized
  - ADDR_FROM_SET=1, ADDR_TO_SET=0  => clearly it's a frontend connection
  - INIT_DATA=1, WAKE_DATA=1        => processing a handshake (ssl I guess)
  - {DATA,SOCK}_{RD,WR}_SH=0        => no shutdown
  - ERROR=0, CONNECTED=0            => handshake not completed yet
  - WAIT_L4_CONN=0                  => normal
  - WAIT_L6_CONN=1                  => waiting for an L6 handshake to complete
  - SSL_WAIT_HS=1                   => the pending handshake is an SSL handshake

So this is a handshake is in progress. And the only way to reach line 88
is for the handshake to complete without error. So we know for sure that
ssl_sock_handshake() was called and completed the handshake then removed
the CO_FL_SSL_WAIT_HS flag from the connection. With these flags,
ssl_sock_handshake() does only call SSL_do_handshake() and retruns. So
that means that the problem is necessarily in data->init().

The fd is wrong as reported but is simply mis-decoded as it's the lower
half of the last function pointer.

What happens in practice is that there's an issue with the way we deal
with embryonic sessions during their conversion to regular sessions.
Since they have no stream interface at the beginning, the pointer to
the connection is temporarily stored into s->target. Then during their
conversion, the first stream interface is properly initialized and the
connection is attached to it, then s->target is set to NULL.

The problem is that if anything fails in session_complete(), the
session is left in this intermediate state where s->target is NULL,
and kill_mini_session() is called afterwards to perform the cleanup.
It needs the connection, that it finds in s->target which is NULL,
dereferences it and dies. The only reasons for dying here are a problem
on the TCP connection when doing the setsockopt(TCP_NODELAY) or a
memory allocation issue.

This patch implements a solution consisting in restoring s->target in
session_complete() on the error path. That way embryonic sessions that
were valid before calling it are still valid after.

The bug was introduced in 1.5-dev20 by commit f8a49ea ("MEDIUM: session:
attach incoming connection to target on embryonic sessions"). No backport
is needed.

Special thanks to John for his numerous tests and traces.
2014-05-08 22:46:32 +02:00
Willy Tarreau
b1982e27aa BUG/MEDIUM: http/session: disable client-side expiration only after body
For a very long time, back in the v1.3 days, we used to rely on a trick
to avoid expiring the client side while transferring a payload to the
server. The problem was that if a client was able to quickly fill the
buffers, and these buffers took some time to reach the server, the
client should not expire while not sending anything.

In order to cover this situation, the client-side timeout was disabled
once the connection to the server was OK, since it implied that we would
at least expire on the server if required.

But there is a drawback to this : if a client stops uploading data before
the end, its timeout is not enforced and we only expire on the server's
timeout, so the logs report a 504.

Since 1.4, we have message body analysers which ensure that we know whether
all the expected data was received or not (HTTP_MSG_DATA or HTTP_MSG_DONE).
So we can fix this problem by disabling the client-side or server-side
timeout at the end of the transfer for the respective side instead of
having it unconditionally in session.c during all the transfer.

With this, the logs now report the correct side for the timeout. Note that
this patch is not enough, because another issue remains : the HTTP body
forwarders do not abort upon timeout, they simply rely on the generic
handling from session.c. So for now, the session is still aborted when
reaching the server timeout, but the culprit is properly reported. A
subsequent patch will address this specific point.

This bug was tagged MEDIUM because of the changes performed. The issue
it fixes is minor however. After some cooling down, it may be backported
to 1.4.

It was reported by and discussed with Rachel Chavez and Patrick Hemmer
on the mailing list.
2014-05-07 14:21:47 +02:00
Willy Tarreau
644c101e2d BUG/MAJOR: http: connection setup may stall on balance url_param
On the mailing list, seri0528@naver.com reported an issue when
using balance url_param or balance uri. The request would sometimes
stall forever.

Cyril Bont managed to reproduce it with the configuration below :

  listen test :80
    mode http
    balance url_param q
    hash-type consistent
    server s demo.1wt.eu:80

and found it appeared with this commit : 80a92c0 ("BUG/MEDIUM: http:
don't start to forward request data before the connect").

The bug is subtle but real. The problem is that the HTTP request
forwarding analyzer refrains from starting to parse the request
body when some LB algorithms might need the body contents, in order
to preserve the data pointer and avoid moving things around during
analysis in case a redispatch is later needed. And in order to detect
that the connection establishes, it watches the response channel's
CF_READ_ATTACHED flag.

The problem is that a request analyzer is not subscribed to a response
channel, so it will only see changes when woken for other (generally
correlated) reasons, such as the fact that part of the request could
be sent. And since the CF_READ_ATTACHED flag is cleared once leaving
process_session(), it is important not to miss it. It simply happens
that sometimes the server starts to respond in a sequence that validates
the connection in the middle of process_session(), that it is detected
after the analysers, and that the newly assigned CF_READ_ATTACHED is
not used to detect that the request analysers need to be called again,
then the flag is lost.

The CF_WAKE_WRITE flag doesn't work either because it's cleared upon
entry into process_session(), ie if we spend more than one call not
connecting.

Thus we need a new flag to tell the connection initiator that we are
specifically interested in being notified about connection establishment.
This new flag is CF_WAKE_CONNECT. It is set by the requester, and is
cleared once the connection succeeds, where CF_WAKE_ONCE is set instead,
causing the request analysers to be scanned again.

For future versions, some better options will have to be considered :
  - let all analysers subscribe to both request and response events ;
  - let analysers subscribe to stream interface events (reduces number
    of useless calls)
  - change CF_WAKE_WRITE's semantics to persist across calls to
    process_session(), but that is different from validating a
    connection establishment (eg: no data sent, or no data to send)

The bug was introduced in 1.5-dev23, no backport is needed.
2014-04-30 20:02:02 +02:00
Willy Tarreau
f51658dac4 MEDIUM: config: relax use_backend check to make the condition optional
Since it became possible to use log-format expressions in use_backend,
having a mandatory condition becomes annoying because configurations
are full of "if TRUE". Let's relax the check to accept no condition
like many other keywords (eg: redirect).
2014-04-23 01:21:56 +02:00
Willy Tarreau
b9a551e6aa BUG/MINOR: stats: last session was not always set
Cyril Bont reported that the "lastsess" field of a stats-only backend
was never updated. In fact the same is true for any applet and anything
not a server. Also, lastsess was not updated for a server reusing its
connection for a new request.

Since the goal of this field is to report recent activity, it's better
to ensure that all accesses are reported. The call has been moved to
the code validating the session establishment instead, since everything
passes there.
2014-04-23 00:35:17 +02:00
Willy Tarreau
5a8f947f4f CLEANUP: http: rename http_process_request_body()
This function does not process anything, it just waits for the beginning
of the request body. Let's rename it http_wait_for_request_body().
2014-04-22 23:15:27 +02:00
Thierry FOURNIER
d988f21589 BUG/MAJOR: session: fix a possible crash with src_tracked
Since commit 4d4149c ("MEDIUM: counters: support passing the counter
number as a fetch argument"), the sample fetch sc_tracked(num) became
equivalent to sc[0-9]_tracked, by using the same smp_fetch_sc_tracked()
function.

This was theorically made possible after the series of changes starting
with commit a65536ca ("MINOR: counters: provide a generic function to
retrieve a stkctr for sc* and src."). Unfortunately, while all other
functions were changed to use the generic primitive smp_fetch_sc_stkctr(),
smp_fetch_sc_tracked() was forgotten and is not able to differentiate
between sc_tracked, src_tracked and sc[0-9]_tracked. The resulting mess is
that if sc_tracked is used, the counter number is assumed to be 47 because
that's what remains after subtracting "0" from char "_".

Fix this by simply relying on the generic function as should have been
done. The bug was introduced in 1.5-dev20. No backport is needed.
2014-04-15 11:09:49 +02:00
Thierry FOURNIER
74c219dc04 BUG/MEDIUM: stick-table: fix IPv4-to-IPv6 conversion in src_* fetches
The function addr_to_stktable_key doesn't consider the expected
type of key. If the stick table key is based on IPv6 addresses
and the input is IPv4, the returned key is IPv4 adddress and his
length is 4 bytes, while is expected 16 bytes key.

This patch considers the expected key and try to convert IPv4 to
IPv6 and IPv6 to IPv4 according with the expected key.

This fixes the bug reported by Apollon Oikonomopoulos.

This bug was introduced somewhere in the 1.5-dev process.
2014-04-14 18:22:57 +02:00
Willy Tarreau
6a0b6bd648 BUG/MAJOR: counters: check for null-deref when looking up an alternate table
Constructions such as sc0_get_gpc0(foo) allow to look up the same key as
the current key but in an alternate table. A check was missing to ensure
we already have a key, resulting in a crash if this lookup is performed
before the associated track-sc rule.

This bug was reported on the mailing list by Neil@iamafreeman and
narrowed down further by Lukas Tribus and Thierry Fournier.

This bug was introduced in 1.5-dev20 by commit "0f791d4 MEDIUM: counters:
support looking up a key in an alternate table".
2014-04-09 13:32:11 +02:00
Bertrand Jacquin
702d44f2ff MEDIUM: proxy: support use_backend with dynamic names
We have a use case where we look up a customer ID in an HTTP header
and direct it to the corresponding server. This can easily be done
using ACLs and use_backend rules, but the configuration becomes
painful to maintain when the number of customers grows to a few
tens or even a several hundreds.

We realized it would be nice if we could make the use_backend
resolve its name at run time instead of config parsing time, and
use a similar expression as http-request add-header to decide on
the proper backend to use. This permits the use of prefixes or
even complex names in backend expressions. If no name matches,
then the default backend is used. Doing so allowed us to get rid
of all the use_backend rules.

Since there are some config checks on the use_backend rules to see
if the referenced backend exists, we want to keep them to detect
config errors in normal config. So this patch does not modify the
default behaviour and proceeds this way :

  - if the backend name in the use_backend directive parses as a log
    format rule, it's used as-is and is resolved at run time ;

  - otherwise it's a static name which must be valid at config time.

There was the possibility of doing this with the use-server directive
instead of use_backend, but it seems like use_backend is more suited
to this task, as it can be used for other purposes. For example, it
becomes easy to serve a customer-specific proxy.pac file based on the
customer ID by abusing the errorfile primitive :

     use_backend bk_cust_%[hdr(X-Cust-Id)] if { hdr(X-Cust-Id) -m found }
     default_backend bk_err_404

     backend bk_cust_1
         errorfile 200 /etc/haproxy/static/proxy.pac.cust1

Signed-off-by: Bertrand Jacquin <bjacquin@exosec.fr>
2014-03-31 10:18:30 +02:00
Thierry FOURNIER
a47a94fb13 MINOR: session: don't always assume there's a listener
For outgoing connections initiated from an applet, there might not be
any listener. It's the case with peers, which resort to a hack consisting
in making the session's listener point to the peer. This listener is only
used for statistics now so it's much easier to check for its presence now.
2014-03-28 13:16:32 +01:00
Willy Tarreau
7519560767 MINOR: http: release compression context only in http_end_txn()
Currently there are two places where the compression context is released,
one in session_free() and another one in http_end_txn_clean_session().
Both of them call http_end_txn(), either directly or via http_reset_txn(),
and this function is made for this exact purpose. So let's centralize the
call there instead.
2014-03-14 19:26:20 +01:00
Bhaskar Maddala
a20cb85eba MINOR: stats: Enhancement to stats page to provide information of last session time.
Summary:
Track and report last session time on the stats page for each server
in every backend, as well as the backend.

This attempts to address the requirement in the ROADMAP

  - add a last activity date for each server (req/resp) that will be
    displayed in the stats. It will be useful with soft stop.

The stats page reports this as time elapsed since last session. This
change does not adequately address the requirement for long running
session (websocket, RDP... etc).
2014-02-08 01:19:58 +01:00
Willy Tarreau
a23ee3a2ea MINOR: session: clean up the connection free code
Use conn_free() instead of pool_free2(conn...). This makes the code more
auditable.
2014-02-05 00:18:47 +01:00
Willy Tarreau
818dca5098 BUG/MEDIUM: listener: improve detection of non-working accept4()
On ARM, glibc does not implement accept4() and simply returns ENOSYS
which was not caught as a reason to fall back to accept(), resulting
in a spinning process since poll() would call again.

Let's change the error detection mechanism to save the broken status
of the syscall into a local variable that is used to fall back to the
legacy accept().

In addition to this, since the code was becoming a bit messy, the
accept4() was removed, so now the fallback code and the legacy code
are the same. This will also increase bug report accuracy if needed.

This is 1.5-specific, no backport is needed.
2014-01-31 19:40:19 +01:00
Willy Tarreau
cc08d2c9ff MEDIUM: counters: stop relying on session flags at all
Till now, we had one flag per stick counter to indicate if it was
tracked in a backend or in a frontend. We just had to add another
flag per stick-counter to indicate if it relies on contents or just
connection. These flags are quite painful to maintain and tend to
easily conflict with other flags if their number is changed.

The correct solution consists in moving the flags to the stkctr struct
itself, but currently this struct is made of 2 pointers, so adding a
new entry there to store only two bits will cause at least 16 more bytes
to be eaten per counter due to alignment issues, and we definitely don't
want to waste tens to hundreds of bytes per session just for things that
most users don't use.

Since we only need to store two bits per counter, an intermediate
solution consists in replacing the entry pointer with a composite
value made of the original entry pointer and the two flags in the
2 unused lower bits. If later a need for other flags arises, we'll
have to store them in the struct.

A few inline functions have been added to abstract the retrieval
and assignment of the pointers and flags, resulting in very few
changes. That way there is no more dependence on the number of
stick-counters and their position in the session flags.
2014-01-28 23:34:45 +01:00
Willy Tarreau
e9101695ef BUG/MEDIUM: counters: fix stick-table entry leak when using track-sc2 in connection
In 1.5-dev19, commit e25c917 ("MEDIUM: counters: add support for tracking
a third counter") introduced the third track counter. However, there was
a hard-coded test in the accept() error path to release only sc0 and sc1.
So it seems that if tracking sc2 at the connection level and deciding to
reject once the track-sc2 has been done, there could be some leaking of
stick-table entries which remain marked used forever, thus which can never
be purged nor expired. There's no memory leak though, it's just that
entries are unexpirable forever.

The simple solution consists in removing the test and always calling
the inline function which iterates over all entries.
2014-01-28 23:32:50 +01:00
Willy Tarreau
1f0da2485e BUG/MEDIUM: unique_id: HTTP request counter is not stable
Patrick Hemmer reported that using unique_id_format and logs did not
report the same unique ID counter since commit 9f09521 ("BUG/MEDIUM:
unique_id: HTTP request counter must be unique!"). This is because
the increment was done while producing the log message, so it was
performed twice.

A better solution consists in fetching a new value once per request
and saving it in the request or session context for all of this
request's life.

It happens that sessions already have a unique ID field which is used
for debugging and reporting errors, and which differs from the one
sent in logs and unique_id header.

So let's change this to reuse this field to have coherent IDs everywhere.
As of now, a session gets a new unique ID once it is instanciated. This
means that TCP sessions will also benefit from a unique ID that can be
logged. And this ID is renewed for each extra HTTP request received on
an existing session. Thus, all TCP sessions and HTTP requests will have
distinct IDs that will be stable along all their life, and coherent
between all places where they're used (logs, unique_id header,
"show sess", "show errors").

This feature is 1.5-specific, no backport to 1.4 is needed.
2014-01-25 11:07:06 +01:00
Willy Tarreau
2b028dd828 OPTIM: session: put unlikely() around the freewheeling code
The code which enables tunnel mode or TCP transfers is rarely used
and at most once per session. Putting it in an unlikely() clause
reduces the length of the hot path of process_session() which is
already quite long, and also slightly reduces its overall size.
Some measurements show a steady gain of about 0.2% thanks to this.
2013-12-31 23:56:46 +01:00
Willy Tarreau
9e5a3aacf4 MEDIUM: stream-int: make si_connect() return an established state when possible
si_connect() used to only return SI_ST_CON. But it already detect the
connection reuse and is the function which avoids calling connect().
So it already knows the connection is valid and reuse. Thus we make it
return SI_ST_EST when a connection is reused. This means that
connect_server() can return this state and sess_update_stream_int()
as well.

Thanks to this change, we don't need to leave process_session() in
SI_ST_CON state to immediately enter it again to switch to SI_ST_EST.
Implementing this removes one call to process_session() per request
in keep-alive mode. We're now at 2 calls per request, which is the
minimum (one for the request and another one for the response). The
number of calls to http_wait_for_response() has also dropped from 2
to one.

Tests indicate a performance gain of about 2.6% in request rate in
keep-alive mode. There should be no gain in http-server-close() since
we don't use this faster path.
2013-12-31 23:32:12 +01:00
Willy Tarreau
b44c873d61 MEDIUM: session: prepare to support earlier transitions to the established state
At the moment it is possible in sess_prepare_conn_req() to switch to the
established state when the target is an applet. But sess_update_stream_int()
will soon also have the ability to set the established state via
connect_server() when a connection is reused, leading to a synchronous
connect.

So prepare the code to handle this SI_ST_ASS -> SI_ST_EST transition, which
really matches what's done in the lower layers.
2013-12-31 23:16:50 +01:00
Willy Tarreau
0e37f1c40e MINOR: session: factor out the connect time measurement
Currently there are 3 places in the code where t_connect is set after
switching to state SI_ST_EST, and a fourth one will soon come. Since
all these places lead to an immediate call to sess_establish() to
complete the session establishment, better move that measurement
there.
2013-12-31 23:06:46 +01:00
Willy Tarreau
d81ca04051 OPTIM: session: set the READ_DONTWAIT flag when connecting
As soon as we connect to the server, we want to limit the number of
recvfrom() on the response path because most of the time a single
call will retrieve enough information.

At the moment this is only done in the HTTP response parser, after
some reads have already failed, which is too late. We need to do
that at the earliest possible instant. It was already done for the
request side by frontend_accept() for the first request, and by
http_reset_txn() for the next requests.

Thanks to this change, there are no more failed recvfrom() calls in
keep-alive mode.
2013-12-31 22:39:26 +01:00
Willy Tarreau
d7ad9f5b0d MAJOR: channel: add a new flag CF_WAKE_WRITE to notify the task of writes
Since commit 6b66f3e ([MAJOR] implement autonomous inter-socket forwarding)
introduced in 1.3.16-rc1, we've been relying on a stupid mechanism to wake
up the task after a write, which was an exact copy-paste of the reader side.

The principle was that if we empty a buffer and there's no forwarding
scheduled or if the *producer* is not in a connected state, then we wake
the task up.

That does not make any sense. It happens to wake up too late sometimes (eg,
when the request analyser waits for some room in the buffer to start to
work), and leads to unneeded wakeups in client-side keep-alive, because
the task is woken up when the response is sent, while the analysers are
simply waiting for a new request.

In order to fix this, we introduce a new channel flag : CF_WAKE_WRITE. It
is designed so that an analyser can explicitly request being notified when
some data were written. It is used only when the HTTP request or response
analysers need to wait for more room in the buffers. It is automatically
cleared upon wake up.

The flag is also automatically set by the functions which try to write into
a buffer from an applet when they fail (bi_putblk() etc...).

That allows us to remove the stupid condition above and avoid some wakeups.
In http-server-close and in http-keep-alive modes, this reduces from 4 to 3
the average number of wakeups per request, and increases the overall
performance by about 1.5%.
2013-12-31 18:37:36 +01:00
Willy Tarreau
068621e4ad MINOR: http: try to stick to same server after status 401/407
In HTTP keep-alive mode, if we receive a 401, we still have a chance
of being able to send the visitor again to the same server over the
same connection. This is required by some broken protocols such as
NTLM, and anyway whenever there is an opportunity for sending the
challenge to the proper place, it's better to do it (at least it
helps with debugging).
2013-12-23 15:12:44 +01:00
Willy Tarreau
2cff2f7bb8 MINOR: session: remove debugging code
The memset() was put here to corrupt memory for a debugging test,
it's not needed anymore and was unfortunately committed. It does
not harm anyway, it probably just slightly affects performance.
2013-12-16 10:12:54 +01:00
Willy Tarreau
59e3ff4549 BUG/MAJOR: session: repair tcp-request connection rules
Since recent commit f79c817 (MAJOR: connection: add two new flags to
indicate readiness of control/transport) and the surrounding commits,
the session initialization has been slightly delayed and the control
layer of the connection is not yet initialized when processing the
rules.

We need to move that minimal initialization a bit above.

The bug was introduced with latest changes, no backport is needed.
2013-12-16 02:23:50 +01:00
Willy Tarreau
89efaed6b6 BUILD: definitely silence some stupid GCC warnings
It's becoming increasingly difficult to ignore unwanted function returns in
debug code with gcc. Now even when you try to work around it, it suggests a
way to write your code differently. For example :

    src/frontend.c:187:65: warning: if statement has empty body [-Wempty-body]
                if (write(1, trash.str, trash.len) < 0) /* shut gcc warning */;
                                                                              ^
    src/frontend.c:187:65: note: put the semicolon on a separate line to silence this warning
    1 warning generated.

This is totally unacceptable, this code already had to be written this way
to shut it up in earlier versions. And now it comments the form ? What's the
purpose of the C language if you can't write anymore the code that does what
you want ?

Emeric proposed to just keep a global variable to drain such useless results
so that gcc stops complaining all the time it believes people who write code
are monkeys. The solution is acceptable because the useless assignment is done
only in debug code so it will not impact performance. This patch implements
this, until gcc becomes even "smarter" to detect that we tried to cheat.
2013-12-13 15:21:36 +01:00
Willy Tarreau
6bbb2f68cd MINOR: session: report lack of resources using the new stream-interface's error code
Let's now use SI_ET_CONN_RES to report lack of resources instead of
SO_ET_CONN_OTHER with a handcrafted code.
2013-12-09 17:14:23 +01:00
Willy Tarreau
2d400bb931 MINOR: stream_interface: add reporting of ressouce allocation errors
SSL and keep-alive will need to be able to fail on allocation errors,
and the stream interface did not allow to report such a cause. The flag
will then be "RC" as already documented.
2013-12-09 17:12:18 +01:00
Willy Tarreau
4384ddfc84 MEDIUM: session: automatically register the applet designated by the target
Some applet users don't need to initialize their applet, they just want
to route the traffic there just as if it were a server. Since applets
are now connected to from session.c, let's simply ensure that when
connecting, the applet in si->end matches the target, and allocate
one there if it's not already done. In case of error, we force the
status code to resource and connection so that it's clear that it
happens because of a memory shortage.
2013-12-09 15:40:23 +01:00
Willy Tarreau
32e3c6a607 MAJOR: stream interface: dynamically allocate the outgoing connection
The outgoing connection is now allocated dynamically upon the first attempt
to touch the connection's source or destination address. If this allocation
fails, we fail on SN_ERR_RESOURCE.

As we didn't use si->conn anymore, it was removed. The endpoints are released
upon session_free(), on the error path, and upon a new transaction. That way
we are able to carry the existing server's address across retries.

The stream interfaces are not initialized anymore before session_complete(),
so we could even think about allocating them dynamically as well, though
that would not provide much savings.

The session initialization now makes use of conn_new()/conn_free(). This
slightly simplifies the code and makes it more logical. The connection
initialization code is now shorter by about 120 bytes because it's done
at once, allowing the compiler to remove all redundant initializations.

The si_attach_applet() function now takes care of first detaching the
existing endpoint, and it is called from stream_int_register_handler(),
so we can safely remove the calls to si_release_endpoint() in the
application code around this call.

A call to si_detach() was made upon stream_int_unregister_handler() to
ensure we always free the allocated connection if one was allocated in
parallel to setting an applet (eg: detect HTTP proxy while proceeding
with stats maybe).
2013-12-09 15:40:23 +01:00
Willy Tarreau
2a6e8802c0 MEDIUM: stream-interface: introduce si_attach_conn to replace si_prepare_conn
si_prepare_conn() is not appropriate in our case as it both initializes and
attaches the connection to the stream interface. Due to the asymmetry between
accept() and connect(), it causes some fields such as the control and transport
layers to be reinitialized.

Now that we can separately initialize these fields using conn_prepare(), let's
break this function to only attach the connection to the stream interface.

Also, by analogy, si_prepare_none() was renamed si_detach(), and
si_prepare_applet() was renamed si_attach_applet().
2013-12-09 15:40:23 +01:00
Willy Tarreau
7abddb5c67 MINOR: connection: replace conn_assign with conn_attach
We don't want to assign the control nor transport layers anymore
at the same time as the data layer, because it prevents one from
keeping existing settings when reattaching a connection to an
existing stream interface.

Let's have conn_attach() replace conn_assign() for this purpose.

Thus, conn_prepare() + conn_attach() do exactly the same as the
previous conn_assign().
2013-12-09 15:40:23 +01:00
Willy Tarreau
910c6aa5b7 MINOR: connection: reintroduce conn_prepare to set the protocol and transport
Now that we can assign conn->xprt regardless of the initialization state,
we can reintroduce conn_prepare() to set only the protocol, the transport
layer and initialize the transport layer's state.
2013-12-09 15:40:23 +01:00
Willy Tarreau
3ed35ef05b MINOR: stream-interface: introduce si_reset() and si_set_state()
The first function is used to (re)initialize a stream interface and
the second to force it into a known state. These are intended for
cleaning up the stream interface initialization code in session.c
and peers.c and avoiding future issues with missing initializations.
2013-12-09 15:40:23 +01:00
Willy Tarreau
f79c8171b2 MAJOR: connection: add two new flags to indicate readiness of control/transport
Currently the control and transport layers of a connection are supposed
to be initialized when their respective pointers are not NULL. This will
not work anymore when we plan to reuse connections, because there is an
asymmetry between the accept() side and the connect() side :

  - on accept() side, the fd is set first, then the ctrl layer then the
    transport layer ; upon error, they must be undone in the reverse order,
    then the FD must be closed. The FD must not be deleted if the control
    layer was not yet initialized ;

  - on the connect() side, the fd is set last and there is no reliable way
    to know if it has been initialized or not. In practice it's initialized
    to -1 first but this is hackish and supposes that local FDs only will
    be used forever. Also, there are even less solutions for keeping trace
    of the transport layer's state.

Also it is possible to support delayed close() when something (eg: logs)
tracks some information requiring the transport and/or control layers,
making it even more difficult to clean them.

So the proposed solution is to add two flags to the connection :

  - CO_FL_CTRL_READY is set when the control layer is initialized (fd_insert)
    and cleared after it's released (fd_delete).

  - CO_FL_XPRT_READY is set when the control layer is initialized (xprt->init)
    and cleared after it's released (xprt->close).

The functions have been adapted to rely on this and not on the pointers
anymore. conn_xprt_close() was unused and dangerous : it did not close
the control layer (eg: the socket itself) but still marks the transport
layer as closed, preventing any future call to conn_full_close() from
finishing the job.

The problem comes from conn_full_close() in fact. It needs to close the
xprt and ctrl layers independantly. After that we're still having an issue :
we don't know based on ->ctrl alone whether the fd was registered or not.
For this we use the two new flags CO_FL_XPRT_READY and CO_FL_CTRL_READY. We
now rely on this and not on conn->xprt nor conn->ctrl anymore to decide what
remains to be done on the connection.

In order not to miss some flag assignments, we introduce conn_ctrl_init()
to initialize the control layer, register the fd using fd_insert() and set
the flag, and conn_ctrl_close() which unregisters the fd and removes the
flag, but only if the transport layer was closed.

Similarly, at the transport layer, conn_xprt_init() calls ->init and sets
the flag, while conn_xprt_close() checks the flag, calls ->close and clears
the flag, regardless xprt_ctx or xprt_st. This also ensures that the ->init
and the ->close functions are called only once each and in the correct order.
Note that conn_xprt_close() does nothing if the transport layer is still
tracked.

conn_full_close() now simply calls conn_xprt_close() then conn_full_close()
in turn, which do nothing if CO_FL_XPRT_TRACKED is set.

In order to handle the error path, we also provide conn_force_close() which
ignores CO_FL_XPRT_TRACKED and closes the transport and the control layers
in turns. All relevant instances of fd_delete() have been replaced with
conn_force_close(). Now we always know what state the connection is in and
we can expect to split its initialization.
2013-12-09 15:40:23 +01:00
Willy Tarreau
c10aec299f MINOR: get rid of si_takeover_conn()
Since last commit, this function is an exact copy of si_prepare_conn().
2013-12-09 15:40:23 +01:00