Commit 24db47e0 tried to improve support for delayed ACK upon connect
but it was incomplete, because checks with the proxy protocol would
always enable polling for data receive and there was no way of
distinguishing data polling and delayed ack.
So we add a distinct delack flag to the connect() function so that
the caller decides whether or not to use a delayed ack regardless
of pending data (eg: when send-proxy is in use). Doing so covers all
combinations of { (check with data), (sendproxy), (smart-connect) }.
Pure TCP checks only use the SYN/ACK in return to a SYN. By forcing
the system to use delayed ACKs, it is possible to send an RST instead
of the ACK and thus ensure that the application will never be needlessly
woken up. This avoids error logs or counters on checked components since
the application is never made aware of this connection which dies in the
network stack.
Health checks currently still use the connection's fd to know whether
a check is running (this needs to change). When a health check
immediately fails during connect() because of a lack of local resource
(eg: port), we failed to unset the fd, so each time the process_chk
woken up after such an error, it believed a check was still running
and used to close the fd again instead of starting a new check. This
could result in other connections being closed because they were
assigned the same fd value.
The bug is only marked medium because when this happens, the system
is already in a bad state.
A comment was added above tcp_connect_server() to clarify that the
fd is *not* valid on error.
Instead of storing a couple of (int, ptr) in the struct connection
and the struct session, we use a different method : we only store a
pointer to an integer which is stored inside the target object and
which contains a unique type identifier. That way, the pointer allows
us to retrieve the object type (by dereferencing it) and the object's
address (by computing the displacement in the target structure). The
NULL pointer always corresponds to OBJ_TYPE_NONE.
This reduces the size of the connection and session structs. It also
simplifies target assignment and compare.
In order to improve the generated code, we try to put the obj_type
element at the beginning of all the structs (listener, server, proxy,
si_applet), so that the original and target pointers are always equal.
A lot of code was touched by massive replaces, but the changes are not
that important.
We will need to be able to switch server connections on a session and
to keep idle connections. In order to achieve this, the preliminary
requirement is that the connections can survive the session and be
detached from them.
Right now they're still allocated at exactly the same place, so when
there is a session, there are always 2 connections. We could soon
improve on this by allocating the outgoing connection only during a
connect().
This current patch touches a lot of code and intentionally does not
change any functionnality. Performance tests show no regression (even
a very minor improvement). The doc has not yet been updated.
Thomas Heil reported that health checks did not work anymore when a backend
or server has "usesrc clientip". This is because the source address is not
set and tcp_bind_socket() tries to bind to that address anyway.
The solution consists in explicitly clearing the source address in the checks
and to make tcp_bind_socket() avoid binding when the address is not set. This
also has an indirect benefit that a useless bind() syscall will be avoided
when using "source 0.0.0.0 usesrc clientip" in health checks.
With this commit, we now separate the channel from the buffer. This will
allow us to replace buffers on the fly without touching the channel. Since
nobody is supposed to keep a reference to a buffer anymore, doing so is not
a problem and will also permit some copy-less data manipulation.
Interestingly, these changes have shown a 2% performance increase on some
workloads, probably due to a better cache placement of data.
While working on the changes required to make the health checks use the
new connections, it started to become obvious that some naming was not
logical at all in the connections. Specifically, it is not logical to
call the "data layer" the layer which is in charge for all the handshake
and which does not yet provide a data layer once established until a
session has allocated all the required buffers.
In fact, it's more a transport layer, which makes much more sense. The
transport layer offers a medium on which data can transit, and it offers
the functions to move these data when the upper layer requests this. And
it is the upper layer which iterates over the transport layer's functions
to move data which should be called the data layer.
The use case where it's obvious is with embryonic sessions : an incoming
SSL connection is accepted. Only the connection is allocated, not the
buffers nor stream interface, etc... The connection handles the SSL
handshake by itself. Once this handshake is complete, we can't use the
data functions because the buffers and stream interface are not there
yet. Hence we have to first call a specific function to complete the
session initialization, after which we'll be able to use the data
functions. This clearly proves that SSL here is only a transport layer
and that the stream interface constitutes the data layer.
A similar change will be performed to rename app_cb => data, but the
two could not be in the same commit for obvious reasons.
Navigating through listeners was very inconvenient and error-prone. Not to
mention that listeners were linked in reverse order and reverted afterwards.
In order to definitely get rid of these issues, we now do the following :
- frontends have a dual-linked list of bind_conf
- frontends have a dual-linked list of listeners
- bind_conf have a dual-linked list of listeners
- listeners have a pointer to their bind_conf
This way we can now navigate from anywhere to anywhere and always find the
proper bind_conf for a given listener, as well as find the list of listeners
for a current bind_conf.
Now proto_tcp.c is responsible for the 4 settings it handles :
- defer-accept
- interface
- mss
- transparent
These ones do not need to be handled in cfgparse anymore. If support for a
setting is disabled by a missing build option, then cfgparse correctly
reports :
[ALERT] 255/232700 (2701) : parsing [echo.cfg:114] : 'bind' : 'transparent' option is not implemented in this version (check build options).
It appears that fd.h includes a number of unneeded files and was
included from standard.h, and as such served as an intermediary
to provide almost everything to everyone.
By removing its useless includes, a long dependency chain broke
but could easily be fixed.
These flags were added for TCP_CORK. They were only set at various places
but never checked by any user since TCP_CORK was replaced with MSG_MORE.
Simply get rid of this now.
SSL need to initialize the data layer before proceeding with data. At
the moment, this data layer is automatically initialized from itself,
which will not be possible once we extract connection from sessions
since we'll only create the data layer once the handshake is finished.
So let's have the application layer initialize the data layer before
using it.
Make it more obvious that this function does not depend on any knowledge
of the session. This is important to plan for TCP rules that can run on
connection without any initialized session yet.
The last uses of the stream interfaces were in tcp_connect_server() and
could easily and more appropriately be moved to its callers, si_connect()
and connect_server(), making a lot more sense.
Now the function should theorically be usable for health checks.
It also appears more obvious that the file is split into two distinct
parts :
- the protocol layer used at the connection level
- the tcp analysers executing tcp-* rules and their samples/acls.
These ones are implicitly handled by the connection's data layer, no need
to rely on them anymore and reaching them maintains undesired dependences
on stream-interface.
We need to have the source and destination addresses in the connection.
They were lying in the stream interface so let's move them. The flags
SI_FL_FROM_SET and SI_FL_TO_SET have been moved as well.
It's worth noting that tcp_connect_server() almost does not use the
stream interface anymore except for a few flags.
It has been identified that once we detach the connection from the SI,
it will probably be needed to keep a copy of the server-side addresses
in the SI just for logging purposes. This has not been implemented right
now though.
This is a massive rename of most functions which should make use of the
word "channel" instead of the word "buffer" in their names.
In concerns the following ones (new names) :
unsigned long long channel_forward(struct channel *buf, unsigned long long bytes);
static inline void channel_init(struct channel *buf)
static inline int channel_input_closed(struct channel *buf)
static inline int channel_output_closed(struct channel *buf)
static inline void channel_check_timeouts(struct channel *b)
static inline void channel_erase(struct channel *buf)
static inline void channel_shutr_now(struct channel *buf)
static inline void channel_shutw_now(struct channel *buf)
static inline void channel_abort(struct channel *buf)
static inline void channel_stop_hijacker(struct channel *buf)
static inline void channel_auto_connect(struct channel *buf)
static inline void channel_dont_connect(struct channel *buf)
static inline void channel_auto_close(struct channel *buf)
static inline void channel_dont_close(struct channel *buf)
static inline void channel_auto_read(struct channel *buf)
static inline void channel_dont_read(struct channel *buf)
unsigned long long channel_forward(struct channel *buf, unsigned long long bytes)
Some functions provided by channel.[ch] have kept their "buffer" name because
they are really designed to act on the buffer according to some information
gathered from the channel. They have been moved together to the same place in
the file for better readability but they were not changed at all.
The "buffer" memory pool was also renamed "channel".
Get rid of these confusing BF_* flags. Now channel naming should clearly
be used everywhere appropriate.
No code was changed, only a renaming was performed. The comments about
channel operations was updated.
This flag is quite complex to get right and updating it everywhere is a
major pain, especially since the buffer/channel split. This is the first
step of getting rid of it. Instead now it's dynamically computed whenever
needed.
This flag was very problematic because it was composite in that both changes
to the pipe or to the buffer had to cause this flag to be updated, which is
not always simple (eg: there may not even be a channel attached to a buffer
at all).
There were not that many users of this flags, mostly setters. So the flag got
replaced with a macro which reports whether the channel is empty or not, by
checking both the pipe and the buffer.
One part of the change is sensible : the flag was also part of BF_MASK_STATIC,
which is used by process_session() to rescan all analysers in case the flag's
status changes. At first glance, none of the analysers seems to change its
mind base on this flag when it is subject to change, so it seems fine not to
add variation checks here. Otherwise it's possible that checking the buffer's
output size is more useful than checking the flag's replacement.
The splicing is now provided by the data-layer rcv_pipe/snd_pipe functions
which in turn are called by the stream interface's recv and send callbacks.
The presence of the rcv_pipe/snd_pipe functions is used to attest support
for splicing at the data layer. It looks like the stream-interface's
SI_FL_CAP_SPLICE flag does not make sense anymore as it's used as a proxy
for the pointers above.
It also appears that we call chk_snd() from the recv callback and then
try to call it again in update_conn(). It is very likely that this last
function will progressively slip into the recv/send callbacks in order
to avoid duplicate check code.
The code works right now with and without splicing. Only raw_sock provides
support for it and it is automatically selected when the various splice
options are set. However it looks like splice-auto doesn't enable it, which
possibly means that the streamer detection code does not work anymore, or
that it's only called at a time where it's too late to enable splicing (in
process_session).
The "raw_sock" prefix will be more convenient for naming functions as
it will be prefixed with the data layer and suffixed with the data
direction. So let's rename the files now to avoid any further confusion.
The #include directive was also removed from a number of files which do
not need it anymore.
At the moment, the struct is still embedded into the struct channel, but
all the functions have been updated to use struct buffer only when possible,
otherwise struct channel. Some functions would likely need to be splitted
between a buffer-layer primitive and a channel-layer function.
Later the buffer should become a pointer in the struct buffer, but doing so
requires a few changes to the buffer allocation calls.
This is a massive rename. We'll then split channel and buffer.
This change needs a lot of cleanups. At many locations, the parameter
or variable is still called "buf" which will become ambiguous. Also,
the "struct channel" is still defined in buffers.h.
This is a second attempt at getting rid of FD_WAIT_*. Now the situation is
much better since native I/O handlers can directly manipulate the FD using
fd_{poll|want|stop}_* and the connection handlers manipulate connection-level
flags using the conn_{data|sock}_* equivalent.
Proceeding this way ensures that the connection flags always reflect the
reality even after data<->handshake switches.
Now the connection handler, the handshake callbacks and the I/O callbacks
make use of the connection-layer polling functions to enable or disable
polling on a file descriptor.
Some changes still need to be done to avoid using the FD_WAIT_* constants.
These functions have a more explicity meaning and will offer provisions
for explicit polling.
EV_FD_ISSET() has been left for now as it is still in use in checks.
This new flag is used to indicate that the connection was already
connected. It can be used by I/O handlers to know that a connection
has just completed. It is used by stream_sock_update_conn(), allowing
the sock_opt handlers not to manipulate the SI timeout nor the
BF_WRITE_NULL flag anymore.
It's better to have only stream_sock_update_conn() handle the conversion
of the CO_FL_ERROR flag to SI_FL_ERR than having it in each and every I/O
callback.
The sock_ops I/O callbacks made use of an FD till now. This has become
inappropriate and the struct connection is much more useful. It also
fixes the race condition introduced by previous change.
The socket data layer code must only focus on moving data between a
socket and a buffer. We need a special stream interface handler to
update the stream interface and the file descriptor status.
At the moment the code works but suffers from a race condition caused
by its API : the read/write callbacks still make use of the fd instead
of using the connection. And when a double shutdown is performed, a call
to ->write() after ->read() processed an error results in dereferencing
a NULL fdtab[]->owner. This is only a temporary issue which doesn't need
to be fixed now since this will automatically go away when the functions
change to use the connection instead.
Use a single tcp_connect_probe() instead of tcp_connect_write() and
tcp_connect_read(). We call this one only when no data layer function
have been processed, so this is a fallback to test for completion of
a connection attempt.
With this done, we don't have the need for any direct I/O callback
anymore.
The function still relies on ->write() to wake the stream interface up,
so it's not finished.
This handshake handler must be independant, so move it away from
proto_tcp. It has a dedicated connection flag. It is tested before
I/O handlers and automatically removes the CO_FL_WAIT_L4_CONN flag
upon success.
It also sets the BF_WRITE_NULL flag on the stream interface and
stops the SI timeout. However it does not perform the task_wakeup(),
and relies on the data handler to do so for now. The SI wakeup will
have to be moved elsewhere anyway.
fdtab[].state was only used to know whether a connection was in progress
or an error was encountered. Instead we now use connection->flags to store
a flag for both. This way, connection management will be able to update the
connection status on I/O.
The destination address is purely a connection thing and not an fd thing.
It's also likely that later the address will be stored into the connection
and linked to by the SI.
struct fdinfo only keeps the pointer to the port range and the local port
for now. All of this also needs to move to the connection but before this
the release of the port range must move from fd_delete() to a new function
dedicated to the connection.
We start to move everything needed to manage a connection to a special
entity "struct connection". We have the data layer operations and the
control operations there. We'll also have more info in the future such
as file descriptors and applet contexts, so that in the end it becomes
detachable from the stream interface, which will allow connections to
be reused between sessions.
For now on, we start with minimal changes.
It is much better and more efficient to consider that the send-proxy
feature is part of the protocol layer than part of the data layer.
Now the connection is considered established once the send-proxy line
has been sent.
This way the data layer doesn't have to care anymore about this specific
part.
The tcp_connect_write() function now automatically calls the data layer
write() function once the connection is established, which saves calls
to epoll_ctl/epoll_wait/process_session.
It's starting to look more and more obvious that tcp_connect_read() and
tcp_connect_write() are not TCP-specific but only socket-specific and as
such should probably move, along with some functions from protocol.c, to
a socket-specific file (eg: stream_sock).
It would be nice to be able to support autonomous listeners to parse the
proxy protocol before accepting a connection, so that we get rid of it
at the session layer and to support using these informations in the
tcp-request connection rules.
If the connect succeeds exactly at the same millisecond as the connect
timeout is supposed to strike, the timeout is still considered while
data may have already be sent. This results in a new connection attempt
with no data and with the response being lost.
Note that in practice the only real-world situation where this is observed
is when connect timeouts are extremely low, too low for safe operations.
This bug was encountered with a 1ms connect timeout.
It is also present on 1.4 and needs to be fixed there too.
Calling the init() function in sess_establish was a bad idea, it is
too late to allow it to fail on lack of resource and does not help at
all. Remove it for now before it's used.
These pointers were used to hold pointers to buffers in the past, but
since we introduced the stream interface, they're no longer used but
they were still sometimes set.
Removing them shrink the struct fdtab from 32 to 24 bytes on 32-bit machines,
and from 52 to 36 bytes on 64-bit machines, which is a significant saving. A
quick tests shows a steady 0.5% performance gain, probably due to the better
cache efficiency.
Up to now, if an outgoing connection had no data to send, the socket layer
had to perform a connect() again to check for establishment. This is not
acceptable for SSL, and will cause problems with socketpair(). Some socket
layers will also need an initializer before sending data (eg: SSL).
The solution consists in moving the connect() test to the protocol layer
(eg: TCP) and to make it hold the fd->write callback until the connection
is validated. At this point, it will switch the write callback to the
socket layer's write function. In fact we need to hold both read and write
callbacks to ensure the socket layer is never called before being initialized.
This intermediate callback is used only if there is a socket init function
or if there are no data to send.
The socket layer does not have any code to check for connection establishment
anymore, which makes sense.
Commit e164e7a removed get_src/get_dst setting in the stream interfaces but
forgot to set it in proto_tcp. Get the feature back because we need it for
logging, transparent mode, ACLs etc... We now rely on the stream interface
direction to know what syscall to use.
One benefit of doing it this way is that we don't use getsockopt() anymore
on outgoing stream interfaces nor on UNIX sockets.
We'll soon have an SSL socket layer, and in order to ease the difference
between the two, we use the name "sock_raw" to designate the one which
directly talks to the sockets without any conversion.
All keywords registered using a cfg_kw_list now make use of the new error reporting
framework. This allows easier and more precise error reporting without having to
deal with complex buffer allocation issues.
These methods have been superseded by src and dst which support
multiple families. There is no point keeping them since they appeared
in a development version anyway.
For configurations using "src6", please use "src" instead. For "dst6",
use "dst" instead.
The previous sockstream_accept() function uses nothing from sockstream, and
is totally irrelevant to stream interfaces. Move this to the protocols.c
file which handles listeners and protocols, and call it listener_accept().
It now makes much more sense that the code dealing with listen() also handles
accept() and passes it to upper layers.
This is mainly a massive renaming in the code to get it in line with the
calling convention. Next patch will rename a few files to complete this
operation.
All parsing errors were known but impossible to return. Now by making use
of memprintf(), we're able to build meaningful error messages that the
caller can display.
pattern_fetch_rdp_cookie() is useless now since it only used to add controls
on top of smp_fetch_rdp_cookie() which have now been integrated into the
pattern subsystem. Let's remove it.
A test was already performed which worked by pure luck due to integer types,
otherwise it would have been possible to start checking for an offset out of
the buffer's bounds if the buffer size was large enough to allow an integer
wrap. Let's perform explicit checks and use unsigned ints for offsets instead
of risking being hit later.
These ones were easy to adapt to ACL usage and may really be useful,
so let's make them available right now. It's likely that some extension
such as regex, string-to-IP and raw IP matching will be implemented in
the near future.
Since pattern_process() is able to automatically cast returned types
into expected types, we can safely use the sample functions to fetch
addresses whatever their family. The lowest castable type must be
declared with the keyword so that config checks pass.
Right now this means that src/dst use the same fetch function for ACLs
and patterns. src6/dst6 have been kept so that configs which explicitly
rely on v6 are properly checked.
src_port, dst_port and url_param have converged between ACLs and patterns.
This means that src_port is now available in patterns and that urlp_* has
been added to ACLs. Some code has moved to accommodate for static function
definitions, but there were little changes.
Patterns were using a bitmask to indicate if request or response was desired
in fetch functions and keywords. ACLs were using a bitmask in fetch keywords
and a single bit in fetch functions. ACLs were also using an ACL_PARTIAL bit
in fetch functions indicating that a non-final fetch was performed, which was
an abuse of the existing direction flag.
The change now consists in using :
- a capabilities field for fetch keywords => SMP_CAP_REQ/RES to indicate
if a keyword supports requests, responses, both, etc...
- an option field for fetch functions to indicate what the caller expects
(request/response, final/non-final)
The ACL_PARTIAL bit was reversed to get SMP_OPT_FINAL as it's more explicit
to know we're working on a final buffer than on a non-final one.
ACL_DIR_* were removed, as well as PATTERN_FETCH_*. L4 fetches were improved
to support being called on responses too since they're still available.
The <dir> field of all fetch functions was changed to <opt> which is now
unsigned.
The patch is large but mostly made of cosmetic changes to accomodate this, as
almost no logic change happened.
The former was only a wrapper to the second, let's remove it now that
the calling convention is exactly the same. This is the first function
to be unified between ACLs and samples.
Having the args everywhere will make it easier to share fetch functions
between patterns and ACLs. The only place where we could have needed
the expr was in the http_prefetch function which can do well without.
Previously, both pattern, backend and persist_rdp_cookie would build fake
ACL expressions to fetch an RDP cookie by calling acl_fetch_rdp_cookie().
Now we switch roles. The RDP cookie fetch function is provided as a sample
fetch function that all others rely on, including ACL. The code is exactly
the same, only the args handling moved from expr->args to args. The code
was moved to proto_tcp.c, but probably that a dedicated file would be more
suited to content handling.
We need the pattern fetchers and converters to correctly set the output type
so that they can be used by ACL fetchers. By using the sample type instead of
the keyword type, we also open the possibility to create some multi-type
pattern fetch methods later (eg: "src" being v4/v6). Right now the type in
the keyword is used to validate the configuration.
Now there is no more reference to union pattern_data. All pattern fetch and
conversion functions now make use of the common sample type. Note: none of
them adjust the type right now so it's important to do it next otherwise
we would risk sharing such functions with ACLs and seeing them fail.
This one is not needed anymore as we can return the data and its type in the
sample provided by the caller. ACLs now always return the proper type. BOOL
is already returned when the result is expected to be processed as a boolean.
temp_pattern has been unexported now.
The new sample types are necessary for the acl-pattern convergence.
These types are boolean and signed int. Some types were renamed for
less ambiguity (ip->ipv4, integer->uint).
This is used to validate that arguments are coherent. For instance,
payload_lv expects that the last arg (if any) is not more negative
than the sum of the first two. The error is reported if any.
We don't need the pattern-specific args parsers anymore, make use of the
common parser instead. We still need to improve this by adding a validation
function to report abnormal argument values or combinations. We don't report
precise parsing errors yet but this was not previously done either.
arg_i was almost unused, and since we migrated to use struct arg everywhere,
the rare cases where arg_i was needed could be replaced by switching to
arg->type = ARGT_STOP.
The types and minimal number of ACL keyword arguments are now stored in
their declaration. This will allow many more fantasies if some ACL use
several arguments or types.
Doing so required to rework all ACL keyword declarations to add two
parameters. So this was a good opportunity for a general cleanup and
to sort all entries in alphabetical order.
We still have two pending issues :
- parse_acl_expr() checks for errors but has no way to report them to
the user ;
- the types of some arguments are still not resolved and kept as strings
(eg: ARGT_FE/BE/TAB) for compatibility reasons, which must be resolved
in acl_find_targets()
The ACL parser now uses the argument parser to build a typed argument list.
Right now arguments are all strings and only one argument is supported since
this is what ACLs currently support.
This change introduces the buffer's base pointer, which is the limit between
incoming and outgoing data. It's the point where the parsing should start
from. A number of computations have already been greatly simplified, but
more simplifications are expected to come from the removal of buf->r.
The changes appear good and have revealed occasional improper use of some
pointers. It is possible that this patch has introduced bugs or revealed
some, although preliminary testings tend to indicate that everything still
works as it should.
We don't have buf->l anymore. We have buf->i for pending data and
the total length is retrieved by adding buf->o. Some computation
already become simpler.
Despite extreme care, bugs are not excluded.
It's worth noting that msg->err_pos as set by HTTP request/response
analysers becomes relative to pending data and not to the beginning
of the buffer. This has not been completed yet so differences might
occur when outgoing data are left in the buffer.
These callbacks are used to retrieve the source and destination address
of a socket. The address flags are not hold on the stream interface and
not on the session anymore. The addresses are collected when needed.
This still needs to be improved to store the IP and port separately so
that it is not needed to perform a getsockname() when only the IP address
is desired for outgoing traffic.
%Bi return the backend source IP
%Bp return the backend source port
Add a function pointer in logformat_type to do additional configuration
during the log-format variable parsing.