All the HTX definition is self-contained and doesn't really depend on
anything external since it's a mostly protocol. In addition, some
external similar files (like h2) also placed in common used to rely
on it, making it a bit awkward.
This patch moves the two htx.h files into a single self-contained one.
The historical dependency on sample.h could be also removed since it
used to be there only for http_meth_t which is now in http.h.
Now, the function htx_from_buf() will set the buffer's length to its size
automatically. In return, the caller should call htx_to_buf() at the end to be
sure to leave the buffer hosting the HTX message in the right state. When the
caller can use the function htxbuf() to get the HTX message without any update
on the underlying buffer.
First, to be called on HTX streams, a filter must explicitly be declared as
compatible by setting the flag STRM_FLT_FL_HAS_FILTERS on the filter's config at
HAProxy startup. This flag is checked when a filter implementation is attached
to a stream.
Then, some changes have been made on HTTP callbacks. The callback http_payload
has been added to filter HTX data. It will be called on HTX streams only. It
replaces the callbacks http_data, http_chunk_trailers and http_forward_data,
called on legacy HTTP streams only and marked as deprecated. The documention
(once updated)) will give all information to implement this new callback. Other
HTTP callbacks will be called for HTX and HTTP legacy streams. So it is the
filter's responsibility to known which kind of data it handles. The macro
IS_HTX_STRM should be used in such cases.
There is at least a noticeable changes in the way data are forwarded. In HTX,
after the call to the callback http_headers, all the headers are considered as
forwarded. So, in http_payload, only the body and eventually the trailers will
be filtered.
Instead of exporting a number of pools and having to manually delete
them in deinit() or to have dedicated destructors to remove them, let's
simply kill all pools on deinit().
For this a new function pool_destroy_all() was introduced. As its name
implies, it destroys and frees all pools (provided they don't have any
user anymore of course).
This allowed to remove 4 implicit destructors, 2 explicit ones, and 11
individual calls to pool_destroy(). In addition it properly removes
the mux_pt_ctx pool which was not cleared on exit (no backport needed
here since it's 1.9 only). The sig_handler pool doesn't need to be
exported anymore and became static now.
This commit replaces the explicit pool creation that are made in
constructors with a pool registration. Not only this simplifies the
pools declaration (it can be done on a single line after the head is
declared), but it also removes references to pools from within
constructors. The only remaining create_pool() calls are those
performed in init functions after the config is parsed, so there
is no more user of potentially uninitialized pool now.
It has been the opportunity to remove no less than 12 constructors
and 6 init functions.
Most calls to hap_register_post_check(), hap_register_post_deinit(),
hap_register_per_thread_init(), hap_register_per_thread_deinit() can
be done using initcalls and will not require a constructor anymore.
Let's create a set of simplified macros for this, called respectively
REGISTER_POST_CHECK, REGISTER_POST_DEINIT, REGISTER_PER_THREAD_INIT,
and REGISTER_PER_THREAD_DEINIT.
Some files were not modified because they wouldn't benefit from this
or because they conditionally register (e.g. the pollers).
This switches explicit calls to various trivial registration methods for
keywords, muxes or protocols from constructors to INITCALL1 at stage
STG_REGISTER. All these calls have in common to consume a single pointer
and return void. Doing this removes 26 constructors. The following calls
were addressed :
- acl_register_keywords
- bind_register_keywords
- cfg_register_keywords
- cli_register_kw
- flt_register_keywords
- http_req_keywords_register
- http_res_keywords_register
- protocol_register
- register_mux_proto
- sample_register_convs
- sample_register_fetches
- srv_register_keywords
- tcp_req_conn_keywords_register
- tcp_req_cont_keywords_register
- tcp_req_sess_keywords_register
- tcp_res_cont_keywords_register
- flt_register_keywords
For now, the filters are not compatible with the new HTX internal representation
of HTTP messages. Thus, for a given proxy, when the option "http-use-htx" is
enabled, an error is triggered if any filter is also configured.
Now all the code used to manipulate chunks uses a struct buffer instead.
The functions are still called "chunk*", and some of them will progressively
move to the generic buffer handling code as they are cleaned up.
Now the buffers only contain the header and a pointer to the storage
area which can be anywhere. This will significantly simplify buffer
swapping and will make it possible to map chunks on buffers as well.
The buf_empty variable was removed, as now it's enough to have size==0
and area==NULL to designate the empty buffer (thus a non-allocated head
is the empty buffer by default). buf_wanted for now is indicated by
size==0 and area==(void *)1.
The channels and the checks now embed the buffer's head, and the only
pointer is to the storage area. This slightly increases the unallocated
buffer size (3 extra ints for the empty buffer) but considerably
simplifies dynamic buffer management. It will also later permit to
detach unused checks.
The way the struct buffer is arranged has proven quite efficient on a
number of tests, which makes sense given that size is always accessed
and often first, followed by the othe ones.
These ones manipulate the output data count which will be specific to
the channel soon, so prepare the call points to use the channel only.
The b_* functions are now unused and were removed.
During the migration to the second version of the pools, the new
functions and pool pointers were all called "pool_something2()" and
"pool2_something". Now there's no more pool v1 code and it's a real
pain to still have to deal with this. Let's clean this up now by
removing the "2" everywhere, and by renaming the pool heads
"pool_head_something".
Rename the global variable "proxy" to "proxies_list".
There's been multiple proxies in haproxy for quite some time, and "proxy"
is a potential source of bugs, a number of functions have a "proxy" argument,
and some code used "proxy" when it really meant "px" or "curproxy". It worked
by pure luck, because it usually happened while parsing the config, and thus
"proxy" pointed to the currently parsed proxy, but we should probably not
rely on this.
[wt: some of these are definitely fixes that are worth backporting]
Now, it is possible to define init_per_thread and deinit_per_thread callbacks to
deal with ressources allocation for each thread.
This is the filter responsibility to deal with concurrency. This is also the
filter responsibility to know if HAProxy is started with some threads. A good
way to do so is to check "global.nbthread" value. If it is greater than 1, then
_per_thread callbacks will be called.
In the commit 2b553de5 ("BUG/MINOR: filters: Don't force the stream's wakeup
when we wait in flt_end_analyze"), we removed a task_wakeup in flt_end_analyze
to no consume too much CPU by looping in certain circumstances.
But this fix was too drastic. For Keep-Alive transactions, flt_end_analyze is
often called only for the response. Then the stream is paused until a timeout is
hitted or the next request is received. We need first let a chance to both
channels to call flt_end_analyze function. Then if a filter need to wait here,
it is its responsibility to wake up the stream when needed. To fix the bug, and
thanks to previous commits, we set the flag CF_WAKE_ONCE on channels to pretend
there is an activity. On the current channel, the flag will be removed without
any effect, but for the other side the analyzer will be called immediatly.
Thanks for Lukas Tribus for his detailed analysis of the bug.
This patch must be backported in 1.7 with the 2 previous ones:
* a94fda3 ("BUG/MINOR: http: Don't reset the transaction if there are still data to send")
* cdaea89 ("BUG/MINOR: stream: Don't forget to remove CF_WAKE_ONCE flag on response channel")
In flt_end_analyze, we wait that the anlayze is finished for both the request
and the response. In this case, because of a task_wakeup, some streams can
consume too much CPU to do nothing. So now, this is the filter's responsibility
to know if this wakeup is needed.
This fix should be backported in 1.7.
When a filter is used, there are 2 channel's analyzers to surround all the
others, flt_start_analyze and flt_end_analyze. This is the good place to acquire
and release resources used by filters, when needed. In addition, the last one is
used to synchronize the both channels, especially for HTTP streams. We must wait
that the analyze is finished for the both channels for an HTTP transaction
before restarting it for the next one.
But this part was buggy, leading to unexpected behaviours. First, depending on
which channel ends first, the request or the response can be switch in a
"forward forever" mode. Then, the HTTP transaction can be cleaned up too early,
while a processing is still in progress on a channel.
To fix the bug, the flag CF_FLT_ANALYZE has been added. It is set on channels in
flt_start_analyze and is kept if at least one filter is still analyzing the
channel. So, we can trigger the channel syncrhonization if this flag was removed
on the both channels. In addition, the flag TX_WAIT_CLEANUP has been added on
the transaction to know if the transaction must be cleaned up or not during
channels syncrhonization. This way, we are sure to reset everything once all the
processings are finished.
This patch should be backported in 1.7.
This commit removes second argument(msgnum) from http_error_message and
changes http_error_message to use s->txn->status/http_get_status_idx for
mapping status code from 200..504 to HTTP_ERR_200..HTTP_ERR_504(enum).
This is needed for http-request tarpit deny_status commit.
It is important to defined analyzers (AN_REQ_* and AN_RES_*) in the same order
they are evaluated in process_stream. This order is really important because
during analyzers evaluation, we run them in the order of the lower bit to the
higher one. This way, when an analyzer adds/removes another one during its
evaluation, we know if it is located before or after it. So, when it adds an
analyzer which is located before it, we can switch to it immediately, even if it
has already been called once but removed since.
With the time, and introduction of new analyzers, this order was broken up. the
main problems come from the filter analyzers. We used values not related with
their evaluation order. Furthermore, we used same values for request and response
analyzers.
So, to fix the bug, filter analyzers have been splitted in 2 distinct lists to
have different analyzers for the request channel than those for the response
channel. And of course, we have moved them to the right place.
Some other analyzers have been reordered to respect the evaluation order:
* AN_REQ_HTTP_TARPIT has been moved just before AN_REQ_SRV_RULES
* AN_REQ_PRST_RDP_COOKIE has been moved just before AN_REQ_STICKING_RULES
* AN_RES_STORE_RULES has been moved just after AN_RES_WAIT_HTTP
Note today we have 29 analyzers, all stored into a 32 bits bitfield. So we can
still add 4 more analyzers before having a problem. A good way to fend off the
problem for a while could be to have a different bitfield for request and
response analyzers.
[wt: all of this must be backported to 1.7, and part of it must be backported
to 1.6 and 1.5]
Now, for TCP streams, backend filters are released when the stream is
destroyed. But, for HTTP streams, these filters are released when the
transaction analyze ends, in flt_end_analyze callback.
New callbacks have been added to handle creation and destruction of filter
instances:
* 'attach' callback is called after a filter instance creation, when it is
attached to a stream. This happens when the stream is started for filters
defined on the stream's frontend and when the backend is set for filters
declared on the stream's backend. It is possible to ignore the filter, if
needed, by returning 0. This could be useful to have conditional filtering.
* 'detach' callback is called when a filter instance is detached from a stream,
before its destruction. This happens when the stream is stopped for filters
defined on the stream's frontend and when the analyze ends for filters defined
on the stream's backend.
In addition, the callback 'stream_set_backend' has been added to know when a
backend is set for a stream. It is only called when the frontend and the backend
are not the same. And it is called for all filters attached to a stream
(frontend and backend).
Finally, the TRACE filter has been updated.
Filters can alter data during the parsing, i.e when http_data or tcp_data
callbacks are called. For now, the update must be done by hand. So we must
handle changes in the channel buffers, especially on the number of input bytes
pending (buf->i).
In addition, a filter can choose to switch channel buffers to do its
updates. So, during data filtering, we must always use the right buffer and not
use variable to reference them.
Without this patch, filters cannot safely alter data during the data parsing.
'channel_analyze' callback has been removed. Now, there are 2 callbacks to
surround calls to analyzers:
* channel_pre_analyze: Called BEFORE all filterable analyzers. it can be
called many times for the same analyzer, once at each loop until the
analyzer finishes its processing. This callback is resumable, it returns a
negative value if an error occurs, 0 if it needs to wait, any other value
otherwise.
* channel_post_analyze: Called AFTER all filterable analyzers. Here, AFTER
means when an analyzer finishes its processing. This callback is NOT
resumable, it returns a negative value if an error occurs, any other value
otherwise.
Pre and post analyzer callbacks are not automatically called. 'pre_analyzers'
and 'post_analyzers' bit fields in the filter structure must be set to the right
value using AN_* flags (see include/types/channel.h).
The flag AN_RES_ALL has been added (AN_REQ_ALL already exists) to ease the life
of filter developers. AN_REQ_ALL and AN_RES_ALL include all filterable
analyzers.
Instead of calling 'channel_analyze' callback with the flag AN_FLT_HTTP_HDRS,
now we use the new callback 'http_headers'. This change is done because
'channel_analyze' callback will be removed in a next commit.
Add opaque data between the filter keyword registrering and the parsing
function. This opaque data allow to use the same parser with differents
registered keywords. The opaque data is used for giving data which mainly
makes difference between the two keywords.
It will be used with Lua keywords registering.
Now, filter's configuration (.id, .conf and .ops fields) is stored in the
structure 'flt_conf'. So proxies own a flt_conf list instead of a filter
list. When a filter is attached to a stream, it gets a pointer on its
configuration. This avoids mixing the filter's context (owns by a stream) and
its configuration (owns by a proxy). It also saves 2 pointers per filter
instance.
Before, functions to filter HTTP body (and TCP data) were called from the moment
at least one filter was attached to the stream. If no filter is interested by
these data, this uselessly slows data parsing.
A good example is the HTTP compression filter. Depending of request and response
headers, the response compression can be enabled or not. So it could be really
nice to call it only when enabled.
So, now, to filter HTTP/TCP data, a filter must use the function
register_data_filter. For TCP streams, this function can be called only
once. But for HTTP streams, when needed, it must be called for each HTTP request
or HTTP response.
Only registered filters will be called during data parsing. At any time, a
filter can be unregistered by calling the function unregister_data_filter.
From the stream point of view, this new structure is opaque. it hides filters
implementation details. So, impact for future optimizations will be reduced
(well, we hope so...).
Some small improvements has been made in filters.c to avoid useless checks.
This new analyzer will be called for each HTTP request/response, before the
parsing of the body. It is identified by AN_FLT_HTTP_HDRS.
Special care was taken about the following condition :
* the frontend is a TCP proxy
* filters are defined in the frontend section
* the selected backend is a HTTP proxy
So, this patch explicitly add AN_FLT_HTTP_HDRS analyzer on the request and the
response channels when the backend is a HTTP proxy and when there are filters
attatched on the stream.
This patch simplifies http_request_forward_body and http_response_forward_body
functions.
For Chunked HTTP request/response, the body filtering can be really
expensive. In the worse case (many chunks of 1 bytes), the filters overhead is
of 3 calls per chunk. If http_data callback is useful, others are just
informative.
So these callbacks has been removed. Of course, existing filters (trace and
compression) has beeen updated accordingly. For the HTTP compression filter, the
update is quite huge. Its implementation is closer to the old one.
When no filter is attached to the stream, the CPU footprint due to the calls to
filters_* functions is huge, especially for chunk-encoded messages. Using macros
to check if we have some filters or not is a great improvement.
Furthermore, instead of checking the filter list emptiness, we introduce a flag
to know if filters are attached or not to a stream.
HTTP compression has been rewritten to use the filter API. This is more a PoC
than other thing for now. It allocates memory to work. So, if only for that, it
should be rewritten.
In the mean time, the implementation has been refactored to allow its use with
other filters. However, there are limitations that should be respected:
- No filter placed after the compression one is allowed to change input data
(in 'http_data' callback).
- No filter placed before the compression one is allowed to change forwarded
data (in 'http_forward_data' callback).
For now, these limitations are informal, so you should be careful when you use
several filters.
About the configuration, 'compression' keywords are still supported and must be
used to configure the HTTP compression behavior. In absence of a 'filter' line
for the compression filter, it is added in the filter chain when the first
compression' line is parsed. This is an easy way to do when you do not use other
filters. But another filter exists, an error is reported so that the user must
explicitly declare the filter.
For example:
listen tst
...
compression algo gzip
compression offload
...
filter flt_1
filter compression
filter flt_2
...
When all callbacks have been called for all filters registered on a stream, if
we are waiting for the next HTTP request, we must reset stream analyzers. But it
is useless to do so if the client has already closed the connection.
This patch adds the support of filters in HAProxy. The main idea is to have a
way to "easely" extend HAProxy by adding some "modules", called filters, that
will be able to change HAProxy behavior in a programmatic way.
To do so, many entry points has been added in code to let filters to hook up to
different steps of the processing. A filter must define a flt_ops sutrctures
(see include/types/filters.h for details). This structure contains all available
callbacks that a filter can define:
struct flt_ops {
/*
* Callbacks to manage the filter lifecycle
*/
int (*init) (struct proxy *p);
void (*deinit)(struct proxy *p);
int (*check) (struct proxy *p);
/*
* Stream callbacks
*/
void (*stream_start) (struct stream *s);
void (*stream_accept) (struct stream *s);
void (*session_establish)(struct stream *s);
void (*stream_stop) (struct stream *s);
/*
* HTTP callbacks
*/
int (*http_start) (struct stream *s, struct http_msg *msg);
int (*http_start_body) (struct stream *s, struct http_msg *msg);
int (*http_start_chunk) (struct stream *s, struct http_msg *msg);
int (*http_data) (struct stream *s, struct http_msg *msg);
int (*http_last_chunk) (struct stream *s, struct http_msg *msg);
int (*http_end_chunk) (struct stream *s, struct http_msg *msg);
int (*http_chunk_trailers)(struct stream *s, struct http_msg *msg);
int (*http_end_body) (struct stream *s, struct http_msg *msg);
void (*http_end) (struct stream *s, struct http_msg *msg);
void (*http_reset) (struct stream *s, struct http_msg *msg);
int (*http_pre_process) (struct stream *s, struct http_msg *msg);
int (*http_post_process) (struct stream *s, struct http_msg *msg);
void (*http_reply) (struct stream *s, short status,
const struct chunk *msg);
};
To declare and use a filter, in the configuration, the "filter" keyword must be
used in a listener/frontend section:
frontend test
...
filter <FILTER-NAME> [OPTIONS...]
The filter referenced by the <FILTER-NAME> must declare a configuration parser
on its own name to fill flt_ops and filter_conf field in the proxy's
structure. An exemple will be provided later to make it perfectly clear.
For now, filters cannot be used in backend section. But this is only a matter of
time. Documentation will also be added later. This is the first commit of a long
list about filters.
It is possible to have several filters on the same listener/frontend. These
filters are stored in an array of at most MAX_FILTERS elements (define in
include/types/filters.h). Again, this will be replaced later by a list of
filters.
The filter API has been highly refactored. Main changes are:
* Now, HA supports an infinite number of filters per proxy. To do so, filters
are stored in list.
* Because filters are stored in list, filters state has been moved from the
channel structure to the filter structure. This is cleaner because there is no
more info about filters in channel structure.
* It is possible to defined filters on backends only. For such filters,
stream_start/stream_stop callbacks are not called. Of course, it is possible
to mix frontend and backend filters.
* Now, TCP streams are also filtered. All callbacks without the 'http_' prefix
are called for all kind of streams. In addition, 2 new callbacks were added to
filter data exchanged through a TCP stream:
- tcp_data: it is called when new data are available or when old unprocessed
data are still waiting.
- tcp_forward_data: it is called when some data can be consumed.
* New callbacks attached to channel were added:
- channel_start_analyze: it is called when a filter is ready to process data
exchanged through a channel. 2 new analyzers (a frontend and a backend)
are attached to channels to call this callback. For a frontend filter, it
is called before any other analyzer. For a backend filter, it is called
when a backend is attached to a stream. So some processing cannot be
filtered in that case.
- channel_analyze: it is called before each analyzer attached to a channel,
expects analyzers responsible for data sending.
- channel_end_analyze: it is called when all other analyzers have finished
their processing. A new analyzers is attached to channels to call this
callback. For a TCP stream, this is always the last one called. For a HTTP
one, the callback is called when a request/response ends, so it is called
one time for each request/response.
* 'session_established' callback has been removed. Everything that is done in
this callback can be handled by 'channel_start_analyze' on the response
channel.
* 'http_pre_process' and 'http_post_process' callbacks have been replaced by
'channel_analyze'.
* 'http_start' callback has been replaced by 'http_headers'. This new one is
called just before headers sending and parsing of the body.
* 'http_end' callback has been replaced by 'channel_end_analyze'.
* It is possible to set a forwarder for TCP channels. It was already possible to
do it for HTTP ones.
* Forwarders can partially consumed forwardable data. For this reason a new
HTTP message state was added before HTTP_MSG_DONE : HTTP_MSG_ENDING.
Now all filters can define corresponding callbacks (http_forward_data
and tcp_forward_data). Each filter owns 2 offsets relative to buf->p, next and
forward, to track, respectively, input data already parsed but not forwarded yet
by the filter and parsed data considered as forwarded by the filter. A any time,
we have the warranty that a filter cannot parse or forward more input than
previous ones. And, of course, it cannot forward more input than it has
parsed. 2 macros has been added to retrieve these offets: FLT_NXT and FLT_FWD.
In addition, 2 functions has been added to change the 'next size' and the
'forward size' of a filter. When a filter parses input data, it can alter these
data, so the size of these data can vary. This action has an effet on all
previous filters that must be handled. To do so, the function
'filter_change_next_size' must be called, passing the size variation. In the
same spirit, if a filter alter forwarded data, it must call the function
'filter_change_forward_size'. 'filter_change_next_size' can be called in
'http_data' and 'tcp_data' callbacks and only these ones. And
'filter_change_forward_size' can be called in 'http_forward_data' and
'tcp_forward_data' callbacks and only these ones. The data changes are the
filter responsability, but with some limitation. It must not change already
parsed/forwarded data or data that previous filters have not parsed/forwarded
yet.
Because filters can be used on backends, when we the backend is set for a
stream, we add filters defined for this backend in the filter list of the
stream. But we must only do that when the backend and the frontend of the stream
are not the same. Else same filters are added a second time leading to undefined
behavior.
The HTTP compression code had to be moved.
So it simplifies http_response_forward_body function. To do so, the way the data
are forwarded has changed. Now, a filter (and only one) can forward data. In a
commit to come, this limitation will be removed to let all filters take part to
data forwarding. There are 2 new functions that filters should use to deal with
this feature:
* flt_set_http_data_forwarder: This function sets the filter (using its id)
that will forward data for the specified HTTP message. It is possible if it
was not already set by another filter _AND_ if no data was yet forwarded
(msg->msg_state <= HTTP_MSG_BODY). It returns -1 if an error occurs.
* flt_http_data_forwarder: This function returns the filter id that will
forward data for the specified HTTP message. If there is no forwarder set, it
returns -1.
When an HTTP data forwarder is set for the response, the HTTP compression is
disabled. Of course, this is not definitive.