A problem was reported recently by some users of programs compiled
with Go 1.5 which by default blocks all signals before executing
processes, resulting in haproxy not receiving SIGUSR1 or even SIGTERM,
causing lots of zombie processes.
This problem was apparently observed by users of consul and kubernetes
(at least).
This patch is a workaround for this issue. It consists in unblocking
all signals on startup. Since they're normally not blocked in a regular
shell, it ensures haproxy always starts under the same conditions.
Quite useful information reported by both Matti Savolainen and REN
Xiaolei actually helped find the root cause of this problem and this
workaround. Thanks to them for this.
This patch must be backported to 1.6 and 1.5 where the problem is
observed.
commit 7c0ffd23 is only considering the explicit use of the "process" keyword
on the listeners. But at this step, if it's not defined in the configuration,
the listener bind_proc mask is set to 0. As a result, the code will compute
the maxaccept value based on only 1 process, which is not always true.
For example :
global
nbproc 4
frontend test
bind-process 1-2
bind :80
Here, the maxaccept value for the "test" frontend was set to the global
tune.maxaccept value (default to 64), whereas it should consider 2 processes
will accept connections. As of the documentation, the value should be divided
by twice the number of processes the listener is bound to.
To fix this, we can consider that if no mask is set to the listener, we take
the frontend mask.
This is not critical but it can introduce unfairness distribution of the
incoming connections across the processes.
It should be backported to the same branches as commit 7c0ffd23 (1.6 and 1.5
were in the scope).
When a listener is not bound to a process its frontend belongs to, it
is only paused and not stopped. This creates confusion from the outside
as "netstat -ltnp" for example will report only the parent process as
the listener instead of the effective one. "ss -lnp" will report that
all processes are listening to all sockets.
This is confusing enough to suggest a fix. Now we simply stop the unused
listeners. Example with this simple config :
global
nbproc 4
frontend haproxy_test
bind-process 1-40
bind :12345 process 1
bind :12345 process 2
bind :12345 process 3
bind :12345 process 4
Before the patch :
$ netstat -ltnp
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 30457/./haproxy
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 30457/./haproxy
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 30457/./haproxy
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 30457/./haproxy
After the patch :
$ netstat -ltnp
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 30504/./haproxy
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 30503/./haproxy
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 30502/./haproxy
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 30501/./haproxy
This patch may be backported to 1.6 and 1.5, but it relies on commit
7a798e5 ("CLEANUP: fix inconsistency between fd->iocb, proto->accept
and accept()") since it will expose an API inconsistency by including
listener.h in the .c.
Christian Ruppert reported a performance degradation when binding a
single frontend to many processes while only one bind line was being
used, bound to a single process.
The reason comes from the fact that whenever a listener is bound to
multiple processes, the it is assigned a maxaccept value which equals
half the global maxaccept value divided by the number of processes the
frontend is bound to. The purpose is to ensure that no single process
will drain all the incoming requests at once and ensure a fair share
between all listeners. Usually this works pretty well, when a listener
is bound to all the processes of its frontend. But here we're in a
situation where the maxaccept of a listener which is bound to a single
process is still divided by a large value.
The fix consists in taking into account the number of processes the
listener is bound do and not only those of the frontend. This way it
is perfectly possible to benefit from nbproc and SO_REUSEPORT without
performance degradation.
1.6 and 1.5 normally suffer from the same issue.
There's quite some inconsistency in the internal API. listener_accept()
which is the main accept() function returns void but is declared as int
in the include file. It's assigned to proto->accept() for all stream
protocols where an int is expected but the result is never checked (nor
is it documented by the way). This proto->accept() is in turn assigned
to fd->iocb() which is supposed to return an int composed of FD_WAIT_*
flags, but which is never checked either.
So let's fix all this mess :
- nobody checks accept()'s return
- nobody checks iocb()'s return
- nobody sets a return value
=> let's mark all these functions void and keep the current ones intact.
Additionally we now include listener.h from listener.c to ensure we won't
silently hide this incoherency in the future.
Note that this patch could/should be backported to 1.6 and even 1.5 to
simplify debugging sessions.
parse_binary line 2025 checks the nullity of binstr parameter.
Other calls of parse_binary properly zeroify this parameter.
[wt: this could result in random failures of the const parser]
dns_option struct pref_net field is an array of 5. The issue
here shows that pref_net_nb can go up to 5 as well which might lead
to read outside of this array.
Depending on the path that led to sess_update_stream_int(), it's
possible that we had a read error on the frontend, but that we haven't
checked if we may abort the connection.
This was seen in particular the following setup: tcp mode, with
abortonclose set, frontend using ssl. If the ssl connection had a first
successful read, but the second read failed, we would stil try to open a
connection to the backend, although we had enough information to close
the connection early.
sess_update_stream_int() had some logic to handle that case in the
SI_ST_QUE and SI_ST_TAR, but that was missing in the SI_ST_ASS case.
This patches addresses the issue by verifying the state of the req
channel (and the abortonclose option) right before opening the
connection to the backend, so we have the opportunity to close the
connection there, and factorizes the shared SI_ST_{QUE,TAR,ASS} code.
Emeric found that some certificate files that were valid with the old method
(the one with the explicit name involving SSL_CTX_use_PrivateKey_file()) do
not work anymore with the new one (the one trying to load multiple cert types
using PEM_read_bio_PrivateKey()). With the last one, the private key couldn't
be loaded.
The difference was related to the ordering in the PEM file was different. The
old method would always work. The new method only works if the private key is
at the top, or if it appears as an "EC" private key. The cause in fact is that
we never rewind the BIO between the various calls. So this patch moves the
loading of the private key as the first step, then it rewinds the BIO, and
then it loads the cert and the chain. With this everything works.
No backport is needed, this issue came with the recent addition of the
multi-cert support.
The following patch allow to log cookie for tarpit and denied request.
This minor bug affect at least 1.5, 1.6 and 1.7 branch.
The solution is not perfect : may be the cookie processing
(manage_client_side_cookies) can be moved into http_process_req_common.
060e57301d introduced a bug, related to a
dns option structure change and an improper rebase.
Thanks Lukas Tribus for reporting it.
backport: 1.7 and above
first, we modify the signatures of http_msg_forward_body and
http_msg_forward_chunked_body as they are declared as inline
below. Secondly, just verify the returns of the chunk initialization
which holds the Authorization Method (althought it is unlikely to fail ...).
Both from gcc warnings.
After Cedric Jeanneret reported an issue with HAProxy and DNS resolution
when multiple servers are in use, I saw that the implementation of DNS
query type update on resolution timeout was not implemented, even if it
is documented.
backport: 1.6 and above
A bug leading HAProxy to stop DNS resolution when multiple servers are
configured and one is in timeout, the request is not resent.
Current code fix this issue.
backport status: 1.6 and above
This just happens to work as it is the correct chunk, but should be whatever
gets passed in as argument.
Signed-off-by: Conrad Hoffmann <conrad@soundcloud.com>
Instead of repeating the type of the LHS argument (sizeof(struct ...))
in calls to malloc/calloc, we directly use the pointer
name (sizeof(*...)). The following Coccinelle patch was used:
@@
type T;
T *x;
@@
x = malloc(
- sizeof(T)
+ sizeof(*x)
)
@@
type T;
T *x;
@@
x = calloc(1,
- sizeof(T)
+ sizeof(*x)
)
When the LHS is not just a variable name, no change is made. Moreover,
the following patch was used to ensure that "1" is consistently used as
a first argument of calloc, not the last one:
@@
@@
calloc(
+ 1,
...
- ,1
)
In C89, "void *" is automatically promoted to any pointer type. Casting
the result of malloc/calloc to the type of the LHS variable is therefore
unneeded.
Most of this patch was built using this Coccinelle patch:
@@
type T;
@@
- (T *)
(\(lua_touserdata\|malloc\|calloc\|SSL_get_app_data\|hlua_checkudata\|lua_newuserdata\)(...))
@@
type T;
T *x;
void *data;
@@
x =
- (T *)
data
@@
type T;
T *x;
T *data;
@@
x =
- (T *)
data
Unfortunately, either Coccinelle or I is too limited to detect situation
where a complex RHS expression is of type "void *" and therefore casting
is not needed. Those cases were manually examined and corrected.
There are two issues with error captures. The first one is that the
capture size is still hard-coded to BUFSIZE regardless of any possible
tune.bufsize setting and of the fact that frontends only capture request
errors and that backends only capture response errors. The second is that
captures are allocated in both directions for all proxies, which start to
count a lot in configs using thousands of proxies.
This patch changes this so that error captures are allocated only when
needed, and of the proper size. It also refrains from dumping a buffer
that was not allocated, which still allows to emit all relevant info
such as flags and HTTP states. This way it is possible to save up to
32 kB of RAM per proxy in the default configuration.
The sc_* sample fetch can work without the struct strm, because the
tracked counters are also stored in the session. So, this patchs
removes the check for the strm existance.
This bug is recent and was introduced in 1.7-dev2 by commit 6204cd9
("BUG/MAJOR: vars: always retrieve the stream and session from the sample")
This bugfix must be backported in 1.6.
This patch splits the function stats_dump_be_stats() in two parts. The
part is called stats_fill_be_stats(), and just fill the stats buffer.
This split allows the usage of preformated stats in other parts of HAProxy
like the Lua.
This patch splits the function stats_dump_sv_stats() in two parts. The
extracted part is called stats_fill_sv_stats(), and just fill the stats buffer.
This split allows the usage of preformated stats in other parts of HAProxy
like the Lua.
This patch splits the function stats_dump_li_stats() in two parts. The
extracted part is called stats_fill_li_stats(), and just fill the stats buffer.
This split allows the usage of preformated stats in other parts of HAProxy
like the Lua.
This patch splits the function stats_dump_fe_stats() in two parts. The
extracted part is called stats_fill_fe_stats(), and just fill the stats buffer.
This split allows the usage of preformated stats in other parts of HAProxy
like the Lua.
This patch splits the function stats_dump_info_to_buffer() in two parts. The
extracted part is called stats_fill_info(), and just fill the stats buffer.
This split allows the usage of preformated stats in other parts of HAProxy
like the Lua.
This patch adds a sample fetch which returns the unique-id if it is
configured. If the unique-id is not yet generated, it build it. If
the unique-id is not configured, it returns none.
Some internal HAproxy error message are provided with a final '\n'.
Its typically for the integration in the CLI. Sometimes, these messages
are returned as Lua string. These string must be without "\n" or final
spaces.
This patch adds a function whoch removes unrequired parameters.
This patch adds a Lua post initialisation wrapper. It already exists for
pure Lua function, now it executes also C. It is useful for doing things
when the configuration is ready to use. For example we can can browse and
register all the proxies.
All the HAProxy Lua object are declared with the same pattern:
- Add the function __tosting which dumps the object name
- Register the name in the Lua REGISTRY
- Register the reference ID
These action are refactored in on function. This remove some
lines of code.
The modified function are declared in the safe environment, so
they must called from safe environement. As the environement is
safe, its useles to check the stack size.
The functions
- hlua_class_const_int()
- hlua_class_const_str()
- hlua_class_function()
are use for common class registration actions.
The function 'hlua_dump_object()' is generic dump name function.
These functions can be used by all the HAProxy objects, so I move
it into the safe functions file.
Similar issue was fixed in 67dad27, but the fix is incomplete. Crash still
happened when utilizing req.fhdr() and sending exactly MAX_HDR_HISTORY
headers.
This fix needs to be backported to 1.5 and 1.6.
Signed-off-by: Nenad Merdanovic <nmerdan@anine.io>
Olivier Doucet reported the issue on the ML and tested that when using
more than TLS_TICKETS_NO keys in the file, the CPU usage is much higeher
than expected.
Lukas Tribus then provided a test case which showed that resumption doesn't
work at all in that case.
This fix needs to be backported to 1.6.
Signed-off-by: Nenad Merdanovic <nmerdan@anine.io>
The frequency counters's window start is sent as "now - freq.date",
which is a positive age compared to the current date. But on receipt,
this age was added to the current date instead of subtracted. So
since the date was always in the future, they were always expired if
the activity changed side in less than the counter's measuring period
(eg: 10s).
This bug was reported by Christian Ruppert who also provided an easy
reproducer.
It needs to be backported to 1.6.
Regarding the minor update introduced in the
cd6c3c7cb4 commit, the DeviceAtlas
module is now able to use up to 12 device properties via the
new ARG12 macro.
I just met this warning today making me realize that haproxy's
headers were included prior to the system ones, so all #ifndefs
are taken first then the system redefines them. Simply move
haproxy includes after the system's. This should be backported
to 1.6 as well.
In file included from /usr/include/bits/fcntl.h:61:0,
from /usr/include/fcntl.h:35,
from src/namespace.c:13:
/usr/include/bits/fcntl-linux.h:203:0: warning: "F_SETPIPE_SZ" redefined [enabled by default]
In file included from include/common/config.h:26:0,
from include/proto/log.h:29,
from src/namespace.c:7:
include/common/compat.h:81:0: note: this is the location of the previous definition
The strftime() function can call tzset() internally on some platforms.
When haproxy is chrooted, the /etc/localtime file is not found, and some
implementations will clobber the content of the current timezone.
The GMT offset is computed by diffing the times returned by gmtime_r() and
localtime_r(). These variants are guaranteed to not call tzset() and were
already used in haproxy while chrooted, so they should be safe.
This patch must be backported to 1.6 and 1.5.
Released version 1.7-dev2 with the following main changes :
- DOC: lua: fix lua API
- DOC: mailers: typo in 'hostname' description
- DOC: compression: missing mention of libslz for compression algorithm
- BUILD/MINOR: regex: missing header
- BUG/MINOR: stream: bad return code
- DOC: lua: fix somme errors and add implicit types
- MINOR: lua: add set/get priv for applets
- BUG/MINOR: http: fix several off-by-one errors in the url_param parser
- BUG/MINOR: http: Be sure to process all the data received from a server
- MINOR: filters/http: Use a wrapper function instead of stream_int_retnclose
- BUG/MINOR: chunk: make chunk_dup() always check and set dst->size
- DOC: ssl: fixed some formatting errors in crt tag
- MINOR: chunks: ensure that chunk_strcpy() adds a trailing zero
- MINOR: chunks: add chunk_strcat() and chunk_newstr()
- MINOR: chunk: make chunk_initstr() take a const string
- MEDIUM: tools: add csv_enc_append() to preserve the original chunk
- MINOR: tools: make csv_enc_append() always start at the first byte of the chunk
- MINOR: lru: new function to delete <nb> least recently used keys
- DOC: add Ben Shillito as the maintainer of 51d
- BUG/MINOR: 51d: Ensures a unique domain for each configuration
- BUG/MINOR: 51d: Aligns Pattern cache implementation with HAProxy best practices.
- BUG/MINOR: 51d: Releases workset back to pool.
- BUG/MINOR: 51d: Aligned const pointers to changes in 51Degrees.
- CLEANUP: 51d: Aligned if statements with HAProxy best practices and removed casts from malloc.
- MINOR: rename master process name in -Ds (systemd mode)
- DOC: fix a few spelling mistakes
- DOC: fix "workaround" spelling
- BUG/MINOR: examples: Fixing haproxy.spec to remove references to .cfg files
- MINOR: fix the return type for dns_response_get_query_id() function
- MINOR: server state: missing LF (\n) on error message printed when parsing server state file
- BUG/MEDIUM: dns: no DNS resolution happens if no ports provided to the nameserver
- BUG/MAJOR: servers state: server port is erased when dns resolution is enabled on a server
- BUG/MEDIUM: servers state: server port is used uninitialized
- BUG/MEDIUM: config: Adding validation to stick-table expire value.
- BUG/MEDIUM: sample: http_date() doesn't provide the right day of the week
- BUG/MEDIUM: channel: fix miscalculation of available buffer space.
- MEDIUM: pools: add a new flag to avoid rounding pool size up
- BUG/MEDIUM: buffers: do not round up buffer size during allocation
- BUG/MINOR: stream: don't force retries if the server is DOWN
- BUG/MINOR: counters: make the sc-inc-gpc0 and sc-set-gpt0 touch the table
- MINOR: unix: don't mention free ports on EAGAIN
- BUG/CLEANUP: CLI: report the proper field states in "show sess"
- MINOR: stats: send content-length with the redirect to allow keep-alive
- BUG: stream_interface: Reuse connection even if the output channel is empty
- DOC: remove old tunnel mode assumptions
- BUG/MAJOR: http-reuse: fix risk of orphaned connections
- BUG/MEDIUM: http-reuse: do not share private connections across backends
- BUG/MINOR: ssl: Be sure to use unique serial for regenerated certificates
- BUG/MINOR: stats: fix missing comma in stats on agent drain
- MAJOR: filters: Add filters support
- MINOR: filters: Do not reset stream analyzers if the client is gone
- REORG: filters: Prepare creation of the HTTP compression filter
- MAJOR: filters/http: Rewrite the HTTP compression as a filter
- MEDIUM: filters: Use macros to call filters callbacks to speed-up processing
- MEDIUM: filters: remove http_start_chunk, http_last_chunk and http_chunk_end
- MEDIUM: filters: Replace filter_http_headers callback by an analyzer
- MEDIUM: filters/http: Move body parsing of HTTP messages in dedicated functions
- MINOR: filters: Add stream_filters structure to hide filters info
- MAJOR: filters: Require explicit registration to filter HTTP body and TCP data
- MINOR: filters: Remove unused or useless stuff and do small optimizations
- MEDIUM: filters: Optimize the HTTP compression for chunk encoded response
- MINOR: filters/http: Slightly update the parsing of chunks
- MINOR: filters/http: Forward remaining data when a channel has no "data" filters
- MINOR: filters: Add an filter example
- MINOR: filters: Extract proxy stuff from the struct filter
- MINOR: map: Add regex matching replacement
- BUG/MINOR: lua: unsafe initialization
- DOC: lua: fix somme errors
- MINOR: lua: file dedicated to unsafe functions
- MINOR: lua: add "now" time function
- MINOR: standard: add RFC HTTP date parser
- MINOR: lua: Add date functions
- MINOR: lua: move common function
- MINOR: lua: merge function
- MINOR: lua: Add concat class
- MINOR: standard: add function "escape_chunk"
- MEDIUM: log: add a new log format flag "E"
- DOC: add server name at rate-limit sessions example
- BUG/MEDIUM: ssl: fix off-by-one in ALPN list allocation
- BUG/MEDIUM: ssl: fix off-by-one in NPN list allocation
- DOC: LUA: fix some typos and syntax errors
- MINOR: cli: add a new "show env" command
- MEDIUM: config: allow to manipulate environment variables in the global section
- MEDIUM: cfgparse: reject incorrect 'timeout retry' keyword spelling in resolvers
- MINOR: mailers: increase default timeout to 10 seconds
- MINOR: mailers: use <CRLF> for all line endings
- BUG/MAJOR: lua: segfault using Concat object
- DOC: lua: copyrights
- MINOR: common: mask conversion
- MEDIUM: dns: extract options
- MEDIUM: dns: add a "resolve-net" option which allow to prefer an ip in a network
- MINOR: mailers: make it possible to configure the connection timeout
- BUG/MAJOR: lua: applets can't sleep.
- BUG/MINOR: server: some prototypes are renamed
- BUG/MINOR: lua: Useless copy
- BUG/MEDIUM: stats: stats bind-process doesn't propagate the process mask correctly
- BUG/MINOR: server: fix the format of the warning on address change
- CLEANUP: server: add "const" to some message strings
- MINOR: server: generalize the "updater" source
- BUG/MEDIUM: chunks: always reject negative-length chunks
- BUG/MINOR: systemd: ensure we don't miss signals
- BUG/MINOR: systemd: report the correct signal in debug message output
- BUG/MINOR: systemd: propagate the correct signal to haproxy
- MINOR: systemd: ensure a reload doesn't mask a stop
- BUG/MEDIUM: cfgparse: wrong argument offset after parsing server "sni" keyword
- CLEANUP: stats: Avoid computation with uninitialized bits.
- CLEANUP: pattern: Ignore unknown samples in pat_match_ip().
- CLEANUP: map: Avoid memory leak in out-of-memory condition.
- BUG/MINOR: tcpcheck: fix incorrect list usage resulting in failure to load certain configs
- BUG/MAJOR: samples: check smp->strm before using it
- MINOR: sample: add a new helper to initialize the owner of a sample
- MINOR: sample: always set a new sample's owner before evaluating it
- BUG/MAJOR: vars: always retrieve the stream and session from the sample
- CLEANUP: payload: remove useless and confusing nullity checks for channel buffer
- BUG/MINOR: ssl: fix usage of the various sample fetch functions
- MINOR: stats: create fields types suitable for all CSV output data
- MINOR: stats: add all the "show info" fields in a table
- MEDIUM: stats: fill all the show info elements prior to displaying them
- MINOR: stats: add a function to emit fields into a chunk
- MINOR: stats: add stats_dump_info_fields() to dump one field per line
- MEDIUM: stats: make use of stats_dump_info_fields() for "show info"
- MINOR: stats: add a declaration of all stats fields
- MINOR: stats: don't hard-code the CSV fields list anymore
- MINOR: stats: create stats fields storage and CSV dump function
- MEDIUM: stats: convert stats_dump_fe_stats() to use stats_dump_fields_csv()
- MEDIUM: stats: make stats_dump_fe_stats() use stats fields for HTML dump
- MEDIUM: stats: convert stats_dump_li_stats() to use stats_dump_fields_csv()
- MEDIUM: stats: make stats_dump_li_stats() use stats fields for HTML dump
- MEDIUM: stats: convert stats_dump_be_stats() to use stats_dump_fields_csv()
- MEDIUM: stats: make stats_dump_be_stats() use stats fields for HTML dump
- MEDIUM: stats: convert stats_dump_sv_stats() to use stats_dump_fields_csv()
- MEDIUM: stats: make stats_dump_sv_stats() use the stats field for HTML
- MEDIUM: stats: move the server state coloring logic to the server dump function
- MINOR: stats: do not use srv->admin & STATS_ADMF_MAINT in HTML dumps
- MINOR: stats: do not check srv->state for SRV_ST_STOPPED in HTML dumps
- MINOR: stats: make CSV report server check status only when enabled
- MINOR: stats: only report backend's down time if it has servers
- MINOR: stats: prepend '*' in front of the check status when in progress
- MINOR: stats: make HTML stats dump rely on the table for the check status
- MINOR: stats: add agent_status, agent_code, agent_duration to output
- MINOR: stats: add check_desc and agent_desc to the output fields
- MINOR: stats: add check and agent's health values in the output
- MEDIUM: stats: make the HTML server state dump use the CSV states
- MEDIUM: stats: only report observe errors when observe is set
- MEDIUM: stats: expose the same flags for CLI and HTTP accesses
- MEDIUM: stats: report server's address in the CSV output
- MEDIUM: stats: report the cookie value in the server & backend CSV dumps
- MEDIUM: stats: compute the color code only in the HTML form
- MEDIUM: stats: report the listeners' address in the CSV output
- MEDIUM: stats: make it possible to report the WAITING state for listeners
- REORG: stats: dump the frontend's HTML stats via a generic function
- REORG: stats: dump the socket stats via the generic function
- REORG: stats: dump the server stats via the generic function
- REORG: stats: dump the backend stats via the generic function
- MEDIUM: stats: add a new "mode" column to report the proxy mode
- MINOR: stats: report the load balancing algorithm in CSV output
- MINOR: stats: add 3 fields to report the frontend-specific connection stats
- MINOR: stats: report number of intercepted requests for frontend and backends
- MINOR: stats: introduce stats_dump_one_line() to dump one stats line
- CLEANUP: stats: make stats_dump_fields_html() not rely on proxy anymore
- MINOR: stats: add ST_SHOWADMIN to pass the admin info in the regular flags
- MINOR: stats: make stats_dump_fields_html() not use &trash by default
- MINOR: stats: add functions to emit typed fields into a chunk
- MEDIUM: stats: support "show info typed" on the CLI
- MEDIUM: stats: implement a typed output format for stats
- DOC: document the "show info typed" and "show stat typed" output formats
- MINOR: cfgparse: warn when uid parameter is not a number
- MINOR: cfgparse: warn when gid parameter is not a number
- BUG/MINOR: standard: Avoid free of non-allocated pointer
- BUG/MINOR: pattern: Avoid memory leak on out-of-memory condition
- CLEANUP: http: fix a build warning introduced by a recent fix
- BUG/MINOR: log: GMT offset not updated when entering/leaving DST
GMT offset used in local time formats was computed at startup, but was not updated when DST status changed while running.
For example these two RFC5424 syslog traces where emitted 5 seconds apart, just before and after DST changed:
<14>1 2016-03-27T01:59:58+01:00 bunch-VirtualBox haproxy 2098 - - Connect ...
<14>1 2016-03-27T03:00:03+01:00 bunch-VirtualBox haproxy 2098 - - Connect ...
It looked like they were emitted more than 1 hour apart, unlike with the fix:
<14>1 2016-03-27T01:59:58+01:00 bunch-VirtualBox haproxy 3381 - - Connect ...
<14>1 2016-03-27T03:00:03+02:00 bunch-VirtualBox haproxy 3381 - - Connect ...
This patch should be backported to 1.6 and partially to 1.5 (no fix needed in log.c).
Cyril reported that recent commit 320ec2a ("BUG/MEDIUM: chunks: always
reject negative-length chunks") introduced a build warning because gcc
cannot guess that we can't fall into the case where the auth_method
chunk is not initialized.
This patch addresses it, though for the long term it would be best
if chunk_initlen() would always initialize the result.
This fix must be backported to 1.6 and 1.5 where the aforementionned
fix was already backported.
pattern_new_expr() failed to free the allocated list element when an
out-of-memory error occurs during initialization of the element. As
this only happens when loading the configuration file or evaluating
commands via the CLI, it is unlikely for this leak to be relevant
unless the user makes automated, heavy use of the CLI.
Found in HAProxy 1.5.14.
The original author forgot to dereference the argument to free in
parse_binary. This may result in a crash on reading bad input from
the configuration file instead of a proper error message.
Found in HAProxy 1.5.14.
Currently, no warning are emitted when the gid is not a number.
Purpose of this warning is to let admins know they their configuration
won't be applied as expected.
Currently, no warning are emitted when the uid is not a number.
Purpose of this warning is to let admins know they their configuration
won't be applied as expected.
The output for each field is :
field:<origin><nature><scope>:type:value
where field reminds the type of the object being dumped as well as its
position (pid, iid, sid), field number and field name. This way a
monitoring utility may very well report all available information without
knowing new fields in advance.
This format is also supported in the HTTP version of the stats by adding
";typed" after the URI, instead of ";csv" for the CSV format.
The doc was not updated yet.
This emits the field positions, names and types. It is more convenient
than the default output for a parser that doesn't know all the fields. It
simply relies on stats_emit_typed_data_field() and stats_emit_field_tags()
added by previous patch for the output. A new stats format flag was added,
STAT_FMT_TYPED, which is set when the "typed" keyword is specified on the
CLI.
New function stats_emit_typed_data_field() does exactly like
stats_emit_raw_data_field() except that it also prints the data
type after a colon. This will be used to print using the typed
format.
And function stats_emit_field_tags() appends a 3-letter code
describing the origin, nature, and scope, followed by an optional
delimiter. This will be particularly convenient to dump typed
data.
This function must dump into the buffer it gets in argument, and should
not assume it's always trash. This was the last part of the rework, now
the CSV and HTML functions are compatible and the output format may easily
be extended.
It's easier to have a new flag in <flags> to indicate whether or not we
want to display the admin column in HTML dumps. We already have similar
flags to show the version or the legends.
This new function dumps the current stats line according to the
specified format (CSV or HTML for now), and returns these functions'
output code, which will serve later to indicate a failure (eg: buffer
full).
This further simplifies the code since all dumpers now just call this
function.
This was reported in HTML dumps already but not CSV. It reports the
number of monitor and stats requests. Ideally use-service and redirs
should be accounted for as well.
Frontends have extra information compared to other entities, they can
report some statistics at the connection level while the other ones
are limited to the session level. This patch adds 3 more fields for
this :
- conn_rate
- conn_rate_max
- conn_tot
It's worth noting that listeners theorically have such statistics, except
that the distinction between connections and sessions is not clearly made
in the code, so that will have to be improved later.
This new function stats_dump_fields_html() checks the type of the object
being dumped from the stats table, and emits it in HTML format. It uses
an argument indicating if the HTML page is also used as an admin page,
and for now still takes the proxy in argument as a few entries still
need it.
The code was simply moved as-is to the new function. There's no
functional change.
HTML output used to have it but not the CSV output. It indicates that the
listener is not full but was forced to wait because the max connection
rate was reached.
The color code requires a complex logic, and we use it only in the
HTML part. So let's compute it there based on the server state, its
health and its weight. The thing is tricky but OK. There's a 1-to-1
mapping of down servers, but not of up servers, hence the need for
the weight and health.
The server's cookie value is now reported in the "cookie" column and
used as-is from the HTML dump. It was the last reference to the sv
pointer from this place.
The same was done for the backend's dump.
This new field "addr" presents the server's address:port if the client
is either enabled via "stats show legends" in case of HTTP dumps, or
has at least level operator on the CLI. The address formats might be :
- ipv4:port
- [ipv6]:port
- unix
- (error message)
The HTML dump over HTTP request may have several flags including
ST_SHLGNDS (to show legends), ST_SHNODE (to show node name),
ST_SHDESC (to show some descriptions).
There's no such thing over the CLI so we need to have an equivalent.
Let's compute the flags earlier so that we can make use of these flags
regardless of the call point.
Now instead of recomputing the state based on the health, rise etc,
we reuse the same state as in the CSV file, and optionally complete
it with a down or an up arrow if a change is occurring. We could
have parsed the strings to detect a '/' indicating a state change,
but it was easier to check the health against rise and fall.
This adds the following fields :
- check_rise [...S]: server's "rise" parameter used by checks
- check_fall [...S]: server's "fall" parameter used by checks
- check_health [...S]: server's health check value between 0 and rise+fall-1
- agent_rise [...S]: agent's "rise" parameter, normally 1
- agent_fall [...S]: agent's "fall" parameter, normally 1
- agent_health [...S]: agent's health parameter, between 0 and rise+fall-1
Added these two new fields to the CSV output :
- check_desc : short human-readable description of check_status
- agent_desc : short human-readable description of agent_status
Also factor two tests for enabled checks.
The agent check status is now reported :
- agent_status : status of last agent check
- agent_code : numeric code reported by agent if any (unused for now)
- agent_duration : time in ms taken to finish last check
There's no point in reporting a backend's up/down time if it has no
servers. The CSV output used to report "0" for a serverless backend
while the HTML version already removed the field. For servers, this
field is already omitted if checks are disabled. Let's uniformize
all of this and remove the field in CSV as well when irrelevant.
The HTML version doesn't report a check status when the server is in
maintenance since it can be quite old and irrelevant. The CSV forgot
to care about that, so let's do it here as well.
We don't want the HTML dump to rely on the server state. We
already have this piece of information in the status field by
checking that it starts with "DOWN".
It currently is really not convenient to have a state and a color detection
outside of the function and to use these ones inside. It makes it harder to
adjust the stats output based on the server state exactly. Let's move the
logic into the dump function itself.
Here we still have a huge amount of stuff to extract from the HTML code
and even from the caller. Indeed, the calling function computes the
server state and prepares a color code that will be used to determine
what style to use. The operations needed to decide what field to present
or not depend a lot on the server's state, admin state, health value,
rise and fall etc... all of which are not easily present in the table.
We also have to check the reference's values for all of the above.
There are also a number of differences between the CSV and HTML outputs :
- CSV always reports check duration, HTML only if not zero
- regarding last_change, CSV always report the server's while the HTML
considers either the server's or the reference based on the admin state.
- agent and health are separate in the CSV but mixed in the HTML.
- too few info on agent anyway.
After careful code inspection it happens that both sv->last_change and
ref->last_change are identical and can both derive from [LASTCHG].
Also, the following info are missing from the array to complete the HTML
code :
- cookie, address, status description, check-in-progress, srv->admin
At least for now it still works but a lot of info now need to be added.
This function now only fills the relevant fields with raw values and
calls stats_dump_fields_csv() for the CSV part. The output remains
exactly the same for now.
Some limits are only emitted if set, so the HTML part will have to
check for these being set.
A number of fields had to be built using printf-format strings, so
instead of allocating strings that risk not being freed, we use
chunk_newstr() and chunk_appendf().
Text strings are now copied verbatim in the stats fields so that only
the CSV dump encodes them as CSV. A single "out" chunk is used and cut
into multiple substrings using chunk_newstr() so that we don't need
distinct chunks for each stats field holding a string. The total amount
of data being emitted at once will never overflow the "out" chunk since
only a small part of it goes into the trash buffer which is the same size
and will thus overflow first.
One point of care is that failed_checks and down_trans were logged as
64-bit quantities on servers but they're 32-bit on proxies. This may
have to be changed later to unify them.
Some fields are still needed to complete the conversion :
- px->srv : used to take decisions when backend has no server (eg: print down or not)
- algo string (useful for CSV as well) // only if SHLGNDS
- cookie_name (useful for CSV as well) // only if SHLGNDS
- px->mode == HTTP (or px->mode as a string) // same for frontend
- px->be_counters.intercepted_req (stats and redirects ?)
The following field already has a place but was not presented in the
CSV output, so it should simply be added afterwards :
- px->be_counters.http.cum_req (was in HTML and missing from CSV)
This function now only fills the relevant fields with raw values and
calls stats_dump_fields_csv() for the CSV part. The output remains
exactly the same for now. It's worth noting that there are some
ambiguities between connections and sessions, for example cum_conn
is dumped into cum_sess. Additionally, there is a naming ambiguity
in that the internal "d_time" (time where the beginning of data
appeared) is called "rtime" in the output (response time) and they
actually are indeed the same.
The conversion still requires some elements which are not present in the
current fields :
- the HTML status may emit "WAITING"/"OPEN"/"FULL" while the CSV format
doesn't propose "WAITING", so this last one will have to be added.
- the HTML output emits the listening adresses when the ST_SHLGNDS flag
is set but this address field doesn't exist in the CSV format
- it's interesting to note that when the ST_SHLGNDS flag is not set, the
HTML output doesn't provide the listener's ID while it's present in the
CSV output accessible from the same interface.
This function now only fills the relevant fields with raw values and
calls stats_dump_fields_csv() for the CSV part. The output remains
exactly the same for now.
It is worth mentionning that l->cum_conn is being dumped into a cum_sess
field and that once we introduce an official cum_conn field we may have
to dump the same value at both places to maintain compatibility with the
existing stats.
Now we avoid directly accessing the proxy and instead we pick the values
from the stats fields. This unveils that only a few fields are missing to
complete the job :
- know whether or not the checkbox column needs to be displayed. This
is not directly relevant to the stats but rather to the fact that the
HTML dump is also a control interface. This doesn't need a field, just
a function argument.
- px->mode == HTTP (or px->mode as a string)
- px->fe_counters.intercepted_req (stats and redirects ?)
- px->fe_counters.cum_conn
- px->fe_counters.cps_max
- px->fe_conn_per_sec
All the last ones make sense in the CSV, so they'll have to be added as well.
This function now only fills the relevant fields with raw values and
calls stats_dump_fields_csv() for the CSV part. The output remains
exactly the same for now.
The new function stats_dump_fields_csv() currenty walks over all CSV
fields and prints all non-empty ones as one line. Strings are csv-encoded
on the fly.
This is in preparation for a unifying of the stats output between the
multiple formats. The long-term goal will be that HTML stats are built
from the array used to produce the CSV output in order to ensure that
no information is missing in any format.
This function dumps non-empty fields, one per line with their name and
values, in the same format as is currently used by "show info". It relies
on previously added stats_emit_raw_data_field().
The table is completely filled with all relevant information. Only the
fields that should appear are presented. The description field is now
properly omitted if not set, instead of being reported as empty.
Technically speaking, many SSL sample fetch functions act on the
connection and depend on USE_L5CLI on the client side, which means
they're usable as soon as a handshake is completed on a connection.
This means that the test consisting in refusing to call them when
the stream is NULL will prevent them from working when we implement
the tcp-request session ruleset. Better fix this now. The fix consists
in using smp->sess->origin when they're called for the front connection,
and smp->strm->si[1].end when called for the back connection.
There is currently no known side effect for this issue, though it would
better be backported into 1.6 so that the code base remains consistend.
The buffer could not be null by definition since we moved the stream out
of the session. It's the stream which gained the ability to be NULL (hence
the recent fix for this case). The checks in place are useless and make one
think this situation can happen.
This is the continuation of previous patch called "BUG/MAJOR: samples:
check smp->strm before using it".
It happens that variables may have a session-wide scope, and that their
session is retrieved by dereferencing the stream. But nothing prevents them
from being used from a streamless context such as tcp-request connection,
thus crashing the process. Example :
tcp-request connection accept if { src,set-var(sess.foo) -m found }
In order to fix this, we have to always ensure that variable manipulation
only happens via the sample, which contains the correct owner and context,
and that we never use one from a different source. This results in quite a
large change since a lot of functions are inderctly involved in the call
chain, but the change is easy to follow.
This fix must be backported to 1.6, and requires the last two patches.
Some functions like sample_conv_var2smp(), var_get_byname(), and
var_set_byname() directly or indirectly need to access the current
stream and/or session and must find it in the sample itself and not
as a distinct argument. Thus we first need to call smp_set_owner()
prior to each such calls.
Since commit 6879ad3 ("MEDIUM: sample: fill the struct sample with the
session, proxy and stream pointers") merged in 1.6-dev2, the sample
contains the pointer to the stream and sample fetch functions as well
as converters use it heavily. This requires from a lot of call places
to initialize 4 fields, and it was even forgotten at a few places.
This patch provides a convenient helper to initialize all these fields
at once, making it easy to prepare a new sample from a previous one for
example.
A few call places were cleaned up to make use of it. It will be needed
by further fixes.
At one place in the Lua code, it was moved earlier because we used to
call sample casts with a non completely initialized sample, which is
not clean eventhough at the moment there are no consequences.
Since commit 6879ad3 ("MEDIUM: sample: fill the struct sample with the
session, proxy and stream pointers") merged in 1.6-dev2, the sample
contains the pointer to the stream and sample fetch functions as well
as converters use it heavily.
The problem is that earlier commit 87b0966 ("REORG/MAJOR: session:
rename the "session" entity to "stream"") had split the session and
stream resulting in the possibility for smp->strm to be NULL before
the stream was initialized. This is what happens in tcp-request
connection rulesets, as discovered by Baptiste.
The sample fetch functions must now check that smp->strm is valid
before using it. An alternative could consist in using a dummy stream
with nothing in it to avoid some checks but it would only result in
deferring them to the next step anyway, and making it harder to detect
that a stream is valid or the dummy one.
There is still an issue with variables which requires a complete
independant fix. They use strm->sess to find the session with strm
possibly NULL and passed as an argument. All call places indirectly
use smp->strm to build strm. So the problem is there but the API needs
to be changed to remove this duplicate argument that makes it much
harder to know what pointer to use.
This fix must be backported to 1.6, as well as the next one fixing
variables.
Commit baf9794 ("BUG/MINOR: tcpcheck: conf parsing error when no port
configured on server and first rule(s) is (are) COMMENT") was wrong, it
incorrectly implemented a list access by dereferencing a pointer of an
incorrect type resulting in checking the next element in the list. The
consequence is that it stops before the last comment instead of at the
last one and skips the first rule. In the end, rules starting with
comments are not affected, but if a sequence of checks directly starts
with connect, it is then skipped and this is visible when no port is
configured on the server line as the config refuses to load.
There was another occurence of the same bug a few lines below, both
of them were fixed. Tests were made on different configs and confirm
the new fix is OK.
This fix must be backported to 1.6.
This memory leak of about 100 bytes occurs only if there is an error
condidtion during evaluation of a "map" directive in the configuration
file. This evaluation only happens once on startup because haproxy
does not have a mechanism for re-loading the configuration file during
run-time. The startup will be aborted anyway due to error conditions
raised.
Nevertheless fix it to silence warnings of static code analysis tools
and be safe against future revisions of the code.
Found in haproxy 1.5.14.
Ignore samples that are neither SMP_T_IPV4 nor SMP_T_IPV6 instead of
matching with an uninitialized value in this case.
This situation should not occur in the current codebase but triggers
warnings in static code analysis tools.
Found in haproxy 1.5.
stats_map_lookup() sets bit SMP_F_CONST in the uninitialized member
flags of a stack-allocated sample, leaving the other bits
uninitialized. All code paths that can access the struct only ever
check for this specific flag, so there is no risk of unintended
behavior.
Nevertheless fix it as it triggers warnings in static code analysis
tools and might become a problem on future revisions of the code.
Problem found in version 1.5.
Owen Marshall reported an issue depending on the server keywords order in the
configuration.
Working line :
server dev1 <ip>:<port> check inter 5000 ssl verify none sni req.hdr(Host)
Non working line :
server dev1 <ip>:<port> check inter 5000 ssl sni req.hdr(Host) verify none
Indeed, both parse_server() and srv_parse_sni() modified the current argument
offset at the same time. To fix the issue, srv_parse_sni() can work on a local
copy ot the offset, leaving parse_server() responsible of the actual value.
This fix must be backported to 1.6.
If a SIGHUP/SIGUSR2 is sent immediately after a SIGTERM/SIGINT and
before wait() is notified, it will mask it since there's no queue,
only a copy of the last received signal. Let's add a special check
before overwriting the signal so that SIGTERM/SIGINT are not masked.
Some people report that sometimes there's a collection of old processes
after a restart of the systemd wrapper. It's not surprizing when reading
the code, the SIGTERM is propagated as a SIGINT which asks for a graceful
stop instead. If people ask for termination, we should terminate.
Commit c54bdd2 ("MINOR: Also accept SIGHUP/SIGTERM in systemd-wrapper")
added support for extra signals but did not adapt the debug message,
which continues to report incorrect signal types. Let's pass the signal
number to the do_restart() and do_shutdown() functions so that the output
matches the signal that was received.
There's a race condition in the systemd wrapper. People who restart it
too fast report old processes remaining there. Here if the signal is
caught before entering the "while" loop we block indefinitely on the
wait() call. Let's check the caught signal before calling wait(). It
doesn't completely close the race, a window still exists between the
test of the flag and the call to wait(), but it's much smaller. A
safer solution could involve pselect() to wait for the signal
delivery.
The recent addition of "show env" on the CLI has revealed an interesting
design bug. Chunks are supposed to support a negative length to indicate
that they carry no data. chunk_printf() sets this size to -1 if the string
is too large for the buffer. At a few places in the http engine we may end
up with trash.len = -1. But bi_putchk(), chunk_appendf() and a few other
chunks consumers don't consider this case as possible and will use such a
chunk, possibly restoring an invalid string or trying to copy -1 bytes.
This fix takes care of clarifying the situation in a backportable way
where such sizes are used, so that a negative length indicating an error
remains present until the chunk is reinitialized or overwritten. But a
cleaner design adjustment needs to be done so that there's a clear contract
on how to use these chunks. At first glance it doesn't seem *that* useful
to support negative sizes, so probably this is what should change.
This fix must be backported to 1.6 and 1.5.
the function server_parse_addr_change_request() contain an hardcoded
updater source "stats command". this function can be called from other
sources than the "stats command", so this patch make this argument
generic.
When the server address is changed, a message with unrequired '\n' or
'.' is displayed, like this:
[WARNING] 054/101137 (3229) : zzzz/s3 changed its IP from 127.0.0.1 to ::55 by stats command
.
This patch remove the '\n' which is sent before the '.'.
This patch must be backported in 1.6
With nbproc > 1, it is possible to specify on which process the stats socket
will be bound using "stats bind-process", but the behaviour was not correct,
ignoring the value in some configurations.
Example :
global
nbproc 4
stats bind-process 1
stats socket /var/run/haproxy.sock
With such a configuration, all the processes will listen on the stats socket.
As a workaround, it is also possible to declare a "process" keyword on
the "stats stocket" line.
The patch must be applied to 1.7, 1.6 and 1.5
A value is copied two time in teh stack, but only one is usefull. The
second copy leaves unused in the stack and take some room for noting.
This path removes the second copy.
Must be backported in 1.6
This patch must be backported in 1.6
hlua_yield() function returns the required sleep time. The Lua core must
be resume the execution after the required time. The core dedicated to
the http and tcp applet doesn't implement the wake up function. It is a
miss.
This patch fix this.
This patch introduces a configurable connection timeout for mailers
with a new "timeout mail <time>" directive.
Acked-by: Simon Horman <horms@verge.net.au>
This options prioritize th choice of an ip address matching a network. This is
useful with clouds to prefer a local ip. In some cases, a cloud high
avalailibility service can be announced with many ip addresses on many
differents datacenters. The latency between datacenter is not negligible, so
this patch permitsto prefers a local datacenter. If none address matchs the
configured network, another address is selected.
DNS selection preferences are actually declared inline in the
struct server. There are copied from the server struct to the
dns_resolution struct for each resolution.
Next patchs adds new preferences options, and it is not a good
way to copy all the configuration information before each dns
resolution.
This patch extract the configuration preference from the struct
server and declares a new dedicated struct. Only a pointer to this
new striuict will be copied before each dns resolution.
Concat object is based on "luaL_Buffer". The luaL_Buffer documentation says:
During its normal operation, a string buffer uses a variable number of stack
slots. So, while using a buffer, you cannot assume that you know where the
top of the stack is. You can use the stack between successive calls to buffer
operations as long as that use is balanced; that is, when you call a buffer
operation, the stack is at the same level it was immediately after the
previous buffer operation. (The only exception to this rule is
luaL_addvalue.) After calling luaL_pushresult the stack is back to its level
when the buffer was initialized, plus the final string on its top.
So, the stack cannot be manipulated between the first call at the function
"luaL_buffinit()" and the last call to the function "luaL_pushresult()" because
we cannot known the stack status.
In other way, the memory used by these functions seems to be collected by GC, so
if the GC is triggered during the usage of the Concat object, it can be used
some released memory.
This patch rewrite the Concat class without the "luaL_Buffer" system. It uses
"userdata()" forr the memory allocation of the buffer strings.
Not doing so causes issues with Exchange2013 not processing the message
body from the email. Specification seems to specify that as correct
behavior : https://www.ietf.org/rfc/rfc2821.txt # 2.3.7 Lines
> SMTP client implementations MUST NOT transmit "bare" "CR" or "LF" characters.
This patch should be backported to 1.6.
Acked-by: Simon Horman <horms@verge.net.au>
This allows the tcp connection to send multiple SYN packets, so 1 lost
packet does not cause the mail to be lost. It changes the socket timeout
from 2 to 10 seconds, this allows for 3 syn packets to be send and
waiting a little for their reply.
This patch should be backported to 1.6.
Acked-by: Simon Horman <horms@verge.net.au>
If for example it was written as 'timeout retri 1s' or 'timeout wrong 1s'
this would be used for the retry timeout value. Resolvers section only
timeout setting currently is 'retry', others are still parsed as before
this patch to not break existing configurations.
A less strict version will be backported to 1.6.
With new init systems such as systemd, environment variables became a
real mess because they're only considered on startup but not on reload
since the init script's variables cannot be passed to the process that
is signaled to reload.
This commit introduces an alternative method consisting in making it
possible to modify the environment from the global section with directives
like "setenv", "unsetenv", "presetenv" and "resetenv".
Since haproxy supports loading multiple config files, it now becomes
possible to put the host-dependant variables in one file and to
distribute the rest of the configuration to all nodes, without having
to deal with the init system's deficiencies.
Environment changes take effect immediately when the directives are
processed, so it's possible to do perform the same operations as are
usually performed in regular service config files.
Using environment variables in configuration files can make troubleshooting
complicated because there's no easy way to verify that the variables are
correct. This patch introduces a new "show env" command which displays the
whole environment on the CLI, one variable per line.
The socket must at least have level operator to display the environment.
After seeing previous ALPN fix, I suspected that NPN code was wrong
as well, and indeed it was since ALPN was copied from it. This fix
must be backported into 1.6 and 1.5.
The first time I tried it (1.6.3) I got a segmentation fault :(
After some investigation with gdb and valgrind I found the
problem. memcpy() copies past an allocated buffer in
"bind_parse_alpn". This patch fixes it.
[wt: this fix must be backported into 1.6 and 1.5]
The +E mode escapes characters '"', '\' and ']' with '\' as prefix. It
mostly makes sense to use it in the RFC5424 structured-data log formats.
Example:
log-format-sd %{+Q,+E}o\ [exampleSDID@1234\ header=%[capture.req.hdr(0)]]
This patch merges the last imported functions in one, because
the function hlua_metatype is only used by hlua_checudata.
This patch fix also the compilation error became by the
copy of the code.
This patch moves the function hlua_checkudata which check that
an object contains the expected class_reference as metatable.
This function is commonly used by all the lua functions.
The function hlua_metatype is also moved.
This parser takes a string containing an HTTP date. It returns
a broken-down time struct. We must considers considers this
time as GMT. Maybe later the timezone will be taken in account.
When Lua executes functions from its API, these can throws an error.
These function must be executed in a special environment which catch
these error, otherwise a critical error (like segfault) can raise.
This patch add a c file called "hlua_fcn.c" which collect all the
Lua/c function needing safe environment for its execution.
Now, filter's configuration (.id, .conf and .ops fields) is stored in the
structure 'flt_conf'. So proxies own a flt_conf list instead of a filter
list. When a filter is attached to a stream, it gets a pointer on its
configuration. This avoids mixing the filter's context (owns by a stream) and
its configuration (owns by a proxy). It also saves 2 pointers per filter
instance.
The "trace" filter has been added. It defines all available callbacks and for
each one it prints a trace message. To enable it:
listener test
...
filter trace
...
This is an improvement, especially when the message body is big. Before this
patch, remaining data were forwarded when there is no filter on the stream. Now,
the forwarding is triggered when there is no "data" filter on the channel. When
no filter is used, there is no difference, but when at least one filter is used,
it can be really significative.
Now, http_parse_chunk_size and http_skip_chunk_crlf return the number of bytes
parsed on success. http_skip_chunk_crlf does not use msg->sol anymore.
On the other hand, http_forward_trailers is unchanged. It returns >0 if the end
of trailers is reached and 0 if not. In all cases (except if an error is
encountered), msg->sol contains the length of the last parsed part of the
trailer headers.
Internal doc and comments about msg->sol has been updated accordingly.
Instead of compressing all chunks as they come, we store them in a temporary
buffer. The compression happens during the forwarding phase. This change speeds
up the compression of chunked response.
Before, functions to filter HTTP body (and TCP data) were called from the moment
at least one filter was attached to the stream. If no filter is interested by
these data, this uselessly slows data parsing.
A good example is the HTTP compression filter. Depending of request and response
headers, the response compression can be enabled or not. So it could be really
nice to call it only when enabled.
So, now, to filter HTTP/TCP data, a filter must use the function
register_data_filter. For TCP streams, this function can be called only
once. But for HTTP streams, when needed, it must be called for each HTTP request
or HTTP response.
Only registered filters will be called during data parsing. At any time, a
filter can be unregistered by calling the function unregister_data_filter.
From the stream point of view, this new structure is opaque. it hides filters
implementation details. So, impact for future optimizations will be reduced
(well, we hope so...).
Some small improvements has been made in filters.c to avoid useless checks.
Now body parsing is done in http_msg_forward_body and
http_msg_forward_chunked_body functions, regardless of whether we parse a
request or a response.
Parsing result is still handled in http_request_forward_body and
http_response_forward_body functions.
This patch will ease futur optimizations, mainly on filters.
This new analyzer will be called for each HTTP request/response, before the
parsing of the body. It is identified by AN_FLT_HTTP_HDRS.
Special care was taken about the following condition :
* the frontend is a TCP proxy
* filters are defined in the frontend section
* the selected backend is a HTTP proxy
So, this patch explicitly add AN_FLT_HTTP_HDRS analyzer on the request and the
response channels when the backend is a HTTP proxy and when there are filters
attatched on the stream.
This patch simplifies http_request_forward_body and http_response_forward_body
functions.
For Chunked HTTP request/response, the body filtering can be really
expensive. In the worse case (many chunks of 1 bytes), the filters overhead is
of 3 calls per chunk. If http_data callback is useful, others are just
informative.
So these callbacks has been removed. Of course, existing filters (trace and
compression) has beeen updated accordingly. For the HTTP compression filter, the
update is quite huge. Its implementation is closer to the old one.
When no filter is attached to the stream, the CPU footprint due to the calls to
filters_* functions is huge, especially for chunk-encoded messages. Using macros
to check if we have some filters or not is a great improvement.
Furthermore, instead of checking the filter list emptiness, we introduce a flag
to know if filters are attached or not to a stream.
HTTP compression has been rewritten to use the filter API. This is more a PoC
than other thing for now. It allocates memory to work. So, if only for that, it
should be rewritten.
In the mean time, the implementation has been refactored to allow its use with
other filters. However, there are limitations that should be respected:
- No filter placed after the compression one is allowed to change input data
(in 'http_data' callback).
- No filter placed before the compression one is allowed to change forwarded
data (in 'http_forward_data' callback).
For now, these limitations are informal, so you should be careful when you use
several filters.
About the configuration, 'compression' keywords are still supported and must be
used to configure the HTTP compression behavior. In absence of a 'filter' line
for the compression filter, it is added in the filter chain when the first
compression' line is parsed. This is an easy way to do when you do not use other
filters. But another filter exists, an error is reported so that the user must
explicitly declare the filter.
For example:
listen tst
...
compression algo gzip
compression offload
...
filter flt_1
filter compression
filter flt_2
...
HTTP compression will be moved in a true filter. To prepare the ground, some
functions have been moved in a dedicated file. Idea is to keep everything about
compression algos in compression.c and everything related to the filtering in
flt_http_comp.c.
For now, a header has been added to help during the transition. It will be
removed later.
Unused empty ACL keyword list was removed. The "compression" keyword
parser was moved from cfgparse.c to flt_http_comp.c.
When all callbacks have been called for all filters registered on a stream, if
we are waiting for the next HTTP request, we must reset stream analyzers. But it
is useless to do so if the client has already closed the connection.
This patch adds the support of filters in HAProxy. The main idea is to have a
way to "easely" extend HAProxy by adding some "modules", called filters, that
will be able to change HAProxy behavior in a programmatic way.
To do so, many entry points has been added in code to let filters to hook up to
different steps of the processing. A filter must define a flt_ops sutrctures
(see include/types/filters.h for details). This structure contains all available
callbacks that a filter can define:
struct flt_ops {
/*
* Callbacks to manage the filter lifecycle
*/
int (*init) (struct proxy *p);
void (*deinit)(struct proxy *p);
int (*check) (struct proxy *p);
/*
* Stream callbacks
*/
void (*stream_start) (struct stream *s);
void (*stream_accept) (struct stream *s);
void (*session_establish)(struct stream *s);
void (*stream_stop) (struct stream *s);
/*
* HTTP callbacks
*/
int (*http_start) (struct stream *s, struct http_msg *msg);
int (*http_start_body) (struct stream *s, struct http_msg *msg);
int (*http_start_chunk) (struct stream *s, struct http_msg *msg);
int (*http_data) (struct stream *s, struct http_msg *msg);
int (*http_last_chunk) (struct stream *s, struct http_msg *msg);
int (*http_end_chunk) (struct stream *s, struct http_msg *msg);
int (*http_chunk_trailers)(struct stream *s, struct http_msg *msg);
int (*http_end_body) (struct stream *s, struct http_msg *msg);
void (*http_end) (struct stream *s, struct http_msg *msg);
void (*http_reset) (struct stream *s, struct http_msg *msg);
int (*http_pre_process) (struct stream *s, struct http_msg *msg);
int (*http_post_process) (struct stream *s, struct http_msg *msg);
void (*http_reply) (struct stream *s, short status,
const struct chunk *msg);
};
To declare and use a filter, in the configuration, the "filter" keyword must be
used in a listener/frontend section:
frontend test
...
filter <FILTER-NAME> [OPTIONS...]
The filter referenced by the <FILTER-NAME> must declare a configuration parser
on its own name to fill flt_ops and filter_conf field in the proxy's
structure. An exemple will be provided later to make it perfectly clear.
For now, filters cannot be used in backend section. But this is only a matter of
time. Documentation will also be added later. This is the first commit of a long
list about filters.
It is possible to have several filters on the same listener/frontend. These
filters are stored in an array of at most MAX_FILTERS elements (define in
include/types/filters.h). Again, this will be replaced later by a list of
filters.
The filter API has been highly refactored. Main changes are:
* Now, HA supports an infinite number of filters per proxy. To do so, filters
are stored in list.
* Because filters are stored in list, filters state has been moved from the
channel structure to the filter structure. This is cleaner because there is no
more info about filters in channel structure.
* It is possible to defined filters on backends only. For such filters,
stream_start/stream_stop callbacks are not called. Of course, it is possible
to mix frontend and backend filters.
* Now, TCP streams are also filtered. All callbacks without the 'http_' prefix
are called for all kind of streams. In addition, 2 new callbacks were added to
filter data exchanged through a TCP stream:
- tcp_data: it is called when new data are available or when old unprocessed
data are still waiting.
- tcp_forward_data: it is called when some data can be consumed.
* New callbacks attached to channel were added:
- channel_start_analyze: it is called when a filter is ready to process data
exchanged through a channel. 2 new analyzers (a frontend and a backend)
are attached to channels to call this callback. For a frontend filter, it
is called before any other analyzer. For a backend filter, it is called
when a backend is attached to a stream. So some processing cannot be
filtered in that case.
- channel_analyze: it is called before each analyzer attached to a channel,
expects analyzers responsible for data sending.
- channel_end_analyze: it is called when all other analyzers have finished
their processing. A new analyzers is attached to channels to call this
callback. For a TCP stream, this is always the last one called. For a HTTP
one, the callback is called when a request/response ends, so it is called
one time for each request/response.
* 'session_established' callback has been removed. Everything that is done in
this callback can be handled by 'channel_start_analyze' on the response
channel.
* 'http_pre_process' and 'http_post_process' callbacks have been replaced by
'channel_analyze'.
* 'http_start' callback has been replaced by 'http_headers'. This new one is
called just before headers sending and parsing of the body.
* 'http_end' callback has been replaced by 'channel_end_analyze'.
* It is possible to set a forwarder for TCP channels. It was already possible to
do it for HTTP ones.
* Forwarders can partially consumed forwardable data. For this reason a new
HTTP message state was added before HTTP_MSG_DONE : HTTP_MSG_ENDING.
Now all filters can define corresponding callbacks (http_forward_data
and tcp_forward_data). Each filter owns 2 offsets relative to buf->p, next and
forward, to track, respectively, input data already parsed but not forwarded yet
by the filter and parsed data considered as forwarded by the filter. A any time,
we have the warranty that a filter cannot parse or forward more input than
previous ones. And, of course, it cannot forward more input than it has
parsed. 2 macros has been added to retrieve these offets: FLT_NXT and FLT_FWD.
In addition, 2 functions has been added to change the 'next size' and the
'forward size' of a filter. When a filter parses input data, it can alter these
data, so the size of these data can vary. This action has an effet on all
previous filters that must be handled. To do so, the function
'filter_change_next_size' must be called, passing the size variation. In the
same spirit, if a filter alter forwarded data, it must call the function
'filter_change_forward_size'. 'filter_change_next_size' can be called in
'http_data' and 'tcp_data' callbacks and only these ones. And
'filter_change_forward_size' can be called in 'http_forward_data' and
'tcp_forward_data' callbacks and only these ones. The data changes are the
filter responsability, but with some limitation. It must not change already
parsed/forwarded data or data that previous filters have not parsed/forwarded
yet.
Because filters can be used on backends, when we the backend is set for a
stream, we add filters defined for this backend in the filter list of the
stream. But we must only do that when the backend and the frontend of the stream
are not the same. Else same filters are added a second time leading to undefined
behavior.
The HTTP compression code had to be moved.
So it simplifies http_response_forward_body function. To do so, the way the data
are forwarded has changed. Now, a filter (and only one) can forward data. In a
commit to come, this limitation will be removed to let all filters take part to
data forwarding. There are 2 new functions that filters should use to deal with
this feature:
* flt_set_http_data_forwarder: This function sets the filter (using its id)
that will forward data for the specified HTTP message. It is possible if it
was not already set by another filter _AND_ if no data was yet forwarded
(msg->msg_state <= HTTP_MSG_BODY). It returns -1 if an error occurs.
* flt_http_data_forwarder: This function returns the filter id that will
forward data for the specified HTTP message. If there is no forwarder set, it
returns -1.
When an HTTP data forwarder is set for the response, the HTTP compression is
disabled. Of course, this is not definitive.
The csv stats format breaks when agent changes server state to drain.
Tools like hatop, metric or check agents will fail due to this. This
should be backported to 1.6.
The serial number for a generated certificate was computed using the requested
servername, without any variable/random part. It is not a problem from the
moment it is not regenerated.
But if the cache is disabled or when the certificate is evicted from the cache,
we may need to regenerate it. It is important to not reuse the same serial
number for the new certificate. Else clients (especially browsers) trigger a
warning because 2 certificates issued by the same CA have the same serial
number.
So now, the serial is a static variable initialized with now_ms (internal date
in milliseconds) and incremented at each new certificate generation.
(Ref MPS-2031)
When working on the previous bug, it appeared that it the case that was
triggering the bug would also work between two backends, one of which
doesn't support http-reuse. The reason is that while the idle connection
is moved to the private pool, upon reuse we only check if it holds the
CO_FL_PRIVATE flag. And we don't set this flag when there's no reuse.
So let's always set it in this case, it will guarantee that no undesired
connection sharing may happen.
This fix must be backported to 1.6.
There is a bug in connect_server() : we use si_attach_conn() to offer
the current session's connection to the session we're stealing the
connection from. Unfortunately, si_attach_conn() uses the standard data
connection operations while here we need to use the idle connection
operations.
This results in a situation where when the server's idle timeout strikes,
the read0 is silently ignored, causes the response channel to be shut down
for reads, and the connection remains attached. Next attempt to send a
request when using this connection simply results in nothing being done
because we try to send over an already closed connection. Worse, if the
client aborts, then no timeout remains at all and the session waits
forever and remains assigned to the server.
A more-or-less easy way to reproduce this bug is to have two concurrent
streams each connecting to a different server with "http-reuse aggressive",
typically a cache farm using a URL hash :
stream1: GET /1 HTTP/1.1
stream2: GET /2 HTTP/1.1
stream1: GET /2 HTTP/1.1
wait for the server 1's connection to timeout
stream2: GET /1 HTTP/1.1
The connection hangs here, and "show sess all" shows a closed connection
with a SHUTR on the response channel.
The fix is very simple though not optimal. It consists in calling
si_idle_conn() again after attaching the connection. But in practise
it should not be done like this. The real issue is that there's no way
to cleanly attach a connection to a stream interface without changing
the connection's operations. So the API clearly needs to be revisited
to make such operations easier.
Many thanks to Yves Lafon from W3C for providing lots of useful dumps
and testing patches to help figure the root cause!
This fix must be backported to 1.6.
After a POST on the stats admin page, a 303 is emitted. Unfortunately
this 303 doesn't contain a content-length, which forces the connection
to be closed and reopened. Let's simply add a content-length: 0 to solve
this.
Commit 16f649c ("REORG: polling: rename "fd_spec" to "fd_cache"")
missed the server-facing connection during the rename, so the old
names are still in used and add a bit of confusion during the
debugging.
This should be backported to 1.6 and 1.5.
When a connect() to a unix socket returns EAGAIN we talk about
"no free ports" in the error/debug message, which only makes
sense when using TCP.
Explain connect() failure and suggest troubleshooting server
backlog size.
These two actions don't touch the table so the entry will expire and
the values will not be pushed to other peers. Also in the case of gpc0,
the gpc0_rate counter must be updated. The issue was reported by
Ruoshan Huang.
This fix needs to be backported to 1.6.
Arkadiy Kulev noticed that if a server is marked down while a connection
is being trying to establish, we still insist on performing retries on
the same server, which is absurd. Better perform the redispatch if we
already know the server is down. Because of this, it's likely that the
observe-l4 and sudden-death mechanisms are not optimal an cannot help
much the connection which was used to detect the problem.
The fix should be backported to 1.6 and 1.5 at least.
When users request 16384 bytes for a buffer, they get 16392 after
rounding up. This is problematic for SSL as it systematically
causes a small 8-bytes message to be appended after the first 16kB
message and costs about 15% of performance.
Let's add MEM_F_EXACT to use exactly the size we need. This requires
previous patch (MEDIUM: pools: add a new flag to avoid rounding pool
size up).
This issue was introduced in 1.6 and causes trouble there, so this
fix must be backported.
This is issue was reported by Gary Barrueto and diagnosed by Cyril Bonté.
Usually it's desirable to merge similarly sized pools, which is the
reason why their size is rounded up to the next multiple of 16. But
for the buffers this is problematic because we add the size of
struct buffer to the user-requested size, and the rounding results
in 8 extra bytes that are usable in the end. So the user gets more
bytes than asked for, and in case of SSL it results in short writes
for the extra bytes that are sent above multiples of 16 kB.
So we add a new flag MEM_F_EXACT to request that the size is not
rounded up when creating the entry. Thus it doesn't disable merging.
Gregor KovaÄ reported that http_date() did not return the right day of the
week. For example "Sat, 22 Jan 2016 17:43:38 GMT" instead of "Fri, 22 Jan
2016 17:43:38 GMT". Indeed, gmtime() returns a 'struct tm' result, where
tm_wday begins on Sunday, whereas the code assumed it began on Monday.
This patch must be backported to haproxy 1.5 and 1.6.
Servers state function save and apply server IP when DNS resolution is
enabled on a server.
Purpose is to prevent switching traffic from one server to an other one
when multiple IPs are returned by the DNS server for the A or AAAA
record.
That said, a bug in current code lead to erase the service port while
copying the IP found in the file into the server structure in HAProxy's
memory.
This patch fix this bug.
The bug was reported on the ML by Robert Samuel Newson and fix proposed
by Nenad Merdanovic.
Thank you both!!!
backport: can be backported to 1.6
Erez reported a bug on discourse.haproxy.org about DNS resolution not
occuring when no port is specified on the nameserver directive.
This patch prevent this behavior by returning an error explaining this
issue when parsing the configuration file.
That said, later, we may want to force port 53 when client did not
provide any.
backport: 1.6
This function should return a 16-bit type as that is the type for
dns header id.
Also because it is doing an uint16 unpack big-endian operation.
Backport: can be backported to 1.6
Signed-off-by: Thiago Farina <tfarina@chromium.org>
Signed-off-by: Baptiste Assmann <bedis9@gmail.com>
Changes to if statements do not affect code operation, just layout of
the code. Type casts from malloc returns have been removed as this cast
happens automatically from the void* type.
This may be backported to 1.6.
Parameters provided to 51Degrees methods that have changed to require
const pointers are now cast to avoid compiler warnings.
This should be backported to 1.6.
Malloc continues to be used for the creation of cache entries. The
implementation has been enhanced ready for production deployment. A new
method to free cache entries created in 51d.c has been added to ensure
memory is released correctly.
This should be backported to 1.6.
Args pointer is now used as the LRU cache domain to ensure the cache
distinguishes between multiple fetch and conv configurations.
This should be backported to 1.6.
Introduction of a new function in the LRU cache source file.
Purpose of this function is to be used to delete a number of entries in
the cache. 'number' is defined by the caller and the key removed are
taken at the tail of the tree
csv_enc_append() returns a pointer to the beginning of the encoded
string, which makes it convenient to use in printf(). However it's not
convenient for use in chunks as it may leave an unused byte at the
beginning depending on the automatic quoting. Let's modify it to work
in two passes. First it looks for a character that requires escaping
using strpbrk(), and second it encodes the string. This way it
guarantees to always start at the first available byte of the chunk.
Additionally it made the code quite simpler.
We have csv_enc() but there's no way to append some CSV-encoded data
to an existing chunk, so here we modify the existing function for this
and create an inlined version of csv_enc() which first resets the output
chunk. It will be handy to append data to an existing chunk without
having to use an extra temporary chunk, or to encode multiple strings
into a single chunk with chunk_newstr().
The patch is quite small, in fact most changes are typo fixes in the
comments.
The function http_reply_and_close has been added in proto_http.c to wrap calls
to stream_int_retnclose. This functions will be modified when the filters will
be added.
When the response body is forwarded, if the server closes the input before the
end, an error is thrown. But if the data processing is too slow, all data could
already be received and pending in the input buffer. So this is a bug to stop
processing in this context. The server doesn't really closed the input before
the end.
As an example, this could happen when HAProxy is configured to do compression
offloading. If the server closes the connection explicitly after the response
(keep-alive disabled by the server) and if HAProxy receives the data faster than
they are compressed, then the response could be truncated.
This patch fixes the bug by checking if some pending data remain in the input
buffer before returning an error. If yes, the processing continues.
Several cases of "<=" instead of "<" were found in the url_param parser,
mostly affecting the case where the parameter is wrapping. They shouldn't
affect header operations, just body parsing in a wrapped pipelined request.
The code is a bit complicated with certain operations done multiple times
in multiple functions, so it's not sure others are not left. This code
must be re-audited.
It should only be backported to 1.6 once carefully tested, because it is
possible that other bugs relied on these ones.
The applet can't have access to the session private data. This patch
fix this problem. Now an applet can use private data stored by actions
and fecthes.
INNER and XFERBODY analyzer were set in order to support HTTP applets
from TCP rulesets, but this does not work (cf previous patch).
Other cases already provides theses analyzers, so their addition is
not needed. Furthermore if INNER was set it could cause some headers
to be rewritten (ex: connection) after headers were already forwarded,
resulting in a crash in buffer_insert_line2().
Special thanks to Bernd Helm for providing very detailed information,
captures and stack traces making it possible to spot the root cause
here.
This fix must be backported to 1.6.
HTTP applets request requires everything initilized by
"http_process_request" (analyzer flag AN_REQ_HTTP_INNER).
The applet will be immediately initilized, but its before
the call of this analyzer.
Due to this problem HTTP applets could be called with uncompletely
initialized http_txn.
This fix must be backported to 1.6.
In certain circumstances (eg: Lua HTTP applet called from a
TCP ruleset before http_process_request()), the HTTP TXN is not
yet fully initialized so some information it contains cannot be
relied on. Such information include the HTTP version, the state
of the expect: 100-continue header, the connection header and
the transfer-encoding header.
Here the bug only turns something which already doesn't work
into something wrong, but better avoid any references to the
http_txn from the Lua code to avoid future mistakes.
This patch should be backported into 1.6 for code consistency.
If a sample fetch needing http_txn is called from an HTTP Lua applet,
the result will be invalid and may even cause a crash because some HTTP
data can be forwarded and the HTTP txn is no longer valid.
Here the solution is to ensure that a fetch called from Lua never
needs http_txn. This is done thanks to a new flag HLUA_F_MAY_USE_HTTP
which indicates whether or not it is safe to call a fetch which needs
HTTP.
This fix needs to be backported to 1.6.
This patch converts a boolean "int" to a bitfiled. The main
reason is to save space in the struct if another flag may will
be require.
Note that this patch is required for next fix and will need to be
backported to 1.6.
When a POST is processed by a Lua service, the HTTP header are
potentially gone. So, we cannot retrieve their content using
the standard "hdr" sample fetchs (which will soon become invalid
anyway) from an applet.
This patch add an entry "headers" to the object applet_http. This
entry is an array containing all the headers. It permits to use the
HTTP headers during the processing of the service.
Many thanks to Jan Bruder for reporting this issue with enough
details to reproduce it.
This patch will have to be backported to 1.6 since it will be the
only way to access headers from Lua applets.
New sticktable entries learned from a remote peer can be pushed to others after
a random delay because they are not inserted at the right position in the updates
tree.
When memmax is forced using "-m", the per-process memory limit is enforced
using setrlimit(), but this value is not used to compute the automatic
maxconn limit. In addition, the per-process memory limit didn't consider
the fact that the shared SSL cache only needs to be accounted once.
The doc was also fixed to clearly state that "-m" is global and not per
process. It makes sense because people who use -m want to protect the
system's resources regardless of whatever appears in the configuration.
This setting used to be assigned to a variable tunable from a constant
and for an unknown reason never made its way into the config parser.
tune.recv_enough <number>
Haproxy uses some hints to detect that a short read indicates the end of the
socket buffers. One of them is that a read returns more than <recv_enough>
bytes, which defaults to 10136 (7 segments of 1448 each). This default value
may be changed by this setting to better deal with workloads involving lots
of short messages such as telnet or SSH sessions.
Added support for loading mutiple certs into shared contexts when they
are specified in a crt-list
Note that it's not practical to support SNI filters with multicerts, so
any SNI filters that's provided to the crt-list is ignored if a
multi-cert opertion is used.
Added ability for users to specify multiple certificates that all relate
a single server. Users do this by specifying certificate "cert_name.pem"
but having "cert_name.pem.rsa", "cert_name.pem.dsa" and/or
"cert_name.pem.ecdsa" in the directory.
HAProxy will now intelligently search for those 3 files and try combine
them into as few SSL_CTX's as possible based on CN/SAN. This will allow
HAProxy to support multiple ciphersuite key algorithms off a single
SSL_CTX.
This change integrates into the existing architecture of SNI lookup and
multiple SNI's can point to the same SSL_CTX, which can support multiple
key_types.
Added cert_key_and_chain struct to ssl. This struct will store the
contents of a crt path (from the config file) into memory. This will
allow us to use the data stored in memory instead of reading the file
multiple times.
This will be used to support a later commit to load multiple pkeys/certs
into a single SSL_CTX
In order to properly enable sched_setaffinity, in some versions of Linux,
it is rather _GNU_SOURCE than __USE_GNU (spotted on Alpine Linux for instance),
also for the sake of consistency as __USE_GNU seems not used across the code and
for last, it seems on Linux it is the best way to enable non portable code.
On Linux glibc's based versions, it seems _GNU_SOURCE defines __USE_GNU
it should be safe enough.
Krishna Kumar reported that the following configuration doesn't permit
HTTP reuse between two clients :
frontend private-frontend
mode http
bind :8001
default_backend private-backend
backend private-backend
mode http
http-reuse always
server bck 127.0.0.1:8888
The reason for this is that in http_end_txn_clean_session() we check the
stream's backend backend's http-reuse option before deciding whether the
backend connection should be moved back to the server's pool or not. But
since we're doing this after the call to http_reset_txn(), the backend is
reset to match the frontend, which doesn't have the option. However it
will work fine in a setup involving a "listen" section.
We just need to keep a pointer to the current backend before calling
http_reset_txn(). The code does that and replaces the few remaining
references to s->be inside the same function so that if any part of
code were to be moved later, this trap doesn't happen again.
This fix must be backported to 1.6.
A small configuration parsing error exists when no port is setup on the
server IP:port statement and the server's parameter 'port' is not set
and if the first tcp-check rule is a comment, like in the example below:
backend b
option tcp-check
tcp-check comment blah
tcp-check connect 8444
server s 127.0.0.1 check
In such case, an ALERT is improperly returned, despite this
configuration is valid and works.
The new code move the pointer to the first tcp-check rule which isn't a
comment before checking the presence of the port.
backport status: 1.6 and above
Current configuration parsing is permissive in such situation:
A server in a backend with no port conigured on the IP address
statement, no 'port' parameter configured and last rule of a tcp-check
is a CONNECT with no port.
The current code currently parses all the rules to validate a port is
well available, but it misses the last one, which means such
configuration is valid:
backend b
option tcp-check
tcp-check connect port 8444
tcp-check connect
server s 127.0.0.1 check
the second connect tentative is sent to port '0'...
Current patch fixes this by parsing the list the right way, including
the last rule.
backport status: 1.6 and above
A segfault can occur during at the initialization phase, when an unknown
"mailers" name is configured. This happens when "email-alert myhostname" is not
set, where a direct pointer to an array is used instead of copying the string,
causing the segfault when haproxy tries to free the memory.
This is a minor issue because the configuration is invalid and a fatal error
will remain, but it should be fixed to prevent reload issues.
Example of minimal configuration to reproduce the bug :
backend example
email-alert mailers NOT_FOUND
email-alert from foo@localhost
email-alert to bar@localhost
This fix must be backported to 1.6.
Tommy Atkinson and Sylvain Faivre reported that email alerts didn't work when
they were declared in the defaults section. This is due to the use of an
internal attribute which is set once an email-alert is at least partially
configured. But this attribute was not propagated to the current proxy during
the configuration parsing.
Not that the issue doesn't occur if "email-alert myhostname" is configured in
the defaults section.
This fix must be backported to 1.6.
Currently urlp fetching samples were able to find parameters with an empty
value, but the return code depended on the value length. The final result was
that acls using urlp couldn't match empty values.
Example of acl which always returned "false":
acl MATCH_EMPTY urlp(foo) -m len 0
The fix consists in unconditionally return 1 when the parameter is found.
This fix must be backported to 1.6 and 1.5.
Right now it's possible to change the global compression rate limiting
without the CLI being at the admin level.
This fix must be backported to 1.6 and 1.5.
It's pointless to reserve this amount of memory when zlib is not used.
Adding the condition will make build scripts easier to manage. This may
be backported to 1.6.
client-fin and server-fin are bogus. They are applied on the write
side after a SHUTR was seen. The immediate effect is that sometimes
if a SHUTR was seen after a SHUTW on the same side, the timeout is
enabled again regardless of the fact that the output is already
closed. This results in the timeout event not to be processed and
a busy poll loop to happen until another timeout on the stream gets
rid of it. Note that haproxy continues its job during this, it's just
that it eats all the CPU trying to handle an event that it ignores.
An reproducible case consists in having a client stop reading data from
a server to ensure data remain in the response buffer, then the client
sends a shutdown(write). If abortonclose is enabled on haproxy, the
shutdown is passed to the server side and the server responds with a
SHUTR that cannot immediately be forwarded to the client since the
buffer is full. During this time the event is ignored and the task is
woken again in loops.
It is worth noting that the timeout handling since 1.5 is a bit fragile
and that it might be possible that other similar conditions still exist,
so the timeout handling should be audited regarding this issue.
Many thanks to BaiYang for providing detailed information showing the
problem in action.
This bug also affects 1.5 thus the fix must be backported.
There is a bug where "option http-keep-alive" doesn't force a response
to stay in keep-alive if the server sends the FIN along with the response
on the second or subsequent response. The reason is that the auto-close
was forced enabled when recycling the HTTP transaction and it's never
disabled along the response processing chain before the SHUTR gets a
chance to be forwarded to the client side. The MSG_DONE state of the
HTTP response properly disables it but too late.
There's no more reason for enabling auto-close here, because either it
doesn't matter in non-keep-alive modes because the connection is closed,
or it is automatically enabled by process_stream() when it sees there's
no analyser on the stream.
This bug also affects 1.5 so a backport is desired.
Sander Klein reported an error messages about SSLv3 not
being supported on Debian 8, although he didn't force-sslv3.
Vincent Bernat tracked this down to the LUA initialization, which
actually does force-sslv3.
This patch removes force-sslv3 from the LUA initialization, so
the LUA SSL socket can actually use TLS and doesn't trigger
warnings when SSLv3 is not supported by libssl (such as in
Debian 8).
This should be backported to 1.6.
When a server timeout is detected on the second or nth request of a keep-alive
connection, HAProxy closes the connection without writing a response.
Some clients would fail with a remote disconnected exception and some
others would retry potentially unsafe requests.
This patch removes the special case and makes sure a 504 timeout is
written back whenever a server timeout is handled.
Signed-off-by: lsenta <laurent.senta@gmail.com>
When the txn.done() fiunction is called, the ouput buffer is cleaned,
but the associated relative pointer on the HTTP requests elements
is not reseted.
This patch remove this cleanup, because the output buffer may contain
data to forward.
The initial record layer version in a SSL handshake may be set to TLSv1.0
or similar for compatibility reasons, this is allowed as per RFC5246
Appendix E.1 [1]. Some implementations are Openssl [2] and NSS [3].
A related issue has been fixed some time ago in commit 57d229747
("BUG/MINOR: acl: req_ssl_sni fails with SSLv3 record version").
Fix this by using the real client hello version instead of the record
layer version.
This was reported by Julien Vehent and analyzed by Cyril Bonté.
The initial patch is from Julien Vehent as well.
This should be backported to stable series, the req_ssl_ver keyword was
first introduced in 1.3.16.
[1] https://tools.ietf.org/html/rfc5246#appendix-E.1
[2] 4a1cf50187
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=774547
fgets() can return NULL on error or when EOF occurs. This patch adds a
check of fgets() return value and displays a warning if the first line of
the server state file can not be read. Additionally, we make sure to close
the previously opened file descriptor.
It is possible to create a http capture rule which points to a capture slot
id which does not exist.
Current patch prevent this when parsing configuration and prevent running
configuration which contains such rules.
This configuration is now invalid:
frontend f
bind :8080
http-request capture req.hdr(User-Agent) id 0
default_backend b
this one as well:
frontend f
bind :8080
declare capture request len 32 # implicit id is 0 here
http-request capture req.hdr(User-Agent) id 1
default_backend b
It applies of course to both http-request and http-response rules.
Causes HAProxy to emit a static string to the agent on every check,
so that you can independently control multiple services running
behind a single agent port.
The direction (request or response) is not propagated in the
sample fecthes called throught Lua. This patch adds the direction
status in some structs (hlua_txn and hlua_smp) to make sure that
the sample fetches will be called with all the information.
The converters can not access to a TXN object, so there are not
impacted the direction. However, the samples used as input of the
Lua converter wrapper are initiliazed with the direction. Thereby,
the struct smp stay consistent.
[wt: needs to be backported to 1.6]
This patch cleanups the direction names. It replaces numeric values,
by the associated defines. It ensure the compliance with values found
somwhere else in HAProxy.
It is required by the bugfix patch which is following.
[wt: needs to be backported to 1.6]
Current resolvers section parsing function is permissive on nameserver
id and two nameservers may have the same id.
It's a shame, since we don't know for example, whose statistics belong
to which nameserver...
From now, configuration with duplicated nameserver id in a resolvers
section are considered as broken and returns a fatal error when parsing.
Cyril Bonté reported a reproduceable sequence which can lead to a crash
when using backend connection reuse. The problem comes from the fact that
we systematically add the server connection to an idle pool at the end of
the HTTP transaction regardless of the fact that it might already be there.
This is possible for example when processing a request which doesn't use
a server connection (typically a redirect) after a request which used a
connection. Then after the first request, the connection was already in
the idle queue and we're putting it a second time at the end of the second
request, causing a corruption of the idle pool.
Interestingly, the memory debugger in 1.7 immediately detected a suspicious
double free on the connection, leading to a very early detection of the
cause instead of its consequences.
Thanks to Cyril for quickly providing a working reproducer.
This fix must be backported to 1.6 since connection reuse was introduced
there.
A bug lied in the parsing of DNS CNAME response, leading HAProxy to
think the CNAME was improperly resolved in the response.
This should be backported into 1.6 branch
The status DNS_UPD_NAME_ERROR returned by dns_get_ip_from_response and
which means the queried name can't be found in the response was
improperly processed (fell into the default case).
This lead to a loop where HAProxy simply resend a new query as soon as
it got a response for this status and in the only case where such type
of response is the very first one received by the process.
This should be backported into 1.6 branch
It was accidently discovered that limiting haproxy to 5000 MB leads to
an effective limit of 904 MB. This is because the computation for the
size limit is performed by multiplying rlimit_memmax by 1048576, and
doing so causes the operation to be performed on an int instead of a
long or long long. Just switch to 1048576ULL as is done at other places
to fix this.
This bug affects all supported versions, the backport is desired, though
it rarely affects users since few people apply memory limits.
When DEBUG_MEMORY_POOLS is used, we now use the link pointer at the end
of the pool to store a pointer to the pool, and to control it during
pool_free2() in order to serve four purposes :
- at any instant we can know what pool an object was allocated from
when examining memory, hence how we should possibly decode it ;
- it serves to detect double free when they happen, as the pointer
cannot be valid after the element is linked into the pool ;
- it serves to detect if an element is released in the wrong pool ;
- it serves as a canary, to detect if some buffers experienced an
overflow before being release.
All these elements will definitely help better troubleshoot strange
situations, or at least confirm that certain conditions did not happen.
When debugging a core file, it's sometimes convenient to be able to
visit the released entries in the pools (typically last released
session). Unfortunately the first bytes of these entries are destroyed
by the link elements of the pool. And of course, most structures have
their most accessed elements at the beginning of the structure (typically
flags). Let's add a build-time option DEBUG_MEMORY_POOLS which allocates
an extra pointer in each pool to put the link at the end of each pool
item instead of the beginning.
This commit adds support for setting a per-server maxconn from the stats
socket. The only really notable part of this commit is that we need to
check if maxconn == minconn before changing things, as this indicates
that we are NOT using dynamic maxconn. When we are not using dynamic
maxconn, we should update maxconn/minconn in lockstep.
The function 'EVP_PKEY_get_default_digest_nid()' was introduced in OpenSSL
1.0.0. So for older version of OpenSSL, compiled with the SNI support, the
HAProxy compilation fails with the following error:
src/ssl_sock.c: In function 'ssl_sock_do_create_cert':
src/ssl_sock.c:1096:7: warning: implicit declaration of function 'EVP_PKEY_get_default_digest_nid'
if (EVP_PKEY_get_default_digest_nid(capkey, &nid) <= 0)
[...]
src/ssl_sock.c:1096: undefined reference to `EVP_PKEY_get_default_digest_nid'
collect2: error: ld returned 1 exit status
Makefile:760: recipe for target 'haproxy' failed
make: *** [haproxy] Error 1
So we must add a #ifdef to check the OpenSSL version (>= 1.0.0) to use this
function. It is used to get default signature digest associated to the private
key used to sign generated X509 certificates. It is called when the private key
differs than EVP_PKEY_RSA, EVP_PKEY_DSA and EVP_PKEY_EC. It should be enough for
most of cases.
Basically, it's ill-defined and shouldn't really be used going forward.
We can't guarantee that resolvers will do the 'legwork' for us and
actually resolve CNAMES when we request the ANY query-type. Case in point
(obfuscated, clearly):
PRODUCTION! ahayworth@secret-hostname.com:~$
dig @10.11.12.53 ANY api.somestartup.io
; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> @10.11.12.53 ANY api.somestartup.io
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62454
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 0
;; QUESTION SECTION:
;api.somestartup.io. IN ANY
;; ANSWER SECTION:
api.somestartup.io. 20 IN CNAME api-somestartup-production.ap-southeast-2.elb.amazonaws.com.
;; AUTHORITY SECTION:
somestartup.io. 166687 IN NS ns-1254.awsdns-28.org.
somestartup.io. 166687 IN NS ns-1884.awsdns-43.co.uk.
somestartup.io. 166687 IN NS ns-440.awsdns-55.com.
somestartup.io. 166687 IN NS ns-577.awsdns-08.net.
;; Query time: 1 msec
;; SERVER: 10.11.12.53#53(10.11.12.53)
;; WHEN: Mon Oct 19 22:02:29 2015
;; MSG SIZE rcvd: 242
HAProxy can't handle that response correctly.
Rather than try to build in support for resolving CNAMEs presented
without an A record in an answer section (which may be a valid
improvement further on), this change just skips ANY record types
altogether. A and AAAA are much more well-defined and predictable.
Notably, this commit preserves the implicit "Prefer IPV6 behavior."
Furthermore, ANY query type by default is a bad idea: (from Robin on
HAProxy's ML):
Using ANY queries for this kind of stuff is considered by most people
to be a bad practice since besides all the things you named it can
lead to incomplete responses. Basically a resolver is allowed to just
return whatever it has in cache when it receives an ANY query instead
of actually doing an ANY query at the authoritative nameserver. Thus
if it only received queries for an A record before you do an ANY query
you will not get an AAAA record even if it is actually available since
the resolver doesn't have it in its cache. Even worse if before it
only got MX queries, you won't get either A or AAAA
Kim Seri reported that haproxy 1.6.0 crashes after a few requests
when a bind line has SSL enabled with more than one certificate. This
was caused by an insufficient condition to free generated certs during
ssl_sock_close() which can also catch other certs.
Christopher Faulet analysed the situation like this :
-------
First the LRU tree is only initialized when the SSL certs generation is
configured on a bind line. So, in the most of cases, it is NULL (it is
not the same thing than empty).
When the SSL certs generation is used, if the cache is not NULL, a such
certificate is pushed in the cache and there is no need to release it
when the connection is closed.
But it can be disabled in the configuration. So in that case, we must
free the generated certificate when the connection is closed.
Then here, we have really a bug. Here is the buggy part:
3125) if (conn->xprt_ctx) {
3126) #ifdef SSL_CTRL_SET_TLSEXT_HOSTNAME
3127) if (!ssl_ctx_lru_tree && objt_listener(conn->target)) {
3128) SSL_CTX *ctx = SSL_get_SSL_CTX(conn->xprt_ctx);
3129) if (ctx != 3130)
SSL_CTX_free(ctx);
3131) }
3133) SSL_free(conn->xprt_ctx);
3134) conn->xprt_ctx = NULL;
3135) sslconns--;
3136) }
The check on the line 3127 is not enough to determine if this is a
generated certificate or not. Because ssl_ctx_lru_tree is NULL,
generated certificates, if any, must be freed. But here ctx should also
be compared to all SNI certificates and not only to default_ctx. Because
of this bug, when a SNI certificate is used for a connection, it is
erroneously freed when this connection is closed.
-------
Christopher provided this reliable reproducer :
----------
global
tune.ssl.default-dh-param 2048
daemon
listen ssl_server
mode tcp
bind 127.0.0.1:4443 ssl crt srv1.test.com.pem crt srv2.test.com.pem
timeout connect 5000
timeout client 30000
timeout server 30000
server srv A.B.C.D:80
You just need to generate 2 SSL certificates with 2 CN (here
srv1.test.com and srv2.test.com).
Then, by doing SSL requests with the first CN, there is no problem. But
with the second CN, it should segfault on the 2nd request.
openssl s_client -connect 127.0.0.1:4443 -servername srv1.test.com // OK
openssl s_client -connect 127.0.0.1:4443 -servername srv1.test.com // OK
But,
openssl s_client -connect 127.0.0.1:4443 -servername srv2.test.com // OK
openssl s_client -connect 127.0.0.1:4443 -servername srv2.test.com // KO
-----------
A long discussion led to the following proposal which this patch implements :
- the cert is generated. It gets a refcount = 1.
- we assign it to the SSL. Its refcount becomes two.
- we try to insert it into the tree. The tree will handle its freeing
using SSL_CTX_free() during eviction.
- if we can't insert into the tree because the tree is disabled, then
we have to call SSL_CTX_free() ourselves, then we'd rather do it
immediately. It will more closely mimmick the case where the cert
is added to the tree and immediately evicted by concurrent activity
on the cache.
- we never have to call SSL_CTX_free() during ssl_sock_close() because
the SSL session only relies on openssl doing the right thing based on
the refcount only.
- thus we never need to know how the cert was created since the
SSL_CTX_free() is either guaranteed or already done for generated
certs, and this protects other ones against any accidental call to
SSL_CTX_free() without having to track where the cert comes from.
This patch also reduces the inter-dependence between the LRU tree and
the SSL stack, so it should cause less sweating to migrate to threads
later.
This bug is specific to 1.6.0, as it was introduced after dev7 by
this fix :
d2cab92 ("BUG/MINOR: ssl: fix management of the cache where forged certificates are stored")
Thus a backport to 1.6 is required, but not to 1.5.
Susheel Jalali reported a confusing bug in namespaces implementation.
If namespaces are enabled at build time (USE_NS=1) and *no* namespace
is used at all in the whole config file, my_socketat() returns -1 and
all socket bindings fail. This is because of a wrong condition in this
function. A possible workaround consists in creating some namespaces.
The function which parses a DNS response buffer did not move properly a
pointer when reading a packet where records does not use DNS "message
compression" techniques.
Thanks to 0yvind Johnsen for the help provided during the troubleshooting
session.
This is equivalent to commit 2af207a ("MEDIUM: tcp: implement tcp-ut
bind option to set TCP_USER_TIMEOUT") except that this time it works
on the server side. The purpose is to detect dead server connections
even when checks are rare, disabled, or after a soft reload (since
checks are disabled there as well), and to ensure client connections
will get killed faster.
Baptiste reported a segfault when the "id" keyword was passed on the
"stats socket" line. The problem is related to the fact that the stats
parser stats_parse_global() passes curpx instead of global.stats_fe to
the keyword parser. Indeed, curpx being a pointer to the proxy in the
current section, it is not correct here since the global section does
not describe a proxy. It's just by pure luck that only bind_parse_id()
uses the proxy since any other keyword parser could use it as well.
The bug has no impact since the id specified here is not usable at all
and can be discarded from a faulty configuration.
This fix must be backported to 1.5.
Lua needs to known the direction of the http data processed (request or
response). It checks the flag SMP_OPT_DIR_REQ, buf this flag is 0. This patch
correctly checks the flags after applying the SMP_OPT_DIR mask.
A previous commit broke the interactive stats cli prompt. Specifically,
it was not clear that we could be in STAT_CLI_PROMPT when we get to
the output functions for the cli handler, and the switch statement did
not handle this case. We would then fall through to the default
statement, which was recently changed to set error flags on the socket.
This in turn causes the socket to be closed, which is not what we wanted
in this specific case.
To fix, we add a case for STAT_CLI_PROMPT, and simply break out of the
switch statement.
Testing:
- Connected to unix stats socket, issued 'prompt', observed that I
could issue multiple consecutive commands.
- Connected to unix stats socket, issued 'prompt', observed that socket
timed out after inactivity expired.
- Connected to unix stats socket, issued 'prompt' then 'set timeout cli
5', observed that socket timed out after 5 seconds expired.
- Connected to unix stats socket, issued invalid commands, received
usage output.
- Connected to unix stats socket, issued 'show info', received info
output and socket disconnected.
- Connected to unix stats socket, issued 'show stat', received stats
output and socket disconnected.
- Repeated above tests with TCP stats socket.
[wt: no backport needed, this was introduced during the applet rework in 1.6]
Now, A callback is defined for generated certificates to set DH parameters for
ephemeral key exchange when required.
In same way, when possible, we also defined Elliptic Curve DH (ECDH) parameters.
This is done by adding EVP_PKEY_EC type in supported types for the CA private
key when we get the message digest used to sign a generated X509 certificate.
So now, we support DSA, RSA and EC private keys.
And to be sure, when the type of the private key is not directly supported, we
get its default message digest using the function
'EVP_PKEY_get_default_digest_nid'.
We also use the key of the default certificate instead of generated it. So we
are sure to use the same key type instead of always using a RSA key.
the file specified by the SSL option 'ca-sign-file' can now contain the CA
certificate used to dynamically generate certificates and its private key in any
order.
Commit d2cab92 ("BUG/MINOR: ssl: fix management of the cache where forged
certificates are stored") removed some needed #ifdefs resulting in ssl not
building on older openssl versions where SSL_CTRL_SET_TLSEXT_HOSTNAME is
not defined :
src/ssl_sock.c: In function 'ssl_sock_load_ca':
src/ssl_sock.c:2504: error: 'ssl_ctx_lru_tree' undeclared (first use in this function)
src/ssl_sock.c:2504: error: (Each undeclared identifier is reported only once
src/ssl_sock.c:2504: error: for each function it appears in.)
src/ssl_sock.c:2505: error: 'ssl_ctx_lru_seed' undeclared (first use in this function)
src/ssl_sock.c: In function 'ssl_sock_close':
src/ssl_sock.c:3095: error: 'ssl_ctx_lru_tree' undeclared (first use in this function)
src/ssl_sock.c: In function '__ssl_sock_deinit':
src/ssl_sock.c:5367: error: 'ssl_ctx_lru_tree' undeclared (first use in this function)
make: *** [src/ssl_sock.o] Error 1
Reintroduce the ifdefs around the faulty areas.
First, the LRU cache must be initialized after the configuration parsing to
correctly set its size.
Next, the function 'ssl_sock_set_generated_cert' returns -1 when an error occurs
(0 if success). In that case, the caller is responsible to free the memory
allocated for the certificate.
Finally, when a SSL certificate is generated by HAProxy but cannot be inserted
in the cache, it must be freed when the SSL connection is closed. This happens
when 'tune.ssl.ssl-ctx-cache-size' is set to 0.
The 'OPTIONS' method was not in the list of supported HTTP methods and
find_http_meth return HTTP_METH_OTHER instead of HTTP_METH_OPTIONS.
[wt: this fix needs to be backported at least to 1.5, 1.4 and 1.3]
When debugging an issue, sometimes it can be useful to be able to use
byte 0 to poison memory areas, resulting in the same effect as a calloc().
This patch changes the default mem_poison_byte to -1 to disable it so that
all positive values are usable.
HAProxy could already support being passed a file list on the command
line, by passing multiple times "-f" followed by a file name. People
have been complaining that it made it hard to pass file lists from init
scripts.
This patch introduces an end of arguments using the common "--" tag,
after which only file names may appear. These files are then added to
the existing list of other files specified using -f and are loaded in
their declaration order. Thus it becomes possible to do something like
this :
haproxy -sf $(pidof haproxy) -- /etc/haproxy/global.cfg /etc/haproxy/customers/*.cfg
Given that all command line arguments start with a '-' and that
no pid number can start with this character, there's no constraint
to make the pid list the last argument. Let's relax this rule.
Thierry reported that keep-alive still didn't cope well with Lua
services. The reason is that for now applets have to be closed at
the end of a transaction so we want to work in server-close mode,
which isn't noticeable by the client since it still sees keep-alive.
Additionally we want to enable the request body transfer analyser
which will be needed to synchronize with the response analyser to
indicate the end of the transfer.
The release handler used to be called twice for some time and just by
pure luck we never ended up double-freeing the data there. Add a NULL
to ensure this can never happen should a future change permit this
situation again.
This commit adds support for dumping all resolver stats. Specifically
if a command 'show stats resolvers' is issued withOUT a resolver section
id, we dump all known resolver sections. If none are configured, a
message is displayed indicating that.
When using the external-check command option HAProxy was failing to
start with a fatal error "'external-check' cannot handle unexpected
argument". When looking at the code it was looking for an incorrect
argument. Also correcting an Alert message text as spotted by by
PiBa-NL.
When first parameter to getaddrinfo() is not NULL (it is always not NULL
in str2ip()), on Linux AI_PASSIVE value for ai_flags is ignored. On
FreeBSD, when AI_PASSIVE is specified and hostname parameter is not NULL,
getaddrinfo() ignores local address selection policy, always returning
AAAA record. Pass zero ai_flags to behave correctly on FreeBSD, this
change should be no-op for Linux.
This fix should be backported to 1.5 as well, after some observation
period.
Michael Ezzell reported a bug causing haproxy to segfault during startup
when trying to send syslog message from Lua. The function __send_log() can
be called with *p that is NULL and/or when the configuration is not fully
parsed, as is the case with Lua.
This patch fixes this problem by using individual vectors instead of the
pre-generated strings log_htp and log_htp_rfc5424.
Also, this patch fixes a problem causing haproxy to write the wrong pid in
the logs -- the log_htp(_rfc5424) strings were generated at the haproxy
start, but "pid" value would be changed after haproxy is started in
daemon/systemd mode.
Only the main execution function can set the run flag, because it is
the last function before the execution time.
This patch removes the flag set by another function. It will be used
by the new lua timeout counter.
Commit e11cfcd ("MINOR: config: new backend directives:
load-server-state-from-file and server-state-file-name") introduced a bug
which can cause haproxy to crash upon startup by sending user-controlled
data in a format string when emitting a warning. Fix the way the warning
message is built to avoid this.
No backport is needed, this was introduced in 1.6-dev6 only.
Commit e11cfcd ("MINOR: config: new backend directives:
load-server-state-from-file and server-state-file-name") caused these
warnings when building with Clang :
src/server.c:1972:21: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare]
(srv_uweight < 0) || (srv_uweight > SRV_UWGHT_MAX))
~~~~~~~~~~~ ^ ~
src/server.c:1980:21: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare]
(srv_iweight < 0) || (srv_iweight > SRV_UWGHT_MAX))
~~~~~~~~~~~ ^ ~
Indeed, srv_iweight and srv_uweight are unsigned. Just drop the offending test.
The conn_sock_drain() call is only there to force the system to ACK
pending data in case of TCP_QUICKACK so that the client doesn't retransmit,
otherwise it leads to a real RST making the feature useless. There's no
point in draining the connection when quick ack cannot be disabled, so
let's move the call inside the ifdef part.
The silent-drop action is supposed to close with a TCP reset that is
either not sent or not too far. But since it's on the client-facing
side, the socket's lingering is enabled by default and the RST only
occurs if some pending unread data remain in the queue when closing.
This causes some clean shutdowns to occur with retransmits, which is
not good at all. Force linger_risk on the socket to flush all data
and destroy the socket.
No backport is needed, this was introduced in 1.6-dev6.
req.ssl_st_ext : integer
Returns 0 if the client didn't send a SessionTicket TLS Extension (RFC5077)
Returns 1 if the client sent SessionTicket TLS Extension
Returns 2 if the client also sent non-zero length TLS SessionTicket
This stops the evaluation of the rules and makes the client-facing
connection suddenly disappear using a system-dependant way that tries
to prevent the client from being notified. The effect it then that the
client still sees an established connection while there's none on
HAProxy. The purpose is to achieve a comparable effect to "tarpit"
except that it doesn't use any local resource at all on the machine
running HAProxy. It can resist much higher loads than "tarpit", and
slow down stronger attackers. It is important to undestand the impact
of using this mechanism. All stateful equipments placed between the
client and HAProxy (firewalls, proxies, load balancers) will also keep
the established connection for a long time and may suffer from this
action. On modern Linux systems running with enough privileges, the
TCP_REPAIR socket option is used to block the emission of a TCP
reset. On other systems, the socket's TTL is reduced to 1 so that the
TCP reset doesn't pass the first router, though it's still delivered to
local networks.
tcp-request connection had an inverted condition on action_ptr, resulting
in no registered actions to be usable since commit 4214873 ("MEDIUM: actions:
remove ACTION_STOP") merged in 1.6-dev5. Very few new actions were impacted.
No backport is needed.
When peers are stopped due to not being running on the appropriate
process, we want to completely release them and unregister their signals
and task in order to ensure there's no way they may be called in the
future.
Note: ideally we should have a list of all tables attached to a peers
section being disabled in order to unregister them and void their
sync_task. It doesn't appear to be *that* easy for now.
When performing a soft stop, we used to wake up every proxy's task and
each of their table's task. The problem is that since we're able to stop
proxies and peers not bound to a specific process, we may end up calling
random junk by doing so of the proxy we're waking up is already stopped.
This causes a segfault to appear during soft reloads for old processes
not bound to a peers section if such a section exists in other processes.
Let's only consider proxies that are not stopped when doing this.
This fix must be backported to 1.5 which also has the same issue.
Since commit f83d3fe ("MEDIUM: init: stop any peers section not bound
to the correct process"), it is possible to stop unused peers on certain
processes. The problem is that the pause/resume/stop functions are not
aware of this and will pass a NULL proxy pointer to the respective
functions, resulting in segfaults in unbound processes during soft
restarts.
Properly check that the peers' frontend is still valid before calling
them.
This bug also affects 1.5 so the fix must be backported. Note that this
fix is not enough to completely get rid of the segfault, the next one
is needed as well.
Introduction of the new keyword about the client's cookie name
via a new config entry.
Splits the work between the original convertor and the new fetch type.
The latter iterates through the whole list of HTTP headers but first,
make sure the request data are ready to process and set the input
to the string type rightly.
The API is then able to filter itself which headers are gueninely useful,
with a special care of the Client's cookie.
[WT: the immediate impact for the end user is that configs making use of
"da-csv" won't load anymore and will need to be modified to use either
"da-csv-conv" or "da-csv-fetch"]
This patch adds a new RFC5424-specific log-format for the structured-data
that is automatically send by __send_log() when the sender is in RFC5424
mode.
A new statement "log-format-sd" should be used in order to set log-format
for the structured-data part in RFC5424 formatted syslog messages.
Example:
log-format-sd [exampleSDID@1234\ bytes=\"%B\"\ status=\"%ST\"]
The function __send_log() iterates over senders and passes the header as
the first vector to sendmsg(), thus it can send a logger-specific header
in each message.
A new logger arguments "format rfc5424" should be used in order to enable
RFC5424 header format. For example:
log 10.2.3.4:1234 len 2048 format rfc5424 local2 info
At the moment we have to call snprintf() for every log line just to
rebuild a constant. Thanks to sendmsg(), we send the message in 3 parts:
time-based header, proxy-specific hostname+log-tag+pid, session-specific
message.
static_table_key, get_http_auth_buff and swap_buffer static variables
are now freed during deinit and the two previously new functions are
called as well. In addition, the 'trash' string buffer is cleared.
A new function introduced meant to be called during general deinit phase.
During the configuration parsing, the section entries are all allocated.
This new function free them.
The tune.maxrewrite parameter used to be pre-initialized to half of
the buffer size since the very early days when buffers were very small.
It has grown to absurdly large values over the years to reach 8kB for a
16kB buffer. This prevents large requests from being accepted, which is
the opposite of the initial goal.
Many users fix it to 1024 which is already quite large for header
addition.
So let's change the default setting policy :
- pre-initialize it to 1024
- let the user tweak it
- in any case, limit it to tune.bufsize / 2
This results in 15kB usable to buffer HTTP messages instead of 8kB, and
doesn't affect existing configurations which already force it.
This new target can be called from the frontend or the backend. It
is evaluated just before the backend choice and just before the server
choice. So, the input stream or HTTP request can be forwarded to a
server or to an internal service.
This flag is used by custom actions to know that they're called for the
first time. The only case where it's not set is when they're resuming
from a yield. It will be needed to let them know when they have to
allocate some resources.
The garbage collector is a little bit heavy to run, and it was added
only for cosockets. This patch prevent useless executions when no
cosockets are used.
When the channel is down, the applet is waked up. Before this patch,
the applet closes the stream and the unread data are discarded.
After this patch, the stream is not closed except if the buffers are
empty.
It will be closed later by the close function, or by the garbage collector.
The HAProxy Lua socket respects the Lua Socket tcp specs. these specs
are a little bit limited, it not permits to connect to unix socket.
this patch extends a little it the specs. It permit to accept a
syntax that begin with "unix@", "ipv4@", "ipv6@", "fd@" or "abns@".
Actions may yield but must not do it during the final call from a ruleset
because it indicates there will be no more opportunity to complete or
clean up. This is indicated by ACT_FLAG_FINAL in the action's flags,
which must be passed to hlua_resume().
Thanks to this, an action called from a TCP ruleset is properly woken
up and possibly finished when the client disconnects.
In HTTP it's more difficult to know when to pass the flag or not
because all actions are supposed to be final and there's no inspection
delay. Also, the input channel may very well be closed without this
being an error. So we only set the flag when option abortonclose is
set and the input channel is closed, which is the only case where the
user explicitly wants to forward a close down the chain.
This new flag indicates to a custom action that it must not yield because
it will not be called anymore. This addresses an issue introduced by commit
bc4c1ac ("MEDIUM: http/tcp: permit to resume http and tcp custom actions"),
which made it possible to yield even after the last call and causes Lua
actions not to be stopped when the session closes. Note that the Lua issue
is not fixed yet at this point. Also only TCP rules were handled, for now
HTTP rules continue to let the action yield since we don't know whether or
not it is a final call.
Since commit bc4c1ac ("MEDIUM: http/tcp: permit to resume http and tcp
custom actions"), some actions may yield and be called back when new
information are available. Unfortunately some of them may continue to
yield because they simply don't know that it's the last call from the
rule set. For this reason we'll need to pass a flag to the custom
action to pass such information and possibly other at the same time.
When a thread stops this patch forces a garbage collection which
cleanup and free the memory.
The HAProxy Lua Socket class contains an HAProxy session, so it
uses a big amount of memory, in other this Socket class uses a
filedescriptor and maintain a connection open.
If the class Socket is stored in a global variable, the Socket stay
alive along of the life of the process (except if it is closed by the
other size, or if a timeout is reached). If the class Socket is stored
in a local variable, it must die with this variable.
The socket is closed by a call from the garbage collector. And the
portability and use of a variable is known by the same garbage collector.
so, running the GC just after the end of all Lua code ensure that the
heavy resources like the socket class are freed quickly.
Not having the target set on the connection causes it to be released
at the last moment, and the destination address to randomly be valid
depending on the data found in the memory at this moment. In practice
it works as long as memory poisonning is disabled. The deep reason is
that connect_server() doesn't expect to be called with SF_ADDR_SET and
an existing connection with !reuse. This causes the release of the
connection, its reallocation (!reuse), and taking the address from the
newly allocated connection. This should certainly be improved.
Commit d75cb0f ("BUG/MAJOR: lua: segfault after the channel data is
modified by some Lua action.") introduced a regression causing an
action run from a TCP rule in an HTTP proxy to end in HTTP error
if it terminated cleanly, because it didn't parse the HTTP request!
Relax the test so that it takes into account the opportunity for the
analysers to parse the message.
The lua code must the the appropriate wakeup mechanism for cosockets.
It wakes them up from outside the stream and outside the applet, so it
shouldn't use the applet's wakeup callback which is designed only for
use from within the applet itself. For now it didn't cause any trouble
(yet).
When an action or a fetch modify the channel data, the http request parser
pointer become inconsistent. This patch detects the modification and call
again the parser.
The function lua_settable uses the metatable for acessing data,
so in the C part, we want to index value in the real table and
not throught the meta-methods. It's much faster this way.
When we call the function smp_prefetch_http(), if the txn is not initialized,
it doesn't work. This patch fix this. Now, smp_prefecth_http() permits to use
http with any proxy mode.
While the SI_ST_DIS state is set *after* doing the close on a connection,
it was set *before* calling release on an applet. Applets have no internal
flags contrary to connections, so they have no way to detect they were
already released. Because of this it happened that applets were closed
twice, once via si_applet_release() and once via si_release_endpoint() at
the end of a transaction. The CLI applet could perform a double free in
this case, though the situation to cause it is quite hard because it
requires that the applet is stuck on output in states that produce very
few data.
In order to solve this, we now assign the SI_ST_DIS state *after* calling
->release, and we refrain from doing so if the state is already assigned.
This makes applets work much more like connections and definitely avoids
this double release.
In the future it might be worth making applets have their own flags like
connections to carry their own state regardless of the stream interface's
state, especially when dealing with connection reuse.
No backport is needed since this issue was caused by the rearchitecture
in 1.6.
It's dangerous to call this on internal state error, because it risks
to perform a double-free. This can only happen when a state is not
handled. Note that the switch/case currently doesn't offer any option
for missed states since they're all declared. Better fix this anyway.
The fix was tested by commenting out some entries in the switch/case.
Due to the code inherited from the early CLI mode with the non-interactive
code, the SHUTR status was only considered while waiting for a request,
which prevents the connection from properly being closed during a dump,
and the connection used to remain established. This issue didn't happen
in 1.5 because while this part was missed, the resynchronization performed
in process_session() would detect the situation and handle the cleanup.
No backport is needed.
If an error happens during a dump on the CLI, an explicit call to
cli_release_handler() is performed. This is not needed anymore since
we introduced ->release() in the applet which is called upon error.
Let's remove this confusing call which can even be risky in some
situations.
If an applet tries to write to a closed connection, it hangs forever.
This results in some "get map" commands on the CLI to leave orphaned
connections alive.
Now the applet wakeup function detects that the applet still wants to
write while the channel is closed for reads, which is the equivalent
to the common "broken pipe" situation. In this case, an error is
reported on the stream interface, just as it happens with connections
trying to perform a send() in a similar situation.
With this fix the stats socket is properly released.
This function is a callback made only for calls from the applet handler.
Rename it to remove confusion. It's currently called from the Lua code
but that's not correct, we should call the notify and update functions
instead otherwise it will not enable the applet again.
This one is not needed anymore as what it used to do is either
completely covered by the new stream_int_notify() function, or undesired
and inherited from the past as a side effect of introducing the
connections.
This update is theorically never called since it's assigned only when
nothing is connected to the stream interface. However a test has been
added to si_update() to stay safe if some foreign code decides to call
si_update() in unsafe situations.
The code to report completion after a connection update or an applet update
was almost the same since applets stole it from the connection. But the
differences made them hard to maintain and prevented the creation of new
functions doing only one part of the work.
This patch replaces the common code from the si_conn_wake_cb() and
si_applet_wake_cb() with a single call to stream_int_notify() which only
notifies the stream (si+channels+task) from the outside.
No functional change was made beyond this.
stream_int_notify() was taken from the common part between si_conn_wake_cb()
and si_applet_done(). It is designed to report activity to a stream from
outside its handler. It'll generally be used by lower layers to report I/O
completion but may also be used by remote streams if the buffer processing
is shared.
The condition to release the SI_FL_WAIT_ROOM flag was abnormally
complicated because it was inherited from 6 years ago before we used
to check for the buffer's emptiness. The CF_READ_PARTIAL flag had to be
removed, and the complex test was replaced with a simpler one checking
if *some* data were moved out or not.
The reason behind this change is to have a condition compatible with
both connections and applets, as applets currently don't work very
well in this area. Specifically, some optimizations on the applet
side cause them not to release the flag above until the buffer is
empty, which may prevent applets from taking together (eg: peers
over large haproxy buffers and small kernel buffers).
Now the call to stream_int_update() is moved to si_update(), which
is exclusively called from the stream, so that the socket layer may
be updated without updating the stream layer. This will later permit
to call it individually from other places (other tasks or applets for
example).
Now that we have a generic stream_int_update() function, we can
replace the equivalent part in stream_int_update_conn() and
stream_int_update_applet() to avoid code duplication.
There is no functional change, as the code is the same but split
in two functions for each call.
This function is designed to be called from within the stream handler to
update the channels' expiration timers and the stream interface's flags
based on the channels' flags. It needs to be called only once after the
channels' flags have settled down, and before they are cleared, though it
doesn't harm to call it as often as desired (it just slightly hurts
performance). It must not be called from outside of the stream handler,
as what it does will be used to compute the stream task's expiration.
The code was taken directly from stream_int_update_applet() and
stream_int_update_conn() which had exactly the same one except for
applet-specific or connection-specific status update.
The purpose is to separate the connection-specific parts so that the
stream-int specific one can be factored out. There's no functional
change here, only code displacement.
If an applet wakes up and causes the next one to sleep, the active list
is corrupted and cannot be scanned anymore, as the process then loops
over the next element.
In order to avoid this problem, we move the active applet list to a run
queue and reinit the active list. Only the first element of this queue
is checked, and if the element is not removed, it is removed and requeued
into the active list.
Since we're using a distinct list, if an applet wants to requeue another
applet into the active list, it properly gets added to the active list
and not to the run queue.
This stops the infinite loop issue that could be caused with Lua applets,
and in any future configuration where two applets could be attached
together.
This is not a real run queue and we're facing ugly bugs because
if this : if a an applet removes another applet from the queue,
typically the next one after itself, the list iterator loops
forever because the list's backup pointer is not valid anymore.
Before creating a run queue, let's rename this list.
The pattern match "found" fails to parse on binary type samples. The
reason is that it presents itself as an integer type which bin cannot
be cast into. We must always accept this match since it validates
anything on input regardless of the type. Let's just relax the parser
to accept it.
This problem might also exist in 1.5.
(cherry picked from commit 91cc2368a73198bddc3e140d267cce4ee08cf20e)
Due to a check between offset+len and buf->size, an empty buffer returns
"will never match". Check against tune.bufsize instead.
(cherry picked from commit 43e4039fd5d208fd9d32157d20de93d3ddf9bc0d)
The current Lua action are not registered. The executed function is
selected according with a function name writed in the HAProxy configuration.
This patch add an action registration function. The configuration mode
described above disappear.
This change make some incompatibilities with existing configuration files for
HAProxy 1.6-dev.
function 'peer_prepare_ackmsg' is designed to use the argument 'msg'
instead of 'trash.str'.
There is currently no bug because the caller passes 'trash.str' in
the 'msg' argument.
Some updates are pushed using an incremental update message after a
re-connection whereas the origin is forgotten by the peer.
These updates are never correctly acknowledged. So they are regularly
re-pushed after an idle timeout and a re-connect.
The fix consists to use an absolute update message in some cases.
If an entry is still not present in the update tree, we could miss to schedule
for a push depending of an un-initialized value (upd.key remains un-initialized
for new sessions or isn't re-initalized for reused ones).
In the same way, if an entry is present in the tree, but its update's tick
is far in the past (> 2^31). We could consider it's still scheduled even if
it is not the case.
The fix consist to force the re-scheduling of an update if it was not present in
the updates tree or if the update is not in the scheduling window of every peers.
PiBANL reported that HAProxy's DNS resolver can't "connect" its socker
on FreeBSD.
Remi Gacogne reported that we should use the function 'get_addr_len' to
get the addr structure size instead of sizeof.
Mailing list participant "mlist" reported negative conn_cur values in
stick tables as the result of "tcp-request connection track-sc". The
reason is that after the stick entry it copied from the session to the
stream, both the session and the stream grab a reference to the entry
and when the stream ends, it decrements one reference and one connection,
then the same is done for the session.
In fact this problem was already encountered slightly differently in the
past and addressed by Thierry using the patch below as it was believed by
then to be only a refcount issue since it was the observable symptom :
827752e "BUG/MEDIUM: stick-tables: refcount error after copying SC..."
In reality the problem is that the stream must touch neither the refcount
nor the connection count for entries it inherits from the session. While
we have no way to tell whether a track entry was inherited from the session
(since they're simply memcpy'd), it is possible to prevent the stream from
touching an entry that already exists in the session because that's a
guarantee that it was inherited from it.
Note that it may be a temporary fix. Maybe in the future when a session
gives birth to multiple streams we'll face a situation where a session may
be updated to add more tracked entries after instanciating some streams.
The correct long-term fix is to mark some tracked entries as shared or
private (or RO/RW). That will allow the session to track more entries
even after the same trackers are being used by early streams.
No backport is needed, this is only caused by the session/stream split in 1.6.
Pradeep Jindal reported and troubleshooted a bug causing haproxy to die
during startup on all processes not making use of a peers section. It
only happens with nbproc > 1 when peers are declared. Technically it's
when the peers task is stopped on processes that don't use it that the
crash occurred (a task_free() called on a NULL task pointer).
This only affects peers v2 in the dev branch, no backport is needed.
Trie device detection doesn't benefit from caching compared to Pattern.
As such the LRU cache has been removed from the Trie method.
A new fetch method has been added named 51d.all which uses all the
available HTTP headers for device device detection. The previous 51d
conv method has been changed to 51d.single where one HTTP header,
typically User-Agent, is used for detection. This method is marginally
faster but less accurate.
Three new properties are available with the Pattern method called
Method, Difference and Rank which provide insight into the validity of
the results returned.
A pool of worksets is used to avoid needing to create a new workset for
every request. The workset pool is thread safe ready to support a future
multi threaded version of HAProxy.
Added the definition of CHECK_HTTP_MESSAGE_FIRST and the declaration of
smp_prefetch_http to the header.
Changed smp_prefetch_http implementation to remove the static qualifier.
This patch uses the start up of the health check task to also start
the warmup task when required.
This is executed only once: when HAProxy has just started up and can
be started only if the load-server-state-from-file feature is enabled
and the server was in the warmup state before a reload occurs.
This directive gives HAProxy the ability to use the either the global
server-state-file directive or a local one using server-state-file-name to
load server states.
The state can be saved right before the reload by the init script, using
the "show servers state" command on the stats socket redirecting output into
a file.
This new global section directive is used to store the path to the file
where HAProxy will be able to retrieve server states across reloads.
The file pointed by this path is used to store a file which can contains
state of all servers from all backends.
This new global directive can be used to provide a base directory where
all the server state files could be loaded.
If a server state file name starts with a slash '/', then this directive
must not be applied.
new command 'show servers state' which dumps all variable parameters
of a server during an HAProxy process life.
Purpose is to dump current server state at current run time in order to
read them right after the reload.
The format of the output is versionned and we support version 1 for now.
When a server is disabled in the configuration using the "disabled"
keyword, a single flag is positionned: SRV_ADMF_CMAINT (use to be
SRV_ADMF_FMAINT)..
That said, when providing the first version of this code, we also
changed the SRV_ADMF_MAINT mask to match any of the possible MAINT
cases: SRV_ADMF_FMAINT, SRV_ADMF_IMAINT, SRV_ADMF_CMAINT
Since SRV_ADMF_CMAINT is never (and is not supposed to be) altered at
run time, once a server has this flag set up, it can never ever be
enabled again using the stats socket.
In order to fix this, we should:
- consider SRV_ADMF_CMAINT as a simple flag to report the state in the
old configuration file (will be used after a reload to deduce the
state of the server in a new running process)
- enabling both SRV_ADMF_CMAINT and SRV_ADMF_FMAINT when the keyword
"disabled" is in use in the configuration
- update the mask SRV_ADMF_MAINT as it was before, to only match
SRV_ADMF_FMAINT and SRV_ADMF_IMAINT.
The following patch perform the changes above.
It allows fixing the regression without breaking the way the up coming
feature (seamless server state accross reloads) is going to work.
Note: this is 1.6-only, no backport needed.
This couple of function executes securely some Lua calls outside of
the lua runtime environment. Each Lua call can return a longjmp
if it encounter a memory error.
Lua documentation extract:
If an error happens outside any protected environment, Lua calls
a panic function (see lua_atpanic) and then calls abort, thus
exiting the host application. Your panic function can avoid this
exit by never returning (e.g., doing a long jump to your own
recovery point outside Lua).
The panic function runs as if it were a message handler (see
§2.3); in particular, the error message is at the top of the
stack. However, there is no guarantee about stack space. To push
anything on the stack, the panic function must first check the
available space (see §4.2).
We must check all the Lua entry point. This includes:
- The include/proto/hlua.h exported functions
- the task wrapper function
- The action wrapper function
- The converters wrapper function
- The sample-fetch wrapper functions
It is tolerated that the initilisation function returns an abort.
Before each Lua abort, an error message is writed on stderr.
The macro SET_SAFE_LJMP initialise the longjmp. The Macro
RESET_SAFE_LJMP reset the longjmp. These function must be macro
because they must be exists in the program stack when the longjmp
is called
All the code which emits error log have the same pattern. Its:
Send log with syslog system, and if it is allowed, display error
log on screen.
This patch replace this pattern by a macro. This reduces the number
of lines.
There are two types of retries when performing a DNS resolution:
1. retry because of a timeout
2. retry of the full sequence of requests (query types failover)
Before this patch, the 'resolution->try' counter was incremented
after each send of a DNS request, which does not cover the 2 cases
above.
This patch fix this behavior.
Some DNS response may be valid from a protocol point of view but may not
contain any IP addresses.
This patch gives a new flag to the function dns_get_ip_from_response to
report such case.
It's up to the upper layer to decide what to do with this information.
Some DNS responses may be valid from a protocol point of view, but may
not contain any information considered as interested by the requester..
Purpose of the flag DNS_RESP_NO_EXPECTED_RECORD introduced by this patch is
to allow reporting such situation.
When this happens, a new DNS query is sent with a new query type.
For now, the function only expect A and AAAA query types which is enough
to cover current cases.
In a next future, it will be up to the caller to tell the function which
query types are expected.
The send_log function needs a final \n.
This bug is repported by Michael Ezzell.
Minor bug: when writing to syslog from Lua scripts, the last character from
each log entry is truncated.
core.Alert("this is truncated");
Sep 7 15:07:56 localhost haproxy[7055]: this is truncate
This issue appears to be related to the fact that send_log() (in src/log.c)
is expecting a newline at the end of the message's format string:
/*
* This function adds a header to the message and sends the syslog message
* using a printf format string. It expects an LF-terminated message.
*/
void send_log(struct proxy *p, int level, const char *format, ...)
I believe the fix would be in in src/hlua.c at line 760
<http://git.haproxy.org/?p=haproxy.git;a=blob;f=src/hlua.c;h=1e4d47c31e66c16c837ff2aa5ef577f6cafdc7e7;hb=316e3196285b89a917c7d84794ced59a6a5b4eba#l760>,
where this...
send_log(px, level, "%s", trash.str);
...should be adding a newline into the format string to accommodate what
the code expects.
send_log(px, level, "%s\n", trash.str);
This change provides what seems to be the correct behavior:
Sep 7 15:08:30 localhost haproxy[7150]: this is truncated
All other uses of send_log() in hlua.c have a trailing dot "." in the
message that is masking the truncation issue because the output message
stops on a clean word boundary. I suspect these would also benefit from
"\n" appended to their format strings as well, since this appears to be the
pattern seen throughout the rest of the code base.
Reported-by: Michael Ezzell <michael@ezzell.net>
The server's host name picked for resolution was incorrect, it did not
skip the address family specifier, did not resolve environment variables,
and messed up with the optional trailing colon.
Instead, let's get the fqdn returned by str2sa_range() and use that
exclusively.
If an environment variable is used in an address, and is not set, it's
silently considered as ":" or "0.0.0.0:0" which is not correct as it
can hide environment issues and lead to unexpected behaviours. Let's
report this case when it happens.
This fix should be backported to 1.5.
The function does a bunch of things among which resolving environment
variables, skipping address family specifiers and trimming port ranges.
It is the only one which sees the complete host name before trying to
resolve it. The DNS resolving code needs to know the original hostname,
so we modify this function to optionally provide it to the caller.
Note that the function itself doesn't know if the host part was a host
or an address, but str2ip() knows that and can be asked not to try to
resolve. So we first try to parse the address without resolving and
try again with resolving enabled. This way we know if the address is
explicit or needs some kind of resolution.
In the first version of the DNS resolver, HAProxy sends an ANY query
type and in case of issue fails over to the type pointed by the
directive in 'resolve-prefer'.
This patch allows the following new failover management:
1. default query type is still ANY
2. if response is truncated or in error because ANY is not supported by
the server, then a fail over to a new query type is performed. The
new query type is the one pointed by the directive 'resolve-prefer'.
3. if no response or still some errors occurs, then a query type fail over
is performed to the remaining IP address family.
First dns client implementation simply ignored most of DNS response
flags.
This patch changes the way the flags are parsed, using bit masks and
also take care of truncated responses.
Such response are reported to the above layer which can handle it
properly.
This patch introduces a new internal response state about the analysis
of a DNS response received by a server.
It is dedicated to report to above layer that the response is
'truncated'.
This patch updates the dns_nameserver structure to integrate a counter
dedicated to 'truncated' response sent by servers.
Such response are important to track, since HAProxy is supposed to
replay its request.
Under certain circonstance (a configuration with many servers relying on
DNS resolution and one of them triggering the replay of a request
because of a timeout or invalid response to an ANY query), HAProxy could
end up in an infinite loop over the currently supposed running DNS
queries.
This was caused because the FIFO list of running queries was improperly
updated in snr_resolution_error_cb. The head of the list was removed
instead of the resolution in error, when moving the resolution to the
end of the list.
In the mean time, a LIST_DEL statement is removed since useless. This
action is already performed by the dns_reset_resolution function.
Patch f046f11561 introduced a regression:
DNS resolution doesn't start anymore, while it was supposed to make it
start with first health check.
Current patch fix this issue by triggering a new DNS resolution if the
last_resolution time is not set.
A crash was reported when using the "famous" http-send-name-header
directive. This time it's a bit tricky, it requires a certain number of
conditions to be met including maxconn on a server, queuing, timeout in
the queue and cookie-based persistence.
The problem is that in stream.c, before calling http_send_name_header(),
we check a number of conditions to know if we have to replace the header
name. But prior to reaching this place, it's possible for
sess_update_stream_int() to fail and change the stream-int's state to
SI_ST_CLO, send an error 503 to the client, and flush all buffers. But
http_send_name_header() can only be called with valid buffer contents
matching the http_msg's description. So when it rewinds the stream to
modify the header, buf->o becomes negative by the size of the incoming
request and is used as the argument to memmove() which basically
displaces 4GB of memory off a few bytes to write the new name, resulting
in a core and a core file that's really not fun to play with.
The solution obviously consists in refraining from calling this nasty
function when the stream interface is already closed.
This bug also affects 1.5 and possibly 1.4, so the fix must be backported
there.
See commit id bdc97a8795
Michael Ezzell reported that the following Lua code fails in
dev4 when the TCP is not established immediately (due to a little
bit of latency):
function tricky_socket()
local sock = core.tcp();
sock:settimeout(3);
core.log(core.alert,"calling connect()\n");
local connected, con_err = sock:connect("x.x.x.x",80);
core.log(core.alert,"returned from connect()\n");
if con_err ~= nil then
core.log(core.alert,"connect() failed with error: '" .. con_err .. "'\n");
end
The problem is that the flags who want to wake up the applet are
resetted before each applet call, so the applet must set again the
flags if the connection is not established.
When converting the "method" fetch to a string, we used to get an empty
string if the first character was not an upper case. This was caused by
the lookup function which returns HTTP_METH_NONE when a lookup is not
possible, and this method being mapped to an empty string in the array.
This is a totally stupid mechanism, there's no reason for having the
result depend on the first char. In fact the message parser already
checks that the syntax matches an HTTP token so we can only land there
with a valid token, hence only HTTP_METH_OTHER should be returned.
This fix should be backported to all actively supported branches.
Before this patch, two type of custom actions exists: ACT_ACTION_CONT and
ACT_ACTION_STOP. ACT_ACTION_CONT is a non terminal action and ACT_ACTION_STOP is
a terminal action.
Note that ACT_ACTION_STOP is not used in HAProxy.
This patch remove this behavior. Only type type of custom action exists, and it
is called ACT_CUSTOM. Now, the custion action can return a code indicating the
required behavior. ACT_RET_CONT wants that HAProxy continue the current rule
list evaluation, and ACT_RET_STOP wants that HAPRoxy stops the the current rule
list evaluation.
Jesse Hathaway reported a crash that Cyril Bonté diagnosed as being
caused by the manipulation of srv_conn after setting it to NULL. This
happens in http-server-close mode when the server returns either a 401
or a 407, because the connection was previously closed then it's being
assigned the CO_FL_PRIVATE flag.
This bug only affects 1.6-dev as it was introduced by connection reuse code
with commit 387ebf8 ("MINOR: connection: add a new flag CO_FL_PRIVATE").
First DNS resolution is supposed to be triggered by first health check,
which is not the case with current code.
This patch fixes this behavior by setting the
resolution->last_resolution time to 0 instead of now_ms when parsing
server's configuration at startup.
When called from an http ruleset, txn:done() can still crash the process
because it closes the stream without consuming pending data resulting in
the transaction's buffer representation to differ from the real buffer.
This patch also adjusts the transaction's state to indicate that it's
closed to be consistent with what's already done in redirect rules.
When the Lua execution flow endswith the command done (core.done or txn.done())
an error is detourned, and the stack is no longer usable. This patch juste
reinitilize the stack if this case is detected.
The function txn:close() must be terminal because it demands the session
destruction. This patch renames this function to "done()" to be much
clearer about the fact that it is a final operation.
This patch is inspired by Bowen Ni's proposal and it is based on his first
implementation:
With Lua integration in HAProxy 1.6, one can change the request method,
path, uri, header, response header etc except response line.
I'd like to contribute the following methods to allow modification of the
response line.
[...]
There are two new keywords in 'http-response' that allows you to rewrite
them in the native HAProxy config. There are also two new APIs in Lua that
allows you to do the same rewriting in your Lua script.
Example:
Use it in HAProxy config:
*http-response set-code 404*
Or use it in Lua script:
*txn.http:res_set_reason("Redirect")*
I dont take the full patch because the manipulation of the "reason" is useless.
standard reason are associated with each returned code, and unknown code can
take generic reason.
So, this patch can set the status code, and the reason is automatically adapted.
The function dont remove remaineing analysers and dont update response
channel timeout.
The fix is a copy of the behavior of the functions http_apply_redirect_rule()
and stream_int_retnclose().
Tsvetan Tsvetanov reported that the following Lua code fails in
dev2 and dev3 :
function hello(txn)
local request_msg = txn.req:dup()
local tsm_sock = core.tcp()
tsm_sock:connect("127.0.0.1", 7777)
local res = tsm_sock:send(request_msg)
local response = tsm_sock:receive('*l')
txn.res:send(response)
txn:close()
end
Thierry diagnosed that it was caused by commit 563cc37 ("MAJOR: stream:
use a regular ->update for all stream interfaces"). It broke lua's
ability to establish outgoing connections.
The reason is that the applet used to be notified about established
connections just after the stream analyser loop, and that's not the
case anymore. In peers, this issue didn't happen because peers use
a handshake so after sending data, the response is received and wakes
the applet up again. Here we have to indicate that we want to send or
receive data, this will cause the notification to happen as soon as
the connection is ready. This is similar to pretending that we're
working on a full buffer after all. In theory subscribing for reads
is not needed, but it's added here for completeness.
Reported-By: Tsvetan Tsvetanov <cpi.cecko@gmail.com>
The table definition message id was used instead of the update acknowledgement id.
This bug causes a malformated message and a protocol error and breaks the
connection.
After that, the updates remain unacknowledged.
This was the first transparent proxy technology supported by haproxy
circa 2005 but it was obsoleted in 2007 by Tproxy 4.0 which removed a
lot of the earlier versions' shortcomings and was finally merged into
the kernel. Since nobody has been using cttproxy for many years now
and nobody has even just tried to compile the files, it's time to
remove it. The doc was updated as well.
This patch removes the special stick tables types names and
use the standard sample type names. This avoid the maintainance
of two types and remove the switch/case for matching a sample
type for each stick table type.
This patch is the first step for sample integration. Actually
the stick tables uses her own data type, and some converters
must be called to convert sample type to stick-tables types.
This patch removes the stick-table types and replace it by
the sample types. This prevent:
- Maintenance of two types of converters
- reduce the code using the samples converters
Now the prototype for each action from each section are the same, and
a discriminant for determining for each section we are called are added.
So, this patch removes the wrappers for the action functions called from
more than one section.
This patch removes 132 lines of useless code.
This patch normalize the return code of the configuration parsers. Before
these changes, the tcp action parser returned -1 if fail and 0 for the
succes. The http action returned 0 if fail and 1 if succes.
The normalisation does:
- ACT_RET_PRS_OK for succes
- ACT_RET_PRS_ERR for failure
Each (http|tcp)-(request|response) action use the same method
for looking up the action keyword during the cofiguration parsing.
This patch mutualize the code.
This patch merges the conguration keyword struct. Each declared configuration
keyword struct are similar with the others. This patch simplify the code.
Action function can return 3 status:
- error if the action encounter fatal error (like out of memory)
- yield if the action must terminate his work later
- continue in other cases
For performances considerations, some actions are not processed by remote
function. They are directly processed by the function. Some of these actions
does the same things but for different processing part (request / response).
This patch give the same name for the same actions, and change the normalization
of the other actions names.
This patch is ONLY a rename, it doesn't modify the code.
This patch group the action name in one file. Some action are called
many times and need an action embedded in the action caller. The main
goal is to have only one header file grouping all definitions.
This mark permit to detect if the action tag is over the allowed range.
- Normally, this case doesn't appear
- If it appears, it is processed by ded fault case of the switch
This patch removes the generic opaque type for storing the configuration of the
acion "set-src" (HTTP_REQ_ACT_SET_SRC), and use the dedicated type "struct expr"
The (http|tcp)-(request|response) action rules use common
opaque type. For the HAProxy embbedded feature, types are know,
it better to add this types in the action union and use it.
The (http|tcp)-(request|response) action rules use common
opaque type. For the HAProxy embbedded feature, types are know,
it better to add this types in the action union and use it.
This patch is the first of a serie which merge all the action structs. The
function "tcp-request content", "tcp-response-content", "http-request" and
"http-response" have the same values and the same process for some defined
actions, but the struct and the prototype of the declared function are
different.
This patch try to unify all of these entries.
A map can store and return various types as output. The only one example is the
IPv4 and IPv6 types. The previous patch remove the type from the sample storage
struct and use the conoverter output type, expecting that all entries of the
map have the same type.
This will be wrong when the maps will support both IPv4 and IPv6 as output.
With the difference between the "struct sample" data and the
"struct sample_storage" data, it was not possible to write
data = data, and we did a memcpy. This patch remove some of
these memcpy.
I can't test this patch because the avalaible 51degrees library is
"51Degrees-C-3.1.5.2" and HAProxy obviously build with another version
(some defines and symbols disappear).
This patch is provided as-is in best effort.
The union name "data" is a little bit heavy while we read the source
code because we can read "data.data.sint". The rename from "data" to "u"
makes the read easiest like "data.u.sint".
This patch remove the struct information stored both in the struct
sample_data and in the striuct sample. Now, only thestruct sample_data
contains data, and the struct sample use the struct sample_data for storing
his own data.
During 1.5-dev20 there was some code refactoring to make the src_* fetch
function use the same code as sc_*. Unfortunately this introduced a
regression where src_* doesn't create an entry anymore if it does not
exist in the table. The reason is that smp_fetch_sc_stkctr() only calls
stktable_lookup_key() while src_inc_*/src_clr_* used to make use of
stktable_update_key() which additionally create the entry if it does
not exist.
There's no point modifying the common function for these two exceptions,
so instead we now have a function dedicated to the creation of this entry
for src_* only. It is called when the entry didn't exist, so that requires
minimal modifications to existing code.
Thanks to Thierry Fournier for helping diagnose the issue.
This fix must be backported to 1.5.
The newly created server flag SRV_ADMF_CMAINT means that the server is
in 'disabled' mode because of configuration statement 'disabled'.
The flag SRV_ADMF_FMAINT should not be set anymore in such case and is
reserved only when the server is Forced in maintenance mode from the
stats socket.
During the processing of tcp-request connection, the stream doesn't exists, so the
stick counters are stored in the session. When the stream is created it must
inherit from the session sc.
This patch fix this behavior.
[WT: this is specific to 1.6, no backport needed]
Next patch will remove sample_storage->type, and the only user is the
"show map" feature on the CLI which can use the map's output type instead.
Let's do that first.
The RFC4291 says that when the IPv6 adress have the followin form:
0000::ffff:a.b.c.d, if can be converted to an IPv4 adress. This patch
enable this conversion in casts.
As the sint can be casted as ipv4, and ipv4 can be casted as ipv6, we
can directly cast sint as ipv6 using the RFC4291.
Some function are just a wrappers. This patch reduce the size of
this wrapper for improving the readability. One check is moved
from the wrapper to the main function, and some middle vars are
removed.
Commit c6678e2 ("MEDIUM: config: authorize frontend and listen without bind")
completely removed the test for bind lines in frontends in order to make it
easier for automated tools to generate configs (eg: replacing a bind with
another one passing via a temporary config without any bind line). The
problem is that some common mistakes are totally hidden now. For example,
this apparently valid entry is silently ignored :
listen 1.2.3.4:8000
server s1 127.0.0.1:8000
Hint: 1.2.3.4:8000 is mistakenly the proxy name here.
Thus instead we now emit a warning to indicate that a frontend was found
with no listener. This should be backported to 1.5 to help spot abnormal
configurations.
appsessions started to be deprecated with the introduction of stick
tables, and the latter are much more powerful and flexible, and in
addition they are replicated between nodes and maintained across
reloads. Let's now remove appsession completely.
test conf:
global
tune.lua.session-timeout 0
lua-load lol.lua
debug
maxconn 4096
listen test
bind 0.0.0.0:10010
mode tcp
tcp-request content lua act_test
balance roundrobin
server test 127.0.0.1:3304
lua test:
function act_test(txn)
while true do
core.Alert("TEST")
end
end
The function "act_test()" is not executed because a zero timeout is not
considered as TICK_ETERNITY, but is considered as 0.
This path fix this behavior. This is the same problem than the bugfix
685c014e99.
I've been trying out 1.6 dev3 with lua support, and trying to start
lua tasks seems to not be working.
Using this configuration
global
lua-load /lua/lol.lua
debug
maxconn 4096
backend shard_b
server db01 mysql_shard_b:3306
backend shard_a
server db01 mysql_shard_a:3306
listen mysql-cluster
bind 0.0.0.0:8001
mode tcp
balance roundrobin
use_backend shard_b
And this lua function
core.register_task(function()
while true do
core.Alert("LOLOLOLOLOL")
end
end)
I'd always get a timeout error starting the registered function.
The problem lies as far as I can tell in the fact that is possible for
now_ms to not change (is this maybe a problem on my config/system?)
until the expiration check happens, in the resume function that
actually kickstarts the lua task, making HAProxy think that expiration
time for the task is up, if I understand correctly tasks are meant to
never really timeout.
Since sample fetches are not always available in the response phase,
this patch implements %HQ such that:
GET /foo?bar=baz HTTP/1.0
...would be logged as:
?bar=baz
In some cases, parsing of the DNS response is broken and the response is
considered as invalid, despite being valid.
The current patch fixes this issue. It's a temporary solution until I
rework the response parsing to store the response buffer into a real DNS
packet structure.
This patch adds a few checks on "global._51degrees.data_file_path" and allows
haproxy to start even when the pattern or trie data file is not specified.
If the "51d" converter is used, a new function "_51d_conv_check" will check
"global._51degrees.data_file_path" and displays a warning if necessary.
In src/haproxy.c, the global 51Degrees "cache_size" has moved outside of the
FIFTYONEDEGREES_H_PATTERN_INCLUDED ifdef block.
This strategy is less extreme than "always", it only dispatches first
requests to validated reused connections, and moves a connection from
the idle list to the safe list once it has seen a second request, thus
proving that it could be reused.
These ones are considered safe as they have already been reused.
They will be useful in "aggressive" and "always" http-reuse modes
in order to place the first request of a connection with the least
risk.
The "safe" mode consists in picking existing connections only when
processing a request that's not the first one from a connection. This
ensures that in case where the server finally times out and closes, the
client can decide to replay idempotent requests.
Now instead of closing the existing connection attached to the
stream interface, we first check if the one we pick was attached to
another stream interface, in which case the connections are swapped
if possible (eg: if the current connection is not private). That way
the previous connection remains attached to an existing session and
significantly increases the chances of being reused.
In connect_server(), if we don't have a connection attached to the
stream-int, we first look into the server's idle_conns list and we
pick the first one there, we detach it from its owner if it had one.
If we used to have a connection, we close it.
This mechanism works well but doesn't scale : as servers increase,
the likeliness that the connection attached to the stream interface
doesn't match the server and gets closed increases.
For now it only supports "never", meaning that we never want to reuse a
shared connection, and "always", meaning that we can use any connection
that was not marked private. When "never" is set, this also implies that
no idle connection may become a shared one.
This flag is set on an outgoing connection when this connection gets
some properties that must not be shared with other connections, such
as dynamic transparent source binding, SNI or a proxy protocol header,
or an authentication challenge from the server. This will be needed
later to implement connection reuse.
This function is now dedicated to idle connections only, which means
that it must not be used without any endpoint nor anything not a
connection. The connection remains attached to the stream interface.
For now it's not populated but we have the list entry. It will carry
all idle connections that sessions don't want to share. They may be
used later to reclaim connections upon socket shortage for example.
Since we now always call this function with the reuse parameter cleared,
let's simplify the function's logic as it cannot return the existing
connection anymore. The savings on this inline function are appreciable
(240 bytes) :
$ size haproxy.old haproxy.new
text data bss dec hex filename
1020383 40816 36928 1098127 10c18f haproxy.old
1020143 40816 36928 1097887 10c09f haproxy.new
connect_server() already does most of the check that is done again in
si_alloc_conn(), so let's simply reuse the existing connection instead
of calling the function again. It will also simplify the connection
reuse.
Indeed, for reuse to be set, it also requires srv_conn to be valid. In the
end, the only situation where we have to release the existing connection
and allocate a new one is when reuse == 0.
objt_server() is called multiple times at various places while some
places already make use of srv for this. Let's move the call at the
top of the function and use it all over the place.
Currently it is possible for the current_rule field to be evaluated before
being set, leading to valgrind complaining:
==16783== Conditional jump or move depends on uninitialised value(s)
==16783== at 0x44E662: http_res_get_intercept_rule (proto_http.c:3730)
==16783== by 0x44E662: http_process_res_common (proto_http.c:6528)
==16783== by 0x4797B7: process_stream (stream.c:1851)
==16783== by 0x414634: process_runnable_tasks (task.c:238)
==16783== by 0x40B02F: run_poll_loop (haproxy.c:1528)
==16783== by 0x407F25: main (haproxy.c:1887)
This was introduced by commit 152b81e7b2.
Jan A. Bruder reported that some very specific hostnames on server
lines were causing haproxy to crash on startup. Given that hist
backtrace showed some heap corruption, it was obvious there was an
overflow somewhere. The bug in fact is a typo in dns_str_to_dn_label()
which mistakenly copies one extra byte from the host name into the
output value, thus effectively corrupting the structure.
The bug triggers while parsing the next server of similar length
after the corruption, which generally triggers at config time but
could theorically crash at any moment during runtime depending on
what malloc sizes are needed next. This is why it's tagged major.
No backport is needed, this bug was introduced in 1.6-dev2.
This patch allow the existing operators to take a variable as parameter.
This is useful to add the content of two variables. This patch modify
the behavior of operators.
This patch check calculus for overflow and returns capped values.
This permits to protect against integer overflow in certain operations
involving ratios, percentages, limits or anything. That can sometimes
be critically important with some operations (eg: content-length < X).
This patch removes the 32 bits unsigned integer and the 32 bit signed
integer. It replaces these types by a unique type 64 bit signed.
This makes easy the usage of integer and clarify signed and unsigned use.
With the previous version, signed and unsigned are used ones in place of
others, and sometimes the converter loose the sign. For example, divisions
are processed with "unsigned", if one entry is negative, the result is
wrong.
Note that the integer pattern matching and dotted version pattern matching
are already working with signed 64 bits integer values.
There is one user-visible change : the "uint()" and "sint()" sample fetch
functions which used to return a constant integer have been replaced with
a new more natural, unified "int()" function. These functions were only
introduced in the latest 1.6-dev2 so there's no impact on regular
deployments.
This patch adds 3 functions for 64 bit integer conversion.
* lltoa_r : converts signed 64 bit integer to string
* read_uint64 : converts from string to signed 64 bits integer with capping
* read_int64 : converts from string to unsigned 64 bits integer with capping
This patch introduces three new functions which can be used to find a
server in a farm using different server information:
- server unique id (srv->puid)
- server name
- find best match using either name or unique id
When performing best matching, the following applies:
- use the server name first (if provided)
- use the server id if provided
in any case, the function can update the caller about mismatches
encountered.
This flag aims at reporting whether the server unique id (srv->puid) has
been forced by the administrator in HAProxy's configuration.
If not set, it means HAProxy has generated automatically the server's
unique id.
function proxy_find_best_match can update the caller by updating an int
provided in argument.
For now, proxy_find_best_match hardcode bit values 0x01, 0x02 and 0x04,
which is not understandable when reading a code exploiting them.
This patch defines 3 macros with a more explicit wording, so further
reading of a code exploiting the magic bit values will be understandable
more easily.
The man said that gmtime() and localtime() can return a NULL value.
This is not tested. It appears that all the values of a 32 bit integer
are valid, but it is better to check the return of these functions.
However, if the integer move from 32 bits to 64 bits, some 64 values
can be unsupported.
Madison May reported that the timeout applied by the default
configuration is inproperly set up.
This patch fix this:
- hold valid default to 10s
- timeout retry default to 1s
The new "sni" server directive takes a sample fetch expression and
uses its return value as a hostname sent as the TLS SNI extension.
A typical use case consists in forwarding the front connection's SNI
value to the server in a bridged HTTPS forwarder :
sni ssl_fc_sni
ssl_sock_set_servername() is used to set the SNI hostname on an
outgoing connection. This function comes from code originally
provided by Christopher Faulet of Qualys.
When the HTTP forwarder is used, it resets msg->sov so that we know that
the parsing pointer has advanced by exactly (msg->eoh + msg->eol - msg->sov)
bytes which may have to be rewound in case we want to perform an HTTP fetch
after forwarding has started (eg: upon connect).
But when the backend is in TCP mode, there may be no HTTP forwarding
analyser installed, still we may want to perform these HTTP fetches in
case we have already ensured at the TCP layer that we have a properly
parsed HTTP transaction.
In order to solve this, we reset msg->sov before doing a channel_forward()
so that we can still compute http_rewind() on the pending data. That ensures
the buffer is always rewindable even in mixed TCP+HTTP mode.
ARGC_CAP was not added to fmt_directives() which is used to format
error messages when failing to parse log format expressions. The
whole switch/case has been reorganized to match the declaration
order making it easier to spot missing values. The default is not
the "log" directive anymore but "undefined" asking to report the
bug.
Backport to 1.5 is not strictly needed but is desirable at least
for code sanity.
Clients that support ECC cipher suites SHOULD send the specified extension
within the SSL ClientHello message according to RFC4492, section 5.1. We
can use this extension to chain-proxy requests so that, on the same IP
address, a ECC compatible clients gets an EC certificate and a non-ECC
compatible client gets a regular RSA certificate. The main advantage of this
approach compared to the one presented by Dave Zhu on the mailing list
is that we can make it work with OpenSSL versions before 1.0.2.
Example:
frontend ssl-relay
mode tcp
bind 0.0.0.0:443
use_backend ssl-ecc if { req.ssl_ec_ext 1 }
default_backend ssl-rsa
backend ssl-ecc
mode tcp
server ecc unix@/var/run/haproxy_ssl_ecc.sock send-proxy-v2 check
backend ssl-rsa
mode tcp
server rsa unix@/var/run/haproxy_ssl_rsa.sock send-proxy-v2 check
listen all-ssl
bind unix@/var/run/haproxy_ssl_ecc.sock accept-proxy ssl crt /usr/local/haproxy/ecc.foo.com.pem user nobody
bind unix@/var/run/haproxy_ssl_rsa.sock accept-proxy ssl crt /usr/local/haproxy/www.foo.com.pem user nobody
Signed-off-by: Nenad Merdanovic <nmerdan@anine.io>
The current method of retrieving the incoming connection's destination
address to hash it is not compatible with IPv6 nor the proxy protocol
because it directly tries to get an IPv4 address from the socket. Instead
we must ask the connection. This is only used when no SNI is provided.
In src/51d.c, the function _51d_conv(), a final '\0' is added into
smp->data.str.str, which can cause a problem if the SMP_F_CONST flag is
set in smp->flags or if smp->data.str.size is not available.
This patch adds a check on smp->flags and smp->data.str.size, and copies
the smp->data.str.str to another buffer by using smp_dup(). If necessary,
the "const" flag is set after device detection. Also, this patch removes
the unnecessary call to chunk_reset() on temp argument.
This option enables overriding source IP address in a HTTP request. It is
useful when we want to set custom source IP (e.g. front proxy rewrites address,
but provides the correct one in headers) or we wan't to mask source IP address
for privacy or compliance.
It acts on any expression which produces correct IP address.
This modification makes possible to use sample_fetch_string() in more places,
where we might need to fetch sample values which are not plain strings. This
way we don't need to fetch string, and convert it into another type afterwards.
When using aliased types, the caller should explicitly check which exact type
was returned (e.g. SMP_T_IPV4 or SMP_T_IPV6 for SMP_T_ADDR).
All usages of sample_fetch_string() are converted to use new function.
Compression stats were not easy to read and could be confusing because
the saving ratio could be taken for global savings while it was only
relative to compressible input. Let's make that a bit clearer using
the new tooltips with a bit more details and also report the effective
ratio over all output bytes.
Commit cc87a11 ("MEDIUM: tcp: add register keyword system.") broke the
TCP ruleset by merging custom rules and accept. It was fixed a first time
by commit e91ffd0 ("BUG/MAJOR: tcp: only call registered actions when
they're registered") but the accept action still didn't work anymore
and was causing the matching rule to simply be ignored.
Since the code introduced a very fragile behaviour by not even mentionning
that accept and custom were silently merged, let's fix this once for all by
adding an explicit check for the accept action. Nevertheless, as previously
mentionned, the action should be changed so that custom is the only action
and the continue vs break indication directly comes from the callee.
No backport is needed, this bug only affects 1.6-dev.
Until now, the code assumed that it can get the offset to the first TLV
header just by subtracting the length of the TLV part from the length of
the complete buffer. However, if the buffer contains actual data after
the header, this computation is flawed and leads to haproxy trying to
parse TLV headers from the proxied data.
This change fixes this by making sure that the offset to the first TLV
header is calculated based from the start of the buffer -- simply by
adding the size of the proxy protocol v2 header plus the address
family-dependent size of the address information block.
The function buffer_slow_realign() was initially designed for requests
only and did not consider pending outgoing data. This causes a problem
when called on responses where data remain in the buffer, which may
happen with pipelined requests when the client is slow to read data.
The user-visible effect is that if less than <maxrewrite> bytes are
present in the buffer from a previous response and these bytes cross
the <maxrewrite> boundary close to the end of the buffer, then a new
response will cause a realign and will destroy these pending data and
move the pointer to what's believed to contain pending output data.
Thus the client receives the crap that lies in the buffer instead of
the original output bytes.
This new implementation now properly realigns everything including the
outgoing data which are moved to the end of the buffer while the input
data are moved to the beginning.
This implementation still uses a buffer-to-buffer copy which is not
optimal in terms of performance and which should be replaced by a
buffer switch later.
Prior to this patch, the following script would return different hashes
on each round when run from a 100 Mbps-connected machine :
i=0
while usleep 100000; do
echo round $((i++))
set -- $(nc6 0 8001 < 1kreq5k.txt | grep -v '^[0-9A-Z]' | md5sum)
if [ "$1" != "3861afbb6566cd48740ce01edc426020" ]; then echo $1;break;fi
done
The file contains 1000 times this request with "Connection: close" on the
last one :
GET /?s=5k&R=1 HTTP/1.1
The config is very simple :
global
tune.bufsize 16384
tune.maxrewrite 8192
defaults
mode http
timeout client 10s
timeout server 5s
timeout connect 3s
listen px
bind :8001
option http-server-close
server s1 127.0.0.1:8000
And httpterm-1.7.2 is used as the server on port 8000.
After the fix, 1 million requests were sent and all returned the same
contents.
Many thanks to Charlie Smurthwaite of atechmedia.com for his precious
help on this issue, which would not have been diagnosed without his
very detailed traces and numerous tests.
The patch must be backported to 1.5 which is where the bug was introduced.
This is in order to avoid conflicting with NetBSD popcount* functions
since 6.x release, the final l to mentions the argument is a long like
NetBSD does.
This patch could be backported to 1.5 to fix the build issue there as well.
This cache is used by 51d converter. The input User-Agent string, the
converter args and a random seed are used as a hashing key. The cached
entries contains a pointer to the resulting string for specific
User-Agent string detection.
The cache size can be tuned using 51degrees-cache-size parameter.
Moved 51Degrees code from src/haproxy.c, src/sample.c and src/cfgparse.c
into a separate files src/51d.c and include/import/51d.h.
Added two new functions init_51degrees() and deinit_51degrees(), updated
Makefile and other code reorganizations related to 51Degrees.
Commit 4834bc7 ("MEDIUM: vars: adds support of variables") brought a bug.
Setting a variable from an expression that doesn't resolve infinitely
blocks the processing.
The internal actions API must be changed to let the caller pass the various
flags regarding the state of the analysis (SMP_OPT_FINAL).
For now we only fix the issue by making the action_store() function always
return 1 to prevent any blocking.
No backport is needed.
We'll need to move the session variables to the session. For this, the
accounting must not depend on the stream. Instead we pass the pointers
to the different lists.
Commit 7810ad7 ("BUG/MAJOR: lru: fix unconditional call to free due to
unexpected semi-colon") was not enough, it happens that the free() is
not performed at the right place because if the evicted node is recycled,
we must also release its data before it gets overwritten.
No backport is needed.
Dmitry Sivachenko reported the following build warning using Clang, which
is a real bug :
src/log.c:1538:22: warning: use of logical '&&' with constant operand
[-Wconstant-logical-operand]
if (tmp->options && LOG_OPT_QUOTE)
^ ~~~~~~~~~~~~~
The effect is that recent log tags related to HTTP method, path, uri,
query have a bug making them always use quotes.
This bug was introduced in 1.6-dev2 with commit 0ebc55f ("MEDIUM: logs:
Add HTTP request-line log format directives"), so no backport is needed.
Dmitry Sivachenko reported the following build warning using Clang, which
is a real bug :
src/lru.c:133:32: warning: if statement has empty body [-Wempty-body]
if (old->data && old->free);
^
It results in calling old->free(old->data) even when old->free is NULL,
hence crashing on cached patterns.
The same bug appears a few lines below in lru64_destroy() :
src/lru.c:195:33: warning: if statement has empty body [-Wempty-body]
if (elem->data && elem->free);
^
Both were introduced in 1.6-dev2 with commit f90ac55 ("MINOR: lru: Add the
possibility to free data when an item is removed"), so no backport is needed.
Dmitry Sivachenko reported the following harmless build warning using Clang :
src/dumpstats.c:5196:48: warning: address of array 'strm_li(sess)->proto->name'
will always evaluate to 'true' [-Wpointer-bool-conversion]
...strm_li(sess) && strm_li(sess)->proto->name ? strm_li(sess)->proto->nam...
~~ ~~~~~~~~~~~~~~~~~~~~~~^~~~
proto->name cannot be null here as it's the protocol name which is stored
directly in the structure.
The same case is present in 1.5 though the code changed.
Dmitry Sivachenko reported the following build warning using Clang,
though it's harmless :
src/hlua.c:1911:13: warning: variable '_socket_info_expanded_form' is not needed
and will not be emitted [-Wunneeded-internal-declaration]
static char _socket_info_expanded_form[] = SOCKET_INFO_EXPANDED_FORM;
^
Indeed, the variable is not used except to compute a sizeof which is
taken from the string it is initialized from. It probably is a leftover
after various code refactorings. Let's get rid of it now since it's not
used anymore.
No backport is needed.
Dmitry Sivachenko reported the following build warning using Clang
which is a real bug :
src/ssl_sock.c:4104:44: warning: address of 'smp->data.str.len' will always
evaluate to 'true' [-Wpointer-bool-conversion]
if (!smp->data.str.str || !&smp->data.str.len)
The impact is very low however, it will return an empty session_id
instead of no session id when none is found.
The fix should be backported to 1.5.
Released version 1.6-dev2 with the following main changes :
- BUG/MINOR: ssl: Display correct filename in error message
- MEDIUM: logs: Add HTTP request-line log format directives
- BUG/MEDIUM: check: tcpcheck regression introduced by e16c1b3f
- BUG/MINOR: check: fix tcpcheck error message
- MINOR: use an int instead of calling tcpcheck_get_step_id
- MINOR: tcpcheck_rule structure update
- MINOR: include comment in tcpcheck error log
- DOC: tcpcheck comment documentation
- MEDIUM: server: add support for changing a server's address
- MEDIUM: server: change server ip address from stats socket
- MEDIUM: protocol: add minimalist UDP protocol client
- MEDIUM: dns: implement a DNS resolver
- MAJOR: server: add DNS-based server name resolution
- DOC: server name resolution + proto DNS
- MINOR: dns: add DNS statistics
- MEDIUM: http: configurable http result codes for http-request deny
- BUILD: Compile clean when debug options defined
- MINOR: lru: Add the possibility to free data when an item is removed
- MINOR: lru: Add lru64_lookup function
- MEDIUM: ssl: Add options to forge SSL certificates
- MINOR: ssl: Export functions to manipulate generated certificates
- MEDIUM: config: add DeviceAtlas global keywords
- MEDIUM: global: add the DeviceAtlas required elements to struct global
- MEDIUM: sample: add the da-csv converter
- MEDIUM: init: DeviceAtlas initialization
- BUILD: Makefile: add options to build with DeviceAtlas
- DOC: README: explain how to build with DeviceAtlas
- BUG/MEDIUM: http: fix the url_param fetch
- BUG/MEDIUM: init: segfault if global._51d_property_names is not initialized
- MAJOR: peers: peers protocol version 2.0
- MINOR: peers: avoid re-scheduling of pending stick-table's updates still not pushed.
- MEDIUM: peers: re-schedule stick-table's entry for sync when data is modified.
- MEDIUM: peers: support of any stick-table data-types for sync
- BUG/MAJOR: sample: regression on sample cast to stick table types.
- CLEANUP: deinit: remove codes for cleaning p->block_rules
- DOC: Fix L4TOUT typo in documentation
- DOC: set-log-level in Logging section preamble
- BUG/MEDIUM: compat: fix segfault on FreeBSD
- MEDIUM: check: include server address and port in the send-state header
- MEDIUM: backend: Allow redispatch on retry intervals
- MINOR: Add TLS ticket keys reference and use it in the listener struct
- MEDIUM: Add support for updating TLS ticket keys via socket
- DOC: Document new socket commands "show tls-keys" and "set ssl tls-key"
- MINOR: Add sample fetch which identifies if the SSL session has been resumed
- DOC: Update doc about weight, act and bck fields in the statistics
- BUG/MEDIUM: ssl: fix tune.ssl.default-dh-param value being overwritten
- MINOR: ssl: add a destructor to free allocated SSL ressources
- MEDIUM: ssl: add the possibility to use a global DH parameters file
- MEDIUM: ssl: replace standards DH groups with custom ones
- MEDIUM: stats: Add enum srv_stats_state
- MEDIUM: stats: Separate server state and colour in stats
- MEDIUM: stats: Only report drain state in stats if server has SRV_ADMF_DRAIN set
- MEDIUM: stats: Differentiate between DRAIN and DRAIN (agent)
- MEDIUM: Lower priority of email alerts for log-health-checks messages
- MEDIUM: Send email alerts when servers are marked as UP or enter the drain state
- MEDIUM: Document when email-alerts are sent
- BUG/MEDIUM: lua: bad argument number in analyser and in error message
- MEDIUM: lua: automatically converts strings in proxy, tables, server and ip
- BUG/MINOR: utf8: remove compilator warning
- MEDIUM: map: uses HAProxy facilities to store default value
- BUG/MINOR: lua: error in detection of mandatory arguments
- BUG/MINOR: lua: set current proxy as default value if it is possible
- BUG/MEDIUM: http: the action set-{method|path|query|uri} doesn't run.
- BUG/MEDIUM: lua: undetected infinite loop
- BUG/MAJOR: http: don't read past buffer's end in http_replace_value
- BUG/MEDIUM: http: the function "(req|res)-replace-value" doesn't respect the HTTP syntax
- MEDIUM/CLEANUP: http: rewrite and lighten http_transform_header() prototype
- BUILD: lua: it miss the '-ldl' directive
- MEDIUM: http: allows 'R' and 'S' in the protocol alphabet
- MINOR: http: split the function http_action_set_req_line() in two parts
- MINOR: http: split http_transform_header() function in two parts.
- MINOR: http: export function inet_set_tos()
- MINOR: lua: txn: add function set_(loglevel|tos|mark)
- MINOR: lua: create and register HTTP class
- DOC: lua: fix some typos
- MINOR: lua: add log functions
- BUG/MINOR: lua: Fix SSL initialisation
- DOC: lua: some fixes
- MINOR: lua: (req|res)_get_headers return more than one header value
- MINOR: lua: map system integration in Lua
- BUG/MEDIUM: http: functions set-{path,query,method,uri} breaks the HTTP parser
- MINOR: sample: add url_dec converter
- MEDIUM: sample: fill the struct sample with the session, proxy and stream pointers
- MEDIUM: sample change the prototype of sample-fetches and converters functions
- MINOR: sample: fill the struct sample with the options.
- MEDIUM: sample: change the prototype of sample-fetches functions
- MINOR: http: split the url_param in two parts
- CLEANUP: http: bad indentation
- MINOR: http: add body_param fetch
- MEDIUM: http: url-encoded parsing function can run throught wrapped buffer
- DOC: http: req.body_param documentation
- MINOR: proxy: custom capture declaration
- MINOR: capture: add two "capture" converters
- MEDIUM: capture: Allow capture with slot identifier
- MINOR: http: add array of generic pointers in http_res_rules
- MEDIUM: capture: adds http-response capture
- MINOR: common: escape CSV strings
- MEDIUM: stats: escape some strings in the CSV dump
- MINOR: tcp: add custom actions that can continue tcp-(request|response) processing
- MINOR: lua: Lua tcp action are not final action
- DOC: lua: schematics about lua socket organization
- BUG/MINOR: debug: display (null) in place of "meth"
- DOC: mention the "lua action" in documentation
- MINOR: standard: add function that converts signed int to a string
- BUG/MINOR: sample: wrong conversion of signed values
- MEDIUM: sample: Add type any
- MINOR: debug: add a special converter which display its input sample content.
- MINOR: tcp: increase the opaque data array
- MINOR: tcp/http/conf: extends the keyword registration options
- MINOR: build: fix build dependency
- MEDIUM: vars: adds support of variables
- MINOR: vars: adds get and set functions
- MINOR: lua: Variable access
- MINOR: samples: add samples which returns constants
- BUG/MINOR: vars/compil: fix some warnings
- BUILD: add 51degrees options to makefile.
- MINOR: global: add several 51Degrees members to global
- MINOR: config: add 51Degrees config parsing.
- MINOR: init: add 51Degrees initialisation code
- MEDIUM: sample: add fiftyone_degrees converter.
- MEDIUM: deinit: add cleanup for 51Degrees to deinit
- MEDIUM: sample: add trie support to 51Degrees
- DOC: add 51Degrees notes to configuration.txt.
- DOC: add build indications for 51Degrees to README.
- MEDIUM: cfgparse: introduce weak and strong quoting
- BUG/MEDIUM: cfgparse: incorrect memmove in quotes management
- MINOR: cfgparse: remove line size limitation
- MEDIUM: cfgparse: expand environment variables
- BUG/MINOR: cfgparse: fix typo in 'option httplog' error message
- BUG/MEDIUM: cfgparse: segfault when userlist is misused
- CLEANUP: cfgparse: remove reference to 'ruleset' section
- MEDIUM: cfgparse: check section maximum number of arguments
- MEDIUM: cfgparse: max arguments check in the global section
- MEDIUM: cfgparse: check max arguments in the proxies sections
- CLEANUP: stream-int: remove a redundant clearing of the linger_risk flag
- MINOR: connection: make conn_sock_shutw() actually perform the shutdown() call
- MINOR: stream-int: use conn_sock_shutw() to shutdown a connection
- MINOR: connection: perform the call to xprt->shutw() in conn_data_shutw()
- MEDIUM: stream-int: replace xprt->shutw calls with conn_data_shutw()
- MINOR: checks: use conn_data_shutw_hard() instead of call via xprt
- MINOR: connection: implement conn_sock_send()
- MEDIUM: stream-int: make conn_si_send_proxy() use conn_sock_send()
- MEDIUM: connection: make conn_drain() perform more controls
- REORG: connection: move conn_drain() to connection.c and rename it
- CLEANUP: stream-int: remove inclusion of fd.h that is not used anymore
- MEDIUM: channel: don't always set CF_WAKE_WRITE on bi_put*
- CLEANUP: lua: don't use si_ic/si_oc on known stream-ints
- BUG/MEDIUM: peers: correctly configure the client timeout
- MINOR: peers: centralize configuration of the peers frontend
- MINOR: proxy: store the default target into the frontend's configuration
- MEDIUM: stats: use frontend_accept() as the accept function
- MEDIUM: peers: use frontend_accept() instead of peer_accept()
- CLEANUP: listeners: remove unused timeout
- MEDIUM: listener: store the default target per listener
- BUILD: fix automatic inclusion of libdl.
- MEDIUM: lua: implement a simple memory allocator
- MEDIUM: compression: postpone buffer adjustments after compression
- MEDIUM: compression: don't send leading zeroes with chunk size
- BUG/MINOR: compression: consider the expansion factor in init
- MINOR: http: check the algo name "identity" instead of the function pointer
- CLEANUP: compression: statify all algo-specific functions
- MEDIUM: compression: add a distinction between UA- and config- algorithms
- MEDIUM: compression: add new "raw-deflate" compression algorithm
- MEDIUM: compression: split deflate_flush() into flush and finish
- CLEANUP: compression: remove unused reset functions
- MAJOR: compression: integrate support for libslz
- BUG/MEDIUM: http: hdr_cnt would not count any header when called without name
- BUG/MAJOR: http: null-terminate the http actions keywords list
- CLEANUP: lua: remove the unused hlua_sleep memory pool
- BUG/MAJOR: lua: use correct object size when initializing a new converter
- CLEANUP: lua: remove hard-coded sizeof() in object creations and mallocs
- CLEANUP: lua: fix confusing local variable naming in hlua_txn_new()
- CLEANUP: hlua: stop using variable name "s" alternately for hlua_txn and hlua_smp
- CLEANUP: lua: get rid of the last "*ht" for struct hlua_txn.
- CLEANUP: lua: rename last occurrences of "*s" to "*htxn" for hlua_txn
- CLEANUP: lua: rename variable "sc" for struct hlua_smp
- CLEANUP: lua: get rid of the last two "*hs" for hlua_smp
- REORG/MAJOR: session: rename the "session" entity to "stream"
- REORG/MEDIUM: stream: rename stream flags from SN_* to SF_*
- MINOR: session: start to reintroduce struct session
- MEDIUM: stream: allocate the session when a stream is created
- MEDIUM: stream: move the listener's pointer to the session
- MEDIUM: stream: move the frontend's pointer to the session
- MINOR: session: add a pointer to the session's origin
- MEDIUM: session: use the pointer to the origin instead of s->si[0].end
- CLEANUP: sample: remove useless tests in fetch functions for l4 != NULL
- MEDIUM: http: move header captures from http_txn to struct stream
- MINOR: http: create a dedicated pool for http_txn
- MAJOR: http: move http_txn out of struct stream
- MAJOR: sample: don't pass l7 anymore to sample fetch functions
- CLEANUP: lua: remove unused hlua_smp->l7 and hlua_txn->l7
- MEDIUM: http: remove the now useless http_txn from {req/res} rules
- CLEANUP: lua: don't pass http_txn anymore to hlua_request_act_wrapper()
- MAJOR: sample: pass a pointer to the session to each sample fetch function
- MINOR: stream: provide a few helpers to retrieve frontend, listener and origin
- CLEANUP: stream: don't set ->target to the incoming connection anymore
- MINOR: stream: move session initialization before the stream's
- MINOR: session: store the session's accept date
- MINOR: session: don't rely on s->logs.logwait in embryonic sessions
- MINOR: session: implement session_free() and use it everywhere
- MINOR: session: add stick counters to the struct session
- REORG: stktable: move the stkctr_* functions from stream to sticktable
- MEDIUM: streams: support looking up stkctr in the session
- MEDIUM: session: update the session's stick counters upon session_free()
- MEDIUM: proto_tcp: track the session's counters in the connection ruleset
- MAJOR: tcp: make tcp_exec_req_rules() only rely on the session
- MEDIUM: stream: don't call stream_store_counters() in kill_mini_session() nor session_accept()
- MEDIUM: stream: move all the session-specific stuff of stream_accept() earlier
- MAJOR: stream: don't initialize the stream anymore in stream_accept
- MEDIUM: session: remove the task pointer from the session
- REORG: session: move the session parts out of stream.c
- MINOR: stream-int: make appctx_new() take the applet in argument
- MEDIUM: peers: move the appctx initialization earlier
- MINOR: session: introduce session_new()
- MINOR: session: make use of session_new() when creating a new session
- MINOR: peers: make use of session_new() when creating a new session
- MEDIUM: peers: initialize the task before the stream
- MINOR: session: set the CO_FL_CONNECTED flag on the connection once ready
- CLEANUP: stream.c: do not re-attach the connection to the stream
- MEDIUM: stream: isolate connection-specific initialization code
- MEDIUM: stream: also accept appctx as origin in stream_accept_session()
- MEDIUM: peers: make use of stream_accept_session()
- MEDIUM: frontend: make ->accept only return +/-1
- MEDIUM: stream: return the stream upon accept()
- MEDIUM: frontend: move some stream initialisation to stream_new()
- MEDIUM: frontend: move the fd-specific settings to session_accept_fd()
- MEDIUM: frontend: don't restrict frontend_accept() to connections anymore
- MEDIUM: frontend: move some remaining stream settings to stream_new()
- CLEANUP: frontend: remove one useless local variable
- MEDIUM: stream: don't rely on the session's listener anymore in stream_new()
- MEDIUM: lua: make use of stream_new() to create an outgoing connection
- MINOR: lua: minor cleanup in hlua_socket_new()
- MINOR: lua: no need for setting timeouts / conn_retries in hlua_socket_new()
- MINOR: peers: no need for setting timeouts / conn_retries in peer_session_create()
- CLEANUP: stream-int: swap stream-int and appctx declarations
- CLEANUP: namespaces: fix protection against multiple inclusions
- MINOR: session: maintain the session count stats in the session, not the stream
- MEDIUM: session: adjust the connection flags before stream_new()
- MINOR: stream: pass the pointer to the origin explicitly to stream_new()
- CLEANUP: poll: move the conditions for waiting out of the poll functions
- BUG/MEDIUM: listener: don't report an error when resuming unbound listeners
- BUG/MEDIUM: init: don't limit cpu-map to the first 32 processes only
- BUG/MAJOR: tcp/http: fix current_rule assignment when restarting over a ruleset
- BUG/MEDIUM: stream-int: always reset si->ops when si->end is nullified
- DOC: update the entities diagrams
- BUG/MEDIUM: http: properly retrieve the front connection
- MINOR: applet: add a new "owner" pointer in the appctx
- MEDIUM: applet: make the applet not depend on a stream interface anymore
- REORG: applet: move the applet definitions out of stream_interface
- CLEANUP: applet: rename struct si_applet to applet
- REORG: stream-int: create si_applet_ops dedicated to applets
- MEDIUM: applet: add basic support for an applet run queue
- MEDIUM: applet: implement a run queue for active appctx
- MEDIUM: stream-int: add a new function si_applet_done()
- MAJOR: applet: now call si_applet_done() instead of si_update() in I/O handlers
- MAJOR: stream: use a regular ->update for all stream interfaces
- MEDIUM: dumpstats: don't unregister the applet anymore
- MEDIUM: applet: centralize the call to si_applet_done() in the I/O handler
- MAJOR: stream: do not allocate request buffers anymore when the left side is an applet
- MINOR: stream-int: add two flags to indicate an applet's wishes regarding I/O
- MEDIUM: applet: make the applets only use si_applet_{cant|want|stop}_{get|put}
- MEDIUM: stream-int: pause the appctx if the task is woken up
- BUG/MAJOR: tcp: only call registered actions when they're registered
- BUG/MEDIUM: peers: fix applet scheduling
- BUG/MEDIUM: peers: recent applet changes broke peers updates scheduling
- MINOR: tools: provide an rdtsc() function for time comparisons
- IMPORT: lru: import simple ebtree-based LRU functions
- IMPORT: hash: import xxhash-r39
- MEDIUM: pattern: add a revision to all pattern expressions
- MAJOR: pattern: add LRU-based cache on pattern matching
- BUG/MEDIUM: http: remove content-length from chunked messages
- DOC: http: update the comments about the rules for determining transfer-length
- BUG/MEDIUM: http: do not restrict parsing of transfer-encoding to HTTP/1.1
- BUG/MEDIUM: http: incorrect transfer-coding in the request is a bad request
- BUG/MEDIUM: http: remove content-length form responses with bad transfer-encoding
- MEDIUM: http: restrict the HTTP version token to 1 digit as per RFC7230
- MEDIUM: http: disable support for HTTP/0.9 by default
- MEDIUM: http: add option-ignore-probes to get rid of the floods of 408
- BUG/MINOR: config: clear proxy->table.peers.p for disabled proxies
- MEDIUM: init: don't stop proxies in parent process when exiting
- MINOR: stick-table: don't attach to peers in stopped state
- MEDIUM: config: initialize stick-tables after peers, not before
- MEDIUM: peers: add the ability to disable a peers section
- MINOR: peers: store the pointer to the signal handler
- MEDIUM: peers: unregister peers that were never started
- MEDIUM: config: propagate the table's process list to the peers sections
- MEDIUM: init: stop any peers section not bound to the correct process
- MEDIUM: config: validate that peers sections are bound to exactly one process
- MAJOR: peers: allow peers section to be used with nbproc > 1
- DOC: relax the peers restriction to single-process
- DOC: document option http-ignore-probes
- DOC: fix the comments about the meaning of msg->sol in HTTP
- BUG/MEDIUM: http: wait for the exact amount of body bytes in wait_for_request_body
- BUG/MAJOR: http: prevent risk of reading past end with balance url_param
- MEDIUM: stream: move HTTP request body analyser before process_common
- MEDIUM: http: add a new option http-buffer-request
- MEDIUM: http: provide 3 fetches for the body
- DOC: update the doc on the proxy protocol
- BUILD: pattern: fix build warnings introduced in the LRU cache
- BUG/MEDIUM: stats: properly initialize the scope before dumping stats
- CLEANUP: config: fix misleading information in error message.
- MINOR: config: report the number of processes using a peers section in the error case
- BUG/MEDIUM: config: properly compute the default number of processes for a proxy
- MEDIUM: http: add new "capture" action for http-request
- BUG/MEDIUM: http: fix the http-request capture parser
- BUG/MEDIUM: http: don't forward client shutdown without NOLINGER except for tunnels
- BUILD/MINOR: ssl: fix build failure introduced by recent patch
- BUG/MAJOR: check: fix breakage of inverted tcp-check rules
- CLEANUP: checks: fix double usage of cur / current_step in tcp-checks
- BUG/MEDIUM: checks: do not dereference head of a tcp-check at the end
- CLEANUP: checks: simplify the loop processing of tcp-checks
- BUG/MAJOR: checks: always check for end of list before proceeding
- BUG/MEDIUM: checks: do not dereference a list as a tcpcheck struct
- BUG/MAJOR: checks: break infinite loops when tcp-checks starts with comment
- MEDIUM: http: make url_param iterate over multiple occurrences
- BUG/MEDIUM: peers: apply a random reconnection timeout
- MEDIUM: config: reject invalid config with name duplicates
- MEDIUM: config: reject conflicts in table names
- CLEANUP: proxy: make the proxy lookup functions more user-friendly
- MINOR: proxy: simply ignore duplicates in proxy name lookups
- MINOR: config: don't open-code proxy name lookups
- MEDIUM: config: clarify the conflicting modes detection for backend rules
- CLEANUP: proxy: remove now unused function findproxy_mode()
- MEDIUM: stick-table: remove the now duplicate find_stktable() function
- MAJOR: config: remove the deprecated reqsetbe / reqisetbe actions
- MINOR: proxy: add a new function proxy_find_by_id()
- MINOR: proxy: add a flag to memorize that the proxy's ID was forced
- MEDIUM: proxy: add a new proxy_find_best_match() function
- CLEANUP: http: explicitly reference request in http_apply_redirect_rules()
- MINOR: http: prepare support for parsing redirect actions on responses
- MEDIUM: http: implement http-response redirect rules
- MEDIUM: http: no need to close the request on redirect if data was parsed
- BUG/MEDIUM: http: fix body processing for the stats applet
- BUG/MINOR: da: fix log-level comparison to emove annoying warning
- CLEANUP: global: remove one ifdef USE_DEVICEATLAS
- CLEANUP: da: move the converter registration to da.c
- CLEANUP: da: register the config keywords in da.c
- CLEANUP: adjust the envelope name in da.h to reflect the file name
- CLEANUP: da: remove ifdef USE_DEVICEATLAS from da.c
- BUILD: make 51D easier to build by defaulting to 51DEGREES_SRC
- BUILD: fix build warning when not using 51degrees
- BUILD: make DeviceAtlas easier to build by defaulting to DEVICEATLAS_SRC
- BUILD: ssl: fix recent build breakage on older SSL libs
Commit 31af49d ("MEDIUM: ssl: Add options to forge SSL certificates")
introduced some dependencies on SSL_CTRL_SET_TLSEXT_HOSTNAME for which
a few checks were missing, breaking the build on openssl 0.9.8.
A switch case doesn't have default entry, and the compilator sends
a warning about uninitilized var.
warning: 'vars' may be used uninitialized in this function [-Wmaybe-uninitialized]
This regression was introduce by commit
9c627e84b2 (MEDIUM: sample: Add type any)
New sample type 'any' was not handled in the matrix used to cast
to stick-tables types.
It is possible to propagate entries of any data-types in stick-tables between
several haproxy instances over TCP connections in a multi-master fashion. Each
instance pushes its local updates and insertions to remote peers. The pushed
values overwrite remote ones without aggregation. Interrupted exchanges are
automatically detected and recovered from the last known point.
This patch adds two functions used for variable acces using the
variable full name. If the variable doesn't exists in the variable
pool name, it is created.
This patch adds support of variables during the processing of each stream. The
variables scope can be set as 'session', 'transaction', 'request' or 'response'.
The variable type is the type returned by the assignment expression. The type
can change while the processing.
The allocated memory can be controlled for each scope and each request, and for
the global process.
This patch permits to register a new keyword with the keyword "tcp-request content"
'tcp-request connection", tcp-response content", http-request" and "http-response"
which is identified only by matching the start of the keyword.
for example, we register the keyword "set-var" with the option "match_pfx"
and the configuration keyword "set-var(var_name)" matchs this entry.
This type is used to accept any type of sample as input, and prevent
any automatic "cast". It runs like the type "ADDR" which accept the
type "IPV4" and "IPV6".
The signed values are casted as unsigned before conversion. This patch
use the good converters according with the sample type.
Note: it depends on previous patch to parse signed ints.
Implementation of a DNS client in HAProxy to perform name resolution to
IP addresses.
It relies on the freshly created UDP client to perform the DNS
resolution. For now, all UDP socket calls are performed in the
DNS layer, but this might change later when the protocols are
extended to be more suited to datagram mode.
A new section called 'resolvers' is introduced thanks to this patch. It
is used to describe DNS servers IP address and also many parameters.
Basic introduction of a UDP layer in HAProxy. It can be used as a
client only and manages UDP exchanges with servers.
It can't be used to load-balance UDP protocols, but only used by
internal features such as DNS resolution.
Ability to change a server IP address during HAProxy run time.
For now this is provided via function update_server_addr() which
currently is not called.
A log is emitted on each change. For now we do it inconditionally,
but later we'll want to do it only on certain circumstances, which
explains why the logging block is enclosed in if(1).
Following functions are now available in the SSL public API:
* ssl_sock_create_cert
* ssl_sock_get_generated_cert
* ssl_sock_set_generated_cert
* ssl_sock_generated_cert_serial
These functions could be used to create a certificate by hand, set it in the
cache used to store generated certificates and retrieve it. Here is an example
(pseudo code):
X509 *cacert = ...;
EVP_PKEY *capkey = ...;
char *servername = ...;
unsigned int serial;
serial = ssl_sock_generated_cert_serial(servername, strlen(servername));
if (!ssl_sock_get_generated_cert(serial, cacert)) {
SSL_CTX *ctx = ssl_sock_create_cert(servername, serial, cacert, capkey);
ssl_sock_set_generated_cert(ctx, serial, cacert);
}
With this patch, it is possible to configure HAProxy to forge the SSL
certificate sent to a client using the SNI servername. We do it in the SNI
callback.
To enable this feature, you must pass following BIND options:
* ca-sign-file <FILE> : This is the PEM file containing the CA certitifacte and
the CA private key to create and sign server's certificates.
* (optionally) ca-sign-pass <PASS>: This is the CA private key passphrase, if
any.
* generate-certificates: Enable the dynamic generation of certificates for a
listener.
Because generating certificates is expensive, there is a LRU cache to store
them. Its size can be customized by setting the global parameter
'tune.ssl.ssl-ctx-cache-size'.
It lookup a key in a LRU cache for use with specified domain and revision. It
differs from lru64_get as it does not create missing keys. The function returns
NULL if an error or a cache miss occurs.
Now, When a item is committed in an LRU tree, you can define a function to free
data owned by this item. This function will be called when the item is removed
from the LRU tree or when the tree is destroyed..
When using the "51d" converter without specifying the list of 51Degrees
properties to detect (see parameter "51degrees-property-name-list"), the
"global._51d_property_names" could be left uninitialized which will lead to
segfault during init.
Since all rules listed in p->block_rules have been moved to the beginning of
the http-request rules in check_config_validity(), there is no need to clean
p->block_rules in deinit().
Signed-off-by: Godbach <nylzhaowei@gmail.com>
An ifdef was missing to avoid declaring these variables :
src/haproxy.c: In function 'deinit':
src/haproxy.c:1253:47: warning: unused variable '_51d_prop_nameb' [-Wunused-variable]
src/haproxy.c:1253:30: warning: unused variable '_51d_prop_name' [-Wunused-variable]
It takes up to 5 string arguments that are to be 51Degrees property names.
It will then create a chunk with values detected based on the request header
supplied (this should be the User-Agent).
When haproxy is run on the foreground with DeviceAtlas enabled, one
line of warning is seen for every test because the comparison is always
true even when loglevel is zero :
willy@wtap:haproxy$ ./haproxy -db -f test-da.cfg
[WARNING] 151/150831 (25506) : deviceatlas : final memory image 7148029 bytes.
Deviceatlas module loaded.
[WARNING] 151/150832 (25506) : deviceatlas : .
[WARNING] 151/150833 (25506) : deviceatlas : .
[WARNING] 151/150833 (25506) : deviceatlas : .
^C
Don't emit a warning when loglevel is null.
This diff initialises few DeviceAtlas struct fields member with
their inherent default values.
Furthermore, the specific DeviceAtlas configuration keywords are
registered and the module is initialised and all necessary
resources are freed during the deinit phase.
This diff declares the deviceatlas module and can accept up to 5
property names for the API lookup.
[wt: this should probably be moved to its own file using the keyword
registration mechanism]
This diff is for the DeviceAtlas convertor.
This patch adds the following converters :
deviceatlas-json-file
deviceatlas-log-level
deviceatlas-property-separator
First, the configuration keywords handling (only the log
level configuration part does not end the haproxy process
if it is wrongly set, it fallbacks to the default level).
Furthermore, init, deinit phases and the API lookup phase,
the da_haproxy function which is fed by the input provided
and set all necessary properties chosen via the configuration
to the output, separated by the separator.
It is likely that powerful adversaries have been pre-computing the
standardized DH groups, because being widely used have made them
valuable targets. While users are advised to generate their own
DH parameters, replace the ones we ship by values been randomly
generated for this product only.
[wt: replaced dh1024_p, dh2048_p, and dh4096_p with locally-generated
ones as recommended by Rémi]
This patch adds the ssl-dh-param-file global setting. It sets the
default DH parameters that will be used during the SSL/TLS handshake when
ephemeral Diffie-Hellman (DHE) key exchange is used, for all "bind" lines
which do not explicitely define theirs.
Actually, the registered lua actions with "tcp-request lua" and
"tcp-response lua" are final actions. This patch change the action
type type and permit to continue the evaluation of tcp-* processing
after the evaluation of the lua actions.
Actually, the tcp-request and tcp-response custom ation are always final
actions. This patch create a new type of action that can permit to
continue the evaluation of tcp-request and tcp-response processing.
This patch does'nt add any new feature: the functional behavior
is the same than version 1.0.
Technical differences:
In this version all updates on different stick tables are
multiplexed on the same tcp session. There is only one established
tcp session per peer whereas in first version there was one established
tcp session per peer and per stick table.
Messages format was reviewed to be more evolutive and to support
further types of data exchange such as SSL sessions or other sticktable's
data types (currently only the sticktable's server id is supported).
Commit 9fbe18e ("MEDIUM: http: add a new option http-buffer-request")
introduced a regression due to a misplaced check causing the admin
mode of the HTTP stats not to work anymore.
This patch tried to ensure that when we need a request body for the
stats applet, and we have already waited for this body, we don't wait
for it again, but the condition was applied too early causing a
disabling of the entire processing the body, and based on the wrong
HTTP state (MSG_BODY) resulting in the test never matching.
Thanks to Chad Lavoie for reporting the problem.
This bug is 1.6-only, no backport is needed.
Most of the keywords in the global section does not check the maximum
number of arguments. This leds sometines to unused and wrong arguments
in the configuration file. This patch add a maximum argument test in
many keywords of this section.
This patch checks the number of arguments of the keywords:
'global', 'defaults', 'listen', 'backend', 'frontend', 'peers' and
'userlist'
The 'global' section does not take any arguments.
Proxy sections does not support bind address as argument anymore. Those
sections supports only an <id> argument.
The 'defaults' section didn't had any check on its arguments. It takes
an optional <name> argument.
'peers' section takes a <peersect> argument.
'userlist' section takes a <listname> argument.
If the 'userlist' keyword parsing returns an error and no userlist were
previously created. The parsing of 'user' and 'group' leads to NULL
derefence.
The userlist pointer is now tested to prevent this issue.
Hervé Commowick reported that the logic used to avoid complaining about
ssl-default-dh-param not being set when static DH params are present
in the certificate file was clearly wrong when more than one sni_ctx
is used.
This patch stores whether static DH params are being used for each
SSL_CTX individually, and does not overwrite the value of
tune.ssl.default-dh-param.
Some strings which must be dumped in the CSV output can contain one of
the following chars : <,>, <">, or CR/LF. This patch escapes these
strings if the case is encountered.
This function checks a string for using it in a CSV output format. If
the string contains one of the following four char <">, <,>, CR or LF,
the string is encapsulated between <"> and the <"> are escaped by a <"">
sequence.
The rounding by <"> is optionnal. It can be canceled, forced or the
function choose automatically the right way.
There are two reasons for not keeping the client connection alive upon a
redirect :
- save the client from uploading all data
- avoid keeping a connection alive if the redirect goes to another domain
The first case should consider an exception when all the data from the
client have been read already. This specifically happens on response
redirects after a POST to a server. This is an easy situation to detect.
It could later be improved to cover the cases where option
http-buffer-request is used.
Sometimes it's problematic not to have "http-response redirect" rules,
for example to perform a browser-based redirect based on certain server
conditions (eg: match of a header).
This patch adds "http-response redirect location <fmt>" which gives
enough flexibility for most imaginable operations. The connection to
the server is closed when this is performed so that we don't risk to
forward any pending data from the server.
Any pending response data are trimmed so that we don't risk to
forward anything pending to the client. It's harmless to also do that
for requests so we don't need to consider the direction.
In order to support http-response redirect, the parsing needs to be
adapted a little bit to only support the "location" type, and to
adjust the log-format parser so that it knows the direction of the
sample fetch calls.
This function was made to perform a redirect on requests only, it was
using a message or txn->req in an inconsistent way and did not consider
the possibility that it could be used for the other direction. Let's
clean it up to have both a request and a response messages.
This patch adds a http response capture keyword with the same behavior
as the previous patch called "MEDIUM: capture: Allow capture with slot
identifier".
This patch modifies the current http-request capture function
and adds a new keyword "id" that permits to identify a capture slot.
If the identified doesn't exists, the action fails silently.
Note that this patch removs an unused list initilisation, which seems
to be inherited from a copy/paste. It's harmless and does not need to
be backported.
LIST_INIT((struct list *)&rule->arg.act.p[0]);
This patch adds "capture-req" and "capture-res". These two converters
capture their entry in the allocated slot given in argument and pass
the input on the output.
This patch adds a new keyword called "declare". This keyword
allow to declare some capture slots in requests and response.
It is useful for sharing capture between frontend and backends.
This function tries to spot a proxy by its name, ID and type, and
in case some elements don't match, it tries to determine which ones
could be ignored and reports which ones were ignored so that the
caller can decide whether or not it wants to pick this proxy. This
will be used for maintaining the status across reloads where the
config might have changed a bit.
It does the same as the other one except that it only focuses on the
numeric ID and the capabilities. It's used by proxy_find_by_name()
for numeric names.
These ones were already obsoleted in 1.4, marked for removal in 1.5,
and not documented anymore. They used to emit warnings, and do still
require quite some code to stay in place. Let's remove them now.
We don't use findproxy_mode() anymore so we can check the conflicting
modes and report the anomalies accordingly with line numbers and more
explicit details.
Now that we can't have duplicate proxies with similar capabilities, we
can remove some painful check. The first one is the check that made the
lookup function return NULL when a duplicate is found, as it prevented
it from being used in the config parser to detect duplicates.
First, findproxy() was renamed proxy_find_by_name() so that its explicit
that a name is required for the lookup. Second, we give this function
the ability to search for tables if needed. Third we now provide inline
wrappers to pass the appropriate PR_CAP_* flags and to explicitly look
up a frontend, backend or table.
A nasty situation happens when two tables have the same name. Since it
is possible to declare a table in a frontend and another one in a backend,
this situation may happen and result in a random behaviour each time a
table is designated in a "stick" or "track" rule. Let's make sure this
is properly detected and stopped. Such a config will now report :
[ALERT] 145/104933 (31571) : parsing [prx.cfg:36] : stick-table name 't' conflicts with table declared in frontend 't' at prx.cfg:30.
[ALERT] 145/104933 (31571) : Error(s) found in configuration file : prx.cfg
[ALERT] 145/104933 (31571) : Fatal errors found in configuration.
Since 1.4 we used to emit a warning when two frontends or two backends
had the same name. In 1.5 we added the same warning for two peers sections.
In 1.6 we added the same warning for two mailers sections. It's about time
to reject such invalid configurations, the impact they have on the code
complexity is huge and it is becoming a real obstacle to some improvements
such as restoring servers check status across reloads.
Now these errors are reported as fatal errors and will need to be fixed.
Anyway, till now there was no guarantee that what was written was working
as expected since the behaviour is not defined (eg: use_backend with a
name used by two backends leads to undefined behaviour).
Example of output :
[ALERT] 145/104759 (31564) : Parsing [prx.cfg:12]: mailers section 'm' has the same name as another mailers section declared at prx.cfg:10.
[ALERT] 145/104759 (31564) : Parsing [prx.cfg:16]: peers section 'p' has the same name as another peers section declared at prx.cfg:14.
[ALERT] 145/104759 (31564) : Parsing [prx.cfg:21]: frontend 'f' has the same name as another frontend declared at prx.cfg:18.
[ALERT] 145/104759 (31564) : Parsing [prx.cfg:27]: backend 'b' has the same name as another backend declared at prx.cfg:24.
[ALERT] 145/104759 (31564) : Error(s) found in configuration file : prx.cfg
[ALERT] 145/104759 (31564) : Fatal errors found in configuration.
The "name" and "name_len" arguments in function "smp_fetch_url_param"
could be left uninitialized for subsequent calls.
[wt: no backport needed, this is an 1.6 regression introduced by
commit 4fdc74c ("MINOR: http: split the url_param in two parts") ]
For backend load balancing it sometimes makes sense to redispatch rather
than retrying against the same server. For example, when machines or routers
fail you may not want to waste time retrying against a dead server and
would instead prefer to immediately redispatch against other servers.
This patch allows backend sections to specify that they want to
redispatch on a particular interval. If the interval N is positive the
redispatch occurs on every Nth retry, and if the interval N is negative then
the redispatch occurs on the Nth retry prior to the last retry (-1 is the
default and maintains backwards compatibility). In low latency environments
tuning this setting can save a few hundred milliseconds when backends fail.
This patch is the part of the body_param fetch. The goal is to have
generic url-encoded parser which can used for parsing the query string
and the body.
Commit 9ff95bb ("BUG/MEDIUM: peers: correctly configure the client timeout")
uncovered an old bug in the peers : upon disconnect, we reconnect immediately.
This sometimes results in both ends to do the same thing in parallel causing
a loop of connect/accept/close/close that can last several seconds. The risk
of occurrence of the trouble increases with latency, and is emphasized by the
fact that idle connections are now frequently recycled (after 5s of idle).
In order to avoid this we must apply a random delay before reconnecting.
Fortunately the mechanism already supports a reconnect delay, so here we
compute the random timeout when killing a session. The delay is 50ms plus
a random between 0 and 2 seconds. Ideally an exponential back-off would
be preferred but it's preferable to keep the fix simple.
This bug was reported by Marco Corte.
This fix must be backported to 1.5 since the fix above was backported into
1.5.12.
There are some situations hwere it's desirable to scan multiple occurrences
of a same parameter name in the query string. This change ensures this can
work, even with an empty name which will then iterate over all parameters.
Until now, HAproxy needed to be restarted to change the TLS ticket
keys. With this patch, the TLS keys can be updated on a per-file
basis using the admin socket. Two new socket commands have been
introduced: "show tls-keys" and "set ssl tls-keys".
Signed-off-by: Nenad Merdanovic <nmerdan@anine.io>
Within the listener struct we need to use a reference to the TLS
ticket keys which binds the actual keys with the filename. This will
make it possible to update the keys through the socket
Signed-off-by: Nenad Merdanovic <nmerdan@anine.io>
If a tcp-check sequence starts with "comment", then the action is not
matched in the while() loop and the pointer doesn't advance so we face
an endless loop. It is normally detected early except in the case where
very slow checks are performed causing it to trigger after the admin stops
watching.
This bug is 1.6-only and very recent so it didn't have the time to affect
anyone.
The method used to skip to next rule in the list is wrong, it assumes
that the list element starts at the same offset as the rule. It happens
to be true on most architectures since the list is the first element for
now but it's definitely wrong. Now the code doesn't crash anymore when
the struct list is moved anywhere else in the struct tcpcheck_rule.
This fix must be backported to 1.5.
This is the most important fix of this series. There's a risk of endless
loop and crashes caused by the fact that we go past the head of the list
when skipping to next rule, without checking if it's still a valid element.
Most of the time, the ->action field is checked, which points to the proxy's
check_req pointer (generally NULL), meaning the element is confused with a
TCPCHK_ACT_SEND action.
The situation was accidently made worse with the addition of tcp-check
comment since it also skips list elements. However, since the action that
makes it go forward is TCPCHK_ACT_COMMENT (3), there's little chance to
see this as a valid pointer, except on 64-bit machines where it can match
the end of a check_req string pointer.
This fix heavily depends on previous cleanup and both must be backported
to 1.5 where the bug is present.
There is some unobvious redundancy between the various ways we can leave
the loop. Some of them can be factored out. So now we leave the loop when
we can't go further, whether it's caused by reaching the end of the rules
or by a blocking I/O.
When the end of the list is reached, the current step's action is checked
to know if we must poll or not. Unfortunately, the main reason for going
there is that we walked past the end of list and current_step points to
the head. We cannot dereference ->action since it does not belong to this
structure and can definitely crash if the address is not mapped.
This bug is unlikely to cause a crash since the action appears just after
the list, and corresponds to the "char *check_req" pointer in the proxy
struct, and it seems that we can't go there with current_step being null.
At worst it can cause the check to register for recv events.
This fix needs to be backported to 1.5 since the code is incorrect there
as well.
This cleanup is a preliminary requirement to the upcoming fixes for
the bug that affect tcp-check's improper use of lists. It will have
to be backported to 1.5 though it will not easily apply.
There are two variables pointing to the current rule within the loop,
and either one or the other is used depending on the code blocks,
making it much harder to apply checks to fix the list walking bug.
So first get rid of "cur" and only focus on current_step.
Environment variables were expandables only in adresses.
Now there are expandables everywhere in the configuration file within
double quotes.
This patch breaks compatibility with the previous behavior of
environment variables in adresses, you must enclose adresses with double
quotes to make it work.
Recent commit 22b09d2 ("MINOR: include comment in tcpcheck error log")
accidently left a double-step to the next rule in case of an inverted
rule. The effect is that an inverted rule is necessarily skipped and
that we can crash if it was the last rule since we'd use as a rule the
head of the list, thus dereference random memory contents.
No backport is needed.
tcpcheck error messages include the step id where the error occurs.
In some cases, this is not enough. Now, HAProxy also use the comment
field of the latest tcpcheck rule which has been run.
This commit allows HAProxy to parse a new directive in the tcpcheck
ruleset: 'comment'.
It is used to setup comments on the current tcpcheck rules.
in src.checks.c, the function tcpcheck_get_step_id is called many times.
In order to save some cpu cycles, I save the result of this function in
an integer.
Baptiste reported that commit 0a9a2b8 ("MEDIUM: sample change the
prototype of sample-fetches and converters functions") broke the
build of ssl_sock.c when using openssl-1.0.2 because one missed
replacement of sess with smp->sess. No backport is needed.
Options are relative to the sample. Each sample fetched is associated with
fetch options or fetch flags.
This patch adds the 'opt' vaue in the sample struct. This permits to reduce
the sample-fetch function prototype. In other way, the converters will have
more detail about the origin of the sample.
This patch removes the structs "session", "stream" and "proxy" from
the sample-fetches and converters function prototypes.
This permits to remove some weight in the prototype call.
Some sample analyzer (sample-fetch or converters) needs to known the proxy,
session and stream attached to the sampel. The sample-fetches and the converters
function pointers cannot be called without these 3 pointers filled.
This patch permits to reduce the sample-fetch and the converters called
prototypes, and provides a new mean to add information for this type of
functions.
There's an issue related with shutting down POST transfers or closing the
connection after the end of the upload : the shutdown is forwarded to the
server regardless of the abortonclose option. The problem it causes is that
during a scan, brute force or whatever, it becomes possible that all source
ports are exhausted with all sockets in TIME_WAIT state.
There are multiple issues at once in fact :
- no action is done for the close, it automatically happens at the lower
layers thanks for channel_auto_close(), so we cannot act on NOLINGER ;
- we *do* want to continue to send a clean shutdown in tunnel mode because
some protocols transported over HTTP may need this, regardless of option
abortonclose, thus we can't set the option inconditionally
- for all other modes, we do want to close the dirty way because we're
certain whether we've sent everything or not, and we don't want to eat
all source ports.
The solution is a bit complex and applies to DONE/TUNNEL states :
1) disable automatic close for everything not a tunnel and not just
keep-alive / server-close. Force-close is now covered, as is HTTP/1.0
which implicitly works in force-close mode ;
2) when processing option abortonclose, we know we can disable lingering
if the client has closed and the connection is not in tunnel mode.
Since the last case above leads to a situation where the client side reports
an error, we know the connection will not be reused, so leaving the flag on
the stream-interface is safe. A client closing in the middle of the data
transmission already aborts the transaction so this case is not a problem.
This fix must be backported to 1.5 where the problem was detected.
Due to the code being mostly inspired from the tcp-request parser, it
does some crap because both don't work the same way. The "len" argument
could be mismatched and then the length could be used uninitialized.
This is only possible in frontends of course, but it will finally
make it possible to capture arbitrary http parts, including URL
parameters or parts of the message body.
It's worth noting that an ugly (char **) cast had to be done to
call sample_fetch_string() which is caused by a 5- or 6- levels
of inheritance of this type in the API. Here it's harmless since
the function uses it as a const, but this API madness must be
fixed, starting with the one or two rare functions that modify
the args and inflict this on each and every keyword parser.
(cherry picked from commit 484a4f38460593919a1c1d9a047a043198d69f45)
This patch introduces quoting which allows to write configuration string
including spaces without escaping them.
Strong (with single quotes) and weak (with double quotes) quoting are
supported. Weak quoting supports escaping and special characters when
strong quoting does not interpret anything.
This patch could break configuration files where ' and " where used.
Chad Lavoie reported an interesting regression caused by the latest
updates to automatically detect the processes a peers section runs on.
It turns out that if a config has neither nbproc nor a bind-process
statement and depending on the frontend->backend chaining, it is possible
to evade all bind_proc propagations, resulting in assigning only ~0UL (all
processes, which is 32 or 64) without ever restricting it to nbproc. It
was not visible in backends until they started to reference peers sections
which saw themselves with 64 processes at once.
This patch addresses this by replacing all those ~0UL with nbits(nbproc).
That way all "bind-process" settings *default* to the number of processes
defined in nbproc instead of 32 or 64.
This fix could possibly be backported into 1.5, though there is no indication
that this bug could have any effect there.
Issuing a "show sess all" prior to a "show stat" on the CLI results in no
proxy being dumped because the scope_len union member was not properly
reinitialized.
This fix must be backported into 1.5.
They're caused by the cast to long long from ptr in 32-bit.
src/pattern.c: In function 'pat_match_str':
src/pattern.c:479:44: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
Body processing is still fairly limited, but this is a start. It becomes
possible to apply regex to find contents in order to decide where to route
a request for example. Only the first chunk is parsed for now, and the
response is not yet available (the parsing function must be duplicated for
this).
req.body : binary
This returns the HTTP request's available body as a block of data. It
requires that the request body has been buffered made available using
"option http-buffer-request". In case of chunked-encoded body, currently only
the first chunk is analyzed.
req.body_len : integer
This returns the length of the HTTP request's available body in bytes. It may
be lower than the advertised length if the body is larger than the buffer. It
requires that the request body has been buffered made available using
"option http-buffer-request".
req.body_size : integer
This returns the advertised length of the HTTP request's body in bytes. It
will represent the advertised Content-Length header, or the size of the first
chunk in case of chunked encoding. In order to parse the chunks, it requires
that the request body has been buffered made available using
"option http-buffer-request".
It is sometimes desirable to wait for the body of an HTTP request before
taking a decision. This is what is being done by "balance url_param" for
example. The first use case is to buffer requests from slow clients before
connecting to the server. Another use case consists in taking the routing
decision based on the request body's contents. This option placed in a
frontend or backend forces the HTTP processing to wait until either the whole
body is received, or the request buffer is full, or the first chunk is
complete in case of chunked encoding. It can have undesired side effects with
some applications abusing HTTP by expecting unbufferred transmissions between
the frontend and the backend, so this should definitely not be used by
default.
Note that it would not work for the response because we don't reset the
message state before starting to forward. For the response we need to
1) reset the message state to MSG_100_SENT or BODY , and 2) to reset
body_len in case of chunked encoding to avoid counting it twice.
Since 1.5, the request body analyser has become independant from any
other element and does not even disturb the message forwarder anymore.
And since it's disabled by default, we can place it before most
analysers so that it's can preempt any other one if an intermediary
one enables it.
The get_server_ph_post() function assumes that the buffer is contiguous.
While this is true for all the header part, it is not necessarily true
for the end of data the fit in the reserve. In this case there's a risk
to read past the end of the buffer for a few hundred bytes, and possibly
to crash the process if what follows is not mapped.
The fix consists in truncating the analyzed length to the length of the
contiguous block that follows the headers.
A config workaround for this bug would be to disable balance url_param.
This fix must be backported to 1.5. It seems 1.4 did have the check.
Due to the fact that we were still considering only msg->sov for the
first byte of data after calling http_parse_chunk_size(), we used to
miscompute the input data size and to count the CRLF and the chunk size
as part of the input data. The effect is that it was possible to release
the processing with 3 or 4 missing bytes, especially if they're typed by
hand during debugging sessions. This can cause the stats page to return
some errors in admin mode, and the url_param balance algorithm to fail
to properly hash a body input.
This fix must be backported to 1.5.
If a peers section is bound to no process, it's silently discarded. If its
bound to multiple processes, an error is emitted and the process will not
start.
This will prevent the peers section from remaining in listen state on
the incorrect process. The peers_fe pointer is set to NULL, which will
tell the peers task to commit suicide if it was already scheduled.
The peers initialization sequence is a bit complex, they're attached
to stick-tables and initialized very early in the boot process. When
we fork, if some must not start, it's too late to find them. Instead,
simply add a guard in their respective tasks to stop them once they
want to start.
Sometimes it's very hard to disable the use of peers because an empty
section is not valid, so it is necessary to comment out all references
to the section, and not to forget to restore them in the same state
after the operation.
Let's add a "disabled" keyword just like for proxies. A ->state member
in the peers struct is even present for this purpose but was never used
at all.
Maybe it would make sense to backport this to 1.5 as it's really cumbersome
there.
It's dangerous to initialize stick-tables before peers because they
start a task that cannot be stopped before we know if the peers need
to be disabled and destroyed. Move this after.
If a table in a disabled proxy references a peers section, the peers
name is not resolved to a pointer to a table, but since it belongs to
a union, it can later be dereferenced. Right now it seems it cannot
happen, but it definitely will after the pending changes.
It doesn't cost anything to backport this into 1.5, it will make gdb
sessions less head-scratching.
Recently some browsers started to implement a "pre-connect" feature
consisting in speculatively connecting to some recently visited web sites
just in case the user would like to visit them. This results in many
connections being established to web sites, which end up in 408 Request
Timeout if the timeout strikes first, or 400 Bad Request when the browser
decides to close them first. These ones pollute the log and feed the error
counters. There was already "option dontlognull" but it's insufficient in
this case. Instead, this option does the following things :
- prevent any 400/408 message from being sent to the client if nothing
was received over a connection before it was closed ;
- prevent any log from being emitted in this situation ;
- prevent any error counter from being incremented
That way the empty connection is silently ignored. Note that it is better
not to use this unless it is clear that it is needed, because it will hide
real problems. The most common reason for not receiving a request and seeing
a 408 is due to an MTU inconsistency between the client and an intermediary
element such as a VPN, which blocks too large packets. These issues are
generally seen with POST requests as well as GET with large cookies. The logs
are often the only way to detect them.
This patch should be backported to 1.5 since it avoids false alerts and
makes it easier to monitor haproxy's status.
There's not much reason for continuing to accept HTTP/0.9 requests
nowadays except for manual testing. Now we disable support for these
by default, unless option accept-invalid-http-request is specified,
in which case they continue to be upgraded to 1.0.
While RFC2616 used to allow an undeterminate amount of digits for the
major and minor components of the HTTP version, RFC7230 has reduced
that to a single digit for each.
If a server can't properly parse the version string and falls back to 0.9,
it could then send a head-less response whose payload would be taken for
headers, which could confuse downstream agents.
Since there's no more reason for supporting a version scheme that was
never used, let's upgrade to the updated version of the standard. It is
still possible to enforce support for the old behaviour using options
accept-invalid-http-request and accept-invalid-http-response.
It would be wise to backport this to 1.5 as well just in case.
The spec mandates that content-length must be removed from messages if
Transfer-Encoding is present, not just for valid ones.
This must be backported to 1.5 and 1.4.
The rules related to how to handle a bad transfer-encoding header (one
where "chunked" is not at the final place) have evolved to mandate an
abort when this happens in the request. Previously it was only a close
(which is still valid for the server side).
This must be backported to 1.5 and 1.4.
While Transfer-Encoding is HTTP/1.1, we must still parse it in HTTP/1.0
in case an agent sends it, because it's likely that the other side might
use it as well, causing confusion. This will also result in getting rid
of the Content-Length header in such abnormal situations and in having
a clean connection.
This must be backported to 1.5 and 1.4.
RFC7230 clarified the behaviour to adopt when facing both a
content-length and a transfer-encoding: chunked in a message. While
haproxy already complied with the method for getting the message
length right, and used to detect improper content-length duplicates,
it still did not remove the content-length header when facing a
transfer-encoding: chunked. Usually it is not a problem since other
agents (clients and servers) are required to parse the message
according to the rules that have been in place since RFC2616 in
1999.
However Régis Leroy reported the existence of at least one such
non-compliant agent so haproxy could be abused to get out of sync
with it on pipelined requests (HTTP request smuggling attack),
it consider part of a payload as a subsequent request.
The best thing to do is then to remove the content-length according
to RFC7230. It used to be in the todo list with a fixme in the code
while waiting for the standard to stabilize, let's apply it now that
it's published.
Thanks to Régis for bringing that subject to our attention.
This fix must be backported to 1.5 and 1.4.
This is similar to the way email alerts are sent when servers are marked as
DOWN.
Like the log messages corresponding to these state changes the messages
have log level notice. Thus they are suppressed by the default email-alert
level of 'alert'. To allow these messages the email-alert level should
be set to 'notice', 'info' or 'debug'. e.g:
email-alert level notice
"email-alert mailers" and "email-alert to" settings are also required in
order for any email alerts to be sent.
A follow-up patch will document the above.
Signed-off-by: Simon Horman <horms@verge.net.au>
Lower the priority of email alerts for log-health-checks messages from
LOG_NOTICE to LOG_INFO.
This is to allow set-ups with log-health-checks enabled to disable email
for health check state changes while leaving other email alerts enabled.
In order for email alerts to be sent for health check state changes
"log-health-checks" needs to be set and "email-alert level" needs to be 'info'
or lower. "email-alert mailers" and "email-alert to" settings are also
required in order for any email alerts to be sent.
A follow-up patch will document the above.
Signed-off-by: Simon Horman <horms@verge.net.au>
The principle of this cache is to have a global cache for all pattern
matching operations which rely on lists (reg, sub, dir, dom, ...). The
input data, the expression and a random seed are used as a hashing key.
The cached entries contains a pointer to the expression and a revision
number for that expression so that we don't accidently used obsolete
data after a pattern update or a very unlikely hash collision.
Regarding the risk of collisions, 10k entries at 10k req/s mean 1% risk
of a collision after 60 years, that's already much less than the memory's
reliability in most machines and more durable than most admin's life
expectancy. A collision will result in a valid result to be returned
for a different entry from the same list. If this is not acceptable,
the cache can be disabled using tune.pattern.cache-size.
A test on a file containing 10k small regex showed that the regex
matching was limited to 6k/s instead of 70k with regular strings.
When enabling the LRU cache, the performance was back to 70k/s.
This will be used to detect any change on the pattern list between
two operations, ultimately making it possible to implement a cache
which immediately invalidates obsolete keys after an update. The
revision is simply taken from the timestamp counter to ensure that
even upon a pointer reuse we cannot accidently come back to the
same (expr,revision) tuple.
The xxhash library provides a very fast and excellent hash algorithm
suitable for many purposes. It excels at hashing large blocks but is
also extremely fast on small ones. It's distributed under a 2-clause
BSD license (GPL-compatible) so it can be included here. Updates are
distributed here :
https://github.com/Cyan4973/xxHash
This will be usable to implement some maps/acl caches for heavy datasets
loaded from files (mostly regex-based but in general anything that cannot
be indexed in a tree).
The commit e16c1b3f changed the way the function tcpcheck_get_step_id is
now called (check instead of server).
This change introduced a regression since now this function would return
0 all the time because of:
if (check->current_step)
return 0;
This patch fixes this issue by inversing the test: you want to return 0
only if current_step is not yet set :)
No backport is needed.
This commit adds 4 new log format variables that parse the
HTTP Request-Line for more specific logging than "%r" provides.
For example, we can parse the following HTTP Request-Line with
these new variables:
"GET /foo?bar=baz HTTP/1.1"
- %HM: HTTP Method ("GET")
- %HV: HTTP Version ("HTTP/1.1")
- %HU: HTTP Request-URI ("/foo?bar=baz")
- %HP: HTTP Request-URI without query string ("/foo")
Since appctx are scheduled out of streams, it's pointless to wake up
the task managing the stream to push updates, they won't be seen. In
fact unit tests work because silent sessions are restarted after 5s of
idle and the exchange is correctly scheduled during startup!
So we need to notify the appctx instead. For this we add a pointer to
the appctx in the peer session.
No backport is needed of course.
Consecutive to the recent changes brought to applets, peers properly
connect but do not exchange data anymore because the stream interface
is not marked as waiting for data.
No backport is needed.
When one of these functions replaces a part of the query string by
a shorter or longer new one, the header parsing is broken. This is
because the start of the first header is not updated.
In the same way, the total length of the request line is not updated.
I dont see any bug caused by this miss, but I guess than it is better
to store the good length.
This bug is only in the development version.
Commit cc87a11 ("MEDIUM: tcp: add register keyword system.") introduced
the registration of new keywords for TCP rulesets. Unfortunately it
replaced the "accept" action with an unconditionnal call to the rule's
action function, resulting in an immediate segfault when using the
"accept" action in a TCP ruleset.
This bug reported by Baptiste Assmann was introduced in 1.6-dev1, no
backport is needed.
If we're going to call the task we don't need to call the appctx anymore
since the task may decide differently in the end and will do the proper
thing using ->update(). This reduces one wake up call per session and
may go down to half in case of high concurrency (scheduling races).
The applets don't fiddle with SI_FL_WAIT_ROOM anymore, instead they indicate
what they want, possibly that they failed (eg: WAIT_ROOM), and it's done() /
update() which finally updates the WAIT_* flags according to the channels'
and stream interface's states. This solves the issue of the pauses during a
"show sess" without creating busy loops.
We used to allocate a request buffer so that we could process applets
from process_stream(), and this was causing some trouble because it was
not possible for an analyzer to return an error to an applet, which
we'll need for HTTP/2. Now that we don't call applets anymore from
process_stream() we can simplify this and ensure that a response is
always allocated to process a stream.
It's much easier to centralize this call into the I/O handler than to
do it everywhere with the risk to miss it. Applets are not allowed to
unregister themselves anyway so their SI is still present and it is
possible to update all the context.
Now si->update() is used to update any type of stream interface, whether
it's an applet, a connection or even nothing. We don't call si_applet_call()
anymore at the end of the resync and we don't have the risk that the
stream's task is reinserted into the run queue, which makes the code
a bit simpler.
The stream_int_update_applet() function was simplified to ensure that
it remained compatible with this standardized calling convention. It
was almost copy-pasted from the update code dedicated to connections.
Just like for si_applet_done(), it seems that it should be possible to
merge the two functions except that it would require some slow operations,
except maybe if the type of end point is tested inside the update function
itself.
The applet I/O handlers now rely on si_applet_done() which itself decides
to wake up or sleep the appctx. Now it becomes critical that applte handlers
properly call this on every exit path so that the appctx is removed from the
active list after I/O have been handled. One such call was added to the Lua
socket handler. It used to work without it probably because the main task is
woken up by the parent task but now it's needed.
This is the equivalent of si_conn_wake() but for applets. It will be
called after changes to the stream interface are brought by the applet
I/O handler. Ultimately it will release buffers and may be even wake
the stream's task up if some important changes are detected.
It would be nice to be able to merge it with the connection's wake
function since it mostly manipulates the stream interface, but there
are minor differences (such as how to enable/disable polling on a fd
vs applet) and some specificities to applets (eg: don't wake the
applet up until the output is empty) which would require abstract
functions which would slow down everything.
The new function is called for each round of polling in order to call any
active appctx. For now we pick the stream interface from the appctx's
owner. At the moment there's no appctx queued yet, but we have everything
needed to queue them and remove them.
Now that applet's functions only take an appctx in argument, not a
stream interface. This slightly simplifies the code and will be needed
to take the appctx out of the stream interface.
Differentiate between DRAIN and DRAIN (agent) when reporting stats.
This is consistent with the distinction made between DOWN and DOWN (agent).
Signed-off-by: Simon Horman <horms@verge.net.au>
There are some similarities between a weight of zero and the
administratively set drain state: both allow existing connections
to continue while not accepting any new ones.
However, when reporting a server state generally a distinction is made
between state=UP,weight=0 and state=DRAIN,weight=*. This patch makes
stats reporting consistent in this regard.
This patch does not alter the behaviour that if a server's weight
is zero then its stats row is blue when accessed via HTML. This remains
the case regardless of if the state is UP or DRAIN.
Signed-off-by: Simon Horman <horms@verge.net.au>
There is a relationship between the state and colour of a server in
stats, however, it is not a one-to-one relationship and the current
implementation has proved fragile.
This patch attempts to address that problem by clearly separating
state and colour.
A follow-up patch will further distinguish between DRAIN states
and DRAINING colours.
Signed-off-by: Simon Horman <horms@verge.net.au>
Add an enumeration to make the handling of the states of servers
in status messages somewhat clearer.
This is the first of a two-step attempt to disentangle the state and
colour of status information. A subsequent patch will separate state
colours from the states themselves.
This patch should not make any functional changes.
Signed-off-by: Simon Horman <horms@verge.net.au>
Commit 350f487 ("CLEANUP: session: simplify references to chn_{prod,cons}(&s->{req,res})")
introduced a regression causing the cli_conn to be picked from the server
side instead of the client side, so the XFF header is not appended anymore
since the connection is NULL.
Thanks to Reinis Rozitis for reporting this bug. No backport is needed
as it's 1.6-specific.
Commit bc4c1ac ("MEDIUM: http/tcp: permit to resume http and tcp custom
actions") introduced the ability to interrupt and restart processing in
the middle of a TCP/HTTP ruleset. But it doesn't do it in a consistent
way : it checks current_rule_list, immediately dereferences current_rule,
which is only set in certain cases and never cleared. So that broke the
tcp-request content rules when the processing was interrupted due to
missing data, because current_rule was not yet set (segfault) or could
have been inherited from another ruleset if it was used in a backend
(random behaviour).
The proper way to do it is to always set current_rule before dereferencing
it. But we don't want to set it for all rules because we don't want any
action to provide a checkpointing mechanism. So current_rule is set to NULL
before entering the loop, and only used if not NULL and if current_rule_list
matches the current list. This way they both serve as a guard for the other
one. This fix also makes the current rule point to the rule instead of its
list element, as it's much easier to manipulate.
No backport is needed, this is 1.6-specific.
We have to allow 32 or 64 processes depending on the machine's word
size, and on 64-bit machines only the first 32 processes were properly
bound.
This fix should be backported to 1.5.
Pavlos Parissis reported that a sequence of disable/enable on a frontend
performed on the CLI can result in an error if the frontend has several
"bind" lines each bound to different processes. This is because the
resume_listener() function returns a failure for frontends not part of
the current process instead of returning a success to pretend there was
no failure.
This fix should be backported to 1.5.
The poll() functions have become a bit dirty because they now check the
size of the signal queue, the FD cache and the number of tasks. It's not
their job, this must be moved to the caller. In the end it simplifies the
code because the expiration date is now set to now_ms if we must not wait,
and this achieves in exactly the same result and is cleaner. The change
looks large due to the change of indent for blocks which were inside an
"if" block.
This patch adds support for error codes 429 and 405 to Haproxy and a
"deny_status XXX" option to "http-request deny" where you can specify which
code is returned with 403 being the default. We really want to do this the
"haproxy way" and hope to have this patch included in the mainline. We'll
be happy address any feedback on how this is implemented.
We don't pass sess->origin anymore but the pointer to the previous step. Now
it should be much easier to chain elements together once applets are moved out
of streams. Indeed, the session is only used for configuration and not for the
dynamic chaining anymore.
It's not the stream's job to manipulate the connection's flags, it's
more related to the session that accepted the new connection. And the
only case where we have to do it conditionally is based on the frontend
which is known from the session, thus it makes sense to do it there.
This patch cretes a new Map class that permits to do some lookup in
HAProxy maps. This Map class is integration in the HAProxy update
system, so we can modify the map throught the socket.
the functions (req|res)_get_headers() return only the last entry
for each header with the same name. This patch fix this behavior.
Each header name contain an array of values.
When the stream is instanciated from an applet, it doesn't necessarily
have a listener. The listener was sparsely used there, just to retrieve
the task function, update the listeners' stats, and set the analysers
and default target, both of which are often zero from applets. Thus
these elements are now initialized with default values that the caller
is free to change if desired.
The auto-forwarding mechanism in case no analyser is set is generic
to the streams. Also the timeouts on the client side are better preset
in the stream initialization as well.
The frontend is generic and does not depend on a file descriptor,
so applying some socket options to the incoming fd is not its role.
Let's move the setsockopt() calls earlier in session_accept_fd()
where others are done as well.
The function was called stream_accept_session(), let's rename it
stream_new() and make it return the newly allocated pointer. It's
more convenient for some callers who need it.
This function was specified as being able to return 3 states, which had
repercussions to the stream accept function. It was used at the time
when the frontend would do the monitoring itself. This is not the case
anymore, so let's simplify this.
Instead of going through some obscure initialization sequences, we now
rely on the stream code to initialize our stream. Some parts are still
a bit tricky as we cannot call the frontend's accept code which is only
made for appctx in input. So part of the initialization past the stream
code is what ought to be in the frontend code instead. Still, even
without this, these are 71 lines that were removed.
In stream_accept_session(), we perform some operations that explicitly
want a connection as the origin, but we'll soon have other types of
origin (eg: applet). Thus change the test to ensure we only call this
code with connections. Additionally, we refrain from calling fe->accept()
if the origin is not a connection, because for now the only fe->accept()
may only use a connection (frontend_accept).
This concerns everythins related to accepting a new session and
expiring the embryonic session. There's still a hard-coded call
to stream_accept_session() which could be set somewhere in the
frontend, but for now it's not a problem.
Now that the previous changes were made, we can add a struct task
pointer to stream_complete() and get rid of it in struct session.
The new relation between connection, session and task are like this :
orig -- sess <-- context
| |
v |
conn -- owner ---> task
Some session-specific parts should now move away from stream.
The function now only initializes a session, calls the tcp req connection
rules, and calls stream_complete() to finish initialization. If a handshake
is needed, it is done without allocating the stream at all.
Temporarily, in order to limit the amount of changes, the task allocated
is put into sess->task, and it is used by the connection for the handshake
or is offered to the stream. At this point we set the relation between
sess/task/conn this way :
orig -- sess <-- context
| ^ +- task -+ |
v | v |
conn -- owner task
The task must not remain in the session and ultimately it is planned to
remove this task pointer from the session because it can be found by
having conn->owner = task, and looping back from sess to conn, and to
find the session from the connection via the task.
Since the tcp-request connection rules don't need the stream anymore, we
can safely move the session-specific stuff earlier and prepare for a split
of session and stream initialization. Some work remains to be done.
This one is not needed anymore since we cannot track the stream counters
prior to reaching these locations. Only session counters may be tracked
and they're properly committed during session_free().
It passes a NULL wherever a stream was needed (acl_exec_cond() and
action_ptr mainly). It can still track the connection rate correctly
and block based on ACLs.
In order to support sessions tracking counters, we first ensure that there
is no overlap between streams' stkctr and sessions', and we allow an
automatic lookup into the session's counters when the stream doesn't
have a counter or when the stream doesn't exist during an access via
a sample fetch. The functions used to update the stream counters only
update them and not the session counters however.
The stick counters in the session will be used for everything not related
to contents, hence the connections / concurrent sessions / etc. They will
be usable by "tcp-request connection" rules even without a stream. For now
they're just allocated and initialized.
Doing so ensures we don't need to use the stream anymore to prepare the
log information to report a failed handshake on an embryonic session.
Thus, prepare_mini_sess_log_prefix() now takes a session in argument.
Now that we have sess->origin to carry that information along, we don't
need to put that into strm->target anymore, so we remove one dependence
on the stream in embryonic connections.
Many such function need a session, and till now they used to dereference
the stream. Once we remove the stream from the embryonic session, this
will not be possible anymore.
So as of now, sample fetch functions will be called with this :
- sess = NULL, strm = NULL : never
- sess = valid, strm = NULL : tcp-req connection
- sess = valid, strm = valid, strm->txn = NULL : tcp-req content
- sess = valid, strm = valid, strm->txn = valid : http-req / http-res
The registerable http_req_rules / http_res_rules used to require a
struct http_txn at the end. It's redundant with struct stream and
propagates very deep into some parts (ie: it was the reason for lua
requiring l7). Let's remove it now.
All of them can now retrieve the HTTP transaction *if it exists* from
the stream and be sure to get NULL there when called with an embryonic
session.
The patch is a bit large because many locations were touched (all fetch
functions had to have their prototype adjusted). The opportunity was
taken to also uniformize the call names (the stream is now always "strm"
instead of "l4") and to fix indent where it was broken. This way when
we later introduce the session here there will be less confusion.
Now this one is dynamically allocated. It means that 280 bytes of memory
are saved per TCP stream, but more importantly that it will become
possible to remove the l7 pointer from fetches and converters since
it will be deduced from the stream and will support being null.
A lot of care was taken because it's easy to forget a test somewhere,
and the previous code used to always trust s->txn for being valid, but
all places seem to have been visited.
All HTTP fetch functions check the txn first so we shouldn't have any
issue there even when called from TCP. When branching from a TCP frontend
to an HTTP backend, the txn is properly allocated at the same time as the
hdr_idx.
This one will not necessarily be allocated for each stream, and we want
to use the fact that it equals null to know it's not present so that we
can always deduce its presence from the stream pointer.
This commit only creates the new pool.
The header captures are now general purpose captures since tcp rules
can use them to capture various contents. That removes a dependency
on http_txn that appeared in some sample fetch functions and in the
order by which captures and http_txn were allocated.
Interestingly the reset of the header captures were done at too many
places as http_init_txn() used to do it while it was done previously
in every call place.
The stream may never be null given that all these functions are called
from sample_process(). Let's remove this now confusing test which
sometimes happens after a dereference was already done.
When s->si[0].end was dereferenced as a connection or anything in
order to retrieve information about the originating session, we'll
now use sess->origin instead so that when we have to chain multiple
streams in HTTP/2, we'll keep accessing the same origin.
Just like for the listener, the frontend is session-wide so let's move
it to the session. There are a lot of places which were changed but the
changes are minimal in fact.
There is now a pointer to the session in the stream, which is NULL
for now. The session pool is created as well. Some parts will move
from the stream to the session now.
With HTTP/2, we'll have to support multiplexed streams. A stream is in
fact the largest part of what we currently call a session, it has buffers,
logs, etc.
In order to catch any error, this commit removes any reference to the
struct session and tries to rename most "session" occurrences in function
names to "stream" and "sess" to "strm" when that's related to a session.
The files stream.{c,h} were added and session.{c,h} removed.
The session will be reintroduced later and a few parts of the stream
will progressively be moved overthere. It will more or less contain
only what we need in an embryonic session.
Sample fetch functions and converters will have to change a bit so
that they'll use an L5 (session) instead of what's currently called
"L4" which is in fact L6 for now.
Once all changes are completed, we should see approximately this :
L7 - http_txn
L6 - stream
L5 - session
L4 - connection | applet
There will be at most one http_txn per stream, and a same session will
possibly be referenced by multiple streams. A connection will point to
a session and to a stream. The session will hold all the information
we need to keep even when we don't yet have a stream.
Some more cleanup is needed because some code was already far from
being clean. The server queue management still refers to sessions at
many places while comments talk about connections. This will have to
be cleaned up once we have a server-side connection pool manager.
Stream flags "SN_*" still need to be renamed, it doesn't seem like
any of them will need to move to the session.
These ones were found in the actions to set the query/path/method/uri.
Where it's used, "s" makes one think about session or something like
this, especially when mixed with http_txn.
hlua_run_sample_fetch() uses "struct hlua_smp *s" which starts to become
confusing when "s->s" is used, then hlua_txn_close() uses this for struct
hlua_txn with the same "s->s" everywhere. Let's uniformize everything with
htxn and hsmp as in other places.
Struct hlua_txn is called "htxn" or "ht" everywhere, while here it's
called "hs" which is the name used everywhere for struct "hlua_smp".
Such confusion participate to the dangers of copy-pasting code, so
let's fix the name here.
Last bug was an example of a side effect of abuse of copy-paste, but
there are other places at risk, so better fix all occurrences of sizeof
to really reference the object size in order to limit the risks in the
future.
In hlua_converters_new(), we used to allocate the size of an hlua_txn
instead of hlua_smp, resulting in random crashes with one integer being
randomly overwritten at the end, even when no converter is being used.
Commit a0dc23f ("MEDIUM: http: implement http-request set-{method,path,query,uri}")
forgot to null-terminate the list, resulting in crashes when these actions
are used if the platform doesn't pad the struct with nulls.
Thanks to Gunay Arslan for reporting a detailed trace showing the
origin of this bug.
No backport to 1.5 is needed.
It's documented that these sample fetch functions should count all headers
and/or all values when called with no name but in practice it's not what is
being done as a missing name causes an immediate return and an absence of
result.
This bug is present in 1.5 as well and must be backported.
This library is designed to emit a zlib-compatible stream with no
memory usage and to favor resource savings over compression ratio.
While zlib requires 256 kB of RAM per compression context (and can only
support 4000 connections per GB of RAM), the stateless compression
offered by libslz does not need to retain buffers between subsequent
calls. In theory this slightly reduces the compression ratio but in
practice it does not have that much of an effect since the zlib
window is limited to 32kB.
Libslz is available at :
http://git.1wt.eu/web?p=libslz.git
It was designed for web compression and provides a lot of savings
over zlib in haproxy. Here are the preliminary results on a single
core of a core2-quad 3.0 GHz in 32-bit for only 300 concurrent
sessions visiting the home page of www.haproxy.org (76 kB) with
the default 16kB buffers :
BW In BW Out BW Saved Ratio memory VSZ/RSS
zlib 237 Mbps 92 Mbps 145 Mbps 2.58 84M / 69M
slz 733 Mbps 380 Mbps 353 Mbps 1.93 5.9M / 4.2M
So while the compression ratio is lower, the bandwidth savings are
much more important due to the significantly lower compression cost
which allows to consume even more data from the servers. In the
example above, zlib became the bottleneck at 24% of the output
bandwidth. Also the difference in memory usage is obvious.
More tests run on a single core of a core i5-3320M, with 500 concurrent
users and the default 16kB buffers :
At 100% CPU (no limit) :
BW In BW Out BW Saved Ratio memory VSZ/RSS hits/s
zlib 480 Mbps 188 Mbps 292 Mbps 2.55 130M / 101M 744
slz 1700 Mbps 810 Mbps 890 Mbps 2.10 23.7M / 9.7M 2382
At 85% CPU (limited) :
BW In BW Out BW Saved Ratio memory VSZ/RSS hits/s
zlib 1240 Mbps 976 Mbps 264 Mbps 1.27 130M / 100M 1738
slz 1600 Mbps 976 Mbps 624 Mbps 1.64 23.7M / 9.7M 2210
The most important benefit really happens when the CPU usage is
limited by "maxcompcpuusage" or the BW limited by "maxcomprate" :
in order to preserve resources, haproxy throttles the compression
ratio until usage is within limits. Since slz is much cheaper, the
average compression ratio is much higher and the input bandwidth
is quite higher for one Gbps output.
Other tests made with some reference files :
BW In BW Out BW Saved Ratio hits/s
daniels.html zlib 1320 Mbps 163 Mbps 1157 Mbps 8.10 1925
slz 3600 Mbps 580 Mbps 3020 Mbps 6.20 5300
tv.com/listing zlib 980 Mbps 124 Mbps 856 Mbps 7.90 310
slz 3300 Mbps 553 Mbps 2747 Mbps 5.97 1100
jquery.min.js zlib 430 Mbps 180 Mbps 250 Mbps 2.39 547
slz 1470 Mbps 764 Mbps 706 Mbps 1.92 1815
bootstrap.min.css zlib 790 Mbps 165 Mbps 625 Mbps 4.79 777
slz 2450 Mbps 650 Mbps 1800 Mbps 3.77 2400
So on top of saving a lot of memory, slz is constantly 2.5-3.5 times
faster than zlib and results in providing more savings for a fixed CPU
usage. For links smaller than 100 Mbps, zlib still provides a better
compression ratio, at the expense of a much higher CPU usage.
Larger input files provide slightly higher bandwidth for both libs, at
the expense of a bit more memory usage for zlib (it converges to 256kB
per connection).
This function used to take a zlib-specific flag as argument to indicate
whether a buffer flush or end of contents was met, let's split it in two
so that we don't depend on zlib anymore.
This algorithm is exactly the same as "deflate" without the zlib wrapper,
and used as an alternative when the browser wants "deflate". All major
browsers understand it and despite violating the standards, it is known
to work better than "deflate", at least on MSIE and some versions of
Safari. Do not use it in conjunction with "deflate", use either one or
the other since both react to the same Accept-Encoding token. Note that
the lack of Adler32 checksum makes it slightly faster.
Thanks to MSIE/IIS, the "deflate" name is ambigous. According to the RFC
it's a zlib-wrapped deflate stream, but IIS used to send only a raw deflate
stream, which is the only format MSIE understands for "deflate". The other
widely used browsers do support both formats. For this reason some people
prefer to emit a raw deflate stream on "deflate" to serve more users even
it that means violating the standards. Haproxy only follows the standard,
so they cannot do this.
This patch makes it possible to have one algorithm name in the configuration
and another one in the protocol. This will make it possible to have a new
configuration token to add a different algorithm so that users can decide if
they want a raw deflate or the standard one.
There's no reason for exporting identity_* nor deflate_*, they're only
used in the same file. Mark them static, it will make it easier to add
other algorithms.
When checking if the buffer is large enough, we used to rely on a fixed
size that was "apparently" enough. We need to consider the expansion
factor of deflate-encoded streams instead, which is of 5 bytes per 32kB.
The previous value was OK till 128kB buffers but became wrong past that.
It's totally harmless since we always keep the reserve when compressiong,
so there's 1kB or so available, which is enough for buffers as large as
6.5 MB, but better fix the check anyway.
This fix could be backported into 1.5 since compression was added there.
Till now we used to rely on a fixed maximum chunk size. Thanks to last
commit we're now free to adjust the chunk's length before sending the
data, so we don't have to use 6 digits all the time anymore, and if
one wants buffers larger than 16 MB it is now possible.
Till now we used to copy the pending outgoing data into the new buffer,
then compute the chunk size, then compress, then fix the chunk size,
then copy the remaining data into the destination buffer. If the
compression would fail for whatever reason (eg: not enough input bytes
to push an extra block), this work still had to be performed for no
added value. It also presents the disadvantage of having to use a fixed
length to encode the chunk size.
Thanks to the body parser changes that went late into 1.5, the buffers
are not modified anymore during these operations. So this patch rearranges
operations so that they're more optimal :
1) init() prepares a new buffer and reserves space in it for pending
outgoing data (no copy) and for chunk size
2) data are compressed
3) only if data were added to the buffer, then the old data are copied
and the chunk size is set.
A few optimisations are still possible to go further :
- decide whether we prefer to copy pending outgoing data from the
old buffer to the new one, or pending incoming compressed data
from the new one to the old one, based on the amount of outgoing
data available. Given that pending outgoing data are rare and the
operation could be complex in the presence of extra input data,
it's probably better to ignore this one ;
- compute the needed length for the chunk size. This would avoid
sending lots of leading zeroes when not needed.
This fixes an issue that occurs when backend servers run on different
addresses or ports and you wish to healthcheck them via a consistent
port. For example, if you allocate backends dynamically as containers
that expose different ports and you use an inetd based healthchecking
component that runs on a dedicated port.
By adding the server address and port to the send-state header, the
healthcheck component can deduce which address and port to check by
reading the X-Haproxy-Server-State header out of the healthcheck and
parsing out the address and port.
Lua supports a memory allocator. This is very important as it's the
only way we can control the amount of memory allocatable by Lua scripts.
That avoids prevents bogus scripts from eating all of the system's memory.
The value can be enforced using tune.lua.maxmem in the global section.
Thispatch adds global log function. Each log message is writed on
the stderr and is sent to the default syslog server. These two
actions are done according the configuration.
This class is accessible via the TXN object. It is created only if
the attached proxy have HTTP mode. It contain all the HTTP
manipulation functions:
- req_get_headers
- req_del_header
- req_rep_header
- req_rep_value
- req_add_header
- req_set_header
- req_set_method
- req_set_path
- req_set_query
- req_set_uri
- res_get_headers
- res_del_header
- res_rep_header
- res_rep_value
- res_add_header
- res_set_header
This function is a callback for HTTP actions. This function
creates the replacement string from a build_logline() format
and transform the header.
This patch split this function in two part. With this modification,
the header transformation and the replacement string are separed.
We can now transform the header with another replacement string
source than a build_logline() format.
The first part is the replacement engine. It take a replacement action
number and a replacement string and process the action.
The second part is the function which is called by the 'http-request
action' to replace a request line part. This function makes the
string used as replacement.
This split permits to use the replacement engine in other parts of the
code than the request action. The Lua use it for his own http action.
These function used an invalid header parser.
- The trailing white-spaces were embedded in the replacement regex,
- The double-quote (") containing comma (,) were not respected.
This patch replace this parser by the "official" parser http_find_header2().
The function http_replace_value use bad variable to detect the end
of the input string.
Regression introduced by the patch "MEDIUM: regex: Remove null
terminated strings." (c9c2daf2)
We need to backport this patch int the 1.5 stable branch.
WT: there is no possibility to overwrite existing data as we only read
past the end of the request buffer, to copy into the trash. The copy
is bounded by buffer_replace2(), just like the replacement performed
by exp_replace(). However if a buffer happens to contain non-zero data
up to the next unmapped page boundary, there's a theorical risk of
crashing the process despite this not being reproducible in tests.
The risk is low because "http-request replace-value" did not work due
to this bug so that probably means it's not used yet.
If the Lua code causes an infinite loop without yield possible, the
clock is not updated. This patch check the clock when the Lua control
code cannot yield.
This bug is introduced by the commit "MEDIUM: http/tcp: permit to
resume http and tcp custom actions" ( bc4c1ac6ad ).
Before this patch, the return code of the function was ignored.
After this path, if the function returns 0, it wats a YIELD.
The function http_action_set_req_line() retunrs 0, in succes case.
This patch changes the return code of this function.
This will be useful later to state that some listeners have to use
certain decoders (typically an HTTP/2 decoder) regardless of the
regular processing applied to other listeners. For now it simply
defaults to the frontend's default target, and it is used by the
session.
Listerner->timeout is a vestigal thing going back to 2007 or so. It
used to only be used by stats and peers frontends to hold a pointer
to the proxy's client timeout. Now that we use regular frontends, we
don't use it anymore.
Some services such as peers and CLI pre-set the target applet immediately
during accept(), and for this reason they're forced to have a dedicated
accept() function which does not even properly follow everything the regular
one does (eg: sndbuf/rcvbuf/linger/nodelay are not set, etc).
Let's store the default target when known into the frontend's config so that
it's session_accept() which automatically sets it.
The peers frontend timeout was mistakenly set on timeout.connect instead
of timeout.client, resulting in no timeout being applied to the peers
connections. The impact is just that peers can establish connections and
remain connected until they speak. Once they start speaking, only one of
them will still be accepted, and old sessions will be killed, so the
problem is limited. This fix should however be backported to 1.5 since
it was introduced in 1.5-dev3 with peers.
When we explicitly write si[0] or si[1], we know whether we're working
with s->req or s->res, so better use that instead of si_ic/si_oc(), to
make the code simpler and more readable.
Some fetches uses a proxy name as first parameter. This is used to identify
frontend, backend or table. This first argument is declared as mandatory,
but the documentation says that is optional.
The behavior of the function smp_resolve_args() is to use the current proxy
if it match the required argument.
This patch implements the same behavior in the Lua argument checker for
sample-fetches and sample-converters.
The default value is stored in a special struct that describe the
map. This default value is parsed with special parser. This is
useless because HAProxy provides a space to store the default
value (the args) and HAProxy provides also standard parser for
the input types (args again).
This patch remove this special storage and replace it by an argument.
In other way the args of maps are declared as the expected type in
place of strings.
The first argument in the stack is 1 and not 0. So the analyzer
starts at bad argument number, and the error message contains
also bad argument number.
It was inappropriate to put this flag on every failed write into an
input buffer because it depends where it happens. When it's in the
context of an analyser (eg: hlua) it makes sense. When it's in the
context of an applet (eg: dumpstats), it does not make sense, and
it only happens to work because currently applets are scheduled by
the sessions. The proper solution for applets would be to add the
flag SI_FL_WAIT_ROOM on the stream interface.
Thus, we now don't set any flag anymore in bi_put* and it's up to the
caller to either set CF_WAKE_WRITE on the channel or SI_FL_WAIT_ROOM
on the stream interface. Changes were applied to hlua, peers and
dumpstats.
It's now called conn_sock_drain() to make it clear that it only reads
at the sock layer and not at the data layer. The function was too big
to remain inlined and it's used at a few places where size counts.
Currently si_idle_conn_null_cb() has to perform some low-level checks
over the file descriptor and the connection configuration that should
only belong to conn_drain(). Let's move these controls there. The
function now automatically checks for errors and hangups on the file
descriptor for example, and disables recv polling if there's no drain
function at the control layer.
This function is an equivalent to send() which operates over a connection
instead of a file descriptor. It checks that the control layer is ready
and that it's allowed to send. If automatically enables polling if it
cannot send. It simplifies the return checks by returning zero in all
cases where it cannot send so that the caller only has to care about
negative values indicating errors.
Now we only rely on the connection to do the job. This makes the code
cleaner as we don't have any references to the transport shutdown
outside of the connection anymore.
Now that the connection performs the correct controls when shutting down,
use that in the few places where conn->xprt->shutw() was called. The calls
were split between conn_data_shutw() and conn_data_shutw_hard() depending
on the argument. Since the connection flags are updated, we don't need to
call conn_data_stop_send() anymore, instead we just have to call
conn_cond_update_polling().
Stop calling shutdown() on the connection's fd. Note, this also seems
to fix a bug which was harmless, but which consisted in not marking
the connection as shutdown at the socket level until the other side
was shut as well.
In stream_sock_read0(), we used to clear this flag. But the only case
where stream_sock_read0() is called is in reaction with a conn_sock_read0()
event coming from the lower layers, which already clears this flag. So let's
remove this duplicate one and clear one of the few remaining layering
violations in this area.
Released version 1.6-dev1 with the following main changes :
- CLEANUP: extract temporary $CFG to eliminate duplication
- CLEANUP: extract temporary $BIN to eliminate duplication
- CLEANUP: extract temporary $PIDFILE to eliminate duplication
- CLEANUP: extract temporary $LOCKFILE to eliminate duplication
- CLEANUP: extract quiet_check() to avoid duplication
- BUG/MINOR: don't start haproxy on reload
- DOC: Address issue where documentation is excluded due to a gitignore rule.
- BUG/MEDIUM: systemd: set KillMode to 'mixed'
- BUILD: fix "make install" to support spaces in the install dirs
- BUG/MINOR: config: http-request replace-header arg typo
- BUG: config: error in http-response replace-header number of arguments
- DOC: missing track-sc* in http-request rules
- BUILD: lua: missing ifdef related to SSL when enabling LUA
- BUG/MEDIUM: regex: fix pcre_study error handling
- MEDIUM: regex: Use pcre_study always when PCRE is used, regardless of JIT
- BUG/MINOR: Fix search for -p argument in systemd wrapper.
- MEDIUM: Improve signal handling in systemd wrapper.
- DOC: fix typo in Unix Socket commands
- BUG/MEDIUM: checks: external checks can't change server status to UP
- BUG/MEDIUM: checks: segfault with external checks in a backend section
- BUG/MINOR: checks: external checks shouldn't wait for timeout to return the result
- BUG/MEDIUM: auth: fix segfault with http-auth and a configuration with an unknown encryption algorithm
- BUG/MEDIUM: config: userlists should ensure that encrypted passwords are supported
- BUG/MINOR: config: don't propagate process binding for dynamic use_backend
- BUG/MINOR: log: fix request flags when keep-alive is enabled
- BUG/MEDIUM: checks: fix conflicts between agent checks and ssl healthchecks
- MINOR: checks: allow external checks in backend sections
- MEDIUM: checks: provide environment variables to the external checks
- MINOR: checks: update dynamic environment variables in external checks
- DOC: checks: environment variables used by "external-check command"
- BUG/MEDIUM: backend: correctly detect the domain when use_domain_only is used
- MINOR: ssl: load certificates in alphabetical order
- BUG/MINOR: checks: prevent http keep-alive with http-check expect
- MINOR: lua: typo in an error message
- MINOR: report the Lua version in -vv
- MINOR: lua: add a compilation error message when compiled with an incompatible version
- BUG/MEDIUM: lua: segfault when calling haproxy sample fetches from lua
- BUILD: try to automatically detect the Lua library name
- BUILD/CLEANUP: systemd: avoid a warning due to mixed code and declaration
- BUG/MEDIUM: backend: Update hash to use unsigned int throughout
- BUG/MEDIUM: connection: fix memory corruption when building a proxy v2 header
- MEDIUM: connection: add new bit in Proxy Protocol V2
- BUG/MINOR: ssl: rejects OCSP response without nextupdate.
- BUG/MEDIUM: ssl: Fix to not serve expired OCSP responses.
- BUG/MINOR: ssl: Fix OCSP resp update fails with the same certificate configured twice.
- BUG/MINOR: ssl: Fix external function in order not to return a pointer on an internal trash buffer.
- MINOR: add fetchs 'ssl_c_der' and 'ssl_f_der' to return DER formatted certs
- MINOR: ssl: add statement to force some ssl options in global.
- BUG/MINOR: ssl: correctly initialize ssl ctx for invalid certificates
- BUG/MEDIUM: ssl: fix bad ssl context init can cause segfault in case of OOM.
- BUG/MINOR: samples: fix unnecessary memcopy converting binary to string.
- MINOR: samples: adds the bytes converter.
- MINOR: samples: adds the field converter.
- MINOR: samples: add the word converter.
- BUG/MINOR: server: move the directive #endif to the end of file
- BUG/MAJOR: buffer: check the space left is enough or not when input data in a buffer is wrapped
- DOC: fix a few typos
- CLEANUP: epoll: epoll_events should be allocated according to global.tune.maxpollevents
- BUG/MINOR: http: fix typo: "401 Unauthorized" => "407 Unauthorized"
- BUG/MINOR: parse: refer curproxy instead of proxy
- BUG/MINOR: parse: check the validity of size string in a more strict way
- BUILD: add new target 'make uninstall' to support uninstalling haproxy from OS
- DOC: expand the docs for the provided stats.
- BUG/MEDIUM: unix: do not unlink() abstract namespace sockets upon failure.
- MEDIUM: ssl: Certificate Transparency support
- MEDIUM: stats: proxied stats admin forms fix
- MEDIUM: http: Compress HTTP responses with status codes 201,202,203 in addition to 200
- BUG/MEDIUM: connection: sanitize PPv2 header length before parsing address information
- MAJOR: namespace: add Linux network namespace support
- MINOR: systemd: Check configuration before start
- BUILD: ssl: handle boringssl in openssl version detection
- BUILD: ssl: disable OCSP when using boringssl
- BUILD: ssl: don't call get_rfc2409_prime when using boringssl
- MINOR: ssl: don't use boringssl's cipher_list
- BUILD: ssl: use OPENSSL_NO_OCSP to detect OCSP support
- MINOR: stats: fix minor typo in HTML page
- MINOR: Also accept SIGHUP/SIGTERM in systemd-wrapper
- MEDIUM: Add support for configurable TLS ticket keys
- DOC: Document the new tls-ticket-keys bind keyword
- DOC: clearly state that the "show sess" output format is not fixed
- MINOR: stats: fix minor typo fix in stats_dump_errors_to_buffer()
- DOC: httplog does not support 'no'
- BUG/MEDIUM: ssl: Fix a memory leak in DHE key exchange
- MINOR: ssl: use SSL_get_ciphers() instead of directly accessing the cipher list.
- BUG/MEDIUM: Consistently use 'check' in process_chk
- MEDIUM: Add external check
- BUG/MEDIUM: Do not set agent health to zero if server is disabled in config
- MEDIUM/BUG: Only explicitly report "DOWN (agent)" if the agent health is zero
- MEDIUM: Remove connect_chk
- MEDIUM: Refactor init_check and move to checks.c
- MEDIUM: Add free_check() helper
- MEDIUM: Move proto and addr fields struct check
- MEDIUM: Attach tcpcheck_rules to check
- MEDIUM: Add parsing of mailers section
- MEDIUM: Allow configuration of email alerts
- MEDIUM: Support sending email alerts
- DOC: Document email alerts
- MINOR: Remove trailing '.' from email alert messages
- MEDIUM: Allow suppression of email alerts by log level
- BUG/MEDIUM: Do not consider an agent check as failed on L7 error
- MINOR: deinit: fix memory leak
- MINOR: http: export the function 'smp_fetch_base32'
- BUG/MEDIUM: http: tarpit timeout is reset
- MINOR: sample: add "json" converter
- BUG/MEDIUM: pattern: don't load more than once a pattern list.
- MINOR: map/acl/dumpstats: remove the "Done." message
- BUG/MAJOR: ns: HAProxy segfault if the cli_conn is not from a network connection
- BUG/MINOR: pattern: error message missing
- BUG/MEDIUM: pattern: some entries are not deleted with case insensitive match
- BUG/MINOR: ARG6 and ARG7 don't fit in a 32 bits word
- MAJOR: poll: only rely on wake_expired_tasks() to compute the wait delay
- MEDIUM: task: call session analyzers if the task is woken by a message.
- MEDIUM: protocol: automatically pick the proto associated to the connection.
- MEDIUM: channel: wake up any request analyzer on response activity
- MINOR: converters: add a "void *private" argument to converters
- MINOR: converters: give the session pointer as converter argument
- MINOR: sample: add private argument to the struct sample_fetch
- MINOR: global: export function and permits to not resolve DNS names
- MINOR: sample: add function for browsing samples.
- MINOR: global: export many symbols.
- MINOR: includes: fix a lot of missing or useless includes
- MEDIUM: tcp: add register keyword system.
- MEDIUM: buffer: make bo_putblk/bo_putstr/bo_putchk return the number of bytes copied.
- MEDIUM: http: change the code returned by the response processing rule functions
- MEDIUM: http/tcp: permit to resume http and tcp custom actions
- MINOR: channel: functions to get data from a buffer without copy
- MEDIUM: lua: lua integration in the build and init system.
- MINOR: lua: add ease functions
- MINOR: lua: add runtime execution context
- MEDIUM: lua: "com" signals
- MINOR: lua: add the configuration directive "lua-load"
- MINOR: lua: core: create "core" class and object
- MINOR: lua: post initialisation bindings
- MEDIUM: lua: add coroutine as tasks.
- MINOR: lua: add sample and args type converters
- MINOR: lua: txn: create class TXN associated with the transaction.
- MINOR: lua: add shared context in the lua stack
- MINOR: lua: txn: import existing sample-fetches in the class TXN
- MINOR: lua: txn: add lua function in TXN that returns an array of http headers
- MINOR: lua: register and execute sample-fetches in LUA
- MINOR: lua: register and execute converters in LUA
- MINOR: lua: add bindings for tcp and http actions
- MINOR: lua: core: add sleep functions
- MEDIUM: lua: socket: add "socket" class for TCP I/O
- MINOR: lua: core: pattern and acl manipulation
- MINOR: lua: channel: add "channel" class
- MINOR: lua: txn: object "txn" provides two objects "channel"
- MINOR: lua: core: can set the nice of the current task
- MINOR: lua: core: can yield an execution stack
- MINOR: lua: txn: add binding for closing the client connection.
- MEDIUM: lua: Lua initialisation "on demand"
- BUG/MAJOR: lua: send function fails and return bad bytes
- MINOR: remove unused declaration.
- MINOR: lua: remove some #define
- MINOR: lua: use bitfield and macro in place of integer and enum
- MINOR: lua: set skeleton for Lua execution expiration
- MEDIUM: lua: each yielding function returns a wake up time.
- MINOR: lua: adds "forced yield" flag
- MEDIUM: lua: interrupt the Lua execution for running other process
- MEDIUM: lua: change the sleep function core
- BUG/MEDIUM: lua: the execution timeout is ignored in yield case
- DOC: lua: Lua configuration documentation
- MINOR: lua: add the struct session in the lua channel struct
- BUG/MINOR: lua: set buffer if it is nnot avalaible.
- BUG/MEDIUM: lua: reset flags before resuming execution
- BUG/MEDIUM: lua: fix infinite loop about channel
- BUG/MEDIUM: lua: the Lua process is not waked up after sending data on requests side
- BUG/MEDIUM: lua: many errors when we try to send data with the channel API
- MEDIUM: lua: use the Lua-5.3 version of the library
- BUG/MAJOR: lua: some function are not yieldable, the forced yield causes errors
- BUG/MEDIUM: lua: can't handle the response bytes
- BUG/MEDIUM: lua: segfault with buffer_replace2
- BUG/MINOR: lua: check buffers before initializing socket
- BUG/MINOR: log: segfault if there are no proxy reference
- BUG/MEDIUM: lua: sockets don't have buffer to write data
- BUG/MEDIUM: lua: cannot connect socket
- BUG/MINOR: lua: sockets receive behavior doesn't follows the specs
- BUG/BUILD: lua: The strict Lua 5.3 version check is not done.
- BUG/MEDIUM: buffer: one byte miss in buffer free space check
- MEDIUM: lua: make the functions hlua_gethlua() and hlua_sethlua() faster
- MINOR: replace the Core object by a simple model.
- MEDIUM: lua: change the objects configuration
- MEDIUM: lua: create a namespace for the fetches
- MINOR: converters: add function to browse converters
- MINOR: lua: wrapper for converters
- MINOR: lua: replace function (req|get)_channel by a variable
- MINOR: lua: fetches and converters can return an empty string in place of nil
- DOC: lua api
- BUG/MEDIUM: sample: fix random number upper-bound
- BUG/MINOR: stats:Fix incorrect printf type.
- BUG/MAJOR: session: revert all the crappy client-side timeout changes
- BUG/MINOR: logs: properly initialize and count log sockets
- BUG/MEDIUM: http: fetch "base" is not compatible with set-header
- BUG/MINOR: counters: do not untrack counters before logging
- BUG/MAJOR: sample: correctly reinitialize sample fetch context before calling sample_process()
- MINOR: stick-table: make stktable_fetch_key() indicate why it failed
- BUG/MEDIUM: counters: fix track-sc* to wait on unstable contents
- BUILD: remove TODO from the spec file and add README
- MINOR: log: make MAX_SYSLOG_LEN overridable at build time
- MEDIUM: log: support a user-configurable max log line length
- DOC: provide an example of how to use ssl_c_sha1
- BUILD: checks: external checker needs signal.h
- BUILD: checks: kill a minor warning on Solaris in external checks
- BUILD: http: fix isdigit & isspace warnings on Solaris
- BUG/MINOR: listener: set the listener's fd to -1 after deletion
- BUG/MEDIUM: unix: failed abstract socket binding is retryable
- MEDIUM: listener: implement a per-protocol pause() function
- MEDIUM: listener: support rebinding during resume()
- BUG/MEDIUM: unix: completely unbind abstract sockets during a pause()
- DOC: explicitly mention the limits of abstract namespace sockets
- DOC: minor fix on {sc,src}_kbytes_{in,out}
- DOC: fix alphabetical sort of converters
- MEDIUM: stick-table: implement lookup from a sample fetch
- MEDIUM: stick-table: add new converters to fetch table data
- MINOR: samples: add two converters for the date format
- BUG/MAJOR: http: correctly rewind the request body after start of forwarding
- DOC: remove references to CPU=native in the README
- DOC: mention that "compression offload" is ignored in defaults section
- DOC: mention that Squid correctly responds 400 to PPv2 header
- BUILD: fix dependencies between config and compat.h
- MINOR: session: export the function 'smp_fetch_sc_stkctr'
- MEDIUM: stick-table: make it easier to register extra data types
- BUG/MINOR: http: base32+src should use the big endian version of base32
- MINOR: sample: allow IP address to cast to binary
- MINOR: sample: add new converters to hash input
- MINOR: sample: allow integers to cast to binary
- BUILD: report commit ID in git versions as well
- CLEANUP: session: move the stick counters declarations to stick_table.h
- MEDIUM: http: add the track-sc* actions to http-request rules
- BUG/MEDIUM: connection: fix proxy v2 header again!
- BUG/MAJOR: tcp: fix a possible busy spinning loop in content track-sc*
- OPTIM/MINOR: proxy: reduce struct proxy by 48 bytes on 64-bit archs
- MINOR: log: add a new field "%lc" to implement a per-frontend log counter
- BUG/MEDIUM: http: fix inverted condition in pat_match_meth()
- BUG/MEDIUM: http: fix improper parsing of HTTP methods for use with ACLs
- BUG/MINOR: pattern: remove useless allocation of unused trash in pat_parse_reg()
- BUG/MEDIUM: acl: correctly compute the output type when a converter is used
- CLEANUP: acl: cleanup some of the redundancy and spaghetti after last fix
- BUG/CRITICAL: http: don't update msg->sov once data start to leave the buffer
- MEDIUM: http: enable header manipulation for 101 responses
- BUG/MEDIUM: config: propagate frontend to backend process binding again.
- MEDIUM: config: properly propagate process binding between proxies
- MEDIUM: config: make the frontends automatically bind to the listeners' processes
- MEDIUM: config: compute the exact bind-process before listener's maxaccept
- MEDIUM: config: only warn if stats are attached to multi-process bind directives
- MEDIUM: config: report it when tcp-request rules are misplaced
- DOC: indicate in the doc that track-sc* can wait if data are missing
- MINOR: config: detect the case where a tcp-request content rule has no inspect-delay
- MEDIUM: systemd-wrapper: support multiple executable versions and names
- BUG/MEDIUM: remove debugging code from systemd-wrapper
- BUG/MEDIUM: http: adjust close mode when switching to backend
- BUG/MINOR: config: don't propagate process binding on fatal errors.
- BUG/MEDIUM: check: rule-less tcp-check must detect connect failures
- BUG/MINOR: tcp-check: report the correct failed step in the status
- DOC: indicate that weight zero is reported as DRAIN
- BUG/MEDIUM: config: avoid skipping disabled proxies
- BUG/MINOR: config: do not accept more track-sc than configured
- BUG/MEDIUM: backend: fix URI hash when a query string is present
- BUG/MEDIUM: http: don't dump debug headers on MSG_ERROR
- BUG/MAJOR: cli: explicitly call cli_release_handler() upon error
- BUG/MEDIUM: tcp: fix outgoing polling based on proxy protocol
- BUILD/MINOR: ssl: de-constify "ciphers" to avoid a warning on openssl-0.9.8
- BUG/MEDIUM: tcp: don't use SO_ORIGINAL_DST on non-AF_INET sockets
- BUG/BUILD: revert accidental change in the makefile from latest SSL fix
- BUG/MEDIUM: ssl: force a full GC in case of memory shortage
- MEDIUM: ssl: add support for smaller SSL records
- MINOR: session: release a few other pools when stopping
- MINOR: task: release the task pool when stopping
- BUG/MINOR: config: don't inherit the default balance algorithm in frontends
- BUG/MAJOR: frontend: initialize capture pointers earlier
- BUG/MINOR: stats: correctly set the request/response analysers
- MAJOR: polling: centralize calls to I/O callbacks
- DOC: fix typo in the body parser documentation for msg.sov
- BUG/MINOR: peers: the buffer size is global.tune.bufsize, not trash.size
- MINOR: sample: add a few basic internal fetches (nbproc, proc, stopping)
- DEBUG: pools: apply poisonning on every allocated pool
- BUG/MAJOR: sessions: unlink session from list on out of memory
- BUG/MEDIUM: patterns: previous fix was incomplete
- BUG/MEDIUM: payload: ensure that a request channel is available
- BUG/MINOR: tcp-check: don't condition data polling on check type
- BUG/MEDIUM: tcp-check: don't rely on random memory contents
- BUG/MEDIUM: tcp-checks: disable quick-ack unless next rule is an expect
- BUG/MINOR: config: fix typo in condition when propagating process binding
- BUG/MEDIUM: config: do not propagate processes between stopped processes
- BUG/MAJOR: stream-int: properly check the memory allocation return
- BUG/MEDIUM: memory: fix freeing logic in pool_gc2()
- BUG/MAJOR: namespaces: conn->target is not necessarily a server
- BUG/MEDIUM: compression: correctly report zlib_mem
- CLEANUP: lists: remove dead code
- CLEANUP: memory: remove dead code
- CLEANUP: memory: replace macros pool_alloc2/pool_free2 with functions
- MINOR: memory: cut pool allocator in 3 layers
- MEDIUM: memory: improve pool_refill_alloc() to pass a refill count
- MINOR: stream-int: retrieve session pointer from stream-int
- MINOR: buffer: reset a buffer in b_reset() and not channel_init()
- MEDIUM: buffer: use b_alloc() to allocate and initialize a buffer
- MINOR: buffer: move buffer initialization after channel initialization
- MINOR: buffer: only use b_free to release buffers
- MEDIUM: buffer: always assign a dummy empty buffer to channels
- MEDIUM: buffer: add a new buf_wanted dummy buffer to report failed allocations
- MEDIUM: channel: do not report full when buf_empty is present on a channel
- MINOR: session: group buffer allocations together
- MINOR: buffer: implement b_alloc_fast()
- MEDIUM: buffer: implement b_alloc_margin()
- MEDIUM: session: implement a basic atomic buffer allocator
- MAJOR: session: implement a wait-queue for sessions who need a buffer
- MAJOR: session: only allocate buffers when needed
- MINOR: stats: report a "waiting" flags for sessions
- MAJOR: session: only wake up as many sessions as available buffers permit
- MINOR: config: implement global setting tune.buffers.reserve
- MINOR: config: implement global setting tune.buffers.limit
- MEDIUM: channel: implement a zero-copy buffer transfer
- MEDIUM: stream-int: support splicing from applets
- OPTIM: stream-int: try to send pending spliced data
- CLEANUP: session: remove session_from_task()
- DOC: add missing entry for log-format and clarify the text
- MINOR: logs: add a new per-proxy "log-tag" directive
- BUG/MEDIUM: http: fix header removal when previous header ends with pure LF
- MINOR: config: extend the default max hostname length to 64 and beyond
- BUG/MEDIUM: channel: fix possible integer overflow on reserved size computation
- BUG/MINOR: channel: compare to_forward with buf->i, not buf->size
- MINOR: channel: add channel_in_transit()
- MEDIUM: channel: make buffer_reserved() use channel_in_transit()
- MEDIUM: channel: make bi_avail() use channel_in_transit()
- BUG/MEDIUM: channel: don't schedule data in transit for leaving until connected
- CLEANUP: channel: rename channel_reserved -> channel_is_rewritable
- MINOR: channel: rename channel_full() to !channel_may_recv()
- MINOR: channel: rename buffer_reserved() to channel_reserved()
- MINOR: channel: rename buffer_max_len() to channel_recv_limit()
- MINOR: channel: rename bi_avail() to channel_recv_max()
- MINOR: channel: rename bi_erase() to channel_truncate()
- BUG/MAJOR: log: don't try to emit a log if no logger is set
- MINOR: tools: add new round_2dig() function to round integers
- MINOR: global: always export some SSL-specific metrics
- MINOR: global: report information about the cost of SSL connections
- MAJOR: init: automatically set maxconn and/or maxsslconn when possible
- MINOR: http: add a new fetch "query" to extract the request's query string
- MINOR: hash: add new function hash_crc32
- MINOR: samples: provide a "crc32" converter
- MEDIUM: backend: add the crc32 hash algorithm for load balancing
- BUG/MINOR: args: add missing entry for ARGT_MAP in arg_type_names
- BUG/MEDIUM: http: make http-request set-header compute the string before removal
- MEDIUM: args: use #define to specify the number of bits used by arg types and counts
- MEDIUM: args: increase arg type to 5 bits and limit arg count to 5
- MINOR: args: add type-specific flags for each arg in a list
- MINOR: args: implement a new arg type for regex : ARGT_REG
- MEDIUM: regex: add support for passing regex flags to regex_exec_match()
- MEDIUM: samples: add a regsub converter to perform regex-based transformations
- BUG/MINOR: sample: fix case sensitivity for the regsub converter
- MEDIUM: http: implement http-request set-{method,path,query,uri}
- DOC: fix missing closing brackend on regsub
- MEDIUM: samples: provide basic arithmetic and bitwise operators
- MEDIUM: init: continue to enforce SYSTEM_MAXCONN with auto settings if set
- BUG/MINOR: http: fix incorrect header value offset in replace-hdr/replace-value
- BUG/MINOR: http: abort request processing on filter failure
- MEDIUM: tcp: implement tcp-ut bind option to set TCP_USER_TIMEOUT
- MINOR: ssl/server: add the "no-ssl-reuse" server option
- BUG/MAJOR: peers: initialize s->buffer_wait when creating the session
- MINOR: http: add a new function to iterate over each header line
- MINOR: http: add the new sample fetches req.hdr_names and res.hdr_names
- MEDIUM: task: always ensure that the run queue is consistent
- BUILD: Makefile: add -Wdeclaration-after-statement
- BUILD/CLEANUP: ssl: avoid a warning due to mixed code and declaration
- BUILD/CLEANUP: config: silent 3 warnings about mixed declarations with code
- MEDIUM: protocol: use a family array to index the protocol handlers
- BUILD: lua: cleanup many mixed occurrences declarations & code
- BUG/MEDIUM: task: fix recently introduced scheduler skew
- BUG/MINOR: lua: report the correct function name in an error message
- BUG/MAJOR: http: fix stats regression consecutive to HTTP_RULE_RES_YIELD
- Revert "BUG/MEDIUM: lua: can't handle the response bytes"
- MINOR: lua: convert IP addresses to type string
- CLEANUP: lua: use the same function names in C and Lua
- REORG/MAJOR: move session's req and resp channels back into the session
- CLEANUP: remove now unused channel pool
- REORG/MEDIUM: stream-int: introduce si_ic/si_oc to access channels
- MEDIUM: stream-int: add a flag indicating which side the SI is on
- MAJOR: stream-int: only rely on SI_FL_ISBACK to find the requested channel
- MEDIUM: stream-interface: remove now unused pointers to channels
- MEDIUM: stream-int: make si_sess() use the stream int's side
- MEDIUM: stream-int: use si_task() to retrieve the task from the stream int
- MEDIUM: stream-int: remove any reference to the owner
- CLEANUP: stream-int: add si_ib/si_ob to dereference the buffers
- CLEANUP: stream-int: add si_opposite() to find the other stream interface
- REORG/MEDIUM: channel: only use chn_prod / chn_cons to find stream-interfaces
- MEDIUM: channel: add a new flag "CF_ISRESP" for the response channel
- MAJOR: channel: only rely on the new CF_ISRESP flag to find the SI
- MEDIUM: channel: remove now unused ->prod and ->cons pointers
- CLEANUP: session: simplify references to chn_{prod,cons}(&s->{req,res})
- CLEANUP: session: use local variables to access channels / stream ints
- CLEANUP: session: don't needlessly pass a pointer to the stream-int
- CLEANUP: session: don't use si_{ic,oc} when we know the session.
- CLEANUP: stream-int: limit usage of si_ic/si_oc
- CLEANUP: lua: limit usage of si_ic/si_oc
- MINOR: channel: add chn_sess() helper to retrieve session from channel
- MEDIUM: session: simplify receive buffer allocator to only use the channel
- MEDIUM: lua: use CF_ISRESP to detect the channel's side
- CLEANUP: lua: remove the session pointer from hlua_channel
- CLEANUP: lua: hlua_channel_new() doesn't need the pointer to the session anymore
- MEDIUM: lua: remove struct hlua_channel
- MEDIUM: lua: remove hlua_sample_fetch
Patch for not using relative URLs for admin forms. This allows proxied instances
of the stats admin interface to continue to function, as the default FORM
action is to submit to the current URL.
Adds ability to include Signed Certificate Timestamp List in TLS
extension. File containing SCTL must be present at the same path of
the certificate file, suffixed with '.sctl'. This requires OpenSSL
1.0.2 or later.
It is common for rest applications to return status codes other than
200, so compress the other common 200 level responses which might
contain content.
This struct used to carry only a sample fetch function. Thanks to
lua_pushuserdata(), we don't need to have the Lua engine allocate
that struct for us and we can simply push our pointer onto the stack.
This makes the code more readable by removing several occurrences of
"f->f->". Just like the previous patch, it comes with the nice effect
of saving about 1.3% of performance when fetching samples from Lua.
Since last cleanups, this one was only used to carry a struct channel.
Removing it makes the code a bit cleaner (no more chn->chn) and easier
to follow (no more abstraction for a common type). Interestingly it
happens to also make the Lua code slightly faster (about 1.5%) when
using channels, probably thanks to less pointer dereferences and maybe
the use of lua_pushlightuserdata().
Now that we can get the session from the channel, let's simplify the
prototype of session_alloc_recv_buffer() to only require the channel.
Both the caller and the function are now simplified.
All functions dealing with connection establishment currently use a
pointer to the stream interface. Now we know it cannot change and is
always s->si[1].
In process_session, we had around 300 accesses to channels and stream-ints
from the session. Not only this inflates the code due to the large offsets
from the original pointer, but readability can be improved. Let's have 4
local variables for the channels and stream-ints.
These 4 combinations are needlessly complicated since the session already
has direct access to the associated stream interfaces without having to
check an indirect pointer.
The purpose of these two macros will be to pass via the session to
find the relevant stream interfaces so that we don't need to store
the ->cons nor ->prod pointers anymore. Currently they're only defined
so that all references could be removed.
Note that many places need a second pass of clean up so that we don't
have any chn_prod(&s->req) anymore and only &s->si[0] instead, and
conversely for the 3 other cases.
At a few places we need to find one stream interface from the other one.
Instead of passing via the channel, we simply use the session as an
intermediary, which simply results in applying an offset to the pointer.
We go back to the session to get the owner. Here again it's very easy
and is just a matter of relative offsets. Since the owner always exists
and always points to the session's task, we can remove some unneeded
tests.
This new flag "SI_FL_ISBACK" is set only on the back SI and is cleared
on the front SI. That way it's possible only by looking at the SI to
know what side it is.
We'll soon remove direct references to the channels from the stream
interface since everything belongs to the same session, so let's
first not dereference si->ib / si->ob anymore and use macros instead.
The channels were pointers to outside structs and this is not needed
anymore since the buffers have moved, but this complicates operations.
Move them back into the session so that both channels and stream interfaces
are always allocated for a session. Some places (some early sample fetch
functions) used to validate that a channel was NULL prior to dereferencing
it. Now instead we check if chn->buf is NULL and we force it to remain NULL
until the channel is initialized.
In some cases we don't want to known if a fetch or converter
fails. We just want a valid string. After this patch, we
have two sets of fetches and two sets of converters. There are:
txn.f, txn.sf, txn.c, txn.sc. The version prefixed by 's' always
returns strings for any type, and returns an empty string in the
error case or when the data are not available. This is particularly
useful when manipulating headers or cookies.
To add data in channel, it is necessary to process in two times.
First time, get the channel object, and after send data:
local req = txn:req_channel()
req:send("data\n")
Now, the function is converted as a variable containing the req
and res aobject. We can process as following:
txn.req:send("data\n")
This patch implements a wrapper to give access to the converters
in the Lua code. The converters are used with the transaction.
The automatically created function are prefixed by "conv_".
HAProxy proposes many sample fetches. It is possible that the
automatic registration of the sample fetches causes a collision
with an existing Lua function. This patch sets a namespace for
the sample fetches.
Actually an object is just a userdata value with a metatable.
This mode causes some problems like I can't add lua own data.
This new model uses an array as object base, and affect the
userdata at the index 0.
The core entry is just a collection of function, it doesn't depends on
special variable. This patch just converts an object contained in a
metatable in object contained in a normal table.
A few function names in Lua had underscores which did not appear in their
C counterpart. Since almost all of them already had similar names, better
uniformize the naming convention.
For now we don't perform any operation on IP addresses, so at least
we'd like to be able to pass them as strings so that we can log them
or whatever in Lua. Without this patch txn.src(txn) returns "nil" and
now it returns the correct IP address.
Lua 5.3 provides an opaque space associated with each
coroutine stack. This patch uses this lot of memory to
store the "struct hlua *" associated pointer.
This patch makes the retrieval of the "struct hlua *"
associated struct faster because it replace a lookup in
a tree by an immediate access to the data.
This reverts commit cd9084f776.
This commit introduced a regression making it impossible to leave
process_session() during a forced yield because the analyser was always
set on the response even if not needed. The result was a busy loop
making haproxy spin at 100% without even polling anymore in case a
forced yield was performed.
The problem it tried to address (intercept response data from a request
analyser before forwarding) is not a trivial issue to address since
wakeups based on reads will not necessarily happen unless there's write
activity.
Anyway, if functions are attached specifically to a request or to a
response, it's for a reason. So for now let's be clear about the fact
that it's unreliable to try to process data from the opposite channel
until a better solution is found.
Commit bc4c1ac ("MEDIUM: http/tcp: permit to resume http and tcp custom actions")
unfortunately broke the stats applet by moving the clearing of the analyser bit
after processing the applet headers. It used to work only in HTTP/1.1 and not
in HTTP/1.0. This is 1.6-specific, no backport is needed.
Space is not avalaible only if the end of the data inserted
is strictly greater than the end of buffer. If these two value
are equal, the space is avamaible.
This patch fix the Lua library check. Only the version
5.3 or later is allowed.
This bug is added by the patch "MEDIUM: lua: use the
Lua-5.3 version of the library" with commit id
f90838b71a
Specs says that the receive() function with an argument "*l"
must return a line without the final "\n" ( or without "\r\n").
This patch removes these two final bytes.
When we try to write data in a session from another session, the "req"
buffer is not allowed. This patch try to allocate the buffer. The session
wait if the buffer is not yet avalaible.
The HAProxy API allow to send log without defined proxy (it
set to the NULL value). An incomplete test if done to choose
the log tag and an invalid pointer is dereferenced.
When a socket is initilized in the body context, a segfaut is generated
because the memory pools are not initilized. This atch check if these
memory pool are initialized.
The function buffer_contig_space() returns the contiguous space avalaible
to add data (at the end of the input side) while the function
hlua_channel_send_yield() needs to insert data starting at p. Here we
introduce a new function bi_space_for_replace() which returns the amount
of space that can be inserted at the head of the input side with one of
the buffer_replace* functions.
This patch proposes a function that returns the space avalaible after buf->p.
The request action can't handle the reponse trafic because its
automatically forwarded. The forard with CHN_INFINITE_FORWARD
is set because any anamizers are registered on the response
channel.
This patch automatically register the request analyzer on the
reponse channel when its yield. This prevent the automatic
tranfer of the response bytes.
The hook function called each nth Lua instructions just
do a yield. Sometimes this yield is not allowed, because
some functions are not compatible.
The Lua-5.3 permits to known if the yield is avalaible with
the function lua_isyieldable().
If the processing is interrupted in non yieldable state,
we try to execute a yield asap. The yield will be may
available at the end of the non-yieldable currently
executed function. So, we require interrupt at the end
of the current function.
But, Lua cannot yield when its returning from a function,
so, we can fix the interrupt hook to 1 instruction,
expecting that the function is finnished.
During this time, the execution timeout is always checked.
The Lua-5.3 version of the library adds a required function to fix
a bug with the forced-yield system.
This patch permits to build with the Lua-5.3 library. Main changes
are:
- "unsigned" type disappear to be replaced by signed type,
- prototype of the yield function callback changes.
First we allow to use the reserved size to write the data that
will be sent. The reserved size remain guaranty because the
writed data will be sent quickly and the reserved room we be
again avalaible.
This permits to guaranty that the function send always have
avalaible space to send data (except if it cannot connect to
the server).
The function buffer_replace2 works only on contiguous buffer.
This patch also detects if the required size is contiguous.
If it not the case we realign the buffer.
If we are writing in the request buffer, we are not waked up
when the data are forwarded because it is useles. The request
analyzers are waked up only when data is incoming. So, if the
request buffer is full, we set the WAKE_ON_WRITE flag.
Before this patch, each yield in a Lua action set a flags to be
waked up when some activity were detected on the response channel.
This behavior causes loop in the analyzer process.
This patch set the wake up on response buffer activity only if we
really want to be waked up on this activity.
This patch reset the flags except th run flag before resuming
the Lua execution. If this initialisation is not done, some
flags can remain at the end of the Lua execution and give bad
informations.
Check if the buffer is avalaible because HAProxy doesn't allocate
the request buffer if its not required. Sometimes, the Lua need to
write some data to the server before that the client send his data.
This patch check the expiration of the execution timeout for each
yield source, and test also the execution timeout before resuming
the execution of the Lua code.
Commit 501260b ("MEDIUM: task: always ensure that the run queue is
consistent") introduced a skew in the scheduler : if a negatively niced
task is woken up, it can be inserted prior to the current index and will
be skipped as long as there is some activity with less prioritary tasks.
The immediate effect is that it's not possible to get access to the stats
under full load until the load goes down.
This is because the rq_next constantly evolves within more recent
positions. The fix is simple, __task_wakeup() must empty rq_next. The
sad thing is that this issue was fixed during development and missed
during the commit. No backport is needed, this is purely 1.6 stuff.
This patch permits to interrupt the Lua execution each n# of
instructions. It is useful for controling the execution time
of an Lua code, and permits at the HAProxy schedule to process
some others tasks waiting execution.
This flag indicate that the current yield is returned by the Lua
execution task control. If this flag is set, the current task may
quit but will be set in the run queue to be re-executed immediatly.
This patch modify the "hlua_yieldk()" function, it adds an argument
that contain a field containing yield options.
This is used to ensure that the task doesn't become a zombie
when the Lua returns a yield. The yield wrapper ensure that an
timer used for waking up the task will be set.
The timer is reseted to TICK_ETERNITY if the Lua execution is
done.
This first patch permits to cofigure the Lua execution exipiration.
This expiration is configured but it is not yet avalaible, it will
be add in a future patch.
In the future, the lua execution must return scheduling informations.
We want more than one flag, so I convert an integer used with an
enum into an interer used as bitfield.
In some cases the Lua "send" function fails. This is caused by the
return of "buffer_replace2()" is not tested. I checked avalaible space
in the buffer and I supposed than "buffer_replace2()" took all the
data. In some cases, "buffer_replace2()" cannot take the incoming
data, it returns the amount of data copied.
This patch check the amount of data really copied by "buffer_replace2()"
and advance the buffer with taking this value in account.
Gcc complains because the systemd wrapper mixed code and declaration :
"warning: ISO C90 forbids mixed declarations and code
[-Wdeclaration-after-statement]".
When a Lua script calls an internal haproxy sample fetch, it may segfault in
some conditions :
- when a fetch has no argument,
- when there is no room left to store the special type ARGT_STOP in the argument
list (this one shouldn't happen currently as there isn't any sample fetch with
enough arguments to fill the allocated buffer).
Example of Lua code which reproduces a segfault :
core.register_fetches("segfault", function(txn, ...)
return txn.req_ver(txn)
end)
haproxy Lua support begins with Lua 5.2. In order to ease the diagnostic on
compilation error, a preprocessor error is added when an incompatible version
is used.
The compatibility is determined by the presence of LUA_VERSION_NUM and its
magic value (502 for Lua 5.2.x).
As of the other libraries used by haproxy, it can be useful to display the Lua
version used at compilation time.
A new line is added to "haproxy -vv", which shows if Lua is supported by the
binary, and with which version it was compiled.
Actually, the Lua context is always initilized in each
session, even if the session doesn't use Lua. This
behavior cause 5% performances loss.
This patch initilize the Lua only if it is use by the
session. The initialization is now on demand.
The code was a bit of a pain to follow because of this, especially
when some of it heavily relies on longjmp... This refreshing makes
it slightly easier to read.
This patch provides a yield function. This function permits to
give back the hand at the HAProxy scheduler. It is used when the
lua processing consumes a lot of time.
The channel class permits manipulation of channels. A channel is
an FIFO buffer between the client and the server. This class provides
function to read, write, forward, destroy and alter data between
the input and the ouput of the buffer.
This patch adds the TCP I/O functionnality. The class implemented
provides the same functions than the "lua socket" project. This
make network compatibility with another LUA project. The documentation
is located here:
http://w3.impa.br/~diego/software/luasocket/tcp.html
This version of sleep is based on a coroutine. A sleeping
task is started and a signal is registered. This sleep version
must disapear to be replaced by a version using the internal
timers.
This patch permits to write LUA converters. Note that
all the converters declared trough LUA are automatically
prefixed by "lua.".
The Lua converters needs the current session to be executed,
but the writed and executed Lua code must be static and
contextless. The TXN object is not created for the converters.
This patch adds the browsing of all the HAProxy fetches and
create associated LUA functions. The HAProxy internal fetches
can be used in LUA trough the class "TXN".
Note that the symbols "-", "+" and "." in the name of current
sample fetch are rewrited as "_" in LUA because ".", "-" and "+"
are operators.
The shared context is a global reference dedicated to one Lua stack.
Even if the TXN is not required, the shared context it is associated
with a TXN object. In fact, the shared has no sense outside a TXN
context
This class of functions permit to access to all the functions
associated with the transaction like http header, HAProxy internal
fetches, etc ...
This patch puts the skeleton of this class. The class will be
enhanced later.
This system permits to execute some lua function after than HAProxy
complete his initialisation. These functions are executed between
the end of the configuration parsing and check and the begin of the
scheduler.
This system permits to send signals between lua tasks. A main lua stack can
register the signal in a coprocess. When the coprocess finish his job, it
send a signal, and the associated task is wakes. If the main lua execution
stack stop (with or without errors), the list or pending signals is purged.
This is the first step of the lua integration. We add the useful
files in the HAProxy project. These files contains the main
includes, the Makefile options and empty initialisation function.
Is is the LUA skeleton.
We now have functions to retrieve one block and one line from
either the input or the output part of a buffer. They return
up to two (pointer,length) values in case the buffer wraps.
Later, the processing of some actions needs to be interrupted and resumed
later. This patch permit to resume the actions. The actions that needs
to run with the resume mode are not yet avalaible. It will be soon with
Lua patches. So the code added by this patch is untestable for the moment.
The list of "tcp_exec_req_rules" cannot resme because is called by the
unresumable function "accept_session".
Actually, this function returns a pointer on the rule that stop
the evaluation of the rule list. Later we integrate the possibility
of interrupt and resue the processsing of some actions. The current
response mode is not sufficient to returns the "interrupt" information.
The pointer returned is never used, so I change the return type of
this function by an enum. With this enum, the function is ready to
return the "interupt" value.
This patch introduces an action keyword registration system for TCP
rulesets similar to what is available for HTTP rulesets. This sytem
will be useful with lua.
These modifications are done for resolving cross-dependent
includes in the upcoming LUA code.
<proto/channel.h> misses <types/channel.h>.
<types/acl.h> doesn't use <types/session.h> because the session
is already declared in the file as undefined pointer.
appsession.c misses <unistd.h> to use "write()".
Declare undefined pointer "struct session" for <types/proxy.h>
and <types/queue.h>. These includes dont need the detail of this
struct.
The functions "val_payload_lv" and "val_hdr" are useful with
lua. The lua automatic binding for sample fetchs needs to
compare check functions.
The "arg_type_names" permit to display error messages.
Some usages of the converters need to know the attached session. The Lua
needs the session for retrieving his running context. This patch adds
the "session" as an argument of the converters prototype.
This behavior is already existing for the "WAIT_HTTP" analyzer,
this patch just extends the system to any analyzer that would
be waked up on response activity.
When the destination IP is dynamically set, we can't use the "target"
to define the proto. This patch ensures that we always use the protocol
associated with the address family. The proto field was removed from
the server and check structs.
Actually, HAProxy uses the function "process_runnable_tasks" and
"wake_expired_tasks" to get the next task which can expires.
If a task is added with "task_schedule" or other method during
the execution of an other task, the expiration of this new task
is not taken into account, and the execution of this task can be
too late.
Actualy, HAProxy seems to be no sensitive to this bug.
This fix moves the call to process_runnable_tasks() before the timeout
calculation and ensures that all wakeups are processed together. Only
wake_expired_tasks() needs to return a timeout now.
Until now, the TLS ticket keys couldn't have been configured and
shared between multiple instances or multiple servers running HAproxy.
The result was that if a request got a TLS ticket from one instance/server
and it hits another one afterwards, it will have to go through the full
SSL handshake and negotation.
This patch enables adding a ticket file to the bind line, which will be
used for all SSL contexts created from that bind line. We can use the
same file on all instances or servers to mitigate this issue and have
consistent TLS tickets assigned. Clients will no longer have to negotiate
every time they change the handling process.
Signed-off-by: Nenad Merdanovic <nmerdan@anine.io>
As failure to connect to the agent check is not sufficient to mark it as
failed it stands to reason that an L7 error shouldn't either.
Without this fix if an L7 error occurs, for example of connectivity to the
agent is lost immediately after establishing a connection to it, then the
agent check will be considered to have failed and thus may end up with zero
health. Once this has occurred if the primary health check also reaches
zero health, which is likely if connectivity to the server is lost, then
the server will be marked as down and not be marked as up again until a
successful agent check occurs regardless of the success of any primary
health checks.
This behaviour is not correct as a failed agent check should never cause a
server to be marked as down or by extension continue to be marked as down.
Signed-off-by: Simon Horman <horms@verge.net.au>
As found by Thierry Fournier, if a task manages to kill another one and
if this other task is the next one in the run queue, we can do whatever
including crashing, because the scheduler restarts from the saved next
task. For now, there is no such concept of a task killing another one,
but with Lua it will come.
A solution consists in always performing the lookup of the first task in
the scheduler's loop, but it's expensive and costs around 2% of the
performance.
Another solution consists in keeping a global next run queue node and
ensuring that when this task gets removed, it updates this pointer to
the next one. This allows to simplify the code a bit and in the end to
slightly increase the performance (0.3-0.5%). The mechanism might still
be usable if we later migrate to a multi-threaded scheduler.
These new sample fetches retrieve the list of header names as they appear
in the request or response. This can be used for debugging, for statistics
as well as an aid to better detect the presence of proxies or plugins on
some browsers, which alter the request compared to a regular browser by
adding or reordering headers.
Commit bf883e0 ("MAJOR: session: implement a wait-queue for sessions who
need a buffer") introduced in 1.6 forgot to initialize the buffer_wait
list when the session is initiated by an applet for a peer, resulting in
a crash. Thanks to Chris Kopp for reporting the issue.
ACL or map entries are not deleted with the command "del acl" or "del map"
if the case insentive flag is set.
This is because the the case insensitive string are stored in a list and the
default delete function associated with string looks in a tree. I add a check
of the case insensitive flag and execute the delete function for lists if it
is set.
This patch must be backported in 1.5 version.
This option disables SSL session reuse when SSL is used to communicate with
the server. It will force the server to perform a full handshake for every
new connection. It's probably only useful for benchmarking, troubleshooting,
and for paranoid users.
This patch adds a new option which allows configuration of the maximum
log level of messages for which email alerts will be sent.
The default is alert which is more restrictive than
the current code which sends email alerts for all priorities.
That behaviour may be configured using the new configuration
option to set the maximum level to notice or greater.
email-alert level notice
Signed-off-by: Simon Horman <horms@verge.net.au>
This removes the trailing '.' from both the header and the body of email
alerts.
The main motivation for this change is to make the format of email alerts
generated from srv_set_stopped() consistent with those generated from
set_server_check_status().
Signed-off-by: Simon Horman <horms@verge.net.au>
On Linux since 2.6.37, it's possible to set the socket timeout for
pending outgoing data, with an accuracy of 1 millisecond. This is
pretty handy to deal with dead connections to clients and or servers.
For now we only implement it on the frontend side (bind line) so
that when a client disappears from the net, we're able to quickly
get rid of its connection and possibly release a server connection.
This can be useful with long-lived connections where an application
level timeout is not suited because long pauses are expected (remote
terminals, connection pools, etc).
Thanks to Thijs Houtenbos and John Eckersberg for the suggestion.
This currently does nothing beyond parsing the configuration
and storing in the proxy as there is no implementation of email alerts.
Signed-off-by: Simon Horman <horms@verge.net.au>
As mailer and mailers structures and allow parsing of
a mailers section into those structures.
These structures will subsequently be freed as it is
not yet possible to use reference them in the configuration.
Signed-off-by: Simon Horman <horms@verge.net.au>
The motivation for this is to make checks more independent of each
other to allow further reuse of their infrastructure.
For nowserver->check and server->agent still always use the same values
for the addr and proto fields so this patch should not introduce any
behavioural changes.
Signed-off-by: Simon Horman <horms@verge.net.au>
Refactor init_check so that an error string is returned
rather than alerts being printed by it. Also
init_check to checks.c and provide a prototype to allow
it to be used from multiple C files.
Signed-off-by: Simon Horman <horms@verge.net.au>
Remove connect_chk and instead call connect_proc_chk()
and connect_conn_chk(). There no longer seems to be any
value in having a wrapper function here.
Signed-off-by: Simon Horman <horms@verge.net.au>
The value is defined in include/types/global.h to be an unsigned int.
The type format in the printf is for a signed int. This eventually wraps
around.
WT: This bug was introduced in 1.5.
Commit c600204 ("BUG/MEDIUM: regex: fix risk of buffer overrun in
exp_replace()") added a control of failure on the response headers,
but forgot to check for the error during request processing. So if
the filters fail to apply, we could keep the request. It might
cause some headers to silently fail to be added for example. Note
that it's tagged MINOR because a standard configuration cannot make
this case happen.
The fix should be backported to 1.5 and 1.4 though.
Sébastien Rohaut reported that string negation in http-check expect didn't
work as expected.
The misbehaviour is caused by responses with HTTP keep-alive. When the
condition is not met, haproxy awaits more data until the buffer is full or the
connection is closed, resulting in a check timeout when "timeout check" is
lower than the keep-alive timeout on the server side.
In order to avoid the issue, when a "http-check expect" is used, haproxy will
ask the server to disable keep-alive by automatically appending a
"Connection: close" header to the request.
The two http-req/http-resp actions "replace-hdr" and "replace-value" were
expecting exactly one space after the colon, which is wrong. It was causing
the first char not to be seen/modified when no space was present, and empty
headers not to be modified either. Instead of using name->len+2, we must use
ctx->val which points to the first character of the value even if there is
no value.
This fix must be backported into 1.5.
Commit d025648 ("MAJOR: init: automatically set maxconn and/or maxsslconn
when possible") resulted in a case where if enough memory is available,
a maxconn value larger than SYSTEM_MAXCONN could be computed, resulting
in possibly overflowing other systems resources (eg: kernel socket buffers,
conntrack entries, etc). Let's bound any automatic maxconn to SYSTEM_MAXCONN
if it is defined. Note that the value is set to DEFAULT_MAXCONN since
SYSTEM_MAXCONN forces DEFAULT_MAXCONN, thus it is not an error.
If a stick table is defined as below:
stick-table type ip size 50ka expire 300s
HAProxy will stop parsing size after passing through "50k" and return the value
directly. But such format string of size should not be valid. The patch checks
the next character to report error if any.
Signed-off-by: Godbach <nylzhaowei@gmail.com>
This commit introduces a new category of converters. They are bitwise and
arithmetic operators which support performing basic operations on integers.
Some bitwise operations are supported (and, or, xor, cpl) and some arithmetic
operations are supported (add, sub, mul, div, mod, neg). Some comparators
are provided (odd, even, not, bool) which make it possible to report a match
without having to write an ACL.
The detailed list of new operators as they appear in the doc is :
add(<value>)
Adds <value> to the input value of type unsigned integer, and returns the
result as an unsigned integer.
and(<value>)
Performs a bitwise "AND" between <value> and the input value of type unsigned
integer, and returns the result as an unsigned integer.
bool
Returns a boolean TRUE if the input value of type unsigned integer is
non-null, otherwise returns FALSE. Used in conjunction with and(), it can be
used to report true/false for bit testing on input values (eg: verify the
presence of a flag).
cpl
Takes the input value of type unsigned integer, applies a twos-complement
(flips all bits) and returns the result as an unsigned integer.
div(<value>)
Divides the input value of type unsigned integer by <value>, and returns the
result as an unsigned integer. If <value> is null, the largest unsigned
integer is returned (typically 2^32-1).
even
Returns a boolean TRUE if the input value of type unsigned integer is even
otherwise returns FALSE. It is functionally equivalent to "not,and(1),bool".
mod(<value>)
Divides the input value of type unsigned integer by <value>, and returns the
remainder as an unsigned integer. If <value> is null, then zero is returned.
mul(<value>)
Multiplies the input value of type unsigned integer by <value>, and returns
the product as an unsigned integer. In case of overflow, the higher bits are
lost, leading to seemingly strange values.
neg
Takes the input value of type unsigned integer, computes the opposite value,
and returns the remainder as an unsigned integer. 0 is identity. This
operator is provided for reversed subtracts : in order to subtract the input
from a constant, simply perform a "neg,add(value)".
not
Returns a boolean FALSE if the input value of type unsigned integer is
non-null, otherwise returns TRUE. Used in conjunction with and(), it can be
used to report true/false for bit testing on input values (eg: verify the
absence of a flag).
odd
Returns a boolean TRUE if the input value of type unsigned integer is odd
otherwise returns FALSE. It is functionally equivalent to "and(1),bool".
or(<value>)
Performs a bitwise "OR" between <value> and the input value of type unsigned
integer, and returns the result as an unsigned integer.
sub(<value>)
Subtracts <value> from the input value of type unsigned integer, and returns
the result as an unsigned integer. Note: in order to subtract the input from
a constant, simply perform a "neg,add(value)".
xor(<value>)
Performs a bitwise "XOR" (exclusive OR) between <value> and the input value
of type unsigned integer, and returns the result as an unsigned integer.
As reported by Raphaël Enrici, certificates loaded from a directory are loaded
in a non predictive order. If no certificate was first loaded from a file, it
can result in different behaviours when haproxy is used in cluster.
We can also imagine other cases which weren't met yet.
Instead of using readdir(), we can use scandir() and sort files alphabetically.
This will ensure a predictive behaviour.
This patch should also be backported to 1.5.
This commit implements the following new actions :
- "set-method" rewrites the request method with the result of the
evaluation of format string <fmt>. There should be very few valid reasons
for having to do so as this is more likely to break something than to fix
it.
- "set-path" rewrites the request path with the result of the evaluation of
format string <fmt>. The query string, if any, is left intact. If a
scheme and authority is found before the path, they are left intact as
well. If the request doesn't have a path ("*"), this one is replaced with
the format. This can be used to prepend a directory component in front of
a path for example. See also "set-query" and "set-uri".
Example :
# prepend the host name before the path
http-request set-path /%[hdr(host)]%[path]
- "set-query" rewrites the request's query string which appears after the
first question mark ("?") with the result of the evaluation of format
string <fmt>. The part prior to the question mark is left intact. If the
request doesn't contain a question mark and the new value is not empty,
then one is added at the end of the URI, followed by the new value. If
a question mark was present, it will never be removed even if the value
is empty. This can be used to add or remove parameters from the query
string. See also "set-query" and "set-uri".
Example :
# replace "%3D" with "=" in the query string
http-request set-query %[query,regsub(%3D,=,g)]
- "set-uri" rewrites the request URI with the result of the evaluation of
format string <fmt>. The scheme, authority, path and query string are all
replaced at once. This can be used to rewrite hosts in front of proxies,
or to perform complex modifications to the URI such as moving parts
between the path and the query string. See also "set-path" and
"set-query".
All of them are handled by the same parser and the same exec function,
which is why they're merged all together. For once, instead of adding
even more entries to the huge switch/case, we used the new facility to
register action keywords. A number of the existing ones should probably
move there as well.
Two commits ago in 7eda849 ("MEDIUM: samples: add a regsub converter to
perform regex-based transformations"), I got caught for the second time
with the inverted case sensitivity usage of regex_comp(). So by default
it is case insensitive and passing the "i" flag makes it case sensitive.
I forgot to recheck that case before committing the cleanup. No harm
anyway, nobody had the time to use it.
Make check check used to report explicitly report "DOWN (agent)" slightly
more restrictive such that it only triggers if the agent health is zero.
This avoids the following problem.
1. Backend is started disabled, agent check is is enabled
2. Backend is stabled using set server vip/rip state ready
3. Health is marked as down using set server vip/rip health down
At this point the http stats page will report "DOWN (agent)"
but the backend being down has nothing to do with the agent check
This problem appears to have been introduced by cf2924bc25
("MEDIUM: stats: report down caused by agent prior to reporting up").
Note that "DOWN (agent)" may also be reported by a more generic conditional
which immediately follows the code changed by this patch.
Reported-by: Mark Brooks <mark@loadbalancer.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
disable starts a server in the disabled state, however setting the health
of an agent implies that the agent is disabled as well as the server.
This is a problem because the state of the agent is not restored if
the state of the server is subsequently updated leading to an
unexpected state.
For example, if a server is started disabled and then the server
state is set to ready then without this change show stat indicates
that the server is "DOWN (agent)" when it is expected that the server
would be UP if its (non-agent) health check passes.
Reported-by: Mark Brooks <mark@loadbalancer.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
We can now replace matching regex parts with a string, a la sed. Note
that there are at least 3 different behaviours for existing sed
implementations when matching 0-length strings. Here is the result
of the following operation on each implementationt tested :
echo 'xzxyz' | sed -e 's/x*y*/A/g'
GNU sed 4.2.1 => AzAzA
Perl's sed 5.16.1 => AAzAAzA
Busybox v1.11.2 sed => AzAz
The psed behaviour was adopted because it causes the least exceptions
in the code and seems logical from a certain perspective :
- "x" matches x*y* => add "A" and skip "x"
- "z" matches x*y* => add "A" and keep "z", not part of the match
- "xy" matches x*y* => add "A" and skip "xy"
- "z" matches x*y* => add "A" and keep "z", not part of the match
- "" matches x*y* => add "A" and stop here
Anyway, given the incompatibilities between implementations, it's unlikely
that some processing will rely on this behaviour.
There currently is one big limitation : the configuration parser makes it
impossible to pass commas or closing parenthesis (or even closing brackets
in log formats). But that's still quite usable to replace certain characters
or character sequences. It will become more complete once the config parser
is reworked.
This function (and its sister regex_exec_match2()) abstract the regex
execution but make it impossible to pass flags to the regex engine.
Currently we don't use them but we'll need to support REG_NOTBOL soon
(to indicate that we're not at the beginning of a line). So let's add
support for this flag and update the API accordingly.
This one will be used when a regex is expected. It is automatically
resolved after the parsing and compiled into a regex. Some optional
flags are supported in the type-specific flags that should be set by
the optional arg checker. One is used during the regex compilation :
ARGF_REG_ICASE to ignore case.
This is in order to add new types. This patch does not change anything
else. Two remaining (harmless) occurrences of a count of 8 instead of 7
were fixed by this patch : empty_arg_list[] and the for() loop counting
args.
The way http-request/response set-header works is stupid. For a naive
reuse of the del-header code, it removes all occurrences of the header
to be set before computing the new format string. This makes it almost
unusable because it is not possible to append values to an existing
header without first copying them to a dummy header, performing the
copy back and removing the dummy header.
Instead, let's share the same code as add-header and perform the optional
removal after the string is computed. That way it becomes possible to
write things like :
http-request set-header X-Forwarded-For %[hdr(X-Forwarded-For)],%[src]
Note that this change is not expected to have any undesirable impact on
existing configs since if they rely on the bogus behaviour, they don't
work as they always retrieve an empty string.
This fix must be backported to 1.5 to stop the spreadth of ugly configs.
This converter hashes a binary input sample into an unsigned 32-bit quantity
using the CRC32 hash function. Optionally, it is possible to apply a full
avalanche hash function to the output if the optional <avalanche> argument
equals 1. This converter uses the same functions as used by the various hash-
based load balancing algorithms, so it will provide exactly the same results.
It is provided for compatibility with other software which want a CRC32 to be
computed on some input keys, so it follows the most common implementation as
found in Ethernet, Gzip, PNG, etc... It is slower than the other algorithms
but may provide a better or at least less predictable distribution.
This function will be used to perform CRC32 computations. This one wa
loosely inspired from crc32b found here, and focuses on size and speed
at the same time :
http://www.hackersdelight.org/hdcodetxt/crc.c.txt
Much faster table-based versions exist but are pointless for our usage
here, this hash already sustains gigabit speed which is far faster than
what we'd ever need. Better preserve the CPU's cache instead.
This fetch extracts the request's query string, which starts after the first
question mark. If no question mark is present, this fetch returns nothing. If
a question mark is present but nothing follows, it returns an empty string.
This means it's possible to easily know whether a query string is present
using the "found" matching method. This fetch is the completemnt of "path"
which stops before the question mark.
If a memory size limit is enforced using "-n" on the command line and
one or both of maxconn / maxsslconn are not set, instead of using the
build-time values, haproxy now computes the number of sessions that can
be allocated depending on a number of parameters among which :
- global.maxconn (if set)
- global.maxsslconn (if set)
- maxzlibmem
- tune.ssl.cachesize
- presence of SSL in at least one frontend (bind lines)
- presence of SSL in at least one backend (server lines)
- tune.bufsize
- tune.cookie_len
The purpose is to ensure that not haproxy will not run out of memory
when maxing out all parameters. If neither maxconn nor maxsslconn are
used, it will consider that 100% of the sessions involve SSL on sides
where it's supported. That means that it will typically optimize maxconn
for SSL offloading or SSL bridging on all connections. This generally
means that the simple act of enabling SSL in a frontend or in a backend
will significantly reduce the global maxconn but in exchange of that, it
will guarantee that it will not fail.
All metrics may be enforced using #defines to accomodate variations in
SSL libraries or various allocation sizes.
An SSL connection takes some memory when it exists and during handshakes.
We measured up to 16kB for an established endpoint, and up to 76 extra kB
during a handshake. The SSL layer stores these values into the global
struct during initialization. If other SSL libs are used, it's easy to
change these values. Anyway they'll only be used as gross estimates in
order to guess the max number of SSL conns that can be established when
memory is constrained and the limit is not set.
We'll need to know the number of SSL connections, their use and their
cost soon. In order to avoid getting tons of ifdefs everywhere, always
export SSL information in the global section. We add two flags to know
whether or not SSL is used in a frontend and in a backend.
send_log() calls update_hdr() to build a log header. It may happen
that no logger is defined at all but that we try to send a log anyway
(eg: upon startup). This results in a segfault when building the log
header because logline was never allocated.
This bug was revealed by the recent log-tag changes because the logline
is dereferenced after the call to snprintf(). So in 1.5 on most platforms
it has no impact because snprintf() will ignore NULL, but not necessarily
on all platforms.
The fix needs to be backported to 1.5.
It applies to the channel and it doesn't erase outgoing data, only
pending unread data, which is strictly equivalent to what recv()
does with MSG_TRUNC, so that new name is more accurate and intuitive.
This name more accurately reminds that it applies to a channel and not
to a buffer, and that what is returned may be used as a max number of
bytes to pass to recv().
This function's name was poorly chosen and is confusing to the point of
being suspiciously used at some places. The operations it does always
consider the ability to forward pending input data before receiving new
data. This is not obvious at all, especially at some places where it was
used when consuming outgoing data to know if the buffer has any chance
to ever get the missing data. The code needs to be re-audited with that
in mind. Care must be taken with existing code since the polarity of the
function was switched with the renaming.
channel_reserved is confusingly named. It is used to know whether or
not the rewrite area is left intact for situations where we want to
ensure we can use it before proceeding. Let's rename it to fix this
confusion.
In 1.4-dev7, a header removal mechanism was introduced with commit 68085d8
("[MINOR] http: add http_remove_header2() to remove a header value."). Due
to a typo in the function, the beginning of the headers gets desynchronized
if the header preceeding the deleted one ends with an LF/CRLF combination
different form the one of the removed header. The reason is that while
rewinding the pointer, we go back by a number of bytes taking into account
the LF/CRLF status of the removed header instead of the previous one. The
case where it fails is in http-request del-header/set-header where the
multiple occurrences of a header are present and their LF/CRLF ending
differs from the preceeding header. The loop then stops because no more
headers are found given that the names and length do not match.
Another point to take into consideration is that removing headers using
a loop of http_find_header2() and this function is inefficient since we
remove values one at a time while it could be simpler and faster to
remove full header lines. This is something that should be addressed
separately.
This fix must be backported to 1.5 and 1.4. Note that http-send-name-header
relies on this function as well so it could be possible that some of the
issues encountered with it in 1.4 come from this bug.
This is equivalent to what was done in commit 48936af ("[MINOR] log:
ability to override the syslog tag") but this time instead of doing
this globally, it does it per proxy. The purpose is to be able to use
a separate log tag for various proxies (eg: make it easier to route
log messages depending on the customer).
balance hdr(<name>) provides on option 'use_domain_only' to match only the
domain part in a header (designed for the Host header).
Olivier Fredj reported that the hashes were not the same for
'subdomain.domain.tld' and 'domain.tld'.
This is because the pointer was rewinded one step to far, resulting in a hash
calculated against wrong values :
- '.domai' for 'subdomain.domain.tld'
- ' domai' for 'domain.tld' (beginning with the space in the header line)
Another special case is when no dot can be found in the header : the hash will
be calculated against an empty string.
The patch addresses both cases : 'domain' will be used to compute the hash for
'subdomain.domain.tld', 'domain.tld' and 'domain' (using the whole header value
for the last case).
The fix must be backported to haproxy 1.5 and 1.4.
Since commit 3dd6a25 ("MINOR: stream-int: retrieve session pointer from
stream-int"), we can get the session from the task, so let's get rid of
this less obvious function.
commit 9ede66b0 introduced an environment variable (HAPROXY_SERVER_CURCONN) that
was supposed to be dynamically updated, but it was set only once, during its
initialization.
Most of the code provided in this previous patch has been rewritten in order to
easily update the environment variables without reallocating memory during each
check.
Now, HAPROXY_SERVER_CURCONN will contain the current number of connections on
the server at the time of the check.
This is the equivalent of eb9fd51 ("OPTIM: stream_sock: reduce the amount
of in-flight spliced data") whose purpose is to try to immediately send
spliced data if available.
bi_swpbuf() swaps the buffer passed in argument with the one attached to
the channel, but only if this last one is empty. The idea is to avoid a
copy when buffers can simply be swapped.
This setting is used to limit memory usage without causing the alloc
failures caused by "-m". Unexpectedly, tests have shown a performance
boost of up to about 18% on HTTP traffic when limiting the number of
buffers to about 10% of the amount of concurrent connections.
tune.buffers.limit <number>
Sets a hard limit on the number of buffers which may be allocated per process.
The default value is zero which means unlimited. The minimum non-zero value
will always be greater than "tune.buffers.reserve" and should ideally always
be about twice as large. Forcing this value can be particularly useful to
limit the amount of memory a process may take, while retaining a sane
behaviour. When this limit is reached, sessions which need a buffer wait for
another one to be released by another session. Since buffers are dynamically
allocated and released, the waiting time is very short and not perceptible
provided that limits remain reasonable. In fact sometimes reducing the limit
may even increase performance by increasing the CPU cache's efficiency. Tests
have shown good results on average HTTP traffic with a limit to 1/10 of the
expected global maxconn setting, which also significantly reduces memory
usage. The memory savings come from the fact that a number of connections
will not allocate 2*tune.bufsize. It is best not to touch this value unless
advised to do so by an haproxy core developer.
Used in conjunction with the dynamic buffer allocator.
tune.buffers.reserve <number>
Sets the number of buffers which are pre-allocated and reserved for use only
during memory shortage conditions resulting in failed memory allocations. The
minimum value is 2 and is also the default. There is no reason a user would
want to change this value, it's mostly aimed at haproxy core developers.
We've already experimented with three wake up algorithms when releasing
buffers : the first naive one used to wake up far too many sessions,
causing many of them not to get any buffer. The second approach which
was still in use prior to this patch consisted in waking up either 1
or 2 sessions depending on the number of FDs we had released. And this
was still inaccurate. The third one tried to cover the accuracy issues
of the second and took into consideration the number of FDs the sessions
would be willing to use, but most of the time we ended up waking up too
many of them for nothing, or deadlocking by lack of buffers.
This patch completely removes the need to allocate two buffers at once.
Instead it splits allocations into critical and non-critical ones and
implements a reserve in the pool for this. The deadlock situation happens
when all buffers are be allocated for requests pending in a maxconn-limited
server queue, because then there's no more way to allocate buffers for
responses, and these responses are critical to release the servers's
connection in order to release the pending requests. In fact maxconn on
a server creates a dependence between sessions and particularly between
oldest session's responses and latest session's requests. Thus, it is
mandatory to get a free buffer for a response in order to release a
server connection which will permit to release a request buffer.
Since we definitely have non-symmetrical buffers, we need to implement
this logic in the buffer allocation mechanism. What this commit does is
implement a reserve of buffers which can only be allocated for responses
and that will never be allocated for requests. This is made possible by
the requester indicating how much margin it wants to leave after the
allocation succeeds. Thus it is a cooperative allocation mechanism : the
requester (process_session() in general) prefers not to get a buffer in
order to respect other's need for response buffers. The session management
code always knows if a buffer will be used for requests or responses, so
that is not difficult :
- either there's an applet on the initiator side and we really need
the request buffer (since currently the applet is called in the
context of the session)
- or we have a connection and we really need the response buffer (in
order to support building and sending an error message back)
This reserve ensures that we don't take all allocatable buffers for
requests waiting in a queue. The downside is that all the extra buffers
are really allocated to ensure they can be allocated. But with small
values it is not an issue.
With this change, we don't observe any more deadlocks even when running
with maxconn 1 on a server under severely constrained memory conditions.
The code becomes a bit tricky, it relies on the scheduler's run queue to
estimate how many sessions are already expected to run so that it doesn't
wake up everyone with too few resources. A better solution would probably
consist in having two queues, one for urgent requests and one for normal
requests. A failed allocation for a session dealing with an error, a
connection event, or the need for a response (or request when there's an
applet on the left) would go to the urgent request queue, while other
requests would go to the other queue. Urgent requests would be served
from 1 entry in the pool, while the regular ones would be served only
according to the reserve. Despite not yet having this, it works
remarkably well.
This mechanism is quite efficient, we don't perform too many wake up calls
anymore. For 1 million sessions elapsed during massive memory contention,
we observe about 4.5M calls to process_session() compared to 4.0M without
memory constraints. Previously we used to observe up to 16M calls, which
rougly means 12M failures.
During a test run under high memory constraints (limit enforced to 27 MB
instead of the 58 MB normally needed), performance used to drop by 53% prior
to this patch. Now with this patch instead it *increases* by about 1.5%.
The best effect of this change is that by limiting the memory usage to about
2/3 to 3/4 of what is needed by default, it's possible to increase performance
by up to about 18% mainly due to the fact that pools are reused more often
and remain hot in the CPU cache (observed on regular HTTP traffic with 20k
objects, buffers.limit = maxconn/10, buffers.reserve = limit/2).
Below is an example of scenario which used to cause a deadlock previously :
- connection is received
- two buffers are allocated in process_session() then released
- one is allocated when receiving an HTTP request
- the second buffer is allocated then released in process_session()
for request parsing then connection establishment.
- poll() says we can send, so the request buffer is sent and released
- process session gets notified that the connection is now established
and allocates two buffers then releases them
- all other sessions do the same till one cannot get the request buffer
without hitting the margin
- and now the server responds. stream_interface allocates the response
buffer and manages to get it since it's higher priority being for a
response.
- but process_session() cannot allocate the request buffer anymore
=> We could end up with all buffers used by responses so that none may
be allocated for a request in process_session().
When the applet processing leaves the session context, the test will have
to be changed so that we always allocate a response buffer regardless of
the left side (eg: H2->H1 gateway). A final improvement would consists in
being able to only retry the failed I/O operation without waking up a
task, but to date all experiments to achieve this have proven not to be
reliable enough.
A session doesn't need buffers all the time, especially when they're
empty. With this patch, we don't allocate buffers anymore when the
session is initialized, we only allocate them in two cases :
- during process_session()
- during I/O operations
During process_session(), we try hard to allocate both buffers at once
so that we know for sure that a started operation can complete. Indeed,
a previous version of this patch used to allocate one buffer at a time,
but it can result in a deadlock when all buffers are allocated for
requests for example, and there's no buffer left to emit error responses.
Here, if any of the buffers cannot be allocated, the whole operation is
cancelled and the session is added at the tail of the buffer wait queue.
At the end of process_session(), a call to session_release_buffers() is
done so that we can offer unused buffers to other sessions waiting for
them.
For I/O operations, we only need to allocate a buffer on the Rx path.
For this, we only allocate a single buffer but ensure that at least two
are available to avoid the deadlock situation. In case buffers are not
available, SI_FL_WAIT_ROOM is set on the stream interface and the session
is queued. Unused buffers resulting either from a successful send() or
from an unused read buffer are offered to pending sessions during the
->wake() callback.
When a session_alloc_buffers() fails to allocate one or two buffers,
it subscribes the session to buffer_wq, and waits for another session
to release buffers. It's then removed from the queue and woken up with
TASK_WAKE_RES, and can attempt its allocation again.
We decide to try to wake as many waiters as we release buffers so
that if we release 2 and two waiters need only once, they both have
their chance. We must never come to the situation where we don't wake
enough tasks up.
It's common to release buffers after the completion of an I/O callback,
which can happen even if the I/O could not be performed due to half a
failure on memory allocation. In this situation, we don't want to move
out of the wait queue the session that was just added, otherwise it
will never get any buffer. Thus, we only force ourselves out of the
queue when freeing the session.
Note: at the moment, since session_alloc_buffers() is not used, no task
is subscribed to the wait queue.
This patch introduces session_alloc_recv_buffer(), session_alloc_buffers()
and session_release_buffers() whose purpose will be to allocate missing
buffers and release unneeded ones around the process_session() and during
I/O operations.
I/O callbacks only need a single buffer for recv operations and none
for send. However we still want to ensure that we don't pick the last
buffer. That's what session_alloc_recv_buffer() is for.
This allocator is atomic in that it always ensures we can get 2 buffers
or fails. Here, if any of the buffers is not ready and cannot be
allocated, the operation is cancelled. The purpose is to guarantee that
we don't enter into the deadlock where all buffers are allocated by the
same size of all sessions.
A queue will have to be implemented for failed allocations. For now
they're just reported as failures.
We'll soon want to release buffers together upon failure so we need to
allocate them after the channels. Let's change this now. There's no
impact on the behaviour, only the error path is unrolled slightly
differently. The same was done in peers.
Doing so ensures that even when no memory is available, we leave the
channel in a sane condition. There's a special case in proto_http.c
regarding the compression, we simply pre-allocate the tmpbuf to point
to the dummy buffer. Not reusing &buf_empty for this allows the rest
of the code to differenciate an empty buffer that's not used from an
empty buffer that results from a failed allocation which has the same
semantics as a buffer full.
Channels are now created with a valid pointer to a buffer before the
buffer is allocated. This buffer is a global one called "buf_empty" and
of size zero. Thus it prevents any activity from being performed on
the buffer and still ensures that chn->buf may always be dereferenced.
b_free() also resets the buffer to &buf_empty, and was split into
b_drop() which does not reset the buffer.
We don't call pool_free2(pool2_buffers) anymore, we only call b_free()
to do the job. This ensures that we can start to centralize the releasing
of buffers.
It's not clean to initialize the buffer before the channel since it
dereferences one pointer in the channel. Also we'll want to let the
channel pre-initialize the buffer, so let's ensure that the channel
is always initialized prior to the buffers.
b_alloc() now allocates a buffer and initializes it to the size specified
in the pool minus the size of the struct buffer itself. This ensures that
callers do not need to care about buffer details anymore. Also this never
applies memory poisonning, which is slow and useless on buffers.
We'll soon need to be able to switch buffers without touching the
channel, so let's move buffer initialization out of channel_init().
We had the same in compressoin.c.
Till now this function would only allocate one entry at a time. But with
dynamic buffers we'll like to allocate the number of missing entries to
properly refill the pool.
Let's modify it to take a minimum amount of available entries. This means
that when we know we need at least a number of available entries, we can
ask to allocate all of them at once. It also ensures that we don't move
the pointers back and forth between the caller and the pool, and that we
don't call pool_gc2() for each failed malloc. Instead, it's called only
once and the malloc is only allowed to fail once.
pool_alloc2() used to pick the entry from the pool, fall back to
pool_refill_alloc(), and to perform the poisonning itself, which
pool_refill_alloc() was also doing. While this led to optimal
code size, it imposes memory poisonning on the buffers as well,
which is extremely slow on large buffers.
This patch cuts the allocator in 3 layers :
- a layer to pick the first entry from the pool without falling back to
pool_refill_alloc() : pool_get_first()
- a layer to allocate a dirty area by falling back to pool_refill_alloc()
but never performing the poisonning : pool_alloc_dirty()
- pool_alloc2() which calls the latter and optionally poisons the area
No functional changes were made.
In zlib we track memory usage. The problem is that the way alloc_zlib()
and free_zlib() account for memory is different, resulting in variations
that can lead to negative zlib_mem being reported. The alloc() function
uses the requested size while the free() function uses the pool size. The
difference can happen when pools are shared with other pools of similar
size. The net effect is that zlib_mem can be reported negative with a
slowly decreasing count, and over the long term the limit will not be
enforced anymore.
The fix is simple : let's use the pool size in both cases, which is also
the exact value when it comes to memory usage.
This fix must be backported to 1.5.
create_server_socket() used to dereference objt_server(conn->target),
but if the target is not a server (eg: a proxy) then it's NULL and we
get a segfault. This can be reproduced with a proxy using "dispatch"
with no server, even when namespaces are disabled, because that code
is not #ifdef'd. The fix consists in first checking if the target is
a server.
This fix does not need to be backported, this is 1.6-only.
There's a long-standing bug in pool_gc2(). It tries to protect the pool
against releasing of too many entries but the formula is wrong as it
compares allocated to minavail instead of (allocated-used) to minavail.
Under memory contention, it ends up releasing more than what is granted
by minavail and causes trouble to the dynamic buffer allocator.
This bug is in fact major by itself, but since minavail has never been
used till now, there is no impact at least in mainline. A backport to
1.5 is desired anyway in case any future backport or out-of-tree patch
relies on this.
In stream_int_register_handler(), we call si_alloc_appctx(si) but as a
mistake, instead of checking the return value for a NULL, we test <si>.
This bug was discovered under extreme memory contention (memory for only
two buffers with 500 connections waiting) and after 3 million failed
connections. While it was very hard to produce it, the fix is tagged
major because in theory it could happen when haproxy runs with a very
low "-m" setting preventing from allocating just the few bytes needed
for an appctx. But most users will never be able to trigger it. The
fix was confirmed to address the bug.
This fix must be backported to 1.5.
The path "MAJOR: namespace: add Linux network namespace support" doesn't
permit to use internal data producer like a "peers synchronisation"
system. The result is a segfault when the internal application starts.
This patch fix the commit b3e54fe387
It is introduced in 1.6dev version, it doesn't need to be backported.
By convention, the HAProxy CLI doesn't return message if the opration
is sucessfully done. The MAP and ACL returns the "Done." message, an
its noise the output during big MAP or ACL injection.
Immo Goltz reported a case of segfault while parsing the config where
we try to propagate processes across stopped frontends (those with a
"disabled" statement). The fix is trivial. The workaround consists in
commenting out these frontends, although not always easy.
This fix must be backported to 1.5.
propagate_processes() has a typo in a condition :
if (!from->cap & PR_CAP_FE)
return;
The return is never taken because each proxy has at least one capability
so !from->cap always evaluates to zero. Most of the time the caller already
checks that <from> is a frontend. In the cases where it's not tested
(use_backend, reqsetbe), the rules have been checked for the context to
be a frontend as well, so in the end it had no nasty side effect.
This should be backported to 1.5.
Since during parsing stage, curproxy always represents a proxy to be operated,
it should be a mistake by referring proxy.
Signed-off-by: Godbach <nylzhaowei@gmail.com>
Willy: commit f2e8ee2b introduced an optimization in the old speculative epoll
code, which implemented its own event cache. It was needed to store that many
events (it was bound to maxsock/4 btw). Now the event cache lives on its own
and we don't need this anymore. And since events are allocated on the kernel
side, we only need to allocate the events we want to return.
As a result, absmaxevents will be not used anymore. Just remove the definition
and the comment of it, replace it with global.tune.maxpollevents. It is also an
optimization of memory usage for large amounts of sockets.
Signed-off-by: Godbach <nylzhaowei@gmail.com>
random() will generate a number between 0 and RAND_MAX. POSIX mandates
RAND_MAX to be at least 32767. GNU libc uses (1<<31 - 1) as
RAND_MAX.
In smp_fetch_rand(), a reduction is done with a multiply and shift to
avoid skewing the results. However, the shift was always 32 and hence
the numbers were not distributed uniformly in the specified range. We
fix that by dividing by RAND_MAX+1. gcc is smart enough to turn that
into a shift:
0x000000000046ecc8 <+40>: shr $0x1f,%rax
Since commit 656c5fa7e8 ("BUILD: ssl: disable OCSP when using
boringssl) the OCSP code is bypassed when OPENSSL_IS_BORINGSSL
is defined. The correct thing to do here is to use OPENSSL_NO_OCSP
instead, which is defined for this exact purpose in
openssl/opensslfeatures.h.
This makes haproxy forward compatible if boringssl ever introduces
full OCSP support with the additional benefit that it links fine
against a OCSP-disabled openssl.
Signed-off-by: Lukas Tribus <luky-37@hotmail.com>
Using "option tcp-checks" without any rule is different from not using
it at all in that checks are sent with the TCP quick ack mode enabled,
causing servers to log incoming port probes.
This commit fixes this behaviour by disabling quick-ack on tcp-checks
unless the next rule exists and is an expect. All combinations were
tested and now the behaviour is as expected : basic port probes are
now doing a SYN-SYN/ACK-RST sequence.
This fix must be backported to 1.5.
If "option tcp-check" is used and no "tcp-check" rule is specified, we
only look at rule->action which dereferences the proxy's memory and which
can randomly match TCPCHK_ACT_CONNECT or whatever else, causing a check
to fail. This bug is the result of an incorrect fix attempted in commit
f621bea ("BUG/MINOR: tcpcheck connect wrong behavior").
This fix must be backported into 1.5.
tcp_check_main() would condition the polling for writes on check->type,
but this is absurd given that check->type == PR_O2_TCPCHK_CHK since this
is the only way we can get there! This patch removes this confusing test.
The external command accepted 4 arguments, some with the value "NOT_USED" when
not applicable. In order to make the exernal command more generic, this patch
also provides the values in environment variables. This allows to provide more
information.
Currently, the supported environment variables are :
PATH, as previously provided.
HAPROXY_PROXY_NAME, the backend name
HAPROXY_PROXY_ID, the backend id
HAPROXY_PROXY_ADDR, the first bind address if available (or empty)
HAPROXY_PROXY_PORT, the first bind port if available (or empty)
HAPROXY_SERVER_NAME, the server name
HAPROXY_SERVER_ID, the server id
HAPROXY_SERVER_ADDR, the server address
HAPROXY_SERVER_PORT, the server port if available (or empty)
HAPROXY_SERVER_MAXCONN, the server max connections
HAPROXY_SERVER_CURCONN, the current number of connections on the server
Previously, external checks required to find at least one listener in order to
pass the <proxy_address> and <proxy_port> arguments to the external script.
It prevented from declaring external checks in backend sections and haproxy
rejected the configuration.
The listener is now optional and values "NOT_USED" are passed if no listener is
found. For instance, this is the case with a backend section.
This is specific to the 1.6 branch.
Denys Fedoryshchenko reported a segfault when using certain
sample fetch functions in the "tcp-request connection" rulesets
despite the warnings. This is because some tests for the existence
of the channel were missing.
The fetches which were fixed are :
- req.ssl_hello_type
- rep.ssl_hello_type
- req.ssl_sni
This fix must be backported to 1.5.
Dmitry Sivachenko <trtrmitya@gmail.com> reported that commit 315ec42
("BUG/MEDIUM: pattern: don't load more than once a pattern list.")
relies on an uninitialised variable in the stack. While it used to
work fine during the tests, if the uninitialized variable is non-null,
some patterns may be aggregated if loaded multiple times, resulting in
slower processing, which was the original issue it tried to address.
The fix needs to be backported to 1.5.
Since embryonic sessions were introduced in 1.5-dev12 with commit
2542b53 ("MAJOR: session: introduce embryonic sessions"), a major
bug remained present. If haproxy cannot allocate memory during
session_complete() (for example, no more buffers), it will not
unlink the new session from the sessions list. This will cause
memory corruptions if the memory area from the session is reused
for anything else, and may also cause bogus output on "show sess"
on the CLI.
This fix must be backported to 1.5.
word(<index>,<delimiters>)
Extracts the nth word considering given delimiters from an input string.
Indexes start at 1 and delimiters are a string formatted list of chars.
field(<index>,<delimiters>)
Extracts the substring at the given index considering given delimiters from
an input string. Indexes start at 1 and delimiters are a string formatted
list of chars.
bytes(<offset>[,<length>])
Extracts a some bytes from an input binary sample. The result is a
binary sample starting at an offset (in bytes) of the original sample
and optionnaly truncated at the given length.
Sometimes, either for debugging or for logging we'd like to have a bit
of information about the running process. Here are 3 new fetches for this :
nbproc : integer
Returns an integer value corresponding to the number of processes that were
started (it equals the global "nbproc" setting). This is useful for logging
and debugging purposes.
proc : integer
Returns an integer value corresponding to the position of the process calling
the function, between 1 and global.nbproc. This is useful for logging and
debugging purposes.
stopping : boolean
Returns TRUE if the process calling the function is currently stopping. This
can be useful for logging, or for relaxing certain checks or helping close
certain connections upon graceful shutdown.
Currently this is harmless since trash.size is copied from
global.tune.bufsize, but this may soon change when buffers become
more dynamic.
At least for consistency it should be backported to 1.5.
A memory optimization can use the same pattern expression for many
equal pattern list (same parse method, index method and index_smp
method).
The pattern expression is returned by "pattern_new_expr", but this
function dont indicate if the returned pattern is already in use.
So, the caller function reload the list of patterns in addition with
the existing patterns. This behavior is not a problem with tree indexed
pattern, but it grows the lists indexed patterns.
This fix add a "reuse" flag in return of the function "pattern_new_expr".
If the flag is set, I suppose that the patterns are already loaded.
This fix must be backported into 1.5.
In order for HTTP/2 not to eat too much memory, we'll have to support
on-the-fly buffer allocation, since most streams will have an empty
request buffer at some point. Supporting allocation on the fly means
being able to sleep inside I/O callbacks if a buffer is not available.
Till now, the I/O callbacks were called from two locations :
- when processing the cached events
- when processing the polled events from the poller
This change cleans up the design a bit further than what was started in
1.5. It now ensures that we never call any iocb from the poller itself
and that instead, events learned by the poller are put into the cache.
The benefit is important in terms of stability : we don't have to care
anymore about the risk that new events are added into the poller while
processing its events, and we're certain that updates are processed at
a single location.
To achieve this, we now modify all the fd_* functions so that instead of
creating updates, they add/remove the fd to/from the cache depending on
its state, and only create an update when the polling status reaches a
state where it will have to change. Since the pollers make use of these
functions to notify readiness (using fd_may_recv/fd_may_send), the cache
is always up to date with the poller.
Creating updates only when the polling status needs to change saves a
significant amount of work for the pollers : a benchmark showed that on
a typical TCP proxy test, the amount of updates per connection dropped
from 11 to 1 on average. This also means that the update list is smaller
and has more chances of not thrashing too many CPU cache lines. The first
observed benefit is a net 2% performance gain on the connection rate.
A second benefit is that when a connection is accepted, it's only when
we're processing the cache, and the recv event is automatically added
into the cache *after* the current one, resulting in this event to be
processed immediately during the same loop. Previously we used to have
a second run over the updates to detect if new events were added to
catch them before waking up tasks.
The next gain will be offered by the next steps on this subject consisting
in implementing an I/O queue containing all cached events ordered by priority
just like the run queue, and to be able to leave some events pending there
as long as needed. That will allow us *not* to perform some FD processing
if it's not the proper time for this (typically keep waiting for a buffer
to be allocated if none is available for an recv()). And by only processing
a small bunch of them, we'll allow priorities to take place even at the I/O
level.
As a result of this change, functions fd_alloc_or_release_cache_entry()
and fd_process_polled_events() have disappeared, and the code dedicated
to checking for new fd events after the callback during the poll() loop
was removed as well. Despite the patch looking large, it's mostly a
change of what function is falled upon fd_*() and almost nothing was
added.
When enabling stats, response analysers were set on the request
analyser list, which 1) has no effect, and 2) means we don't have
the response analysers properly set.
In practice these response analysers are set when the connection
to the server or applet is established so we don't need/must not
set them here.
Fortunately this bug had no impact since the flags are distinct,
but it definitely is confusing.
It should be backported to 1.5.
This patch makes it possible to create binds and servers in separate
namespaces. This can be used to proxy between multiple completely independent
virtual networks (with possibly overlapping IP addresses) and a
non-namespace-aware proxy implementation that supports the proxy protocol (v2).
The setup is something like this:
net1 on VLAN 1 (namespace 1) -\
net2 on VLAN 2 (namespace 2) -- haproxy ==== proxy (namespace 0)
net3 on VLAN 3 (namespace 3) -/
The proxy is configured to make server connections through haproxy and sending
the expected source/target addresses to haproxy using the proxy protocol.
The network namespace setup on the haproxy node is something like this:
= 8< =
$ cat setup.sh
ip netns add 1
ip link add link eth1 type vlan id 1
ip link set eth1.1 netns 1
ip netns exec 1 ip addr add 192.168.91.2/24 dev eth1.1
ip netns exec 1 ip link set eth1.$id up
...
= 8< =
= 8< =
$ cat haproxy.cfg
frontend clients
bind 127.0.0.1:50022 namespace 1 transparent
default_backend scb
backend server
mode tcp
server server1 192.168.122.4:2222 namespace 2 send-proxy-v2
= 8< =
A bind line creates the listener in the specified namespace, and connections
originating from that listener also have their network namespace set to
that of the listener.
A server line either forces the connection to be made in a specified
namespace or may use the namespace from the client-side connection if that
was set.
For more documentation please read the documentation included in the patch
itself.
Signed-off-by: KOVACS Tamas <ktamas@balabit.com>
Signed-off-by: Sarkozi Laszlo <laszlo.sarkozi@balabit.com>
Signed-off-by: KOVACS Krisztian <hidden@balabit.com>
Previously, if hdr_v2->len was less than the length of the protocol
specific address information we could have read after the end of the
buffer and initialize the sockaddr structure with junk.
Signed-off-by: KOVACS Krisztian <hidden@balabit.com>
[WT: this is only tagged medium since proxy protocol is only used from
trusted sources]
This must be backported to 1.5.
Denys Fedoryshchenko reported and diagnosed a nasty bug caused by TCP
captures, introduced in late 1.5-dev by commit 18bf01e ("MEDIUM: tcp:
add a new tcp-request capture directive"). The problem is that we're
using the array of capture pointers initially designed for HTTP usage
only, and that this array was only reset when starting to process an
HTTP request. In a tcp-only frontend, the pointers are not reset, and
if the capture pool is shared, we can very well point to whatever other
memory location, resulting in random crashes when tcp-request content
captures are processed.
The fix simply consists in initializing these pointers when the pools
are prepared.
A workaround for existing versions consists in either disabling TCP
captures in tcp-only frontends, or in forcing the frontends to work in
HTTP mode.
Thanks to Denys for the amount of testing and detailed reports.
This fix must be backported to 1.5.
Tom Limoncelli from Stack Exchange reported a minor bug : the frontend
inherits the LB parameters from the defaults sections. The impact is
that if a "balance" directive uses any L7 parameter in the defaults
sections and the frontend is in TCP mode, a warning is emitted about
their incompatibility. The warning is harmless but a valid, sane config
should never cause any warning to be reported.
This fix should be backported into 1.5 and possibly 1.4.
pcre_study() has been around long before JIT has been added. It also seems to
affect the performance in some cases (positive).
Below I've attached some test restults. The test is based on
http://sljit.sourceforge.net/regex_perf.html (see bottom). It has been modified
to just test pcre_study vs. no pcre_study. Note: This test does not try to
match specific header it's instead run over a larger text with more and less
complex patterns to make the differences more clear.
% ./runtest
'mark.txt' loaded. (Length: 19665221 bytes)
-----------------
Regex: 'Twain'
[pcre-nostudy] time: 14 ms (2388 matches)
[pcre-study] time: 21 ms (2388 matches)
-----------------
Regex: '^Twain'
[pcre-nostudy] time: 109 ms (100 matches)
[pcre-study] time: 109 ms (100 matches)
-----------------
Regex: 'Twain$'
[pcre-nostudy] time: 14 ms (127 matches)
[pcre-study] time: 16 ms (127 matches)
-----------------
Regex: 'Huck[a-zA-Z]+|Finn[a-zA-Z]+'
[pcre-nostudy] time: 695 ms (83 matches)
[pcre-study] time: 26 ms (83 matches)
-----------------
Regex: 'a[^x]{20}b'
[pcre-nostudy] time: 90 ms (12495 matches)
[pcre-study] time: 91 ms (12495 matches)
-----------------
Regex: 'Tom|Sawyer|Huckleberry|Finn'
[pcre-nostudy] time: 1236 ms (3015 matches)
[pcre-study] time: 34 ms (3015 matches)
-----------------
Regex: '.{0,3}(Tom|Sawyer|Huckleberry|Finn)'
[pcre-nostudy] time: 5696 ms (3015 matches)
[pcre-study] time: 5655 ms (3015 matches)
-----------------
Regex: '[a-zA-Z]+ing'
[pcre-nostudy] time: 1290 ms (95863 matches)
[pcre-study] time: 1167 ms (95863 matches)
-----------------
Regex: '^[a-zA-Z]{0,4}ing[^a-zA-Z]'
[pcre-nostudy] time: 136 ms (4507 matches)
[pcre-study] time: 134 ms (4507 matches)
-----------------
Regex: '[a-zA-Z]+ing$'
[pcre-nostudy] time: 1334 ms (5360 matches)
[pcre-study] time: 1214 ms (5360 matches)
-----------------
Regex: '^[a-zA-Z ]{5,}$'
[pcre-nostudy] time: 198 ms (26236 matches)
[pcre-study] time: 197 ms (26236 matches)
-----------------
Regex: '^.{16,20}$'
[pcre-nostudy] time: 173 ms (4902 matches)
[pcre-study] time: 175 ms (4902 matches)
-----------------
Regex: '([a-f](.[d-m].){0,2}[h-n]){2}'
[pcre-nostudy] time: 1242 ms (68621 matches)
[pcre-study] time: 690 ms (68621 matches)
-----------------
Regex: '([A-Za-z]awyer|[A-Za-z]inn)[^a-zA-Z]'
[pcre-nostudy] time: 1215 ms (675 matches)
[pcre-study] time: 952 ms (675 matches)
-----------------
Regex: '"[^"]{0,30}[?!\.]"'
[pcre-nostudy] time: 27 ms (5972 matches)
[pcre-study] time: 28 ms (5972 matches)
-----------------
Regex: 'Tom.{10,25}river|river.{10,25}Tom'
[pcre-nostudy] time: 705 ms (2 matches)
[pcre-study] time: 68 ms (2 matches)
In some cases it's more or less the same but when it's faster than by a huge margin.
It always depends on the pattern, the string(s) to match against etc.
Signed-off-by: Christian Ruppert <c.ruppert@babiel.com>
Lasse Birnbaum Jensen reported an issue when agent checks are used at the same
time as standard healthchecks when SSL is enabled on the server side.
The symptom is that agent checks try to communicate in SSL while it should
manage raw data. This happens because the transport layer is shared between all
kind of checks.
To fix the issue, the transport layer is now stored in each check type,
allowing to use SSL healthchecks when required, while an agent check should
always use the raw_sock implementation.
The fix must be backported to 1.5.
We currently release all pools when a proxy is stopped, except the
connection, pendconn, and pipe pools. Doing so can improve further
reduce memory usage of old processes, eventhough the connection struct
is quite small, but there are a lot and they can participate to memory
fragmentation. The pipe pool is very small and limited, and not exported
so it's not done here.
There's a very common openssl patch on the net meant to significantly
reduce openssl's memory usage. This patch has been provided for many
versions now, and it makes sense to add support for it given that it
is very simple. It only requires to add an extra SSL_MODE flag. Just
like for other flags, if the flag is unknown, it's unset. About 44kB
of memory may be saved per SSL session with the patch.
When memory becomes scarce and openssl refuses to allocate a new SSL
session, it is worth freeing the pools and trying again instead of
rejecting all incoming SSL connection. This can happen when some
memory usage limits have been assigned to the haproxy process using
-m or with ulimit -m/-v.
This is mostly an enhancement of previous fix and is worth backporting
to 1.5.
Some SSL context's init functions errors were not handled and
can cause a segfault due to an incomplete SSL context
initialization.
This fix must be backported to 1.5.
HAProxy will crash with the following configuration:
global
...
tune.bufsize 1024
tune.maxrewrite 0
frontend xxx
...
backend yyy
...
cookie cookie insert maxidle 300s
If client sends a request of which object size is more than tune.bufsize (1024
bytes), HAProxy will crash.
After doing some debugging, the crash was caused by http_header_add_tail2() ->
buffer_insert_line2() while inserting cookie at the end of response header.
Part codes of buffer_insert_line2() are as below:
int buffer_insert_line2(struct buffer *b, char *pos, const char *str, int len)
{
int delta;
delta = len + 2;
if (bi_end(b) + delta >= b->data + b->size)
return 0; /* no space left */
/* first, protect the end of the buffer */
memmove(pos + delta, pos, bi_end(b) - pos);
...
}
Since tune.maxrewrite is 0, HAProxy can receive 1024 bytes once which is equals
to full buffer size. Under such condition, the buffer is full and bi_end(b)
will be wrapped to the start of buffer which pointed to b->data. As a result,
though there is no space left in buffer, the check condition
if (bi_end(b) + delta >= b->data + b->size)
will be true, then memmove() is called, and (pos + delta) will exceed the end
of buffer (b->data + b->size), HAProxy crashes
Just take buffer_replace2() as a reference, the other check when input data in
a buffer is wrapped should be also added into buffer_insert_line2().
This fix must be backported to 1.5.
Signed-off-by: Godbach <nylzhaowei@gmail.com>
Bug reported by John Leach: no-sslv3 does not work using some certificates.
It appears that ssl ctx is not updated with configured options if the
CommonName of the certificate's subject is not found.
It applies only on the first cerificate of a configured bind line.
There is no security impact, because only invalid nameless certficates
are concerned.
This fix must be backported to 1.5
Adds global statements 'ssl-default-server-options' and
'ssl-default-bind-options' to force on 'server' and 'bind' lines
some ssl options.
Currently available options are 'no-sslv3', 'no-tlsv10', 'no-tlsv11',
'no-tlsv12', 'force-sslv3', 'force-tlsv10', 'force-tlsv11',
'force-tlsv12', and 'no-tls-tickets'.
Example:
global
ssl-default-server-options no-sslv3
ssl-default-bind-options no-sslv3
There's an issue when using SO_ORIGINAL_DST to retrieve the original
destination of a connection's address before being translated by
Netfilter's DNAT/REDIRECT or the old TPROXY. SO_ORIGINAL_DST is
able to retrieve an IPv4 address when the original destination was
IPv4 mapped into IPv6. At first glance it's not a big deal, but it
is for logging and for the proxy protocol, because we then have
two different address families for the source and destination. In
this case, the proxy protocol correctly detects the issue and emits
"UNKNOWN".
In order to fix this, we perform getsockname() first, and only if
the address family is AF_INET, then we perform the getsockopt() call.
This fix must be backported to 1.5, and probably even to 1.4 and 1.3.
ssl_c_der : binary
Returns the DER formatted certificate presented by the client when the
incoming connection was made over an SSL/TLS transport layer. When used for
an ACL, the value(s) to match against can be passed in hexadecimal form.
ssl_f_der : binary
Returns the DER formatted certificate presented by the frontend when the
incoming connection was made over an SSL/TLS transport layer. When used for
an ACL, the value(s) to match against can be passed in hexadecimal form.
pcre_study() may return NULL even though it succeeded. In this case error is
NULL otherwise error is not NULL. Also see man 3 pcre_study.
Previously a ACL pattern of e.g. ".*" would cause error because pcre_study did
not found anything to speed up matching and returned regex->extra = NULL and
error = NULL which in this case was a false-positive. That happend only when
PCRE_JIT was enabled for HAProxy but libpcre has been built without JIT.
Signed-off-by: Christian Ruppert <c.ruppert@babiel.com>
[wt: this needs to be backported to 1.5 as well]
When building on openssl-0.9.8, since commit 23d5d37 ("MINOR: ssl: use
SSL_get_ciphers() instead of directly accessing the cipher list.") we get
the following warning :
src/ssl_sock.c: In function 'ssl_sock_prepare_ctx':
src/ssl_sock.c:1592: warning: passing argument 1 of 'SSL_CIPHER_description' discards qualifiers from pointer target type
This is because the openssl API has changed between 0.9.8 and 1.0.1 :
0.9.8: char *SSL_CIPHER_description(SSL_CIPHER *,char *buf,int size);
1.0.1: char *SSL_CIPHER_description(const SSL_CIPHER *,char *buf,int size);
So let's remove the "const" type qualifier to satisfy both versions.
Note that the fix above was backported to 1.5, so this one should as well.
This converter escapes string to use it as json/ascii escaped string.
It can read UTF-8 with differents behavior on errors and encode it in
json/ascii.
json([<input-code>])
Escapes the input string and produces an ASCII ouput string ready to use as a
JSON string. The converter tries to decode the input string according to the
<input-code> parameter. It can be "ascii", "utf8", "utf8s", "utf8"" or
"utf8ps". The "ascii" decoder never fails. The "utf8" decoder detects 3 types
of errors:
- bad UTF-8 sequence (lone continuation byte, bad number of continuation
bytes, ...)
- invalid range (the decoded value is within a UTF-8 prohibited range),
- code overlong (the value is encoded with more bytes than necessary).
The UTF-8 JSON encoding can produce a "too long value" error when the UTF-8
character is greater than 0xffff because the JSON string escape specification
only authorizes 4 hex digits for the value encoding. The UTF-8 decoder exists
in 4 variants designated by a combination of two suffix letters : "p" for
"permissive" and "s" for "silently ignore". The behaviors of the decoders
are :
- "ascii" : never fails ;
- "utf8" : fails on any detected errors ;
- "utf8s" : never fails, but removes characters corresponding to errors ;
- "utf8p" : accepts and fixes the overlong errors, but fails on any other
error ;
- "utf8ps" : never fails, accepts and fixes the overlong errors, but removes
characters corresponding to the other errors.
This converter is particularly useful for building properly escaped JSON for
logging to servers which consume JSON-formated traffic logs.
Example:
capture request header user-agent len 150
capture request header Host len 15
log-format {"ip":"%[src]","user-agent":"%[capture.req.hdr(1),json]"}
Input request from client 127.0.0.1:
GET / HTTP/1.0
User-Agent: Very "Ugly" UA 1/2
Output log:
{"ip":"127.0.0.1","user-agent":"Very \"Ugly\" UA 1\/2"}
During a tcp connection setup in tcp_connect_server(), we check if
there are pending data to start polling for writes immediately. We
also use the same test to know if we can disable the quick ack and
merge the first data packet with the connection's ACK. This last
case is also valid for the proxy protocol.
The problem lies in the way it's done, as the "data" variable is
improperly completed with the presence of the proxy protocol, resulting
in the connection being polled for data writes if the proxy protocol is
enabled. It's not a big issue per se, except that the proxy protocol
uses the fact that we're polling for data to know if it can use MSG_MORE.
This causes no problem on HTTP/HTTPS, but with banner protocols, it
introduces a 200ms delay if the server waits for the PROXY header.
This has been caused by the connection management changes introduced in
1.5-dev12, specifically commit a1a7474 ("MEDIUM: proxy-proto: don't use
buffer flags in conn_si_send_proxy()"), so this fix must be backported
to 1.5.
Colin Ingarfield reported some unexplainable flags in the logs.
For example, a "LR" termination state was set on a request which was forwarded
to a server, where "LR" means that the request should have been handled
internally by haproxy.
This case happens when at least client side keep-alive is enabled. Next
requests in the connection will inherit the flags from the previous request.
2 fields are impacted : "termination_state" and "Tt" in the timing events,
where a "+" can be added, when a previous request was redispatched.
This is not critical for the service itself but can confuse troubleshooting.
The fix must be backported to 1.5 and 1.4.
Dmitry Sivachenko reported an embarrassing problem where haproxy
would sometimes segfault upon reload. After careful analysis and
code inspection, what happens is related to the "show sess" command
on the CLI, and it is not limited to reload operations only.
When a "show sess" is running, once the output buffer is full, the
stats applet grabs a reference to the session being dumped in order
for the current pointer to be able to advance by itself should this
session disappear while the buffer is full. The applet also uses a
release handler that is called when the applet terminates to release
such references.
The problem is that upon error, the command line parser sets the
applet state to STAT_CLI_O_END indicating it wants to terminate the
processing. Unfortunately, the release handler which is called later
to clean everything up relies on the applet's state to know what
operations were in progress, and as such it does not release the
reference. A later "show sess" or the completion of the task being
watched lead to a LIST_DEL() on the task's list which point to a
location that does not match the applet's reference list anymore
and the process dies.
One solution to this would be to add a flag to the current applet's
state mentionning it must leave, without affecting the state indicating
the current operation. It's a bit invasive but could be the long term
solution. The short term solution simply consists in calling the
release handler just before changing the state to STAT_CLI_O_END.
That way everything that must be released is released in time.
Note that the probability to encounter this issue is very low.
It requires a lot of "show sess" or "show sess all" calls, and
that one of them dies before being completed. That can happen
if "show sess" is run in scripts which truncate the output (eg:
"echo show sess|socat|head"). This could be the worst case as it
almost ensures that haproxy fills a buffer, grabs a reference and
detects the error on the socket.
There's no config-based workaround to this issue, except refraining
from issuing "show sess" on large connection counts or "show sess all".
If that's not possible to block everyone, restricting permissions on
the stats socket ensures only authorized tools can connect.
This fix must be backported to 1.5 and to 1.4 (with some changes in
1.4 since the release function does not exist so the LIST_DEL sequence
must be open-coded).
Special thanks to Dmitry for the fairly complete report.
When the HTTP parser is in state HTTP_MSG_ERROR, we don't know if it was
already initialized or not. If the error happens before HTTP_MSG_RQBEFORE,
random offsets might be present and we don't want to display such random
strings in debug mode.
While it's theorically possible to randomly crash the process when running
in debug mode here, this bug was not tagged MAJOR because it would not
make sense to run in debug mode in production.
This fix must be backported to 1.5 and 1.4.
Commit 98634f0 ("MEDIUM: backend: Enhance hash-type directive with an
algorithm options") cleaned up the hashing code by using a centralized
function. A bug appeared in get_server_uh() which is the URI hashing
function. Prior to the patch, the function would stop hashing on the
question mark, or on the trailing slash of a maximum directory count.
Consecutive to the patch, this last character is included into the
hash computation. This means that :
GET /0
GET /0?
Are not hashed similarly. The following configuration reproduces it :
mode http
balance uri
server s1 0.0.0.0:1234 redir /s1
server s2 0.0.0.0:1234 redir /s2
Many thanks to Vedran Furac for reporting this issue. The fix must
be backported to 1.5.
MAX_SESS_STKCTR allows one to define the number of stick counters that can
be used in parallel in track-sc* rules. The naming of this macro creates
some confusion because the value there is sometimes used as a max instead
of a count, and the config parser accepts values from 0 to MAX_SESS_STKCTR
and the processing ignores anything tracked on the last one. This means
that by default, track-sc3 is allowed and ignored.
This fix must be backported to 1.5 where the problem there only affects
TCP rules.
Paul Taylor and Bryan Talbot found that after commit 419ead8 ("MEDIUM:
config: compute the exact bind-process before listener's maxaccept"),
a backend marked "disabled" would cause the next backend to be skipped
and if it was the last one it would cause a segfault.
The reason is that the commit above changed the "while" loop for a "for"
loop but a "continue" statement still incrementing the current proxy was
left in the code for disabled proxies, causing the next one to be skipped
as well and the last one to try to dereference NULL when seeking ->next.
The quick workaround consists in not disabling backends, or adding an
empty dummy one after a disabled section.
This fix must be backported to 1.5.
A segfault was reported with the introduction of the propagate_processes()
function. It was caused when a use_backend rule was declared with a dynamic
name, using a log-format string. The backend is not resolved during the
configuration, which lead to the segfault.
The patch prevents the process binding propagation for such dynamic rules, it
should also be backported to 1.5.
The step number was reported by checking only last_started_step, which
was not set in case of error during the initial connection phase, and
caused "step 1" to be returned with an invalid check type (typically
SEND). So now we first verify that a test was started before returning
this.
In addition to this, the indication of the test type was taken from
current_step instead of last_started_step, so the error description
was matching the next action instead of the one reported in the step
ID. Thus we could get the confusing "step 1 (send)" report below :
tcp-check connect
tcp-check send foo
In order to ease debugging, when the port number is known for a connect,
it is indicated in the error report.
Note that this only affects asynchronous error messages, synchronous ones
are correct.
This fix must be backported to 1.5.
When "option tcp-check" is specified without any tcp-check rules, the
documentation says that it's the same as the default check method. But
the code path is a bit different, and we used to consider that since
the end of rules was reached, the check is always successful regardless
of the connection status.
This patch reorganizes the error detection, and considers the special
case where there's no tcp-check rule as a real L4 check. It also avoids
dereferencing the rule list head as a rule by itself.
While fixing this bug, another one related to the output messages'
accuracy was noticed, it will be fixed in a separate commit and is
much less important.
This bug is also present in 1.5, so this fix must be backported.
propagate_processes() must not be called with unresolved proxies, but
nothing prevents it from being called in check_config_validity(). The
resulting effect is that an unresolved proxy can cause a recursion
loop if called in such a situation, ending with a segfault after the
fatal error report. There's no side effect beyond this.
This patch refrains from calling the function when any error was met.
This bug also affects 1.5, it should be backported.
Commit 179085c ("MEDIUM: http: move Connection header processing earlier")
introduced a regression : the backend's HTTP mode is not considered anymore
when setting the session's HTTP mode, because wait_for_request() is only
called once, when the frontend receives the request (or when the frontend
is in TCP mode, when the backend receives the request).
The net effect is that in some situations when the frontend and the backend
do not work in the same mode (eg: keep-alive vs close), the backend's mode
is ignored.
This patch moves all that processing to a dedicated function, which is
called from the original place, as well as from session_set_backend()
when switching from an HTTP frontend to an HTTP backend in different
modes.
This fix must be backported to 1.5.
Kristoffer Grönlund reported that after my recent update to the
systemd-wrapper, I accidentely left the debugging code which
consists in disabling the fork :-(
The fix needs to be backported to 1.5 as well since I pushed it
there as well.
Having to use a hard-coded "haproxy" executable name next to the systemd
wrapper is not always convenient, as it's sometimes desirable to run with
multiple versions in parallel.
Thus this patch performs a minor change to the wrapper : if the name ends
with "-systemd-wrapper", then it trims that part off and what remains
becomes the target haproxy executable. That makes it easy to have for
example :
haproxy-1.5.4-systemd-wrapper haproxy-1.5.4
haproxy-1.5.3-systemd-wrapper haproxy-1.5.3
and so on, in a same directory.
This patch also fixes a rare bug caused by readlink() not adding the
trailing zero and leaving possible existing contents, including possibly
a randomly placed "/" which would make it unable to locate the correct
binary. This case is not totally unlikely as I got a \177 a few times
at the end of the executable names, so I could have got a '/' as well.
Back-porting to 1.5 is desirable.
If a frontend has any tcp-request content rule relying on request contents
without any inspect delay, we now emit a warning as this will randomly match.
This can be backported to 1.5 as it reduces the support effort.
A config where a tcp-request rule appears after an http-request rule
might seem valid but it is not. So let's report a warning about this
since this case is hard to detect by the naked eye.
Some users want to have a stats frontend with one line per process, but while
100% valid and safe, the config parser emits a warning. Relax this check to
ensure that the warning is only emitted if at least one of the listeners is
bound to multiple processes, or if the directive is placed in a backend called
from multiple processes (since in this case we don't know if it's safe).
This is a continuation of previous patch, the listener's maxaccept is divided
by the number of processes, so it's best if we can swap the two blocks so that
the number of processes is already known when computing the maxaccept value.
When a frontend does not have any bind-process directive, make it
automatically bind to the union of all of its listeners' processes
instead of binding to all processes. That will make it possible to
have the expected behaviour without having to explicitly specify a
bind-process directive.
Note that if the listeners are not bound to a specific process, the
default is still to bind to all processes.
This change could be backported to 1.5 as it simplifies process
management, and was planned to be done during the 1.5 development phase.
We now recursively propagate the bind-process values between frontends
and backends instead of doing it during name resolving. This ensures
that we're able to properly propagate all the bind-process directives
even across "listen" instances, which are not perfectly covered at the
moment, depending on the declaration order.
This basically reverts 3507d5d ("MEDIUM: proxy: only adjust the backend's
bind-process when already set"). It was needed during the transition to
the new process binding method but is causing trouble now because frontend
to backend binding is not properly propagated.
This fix should be backported to 1.5.
Ryan Brock reported that server stickiness did not work for WebSocket
because the cookies and headers are not modified on 1xx responses. He
found that his browser correctly presents the cookies learned on 101
responses, which was not specifically defined in the WebSocket spec,
nor in the cookie spec. 101 is a very special case. Being part of 1xx,
it's an interim response. But within 1xx, it's special because it's
the last HTTP/1 response that transits on the wire, which is different
from 100 or 102 which may appear multiple times. So in that sense, we
can consider it as a final response regarding HTTP/1, and it makes
sense to allow header processing there. Note that we still ensure not
to mangle the Connection header, which is critical for HTTP upgrade to
continue to work smoothly with agents that are a bit picky about what
tokens are found there.
The rspadd rules are now processed for 101 responses as well, but the
cache-control checks are not performed (since no body is delivered).
Ryan confirmed that this patch works for him.
It would make sense to backport it to 1.5 given that it improves end
user experience on WebSocket servers.
My proposal is to let haproxy-systemd-wrapper also accept normal
SIGHUP/SIGTERM signals to play nicely with other process managers
besides just systemd. In my use case, this will be for using with
runit which has to ability to change the signal used for a
"reload" or "stop" command. It also might be worth renaming this
bin to just haproxy-wrapper or something of that sort to separate
itself away from systemd. But that's a different discussion. :)
Commit bb2e669 ("BUG/MAJOR: http: correctly rewind the request body
after start of forwarding") was incorrect/incomplete. It used to rely on
CF_READ_ATTACHED to stop updating msg->sov once data start to leave the
buffer, but this is unreliable because since commit a6eebb3 ("[BUG]
session: clear BF_READ_ATTACHED before next I/O") merged in 1.5-dev1,
this flag is only ephemeral and is cleared once all analysers have
seen it. So we can start updating msg->sov again each time we pass
through this place with new data. With a sufficiently large amount of
data, it is possible to make msg->sov wrap and validate the if()
condition at the top, causing the buffer to advance by about 2GB and
crash the process.
Note that the offset cannot be controlled by the attacker because it is
a sum of millions of small random sizes depending on how many bytes were
read by the server and how many were left in the buffer, only because
of the speed difference between reading and writing. Also, nothing is
written, the invalid pointer resulting from this operation is only read.
Many thanks to James Dempsey for reporting this bug and to Chris Forbes for
narrowing down the faulty area enough to make its root cause analysable.
This fix must be backported to haproxy 1.5.
When an unknown encryption algorithm is used in userlists or the password is
not pasted correctly in the configuration, http authentication silently fails.
An initial check is now performed during the configuration parsing, in order to
verify that the encrypted password is supported. An unsupported password will
fail with a fatal error.
This patch should be backported to 1.4 and 1.5.
Grégoire Morpain reported a segfault when a secured password is used for http
authentication. It was caused by the use of an unsupported encryption algorithm
with libcrypto. In this case, crypt() returns a NULL pointer.
The fix should be backported to 1.4 and 1.5.
This code aims at clearing up the ACL parsing code a bit to make it
more obvious what happens in the case of an ACL keyword and what happens
in the case of a sample expression. Variables prev_type and cur_type were
merged, and ACL keyword processing functions are checked only once.
This patch could be backported into 1.5 after the previous patch, in order
to keep the code more maintainable.
Sample expressions involving converters in expression simply do not work
if the converter changes the sample type from the original keyword. Either
the keyword is a sample fetch keyword and its own type is used, or it's an
ACL keyword, and the keyword's parse/index/match functions are used despite
the converters. Thus it causes such a stupid error :
redirect location / if { date,ltime(%a) -i Fri }
[ALERT] 240/171746 (29127) : parsing [bug-conv.cfg:35] : error detected in proxy 'svc' while parsing redirect rule : error in condition: 'Fri' is not a number.
In fact now in ACLs, the case where the ACL keyword is alone is an exception
(eventhough the most common one). It's an exception to the sample expression
parsing rules since ACLs allow to redefine alternate parsing functions.
This fix does two things :
- it voids any references to the ACL keyword when a converter is involved
since we certainly not want to enforce a parser for a wrong data type ;
- it ensures that for all other cases (sample expressions), the type of
the expression is used instead of the type of the fetch keyword.
A significant cleanup of the code should be done, but this patch only aims
fixing the bug.
The fix should be backported into 1.5 since this appeared along the redesign
of the acl/pattern processing.
Just like previous patch, this is a remains of an early implementation. Also
fix the outdated comments above. The fix may be backported to 1.5 though the
bug cannot be triggerred, thus it's just a matter of keeping the code clean.
pat_parse_meth() had some remains of an early implementation attempt for
the patterns, it initialises a trash and never sets the pattern value there.
The result is that a non-standard method cannot be matched anymore. The bug
appeared during the pattern rework in 1.5, so this fix must be backported
there. Thanks to Joe Williams of GitHub for reporting the bug.
This results in a string-based HTTP method match returning true when
it doesn't match and conversely. This bug was reported by Joe Williams.
The fix must be backported to 1.5, though it still doesn't work because
of at least 3-4 other bugs in the long path which leads to building this
pattern list.
Sometimes it would be convenient to have a log counter so that from a log
server we know whether some logs were lost or not. The frontend's log counter
serves exactly this purpose. It's incremented each time a traffic log is
produced. If a log is disabled using "http-request set-log-level silent",
the counter will not be incremented. However, admin logs are not accounted
for. Also, if logs are filtered out before being sent to the server because
of a minimum level set on the log line, the counter will be increased anyway.
The counter is 32-bit, so it will wrap, but that's not an issue considering
that 4 billion logs are rarely in the same file, let alone close to each
other.
There are two sample commands to get information about the presence of a
client certificate.
ssl_fc_has_crt is true if there is a certificate present in the current
connection
ssl_c_used is true if there is a certificate present in the session.
If a session has stopped and resumed, then ssl_c_used could be true, while
ssl_fc_has_crt is false.
In the client byte of the TLS TLV of Proxy Protocol V2, there is only one
bit to indicate whether a certificate is present on the connection. The
attached patch adds a second bit to indicate the presence for the session.
This maintains backward compatibility.
[wt: this should be backported to 1.5 to help maintain compatibility
between versions]
Before the commit bbba2a8ecc
(1.5-dev24-8), the tarpit section set timeout and return, after this
commit, the tarpit section set the timeout, and go to the "done" label
which reset the timeout.
Thanks Bryan Talbot for the bug report and analysis.
This should be backported in 1.5.
Google's boringssl has a different cipher_list, we cannot use it as
in OpenSSL. This is due to the "Equal preference cipher groups" feature:
https://boringssl.googlesource.com/boringssl/+/858a88daf27975f67d9f63e18f95645be2886bfb^!/
also see:
https://www.imperialviolet.org/2014/02/27/tlssymmetriccrypto.html
cipher_list is used in haproxy since commit f46cd6e4ec ("MEDIUM: ssl:
Add the option to use standardized DH parameters >= 1024 bits") to
check if DHE ciphers are used.
So, if boringssl is used, the patch just assumes that there is some
DHE cipher enabled. This will lead to false positives, but thats better
than compiler warnings and crashes.
This may be replaced one day by properly implementing the the new style
cipher_list, in the meantime this workaround allows to build and use
boringssl.
Signed-off-by: Lukas Tribus <luky-37@hotmail.com>
get_rfc2409_prime_1024() and friends are not available in Google's
boringssl, so use the fallback in that case.
Signed-off-by: Lukas Tribus <luky-37@hotmail.com>
Google's boringssl doesn't currently support OCSP, so
disable it if detected.
OCSP support may be reintroduced as per:
https://code.google.com/p/chromium/issues/detail?id=398677
In that case we can simply revert this commit.
Signed-off-by: Lukas Tribus <luky-37@hotmail.com>
Google's boringssl doesn't have OPENSSL_VERSION_TEXT, SSLeay_version()
or SSLEAY_VERSION, in fact, it doesn't have any real versioning, its
just git-based.
So in case we build against boringssl, we can't access those values.
Instead, we just inform the user that HAProxy was build against
boringssl.
Signed-off-by: Lukas Tribus <luky-37@hotmail.com>
A couple of typo fixed in 'http-response replace-header':
- an error when counting the number of arguments
- a typo in the alert message
This should be backported to 1.5.
When the child process terminates, it should wake up the associated task to
process the result immediately, otherwise it will be available only when the
task expires.
This fix is specific to the 1.6 branch.
The documentation indicates that external checks can be used in a backend
section, but the code requires a listener to provide information in the script
arguments.
External checks were initialized lately, during the first check, leaving some
variables uninitialized in such a scenario, which trigger the segfault when
accessed to collect errors information.
To prevent the segfault, currently we should initialize the external checks
earlier, during the process initialiation itself and quit if the error occurs.
This fix is specific to the 1.6 branch.
Mark Brooks reported an issue with external healthchecks, where servers are
never marked as UP. This is due to a typo, which flags a successful check as
CHK_RES_FAILED instead of CHK_RES_PASSED.
This bug is specific to the 1.6 branch.
As a consequence of various recent changes on the sample conversion,
a corner case has emerged where it is possible to wait forever for a
sample in track-sc*.
The issue is caused by the fact that functions relying on sample_process()
don't all exactly work the same regarding the SMP_F_MAY_CHANGE flag and
the output result. Here it was possible to wait forever for an output
sample from stktable_fetch_key() without checking the SMP_OPT_FINAL flag.
As a result, if the client connects and closes without sending the data
and haproxy expects a sample which is capable of coming, it will ignore
this impossible case and will continue to wait.
This change adds control for SMP_OPT_FINAL before waiting for extra data.
The various relevant functions have been better documented regarding their
output values.
This fix must be backported to 1.5 since it appeared there.
Move all code out of the signal handlers, since this is potentially
dangerous. To make sure the signal handlers behave as expected, use
sigaction() instead of signal(). That also obsoletes messing with
the signal mask after restart.
Signed-off-by: Conrad Hoffmann <conrad@soundcloud.com>
Searching for the pid file in the list of arguments did not
take flags without parameters into account, like e.g. -de. Because
of this, the wrapper would use a different pid file than haproxy
if such an argument was specified before -p.
The new version can still yield a false positive for some crazy
situations, like your config file name starting with "-p", but
I think this is as good as it gets without using getopt or some
library.
Signed-off-by: Conrad Hoffmann <conrad@soundcloud.com>
Last commit 77d1f01 ("BUG/MEDIUM: connection: fix memory corruption
when building a proxy v2 header") was wrong, using &cn_trash instead
of cn_trash resulting in a warning and the client's SSL cert CN not
being stored at the proper location.
Thanks to Lukas Tribus for spotting this quickly.
This should be backported to 1.5 after the patch above is backported.
Add support for http-request track-sc, similar to what is done in
tcp-request for backends. A new act_prm field was added to HTTP
request rules to store the track params (table, counter). Just
like for TCP rules, the table is resolved while checking for
config validity. The code was mostly copied from the TCP code
with the exception that here we also count the HTTP request count
and rate by hand. Probably that something could be factored out in
the future.
It seems like tracking flags should be improved to mark each hook
which tracks a key so that we can have some check points where to
increase counters of the past if not done yet, a bit like is done
for TRACK_BACKEND.
Doing so finally allows to apply the hex converter to integers as well.
Note that all integers are represented in 32-bit, big endian so that their
conversion remains human readable and portable. A later improvement to the
hex converter could be to make it trim leading zeroes, and/or to only report
a number of least significant bytes.
From time to time it's useful to hash input data (scramble input, or
reduce the space needed in a stick table). This patch provides 3 simple
converters allowing use of the available hash functions to hash input
data. The output is an unsigned integer which can be passed into a header,
a log or used as an index for a stick table. One nice usage is to scramble
source IP addresses before logging when there are requirements to hide them.
IP addresses are a perfect example of fixed size data which we could
cast to binary, still it was not allowed by lack of cast function,
eventhough the opposite was allowed in ACLs. Make that possible both
in sample expressions and in stick tables.
We're using the internal memory representation of base32 here, which is
wrong since these data might be exported to headers for logs or be used
to stick to a server and replicated to other peers. Let's convert base32
to big endian (network representation) when building the binary block.
This mistake is also present in 1.5, it would be better to backport it.
Some users want to add their own data types to stick tables. We don't
want to use a linked list here for performance reasons, so we need to
continue to use an indexed array. This patch allows one to reserve a
compile-time-defined number of extra data types by setting the new
macro STKTABLE_EXTRA_DATA_TYPES to anything greater than zero, keeping
in mind that anything larger will slightly inflate the memory consumed
by stick tables (not per entry though).
Then calling stktable_register_data_store() with the new keyword will
either register a new keyword or fail if the desired entry was already
taken or the keyword already registered.
Note that this patch does not dictate how the data will be used, it only
offers the possibility to create new keywords and have an index to
reference them in the config and in the tables. The caller will not be
able to use stktable_data_cast() and will have to explicitly cast the
stable pointers to the expected types. It can be used for experimentation
as well.
OpenSSL does not free the DH * value returned by the callback specified with SSL_CTX_set_tmp_dh_callback(),
leading to a memory leak for SSL/TLS connections using Diffie Hellman Ephemeral key exchange.
This patch fixes the leak by allocating the DH * structs holding the DH parameters once, at configuration time.
Note: this fix must be backported to 1.5.
Daniel Dubovik reported an interesting bug showing that the request body
processing was still not 100% fixed. If a POST request contained short
enough data to be forwarded at once before trying to establish the
connection to the server, we had no way to correctly rewind the body.
The first visible case is that balancing on a header does not always work
on such POST requests since the header cannot be found. But there are even
nastier implications which are that http-send-name-header would apply to
the wrong location and possibly even affect part of the request's body
due to an incorrect rewinding.
There are two options to fix the problem :
- first one is to force the HTTP_MSG_F_WAIT_CONN flag on all hash-based
balancing algorithms and http-send-name-header, but there's always a
risk that any new algorithm forgets to set it ;
- the second option is to account for the amount of skipped data before
the connection establishes so that we always know the position of the
request's body relative to the buffer's origin.
The second option is much more reliable and fits very well in the spirit
of the past changes to fix forwarding. Indeed, at the moment we have
msg->sov which points to the start of the body before headers are forwarded
and which equals zero afterwards (so it still points to the start of the
body before forwarding data). A minor change consists in always making it
point to the start of the body even after data have been forwarded. It means
that it can get a negative value (so we need to change its type to signed)..
In order to avoid wrapping, we only do this as long as the other side of
the buffer is not connected yet.
Doing this definitely fixes the issues above for the requests. Since the
response cannot be rewound we don't need to perform any change there.
This bug was introduced/remained unfixed in 1.5-dev23 so the fix must be
backported to 1.5.
This patch adds two converters :
ltime(<format>[,<offset>])
utime(<format>[,<offset>])
Both use strftime() to emit the output string from an input date. ltime()
provides local time, while utime() provides the UTC time.
These new converters make it possible to look up any sample expression
in a table, and check whether an equivalent key exists or not, and if it
exists, to retrieve the associated data (eg: gpc0, request rate, etc...).
Till now it was only possible using tracking, but sometimes tracking is
not suited to only retrieving such counters, either because it's done too
early or because too many items need to be checked without necessarily
being tracked.
These converters all take a string on input, and then convert it again to
the table's type. This means that if an input sample is of type IPv4 and
the table is of type IP, it will first be converted to a string, then back
to an IP address. This is a limitation of the current design which does not
allow converters to declare that "any" type is supported on input. Since
strings are the only types which can be cast to any other one, this method
always works.
The following converters were added :
in_table, table_bytes_in_rate, table_bytes_out_rate, table_conn_cnt,
table_conn_cur, table_conn_rate, table_gpc0, table_gpc0_rate,
table_http_err_cnt, table_http_err_rate, table_http_req_cnt,
table_http_req_rate, table_kbytes_in, table_kbytes_out,
table_server_id, table_sess_cnt, table_sess_rate, table_trackers.
Currently we have stktable_fetch_key() which fetches a sample according
to an expression and returns a stick table key, but we also need a function
which does only the second half of it from a known sample. So let's cut the
function in two and introduce smp_to_stkey() to perform this lookup. The
first function was adapted to make use of it in order to avoid code
duplication.
When we were generating a hash, it was done using an unsigned long. When the hash was used
to select a backend, it was sent as an unsigned int. This made it difficult to predict which
backend would be selected.
This patch updates get_hash, and the hash methods to use an unsigned int, to remain consistent
throughout the codebase.
This fix should be backported to 1.5 and probably in part to 1.4.
Abstract namespace sockets ignore the shutdown() call and do not make
it possible to temporarily stop listening. The issue it causes is that
during a soft reload, the new process cannot bind, complaining that the
address is already in use.
This change registers a new pause() function for unix sockets and
completely unbinds the abstract ones since it's possible to rebind
them later. It requires the two previous patches as well as preceeding
fixes.
This fix should be backported into 1.5 since the issue apperas there.
When a listener resumes operations, supporting a full rebind makes it
possible to perform a full stop as a pause(). This will be used for
pausing abstract namespace unix sockets.
In order to fix the abstact socket pause mechanism during soft restarts,
we'll need to proceed differently depending on the socket protocol. The
pause_listener() function already supports some protocol-specific handling
for the TCP case.
This commit makes this cleaner by adding a new ->pause() function to the
protocol struct, which, if defined, may be used to pause a listener of a
given protocol.
For now, only TCP has been adapted, with the specific code moved from
pause_listener() to tcp_pause_listener().
Jan Seda noticed that abstract sockets are incompatible with soft reload,
because the new process cannot bind and immediately fails. This patch marks
the binding as retryable and not fatal so that the new process can try to
bind again after sending a signal to the old process.
Note that this fix is not enough to completely solve the problem, but it
is necessary. This patch should be backported to 1.5.
This is currently harmless, but when stopping a listener, its fd is
closed but not set to -1, so it is not possible to re-open it again.
Currently this has no impact but can have after the abstract sockets
are modified to perform a complete close on soft-reload.
The fix can be backported to 1.5 and may even apply to 1.4 (protocols.c).
As usual, when touching any is* function, Solaris complains about the
type of the element being checked. Better backport this to 1.5 since
nobody knows what the emitted code looks like since macros are used
instead of functions.
check.c doesn't build on solaris since 98637e5 ("MEDIUM: Add external check")
because sigset_t is unknown. Simply include signal.h to fix the issue.
No need to backport, this is 1.6-only.
When bind() fails (function uxst_bind_listener()), the fail path doesn't
consider the abstract namespace and tries to unlink paths held in
uninitiliazed memory (tempname and backname). See the strace excerpt;
the strings still hold the path from test1.
===============================================================================================
23722 bind(5, {sa_family=AF_FILE, path=@"test2"}, 110) = -1 EADDRINUSE (Address already in use)
23722 unlink("/tmp/test1.sock.23722.tmp") = -1 ENOENT (No such file or directory)
23722 close(5) = 0
23722 unlink("/tmp/test1.sock.23722.bak") = -1 ENOENT (No such file or directory)
===============================================================================================
This patch should be backported to 1.5.
With all the goodies supported by logformat, people find that the limit
of 1024 chars for log lines is too short. Some servers do not support
larger lines and can simply drop them, so changing the default value is
not always the best choice.
This patch takes a different approach. Log line length is specified per
log server on the "log" line, with a value between 80 and 65535. That
way it's possibly to satisfy all needs, even with some fat local servers
and small remote ones.
I've been facing multiple configurations which involved track-sc* rules
in tcp-request content without the "if ..." to force it to wait for the
contents, resulting in random behaviour with contents sometimes retrieved
and sometimes not.
Reading the doc doesn't make it clear either that the tracking will be
performed only if data are already there and that waiting on an ACL is
the only way to avoid this.
Since this behaviour is not natural and we now have the ability to fix
it, this patch ensures that if input data are still moving, instead of
silently dropping them, we naturally wait for them to stabilize up to
the inspect-delay. This way it's not needed anymore to implement an
ACL-based condition to force to wait for data, eventhough the behaviour
is not changed for when an ACL is present.
The most obvious usage will be when track-sc is followed by any HTTP
sample expression, there's no need anymore for adding "if HTTP".
It's probably worth backporting this to 1.5 to avoid further configuration
issues. Note that it requires previous patch.
stktable_fetch_key() does not indicate whether it returns NULL because
the input sample was not found or because it's unstable. It causes trouble
with track-sc* rules. Just like with sample_fetch_string(), we want it to
be able to give more information to the caller about what it found. Thus,
now we use the pointer to a sample passed by the caller, and fill it with
the information we have about the sample. That way, even if we return NULL,
the caller has the ability to check whether a sample was found and if it is
still changing or not.
We used to only clear flags when reusing the static sample before calling
sample_process(), but that's not enough because there's a context in samples
that can be used by some fetch functions such as auth, headers and cookies,
and not reinitializing it risks that a pointer of a different type is used
in the wrong context.
An example configuration which triggers the case consists in mixing hdr()
and http_auth_group() which both make use of contexts :
http-request add-header foo2 %[hdr(host)],%[http_auth_group(foo)]
The solution is simple, initialize all the sample and not just the flags.
This fix must be backported into 1.5 since it was introduced in 1.5-dev19.
Baptiste Assmann reported a corner case in the releasing of stick-counters:
we release content-aware counters before logging. In the past it was not a
problem, but since now we can log them it, it prevents one from logging
their value. Simply switching the log production and the release of the
counter fixes the issue.
This should be backported into 1.5.
'ssl_sock_get_common_name' applied to a connection was also renamed
'ssl_sock_get_remote_common_name'. Currently, this function is only used
with protocol PROXYv2 to retrieve the client certificate's common name.
A further usage could be to retrieve the server certificate's common name
on an outgoing connection.
The sample fetch function "base" makes use of the trash which is also
used by set-header/add-header etc... everything which builds a formated
line. So we end up with some junk in the header if base is in use. Let's
fix this as all other fetches by using a trash chunk instead.
This bug was reported by Baptiste Assmann, and also affects 1.5.
Commit 81ae195 ("[MEDIUM] add support for logging via a UNIX socket")
merged in 1.3.14 introduced a few minor issues with log sockets. All
of them happen only when a failure is encountered when trying to set
up the logging socket (eg: socket family is not available or is
temporarily short in resources).
The first socket which experiences an error causes the socket setup
loop to abort, possibly preventing any log from being sent if it was
the first logger. The second issue is that if this socket finally
succeeds after a second attempt, errors are reported for the wrong
logger (eg: logger #1 failed instead of #2). The last point is that
we now have multiple loggers, and it's a waste of time to walk over
their list for every log while they're almost always properly set up.
So in order to fix all this, let's merge the two lists. If a logger
experiences an error, it simply sends an alert and skips to the next
one. That way they don't prevent messages from being sent and are
all properly accounted for.
This is the 3rd regression caused by the changes below. The latest to
date was reported by Finn Arne Gangstad. If a server responds with no
content-length and the client's FIN is never received, either we leak
the client-side FD or we spin at 100% CPU if timeout client-fin is set.
Enough is enough. The amount of tricks needed to cover these side-effects
starts to look like used toilet paper stacked over a chocolate cake. I
don't want to eat that cake anymore!
All this to avoid reporting a server-side timeout when a client stops
uploading data and haproxy expires faster than the server... A lot of
"ifs" resulting in a technically valid log that doesn't always please
users, and whose alternative causes that many issues for all others
users.
So let's revert this crap merged since 1.5-dev25 :
Revert "CLEANUP: http: don't clear CF_READ_NOEXP twice"
This reverts commit 1592d1e72a.
Revert "BUG/MEDIUM: http: clear CF_READ_NOEXP when preparing a new transaction"
This reverts commit 77d29029af.
Revert "BUG/MEDIUM: session: don't clear CF_READ_NOEXP if analysers are not called"
This reverts commit 0943757a21.
Revert "BUG/MEDIUM: http: disable server-side expiration until client has sent the body"
This reverts commit 3bed5e9337.
Revert "BUG/MEDIUM: http: correctly report request body timeouts"
This reverts commit b9edf8fbec.
Revert "BUG/MEDIUM: http/session: disable client-side expiration only after body"
This reverts commit b1982e27aa.
If a cleaner AND SAFER way to do something equivalent in 1.6-dev, we *might*
consider backporting it to 1.5, but given the vicious bugs that have surfaced
since, I doubt it will happen any time soon.
Fortunately, that crap never made it into 1.4 so no backport is needed.
For some browsers (firefox), an expired OCSP Response causes unwanted behavior.
Haproxy stops serving OCSP response if nextupdate date minus
the supported time skew (#define OCSP_MAX_RESPONSE_TIME_SKEW) is
in the past.
I am not entirely sure that this is a bug, but it seems
to me that it may cause a problem if there agent-check is
configured and there is some kind of error making a connection for it.
Signed-off-by: Simon Horman <horms@verge.net.au>
The support is all based on static responses. This doesn't add any
request / response logic to HAProxy, but allows a way to update
information through the socket interface.
Currently certificates specified using "crt" or "crt-list" on "bind" lines
are loaded as PEM files.
For each PEM file, haproxy checks for the presence of file at the same path
suffixed by ".ocsp". If such file is found, support for the TLS Certificate
Status Request extension (also known as "OCSP stapling") is automatically
enabled. The content of this file is optional. If not empty, it must contain
a valid OCSP Response in DER format. In order to be valid an OCSP Response
must comply with the following rules: it has to indicate a good status,
it has to be a single response for the certificate of the PEM file, and it
has to be valid at the moment of addition. If these rules are not respected
the OCSP Response is ignored and a warning is emitted. In order to identify
which certificate an OCSP Response applies to, the issuer's certificate is
necessary. If the issuer's certificate is not found in the PEM file, it will
be loaded from a file at the same path as the PEM file suffixed by ".issuer"
if it exists otherwise it will fail with an error.
It is possible to update an OCSP Response from the unix socket using:
set ssl ocsp-response <response>
This command is used to update an OCSP Response for a certificate (see "crt"
on "bind" lines). Same controls are performed as during the initial loading of
the response. The <response> must be passed as a base64 encoded string of the
DER encoded response from the OCSP server.
Example:
openssl ocsp -issuer issuer.pem -cert server.pem \
-host ocsp.issuer.com:80 -respout resp.der
echo "set ssl ocsp-response $(base64 -w 10000 resp.der)" | \
socat stdio /var/run/haproxy.stat
This feature is automatically enabled on openssl 0.9.8h and above.
This work was performed jointly by Dirkjan Bussink of GitHub and
Emeric Brun of HAProxy Technologies.
The pcreposix layer (in the pcre projetc) execute strlen to find
thlength of the string. When we are using the function "regex_exex*2",
the length is used to add a final \0, when pcreposix is executed a
strlen is executed to compute the length.
If we are using a native PCRE api, the length is provided as an
argument, and these operations disappear.
This is useful because PCRE regex are more used than POSIC regex.
The new regex function can use string and length. The HAproxy buffer are
not null-terminated, and the use of the regex_exec* functions implies
the add of this null character. This patch replace these function by the
functions which takes a string and length as input.
Just the file "proto_http.c" is change because this one is more executed
than other. The file "checks.c" have a very low usage, and it is not
interesting to change it. Furthermore, the buffer used by "checks.c" are
null-terminated.
This patch remove all references of standard regex in haproxy. The last
remaining references are only in the regex.[ch] files.
In the file src/checks.c, the original function uses a "pmatch" array.
In fact this array is unused. This patch remove it.
This patchs rename the "regex_exec" to "regex_exec2". It add a new
"regex_exec", "regex_exec_match" and "regex_exec_match2" function. This
function can match regex and return array containing matching parts.
Otherwise, this function use the compiled method (JIT or PCRE or POSIX).
JIT require a subject with length. PCREPOSIX and native POSIX regex
require a null terminted subject. The regex_exec* function are splited
in two version. The first version take a null terminated string, but it
execute strlen() on the subject if it is compiled with JIT. The second
version (terminated by "2") take the subject and the length. This
version adds a null character in the subject if it is compiled with
PCREPOSIX or native POSIX functions.
The documentation of posix regex and pcreposix says that the function
returns 0 if the string matche otherwise it returns REG_NOMATCH. The
REG_NOMATCH macro take the value 1 with posix regex and the value 17
with the pcreposix. The documentaion of the native pcre API (used with
JIT) returns a negative number if no match, otherwise, it returns 0 or a
positive number.
This patch fix also the return codes of the regex_exec* functions. Now,
these function returns true if the string match, otherwise it returns
false.
When I renamed the modify-header action to replace-value, one of them
was mistakenly set to "replace-val" instead. Additionally, differentiation
of the two actions must be done on args[0][8] and not *args[8]. Thanks
Thierry for spotting...
This patch adds two new actions to http-request and http-response rulesets :
- replace-header : replace a whole header line, suited for headers
which might contain commas
- replace-value : replace a single header value, suited for headers
defined as lists.
The match consists in a regex, and the replacement string takes a log-format
and supports back-references.
The time statistics computed by previous patches are now reported in the
HTML stats in the tips related to the total sessions for backend and servers,
and as separate columns for the CSV stats.
Using the last rate counters, we now compute the queue, connect, response
and total times per server and per backend with a 95% accuracy over the last
1024 samples. The operation is cheap so we don't need to condition it.
Now that we can quote unsafe string, it becomes possible to dump the health
check responses on the CSV page as well. The two new fields are "last_chk"
and "last_agt".
qstr() and cstr() will be used to quote-encode strings. The first one
does it unconditionally. The second one is aimed at CSV files where the
quote-encoding is only needed when the field contains a quote or a comma.
This helper is similar to addr_to_str but
tries to convert the port rather than the address
of a struct sockaddr_storage.
This is in preparation for supporting
an external agent check.
Signed-off-by: Simon Horman <horms@verge.net.au>
The "accept-proxy" statement of bind lines was still limited to version
1 of the protocol, while send-proxy-v2 is now available on the server
lines. This patch adds support for parsing v2 of the protocol on incoming
connections. The v2 header is automatically recognized so there is no
need for a new option.
This is in order to simplify the PPv2 header parsing code to look more
like the one provided as an example in the spec. No code change was
performed beyond just merging the proxy_addr union into the proxy_hdr_v2
struct.
If haproxy receives a connection over a unix socket and forwards it to
another haproxy instance using proxy protocol v1, it sends an UNKNOWN
protocol, which is rejected by the other side. Make the receiver accept
the UNKNOWN protocol as per the spec, and only use the local connection's
address for this.
As discussed with Dmitry Sivachenko, is a server farm has more than one
active server, uses a guaranteed non-determinist algorithm (round robin),
and a connection was initiated from a non-persistent connection, there's
no point insisting to reconnect to the same server after a connect failure,
better redispatch upon the very first retry instead of insisting on the same
server multiple times.
The retry delay is only useful when sticking to a same server. During
a redispatch, it's useless and counter-productive if we're sure to
switch to another server, which is almost guaranteed when there's
more than one server and the balancing algorithm is round robin, so
better not pass via the turn-around state in this case. It could be
done as well for leastconn, but there's a risk of always killing the
delay after the recovery of a server in a farm where it's almost
guaranteed to take most incoming traffic. So better only kill the
delay when using round robin.
As discussed with Dmitry Sivachenko, the default 1-second connect retry
delay can be large for situations where the connect timeout is much smaller,
because it means that an active connection reject will take more time to be
retried than a silent drop, and that does not make sense.
This patch changes this so that the retry delay is the minimum of 1 second
and the connect timeout. That way people running with sub-second connect
timeout will benefit from the shorter reconnect.
This new directive captures the specified fetch expression, converts
it to text and puts it into the next capture slot. The capture slots
are shared with header captures so that it is possible to dump all
captures at once or selectively in logs and header processing.
The purpose is to permit logs to contain whatever payload is found in
a request, for example bytes at a fixed location or the SNI of forwarded
SSL traffic.
This patch adds support for captures with no header name. The purpose
is to allow extra captures to be defined and logged along with the
header captures.
Currently, all callers to sample_fetch_string() call it with SMP_OPT_FINAL.
Now we improve it to support the case where this option is not set, and to
make it return the original sample as-is. The purpose is to let the caller
check the SMP_F_MAY_CHANGE flag in the result and know that it should wait
to get complete contents. Currently this has no effect on existing code.
Similar to previous patches, HTTP header captures are performed when
a TCP frontend switches to an HTTP backend, but are not possible to
report. So let's relax the check to explicitly allow them to be present
in TCP frontends.
Log format is defined in the frontend, and some frontends may be chained to
an HTTP backend. Sometimes it's very convenient to be able to log the HTTP
status code of these HTTP backends. This status is definitely present in
the internal structures, it's just that we used to limit it to be used in
HTTP frontends. So let's simply relax the check to allow it to be used in
TCP frontends as well.
In OpenSSL, the name of a cipher using ephemeral diffie-hellman for key
exchange can start with EDH, but also DHE, EXP-EDH or EXP1024-DHE.
We work around this issue by using the cipher's description instead of
the cipher's name.
Hopefully the description is less likely to change in the future.
When no static DH parameters are specified, this patch makes haproxy
use standardized (rfc 2409 / rfc 3526) DH parameters with prime lenghts
of 1024, 2048, 4096 or 8192 bits for DHE key exchange. The size of the
temporary/ephemeral DH key is computed as the minimum of the RSA/DSA server
key size and the value of a new option named tune.ssl.default-dh-param.
Using the systemd daemon mode the parent doesn't exits but waits for
his childs without closing its listening sockets.
As linux 3.9 introduced a SO_REUSEPORT option (always enabled in
haproxy if available) this will give unhandled connections problems
after an haproxy reload with open connections.
The problem is that when on reload a new parent is started (-Ds
$oldchildspids), in haproxy.c main there's a call to start_proxies
that, without SO_REUSEPORT, should fail (as the old processes are
already listening) and so a SIGTOU is sent to old processes. On this
signal the old childs will call (in pause_listener) a shutdown() on
the listening fd. From my tests (if I understand it correctly) this
affects the in kernel file (so the listen is really disabled for all
the processes, also the parent).
Instead, with SO_REUSEPORT, the call to start_proxies doesn't fail and
so SIGTOU is never sent. Only SIGUSR1 is sent and the listen isn't
disabled for the parent but only the childs will stop listening (with
a call to close())
So, with SO_REUSEPORT, the old childs will close their listening
sockets but will wait for the current connections to finish or
timeout, and, as their parent has its listening socket open, the
kernel will schedule some connections on it. These connections will
never be accepted by the parent as it's in the waitpid loop.
This fix will close all the listeners on the parent before entering the
waitpid loop.
Signed-off-by: Simone Gotti <simone.gotti@gmail.com>
MySQL will in stop supporting pre-4.1 authentication packets in the future
and is already giving us a hard time regarding non-silencable warnings
which are logged on each health check. Warnings look like the following:
"[Warning] Client failed to provide its character set. 'latin1' will be used
as client character set."
This patch adds basic support for post-4.1 authentication by sending the proper
authentication packet with the character set, along with the QUIT command.
Commit b1982e2 ("BUG/MEDIUM: http/session: disable client-side expiration
only after body") was tricky and caused an issue which was fixed by commit
0943757 ("BUG/MEDIUM: session: don't clear CF_READ_NOEXP if analysers are
not called"). But that's not enough, another issue was introduced and further
emphasized by last fix.
The issue is that the CF_READ_NOEXP flag needs to be cleared when waiting
for a new request over that connection, otherwise we cannot expire anymore
an idle connection waiting for a new request.
This explains the neverending keepalives reported by at least 3 different
persons since dev24. No backport is needed.
As reported by Vincent Bernat and Ryan O'Hara, building haproxy with the
option above causes this :
src/dumpstats.c: In function 'stats_dump_sv_stats':
src/dumpstats.c:3059:4: error: format not a string literal and no format arguments [-Werror=format-security]
cc1: some warnings being treated as errors
make: *** [src/dumpstats.o] Error 1
With that option, gcc wants an argument after a string format even when
that string format is a const but not a litteral. It can be anything
invalid, for example an integer when a string is expected, it just
wants something. So feed it with something :-(
Dmitry Sivachenko reported that "uint" doesn't build on FreeBSD 10.
On Linux it's defined in sys/types.h and indicated as "old". Just
get rid of the very few occurrences.
One important aspect of SSL performance tuning is the cache size,
but there's no metric to know whether it's large enough or not. This
commit introduces two counters, one for the cache lookups and another
one for cache misses. These counters are reported on "show info" on
the stats socket. This way, it suffices to see the cache misses
counter constantly grow to know that a larger cache could possibly
help.
It's commonly needed to know how many SSL asymmetric keys are computed
per second on either side (frontend or backend), and to know the SSL
session reuse ratio. Now we compute these values and report them in
"show info".
Currently exp_replace() (which is used in reqrep/reqirep) is
vulnerable to a buffer overrun. I have been able to reproduce it using
the attached configuration file and issuing the following command:
wget -O - -S -q http://localhost:8000/`perl -e 'print "a"x4000'`/cookie.php
Str was being checked only in in while (str) and it was possible to
read past that when more than one character was being accessed in the
loop.
WT:
Note that this bug is only marked MEDIUM because configurations
capable of triggering this bug are very unlikely to exist at all due
to the fact that most rewrites consist in static string additions
that largely fit into the reserved area (8kB by default).
This fix should also be backported to 1.4 and possibly even 1.3 since
it seems to have been present since 1.1 or so.
Config:
-------
global
maxconn 500
stats socket /tmp/haproxy.sock mode 600
defaults
timeout client 1000
timeout connect 5000
timeout server 5000
retries 1
option redispatch
listen stats
bind :8080
mode http
stats enable
stats uri /stats
stats show-legends
listen tcp_1
bind :8000
mode http
maxconn 400
balance roundrobin
reqrep ^([^\ :]*)\ /(.*)/(.*)\.php(.*) \1\ /\3.php?arg=\2\2\2\2\2\2\2\2\2\2\2\2\2\4
server srv1 127.0.0.1:9000 check port 9000 inter 1000 fall 1
server srv2 127.0.0.1:9001 check port 9001 inter 1000 fall 1
We now retrieve a lot of information from a single line of response, which
can be made up of various words delimited by spaces/tabs/commas. We try to
arrange all this and report whatever unusual we detect. The agent now supports :
- "up", "down", "stopped", "fail" for the operational states
- "ready", "drain", "maint" for the administrative states
- any "%" number for the weight
- an optional reason after a "#" that can be reported on the stats page
The line parser and processor should move to its own function so that
we can reuse the exact same one for http-based agent checks later.
When an agent is enabled and forces a down state, it's important to have
this exact information and to report the agent's status, so let's check
the agent before checking the health check.
Commit 671b6f0 ("MEDIUM: Add enable and disable agent unix socket commands")
forgot to update the relevant help messages. This was done in 1.5-dev20, no
backport is needed.
Agent will have the ability to return a weight without indicating an
up/down status. Currently this is not possible, so let's add a 5th
result CHK_RES_NEUTRAL for this purpose. It has been mapped to the
unused HCHK_STATUS_CHECKED which already serves as a neutral delimitor
between initiated checks and those returning a result.
Indicate "Agent" instead of "Health" in health check reports sent when
"option log-health-checks" is set. Also, ensure that any agent check
status change is correctly reported. Till now we used not to emit logs
when the agent could not be reached.
Till now we only had "DOWN" on the stats page, whether it's the agent
or regular checks which caused this status. Let's differentiate the
two with "DOWN (agent)" so that admins know that the agent is causing
this status.
This command supports "agent", "health", "state" and "weight" to adjust
various server attributes as well as changing server health check statuses
on the fly or setting the drain mode.
Instead of enabling/disabling maintenance mode and drain mode separately
using 4 actions, we now offer 3 simplified actions :
- set state to READY
- set state to DRAIN
- set state to MAINT
They have the benefit of reporting the same state as displayed on the page,
and of doing the double-switch atomically eg when switching from drain to
maint.
Note that the old actions are still supported for users running scripts.
This patch adds support for a new "drain" mode. So now we have 3 admin
modes for a server :
- READY
- DRAIN
- MAINT
The drain mode disables load balancing but leaves the server up. It can
coexist with maint, except that maint has precedence. It is also inherited
from tracked servers, so just like maint, it's represented with 2 bits.
New functions were designed to set/clear each flag and to propagate the
changes to tracking servers when relevant, and to log the changes. Existing
functions srv_set_adm_maint() and srv_set_adm_ready() were replaced to make
use of the new functions.
Currently the drain mode is not yet used, however the whole logic was tested
with all combinations of set/clear of both flags in various orders to catch
all corner cases.
srv_is_usable() is broader than srv_is_usable() as it not only considers
the weight but the server's state as well. Future changes will allow a
server to be in drain mode with a non-zero weight, so we should migrate
to use that function instead.
Now that servers have their own states, let's report this one instead of
following the tracked server chain and reporting the tracked server's.
However the tracked server is still used to report x/y when a server is
going up or down. When the agent reports a down state, this one is still
enforced.
Function check_set_server_drain() used to set a server into stopping state.
Now it first checks if all configured checks are UP, and if the possibly
tracked servers is not stopped, and only calls set_srv_stopping() after
that. That also simplified the conditions to call the function, and its
logic. The function was also renamed check_notify_stopping() to better
report this change.
Function check_set_server_up() used to set a server up. Now it first
checks if all configured checks are UP, and if all tracked servers are
UP, and only calls set_srv_running() after that. That also simplified
the conditions to call the function, and its logic. The function was
also renamed check_notify_success() to better report this change.
Function check_set_server_down() used to set a server down. Now it first
checks if the health check's result differs from the server's state, and
only calls srv_set_stopped() if the check reports a failure while the
server is not down. Thanks to this, the conditions that were present
around its call could be removed. The function was also renamed
check_notify_failure() to better report this change.
This function was taken from check_set_server_drain(). It does not
consider health checks at all and only sets a server to stopping
provided it's not in maintenance and is not currently stopped. The
resulting state will be STOPPING. The state change is propagated
to tracked servers.
For now the function is not used, but the goal is to split health
checks status from server status and to be able to change a server's
state regardless of health checks statuses.
This function was taken from check_set_server_up(). It does not consider
health checks at all and only sets a server up provided it's not in
maintenance. The resulting state may be either RUNNING or STARTING
depending on the presence of a slowstart or not. The state change is
propagated to tracked servers.
For now the function is not used, but the goal is to split health
checks status from server status and to be able to change a server's
state regardless of health checks statuses.
This function was extracted from check_set_server_down(). In only
manipulates the server state and does not consider the health checks
at all, nor does it modify their status. It takes a reason message to
report in logs, however it passes NULL when recursing through the
trackers chain.
For now the function is not used, but the goal is to split health
checks status from server status and to be able to change a server's
state regardless of health checks statuses.
check_report_srv_status() was removed in favor of check_reason_string()
combined with srv_report_status(). This way we have one function which
is dedicated to check decoding, and another one dedicated to server
status.
srv_adm_append_status() was renamed srv_append_status() since it's no
more dedicated to maintenance mode. It now supports a reason which if
not null is appended to the output string.
We don't want to manipulate check's statuses anymore in functions
which modify the server's state. So since any check is forced to
call set_server_check_status() exactly once to report the result
of the check, it's the best place to update the check's health.
We don't have to handle the maintenance transition here anymore so we
can simplify the functions and conditions. This also means that we don't
need the disable/enable functions but only a function to switch to each
new state.
It's worth mentionning that at this stage there are still confusions
between the server state and the checks states. For example, the health
check's state is adjusted from tracked servers changing state, while it
should not be.
Now that it is possible to know whether a server is in forced maintenance
or inherits its maintenance status from another one, it is possible to
allow server tracking at more than one level. We still provide a loop
detection however.
Note that for the stats it's a bit trickier since we have to report the
check state which corresponds to the state of the server at the end of
the chain.
This change now involves a new flag SRV_ADMF_IMAINT to note that the
maintenance status of a server is inherited from another server. Thus,
we know at each server level in the chain if it's running, in forced
maintenance or in a maintenance status because it tracks another server,
or even in both states.
Disabling a server propagates this flag down to other servers. Enabling
a server flushes the flag down. A server becomes up again once both of
its flags are cleared.
Two new functions "srv_adm_set_maint()" and "srv_adm_set_ready()" are used to
manipulate this maintenance status. They're used by the CLI and the stats
page.
Now the stats page always says "MAINT" instead of "MAINT(via)" and it's
only the chk/down field which reports "via x/y" when the status is
inherited from another server, but it doesn't say it when a server was
forced into maintenance. The CSV output indicates "MAINT (via x/y)"
instead of only "MAINT(via)". This is the most accurate representation.
One important thing is that now entering/leaving maintenance for a
tracking server correctly follows the state of the tracked server.
Checks.c has become a total mess. A number of proxy or server maintenance
and queue management functions were put there probably because they were
used there, but that makes the code untouchable. And that's without saying
that their names does not always relate to what they really do!
So let's do a first pass by moving these ones :
- set_backend_down() => backend.c
- redistribute_pending() => queue.c:pendconn_redistribute()
- check_for_pending() => queue.c:pendconn_grab_from_px()
- shutdown_sessions => server.c:srv_shutdown_sessions()
- shutdown_backup_sessions => server.c:srv_shutdown_backup_sessions()
All of them were moved at once.
Servers used to have 3 flags to store a state, now they have 4 states
instead. This avoids lots of confusion for the 4 remaining undefined
states.
The encoding from the previous to the new states can be represented
this way :
SRV_STF_RUNNING
| SRV_STF_GOINGDOWN
| | SRV_STF_WARMINGUP
| | |
0 x x SRV_ST_STOPPED
1 0 0 SRV_ST_RUNNING
1 0 1 SRV_ST_STARTING
1 1 x SRV_ST_STOPPING
Note that the case where all bits were set used to exist and was randomly
dealt with. For example, the task was not stopped, the throttle value was
still updated and reported in the stats and in the http_server_state header.
It was the same if the server was stopped by the agent or for maintenance.
It's worth noting that the internal function names are still quite confusing.
Now we introduce srv->admin and srv->prev_admin which are bitfields
containing one bit per source of administrative status (maintenance only
for now). For the sake of backwards compatibility we implement a single
source (ADMF_FMAINT) but the code already checks any source (ADMF_MAINT)
where the STF_MAINTAIN bit was previously checked. This will later allow
us to add ADMF_IMAINT for maintenance mode inherited from tracked servers.
Along doing these changes, it appeared that some places will need to be
revisited when implementing the inherited bit, this concerns all those
modifying the ADMF_FMAINT bit (enable/disable actions on the CLI or stats
page), and the checks to report "via" on the stats page. But currently
the code is harmless.
Till now, the server's state and flags were all saved as a single bit
field. It causes some difficulties because we'd like to have an enum
for the state and separate flags.
This commit starts by splitting them in two distinct fields. The first
one is srv->state (with its counter-part srv->prev_state) which are now
enums, but which still contain bits (SRV_STF_*).
The flags now lie in their own field (srv->flags).
The function srv_is_usable() was updated to use the enum as input, since
it already used to deal only with the state.
Note that currently, the maintenance mode is still in the state for
simplicity, but it must move as well.
Twice in a week I found people were surprized by a "conditional timeout" not
being respected, because they add "if <cond>" after a timeout, and since
they don't see any error nor read the doc, the expect it to work. Let's
make the timeout parser reject extra arguments to avoid these situations.
The DRAIN status is not inherited between tracked servers, so the stats
page must only use the reported server's status and not the tracked
server's status, otherwise it misleadingly indicates a DRAIN state when
a server tracks a draining server, while this is wrong.
As more or less suspected, commit b1982e2 ("BUG/MEDIUM: http/session:
disable client-side expiration only after body") was hazardous. It
introduced a regression causing client side timeout to expire during
connection retries if it's lower than the time needed to cover the
amount of retries, so clients get a 408 when the connection to the
server fails to establish fast enough.
The reason is that the CF_READ_NOEXP flag is set after the MSG_DONE state
is reached, which protects the timeout from being re-armed, then during
the retries, process_session() clears the flag without calling the analyser
(since there's no activity for it), so the timeouts are rearmed.
Ideally, these one-shot flags should be per-analyser, and the analyser
which sets them would be responsible for clearing them, or they would
automatically be cleared when switching to another analyser. Unfortunately
this is not really possible currently.
What can be done however is to only clear them in the following situations :
- we're going to call analysers
- analysers have all been unsubscribed
This method seems reliable enough and approaches the ideal case well enough.
No backport is needed, this bug was introduced in 1.5-dev25.
When run in daemon mode (i.e. with at least one forked process) and using
the epoll poller, sending USR1 (graceful shutdown) to the worker processes
can cause some workers to start running at 100% CPU. Precondition is having
an established HTTP keep-alive connection when the signal is received.
The cloned (during fork) listening sockets do not get closed in the parent
process, thus they do not get removed from the epoll set automatically
(see man 7 epoll). This can lead to the process receiving epoll events
that it doesn't feel responsible for, resulting in an endless loop around
epoll_wait() delivering these events.
The solution is to explicitly remove these file descriptors from the epoll
set. To not degrade performance, care was taken to only do this when
neccessary, i.e. when the file descriptor was cloned during fork.
Signed-off-by: Conrad Hoffmann <conrad@soundcloud.com>
[wt: a backport to 1.4 could be studied though chances to catch the bug are low]
This is a minor fix, but the SSL_CTX_set_options() and
SSL_CTX_set_mode() functions take a long, not an int parameter. As
SSL_OP_ALL is now (since OpenSSL 1.0.0) defined as 0x80000BFFL, I think
it is worth fixing.
Thomas Heil reported that previous commit 07fcaaa ("MINOR: fix a few
memory usage errors") make haproxy crash when req* rules are used. As
diagnosed by Cyril Bonté, this commit introduced a regression which
makes haproxy free the memory areas allocated for regex even when
they're going to be used, resulting in the crashes.
This patch does three things :
- undo the free() on the valid path
- add regfree() on the error path but only when regcomp() succeeds
- rename err_code to ret_code to avoid confusing the valid return
path with an error path.
We used to call srv_is_usable() with either the current state and weights
or the previous ones. This causes trouble for future changes, so let's first
split it in two variants :
- srv_is_usable(srv) considers the current status
- srv_was_usable(srv) considers the previous status
Detecting that a server's status has changed is a bit messy, as well
as it is to commit the status changes. We'll have to add new conditions
soon and we'd better avoid to multiply the number of touched locations
with the high risk of forgetting them.
This commit introduces :
- srv_lb_status_changed() to report if the status changed from the
previously committed one ;
- svr_lb_commit_status() to commit the current status
The function is now used by all load-balancing algorithms.
This flag is only a copy of (srv->uweight == 0), so better get rid of
it to reduce some of the confusion that remains in the code, and use
a simple function to return this state based on this weight instead.
Function set_server_check_status() is very weird. It is called at the
end of a check to update the server's state before the new state is even
calculated, and possibly to log status changes, only if the proxy has
"option log-health-checks" set.
In order to do so, it employs an exhaustive list of the combinations
which can lead to a state change, while in practice almost all of
them may simply be deduced from the change of check status. Better,
some changes of check status are currently not detected while they
can be very valuable (eg: changes between L4/L6/TOUT/HTTP 500 for
example).
The doc was updated to reflect this.
Also, a minor change was made to consider s->uweight and not s->eweight
as meaning "DRAIN" since eweight can be null without the DRAIN mode (eg:
throttle, NOLB, ...).
Having both "active or backup DOWN" and "not checked" on the left side of
the color caption inflates the whole header block for no reason. Simply
move them both on the same line and reduce the header height.
Abuse of copy-paste has made "tcp-check expect binary" to consider a
buffer starting with \0 as empty! Thanks to Lukas Benes for reporting
this problem and confirming the fix.
This is 1.5-only, no backport is needed.
John-Paul Bader reported a stupid regression in 1.5-dev25, we
forget to check that global.stats_fe is initialized before visiting
its sockets, resulting in a crash.
No backport is needed.
It appears than many people considers that the default match for a fetch
returning string is "exact match string" aka "str". This patch set this
match as default for strings.
Commit f465994198 removed the "via" link when a tracking server is in maintenance, but
still calculated an empty link that no one can use. We can safely remove it.
The "via" column includes a link to the tracked server but instead of closing
the link with a </a> tag, a new tag is opened.
This typo should also be backported to 1.4
Whatever ACL option beginning with a '-' is considered as a pattern
if it does not match a known option. This is a big problem because
typos are silently ignored, such as "-" or "-mi".
Better clearly check the complete option name and report a parsing
error if the option is unknown.
Long-lived sessions are often subject to half-closed sessions resulting in
a lot of sessions appearing in FIN_WAIT state in the system tables, and no
way for haproxy to get rid of them. This typically happens because clients
suddenly disconnect without sending any packet (eg: FIN or RST was lost in
the path), and while the server detects this using an applicative heart
beat, haproxy does not close the connection.
This patch adds two new timeouts : "timeout client-fin" and
"timeout server-fin". The former allows one to override the client-facing
timeout when a FIN has been received or sent. The latter does the same for
server-facing connections, which is less useful.
Plain "tcp" health checks sent to a unix socket cause two connect()
calls to be made, one to connect, and a second one to verify that the
connection properly established. But with unix sockets, we get
immediate notification of success, so we can avoid this second
attempt. However we need to ensure that we'll visit the connection
handler even if there's no remaining handshake pending, so for this
we claim we have some data to send in order to enable polling for
writes if there's no more handshake.
Being able to map prefixes to values is already used for IPv4/IPv6
but was not yet used with strings. It can be very convenient to map
directories to server farms but large lists may be slow.
By using ebmb_insert_prefix() and ebmb_lookup_longest(), we can
insert strings with their own length as a prefix, and lookup
candidate strings and ensure that the longest matching one will
be returned, which is the longest string matching the entry.
These sockets are the same as Unix sockets except that there's no need
for any filesystem access. The address may be whatever string both sides
agree upon. This can be really convenient for inter-process communications
as well as for chaining backends to frontends.
These addresses are forced by prepending their address with "abns@" for
"abstract namespace".
We've had everything in place for this for a while now, we just missed
the connect function for UNIX sockets. Note that in order to connect to
a UNIX socket inside a chroot, the path will have to be relative to the
chroot.
UNIX sockets connect about twice as fast as TCP sockets (or consume
about half of the CPU at the same rate). This is interesting for
internal communications between SSL processes and HTTP processes
for example, or simply to avoid allocating source ports on the
loopback.
The tcp_connect_probe() function is still used to probe a dataless
connection, but it is compatible so that's not an issue for now.
Health checks are not yet fully supported since they require a port.
We used to have is_addr() in place to validate sometimes the existence
of an address, sometimes a valid IPv4 or IPv6 address. Replace them
carefully so that is_inet_addr() is used wherever we can only use an
IPv4/IPv6 address.
Currently, mixing an IPv4 and an IPv6 address in checks happens to
work by pure luck because the two protocols use the same functions
at the socket level and both use IPPROTO_TCP. However, they're
definitely wrong as the protocol for the check address is retrieved
from the server's address.
Now the protocol assigned to the connection is the same as the one
the address in use belongs to (eg: the server's address or the
explicit check address).
The RDP cookie extractor compares the 32-bit address from the request
to the address of each server in the farm without first checking that
the server's address is IPv4. This is a leftover from the IPv4 to IPv6
conversion. It's harmless as it's unlikely that IPv4 and IPv6 servers
will be mixed in an RDP farm, but better fix it.
This patch does not need to be backported.
Till now a warning was emitted if the "stats bind-process" was not
specified when nbproc was greater than 1. Now we can be much finer
and only emit a warning when at least of the stats socket is bound
to more than one process at a time.
Now that we know what processes a "bind" statement is attached to, we
have the ability to avoid starting some of them when they're not on the
proper process. This feature is disabled when running in foreground
however, so that debug mode continues to work with everything bound to
the first and only process.
The main purpose of this change is to finally allow the global stats
sockets to be each bound to a different process.
It can also be used to force haproxy to use different sockets in different
processes for the same IP:port. The purpose is that under Linux 3.9 and
above (and possibly other OSes), when multiple processes are bound to the
same IP:port via different sockets, the system is capable of performing
a perfect round-robin between the socket queues instead of letting any
process pick all the connections from a queue. This results in a smoother
load balancing and may achieve a higher performance with a large enough
maxaccept setting.
When a process list is specified on either the proxy or the bind lines,
the latter is refined to the intersection of the two. A warning is emitted
if no intersection is found, and the situation is fixed by either falling
back to the first process of the proxy or to all processes.
When a bind-process setting is present in a frontend or backend, we
now verify that the specified process range at least shares one common
process with those defined globally by nbproc. Then if the value is
set, it is reduced to the one enforced by nbproc.
A warning is emitted if process count does not match, and the fix is
done the following way :
- if a single process was specified in the range, it's remapped to
process #1
- if more than one process was specified, the binding is removed
and all processes are usable.
Note that since backends may inherit their settings from frontends,
depending on the declaration order, they may or may not be reported
as warnings.
Some consistency checks cannot be performed between frontends, backends
and peers at the moment because there is no way to check for intersection
between processes bound to some processes when the number of processes is
higher than the number of bits in a word.
So first, let's limit the number of processes to the machine's word size.
This means nbproc will be limited to 32 on 32-bit machines and 64 on 64-bit
machines. This is far more than enough considering that configs rarely go
above 16 processes due to scalability and management issues, so 32 or 64
should be fine.
This way we'll ensure we can always build a mask of all the processes a
section is bound to.
By default, a proxy's bind_proc is zero, meaning "bind to all processes".
It's only when not zero that its process list is restricted. So we don't
want the frontends to enforce the value on the backends when the backends
are still set to zero.
Now, haproxy exit an error saying:
Unable to initialize the lock for the shared SSL session cache. You can retry using
the global statement 'tune.ssl.force-private-cache' but it could increase the CPU
usage due to renegotiation if nbproc > 1.
They could match different strings as equal if the chunk was shorter
than the string. Those functions are currently only used for SSL's
certificate DN entry extract.
This commit modifies the PROXY protocol V2 specification to support headers
longer than 255 bytes allowing for optional extensions. It implements the
PROXY protocol V2 which is a binary representation of V1. This will make
parsing more efficient for clients who will know in advance exactly how
many bytes to read. Also, it defines and implements some optional PROXY
protocol V2 extensions to send information about downstream SSL/TLS
connections. Support for PROXY protocol V1 remains unchanged.
John-Paul Bader reported a nasty segv which happens after a few hours
when SSL is enabled under a high load. Fortunately he could catch a
stack trace, systematically looking like this one :
(gdb) bt full
level = 6
conn = (struct connection *) 0x0
err_msg = <value optimized out>
s = (struct session *) 0x80337f800
conn = <value optimized out>
flags = 41997063
new_updt = <value optimized out>
old_updt = 1
e = <value optimized out>
status = 0
fd = 53999616
nbfd = 279
wait_time = <value optimized out>
updt_idx = <value optimized out>
en = <value optimized out>
eo = <value optimized out>
count = 78
sr = <value optimized out>
sw = <value optimized out>
rn = <value optimized out>
wn = <value optimized out>
The variable "flags" in conn_fd_handler() holds a copy of connection->flags
when entering the function. These flags indicate 41997063 = 0x0280d307 :
- {SOCK,DATA,CURR}_RD_ENA=1 => it's a handshake, waiting for reading
- {SOCK,DATA,CURR}_WR_ENA=0 => no need for writing
- CTRL_READY=1 => FD is still allocated
- XPRT_READY=1 => transport layer is initialized
- ADDR_FROM_SET=1, ADDR_TO_SET=0 => clearly it's a frontend connection
- INIT_DATA=1, WAKE_DATA=1 => processing a handshake (ssl I guess)
- {DATA,SOCK}_{RD,WR}_SH=0 => no shutdown
- ERROR=0, CONNECTED=0 => handshake not completed yet
- WAIT_L4_CONN=0 => normal
- WAIT_L6_CONN=1 => waiting for an L6 handshake to complete
- SSL_WAIT_HS=1 => the pending handshake is an SSL handshake
So this is a handshake is in progress. And the only way to reach line 88
is for the handshake to complete without error. So we know for sure that
ssl_sock_handshake() was called and completed the handshake then removed
the CO_FL_SSL_WAIT_HS flag from the connection. With these flags,
ssl_sock_handshake() does only call SSL_do_handshake() and retruns. So
that means that the problem is necessarily in data->init().
The fd is wrong as reported but is simply mis-decoded as it's the lower
half of the last function pointer.
What happens in practice is that there's an issue with the way we deal
with embryonic sessions during their conversion to regular sessions.
Since they have no stream interface at the beginning, the pointer to
the connection is temporarily stored into s->target. Then during their
conversion, the first stream interface is properly initialized and the
connection is attached to it, then s->target is set to NULL.
The problem is that if anything fails in session_complete(), the
session is left in this intermediate state where s->target is NULL,
and kill_mini_session() is called afterwards to perform the cleanup.
It needs the connection, that it finds in s->target which is NULL,
dereferences it and dies. The only reasons for dying here are a problem
on the TCP connection when doing the setsockopt(TCP_NODELAY) or a
memory allocation issue.
This patch implements a solution consisting in restoring s->target in
session_complete() on the error path. That way embryonic sessions that
were valid before calling it are still valid after.
The bug was introduced in 1.5-dev20 by commit f8a49ea ("MEDIUM: session:
attach incoming connection to target on embryonic sessions"). No backport
is needed.
Special thanks to John for his numerous tests and traces.
Prevously pthread process shared lock were used by default,
if USE_SYSCALL_FUTEX is not specified.
This patch implements an OS independant kind of lock:
An active spinlock is usedf if USE_SYSCALL_FUTEX is not specified.
The old behavior is still available if USE_PTHREAD_PSHARED=1.
Process shared mutex seems not supported on some OSs (FreeBSD).
This patch checks errors on mutex lock init to fallback
on a private session cache (per process cache) in error cases.
1.5-dev24 introduced SSL_CTX_set_msg_callback(), which came with OpenSSL
0.9.7. A build attempt with an older one failed and we're still compatible
with 0.9.6 in 1.5.
During some tests in multi-process mode under Linux, it appeared that
issuing "disable frontend foo" on the CLI to pause a listener would
make the shutdown(read) of certain processes disturb another process
listening on the same socket, resulting in a 100% CPU loop. What
happens is that accept() returns EAGAIN without accepting anything.
Fortunately, we see that epoll_wait() reports EPOLLIN+EPOLLRDHUP
(likely because the FD points to the same file in the kernel), so we
can use that to stop the other process from trying to accept connections
for a short time and try again later, hoping for the situation to change.
We must not disable the FD otherwise there's no way to re-enable it.
Additionally, during these tests, a loop was encountered on EINVAL which
was not caught. Now if we catch an EINVAL, we proceed the same way, in
case the socket is re-enabled later.
It's the final part of the 2 previous patches. We prevent the server from
timing out if we still have some data to pass to it. That way, even if the
server runs with a short timeout and the client with a large one, the server
side timeout will only start to count once the client sends everything. This
ensures we don't report a 504 before the server gets the whole request.
It is not certain whether the 1.4 state machine is fully compatible with
this change. Since the purpose is only to ensure that we never report a
server error before a client error if some data are missing from the client
and when the server-side timeout is smaller than or equal to the client's,
it's probably not worth attempting the backport.
This is the continuation of previous patch "BUG/MEDIUM: http/session:
disable client-side expiration only after body".
This one takes care of properly reporting the client-side read timeout
when waiting for a body from the client. Since the timeout may happen
before or after the server starts to respond, we have to take care of
the situation in three different ways :
- if the server does not read our data fast enough, we emit a 504
if we're waiting for headers, or we simply break the connection
if headers were already received. We report either sH or sD
depending on whether we've seen headers or not.
- if the server has not yet started to respond, but has read all of
the client's data and we're still waiting for more data from the
client, we can safely emit a 408 and abort the request ;
- if the server has already started to respond (thus it's a transfer
timeout during a bidirectional exchange), then we silently break
the connection, and only the session flags will indicate in the
logs that something went wrong with client or server side.
This bug is tagged MEDIUM because it touches very sensible areas, however
its impact is very low. It might be worth performing a careful backport
to 1.4 once it has been confirmed that everything is correct and that it
does not introduce any regression.
For a very long time, back in the v1.3 days, we used to rely on a trick
to avoid expiring the client side while transferring a payload to the
server. The problem was that if a client was able to quickly fill the
buffers, and these buffers took some time to reach the server, the
client should not expire while not sending anything.
In order to cover this situation, the client-side timeout was disabled
once the connection to the server was OK, since it implied that we would
at least expire on the server if required.
But there is a drawback to this : if a client stops uploading data before
the end, its timeout is not enforced and we only expire on the server's
timeout, so the logs report a 504.
Since 1.4, we have message body analysers which ensure that we know whether
all the expected data was received or not (HTTP_MSG_DATA or HTTP_MSG_DONE).
So we can fix this problem by disabling the client-side or server-side
timeout at the end of the transfer for the respective side instead of
having it unconditionally in session.c during all the transfer.
With this, the logs now report the correct side for the timeout. Note that
this patch is not enough, because another issue remains : the HTTP body
forwarders do not abort upon timeout, they simply rely on the generic
handling from session.c. So for now, the session is still aborted when
reaching the server timeout, but the culprit is properly reported. A
subsequent patch will address this specific point.
This bug was tagged MEDIUM because of the changes performed. The issue
it fixes is minor however. After some cooling down, it may be backported
to 1.4.
It was reported by and discussed with Rachel Chavez and Patrick Hemmer
on the mailing list.
Previously ssl_fc_unique_id and ssl_bc_unique_id return a string encoded
in base64 of the RFC 5929 TLS unique identifier. This patch modify those fetches
to return directly the ID in the original binary format. The user can make the
choice to encode in base64 using the converter.
i.e. : ssl_fc_unique_id,base64
ssl_f_sha1 is a binary binary fetch used to returns the SHA-1 fingerprint of
the certificate presented by the frontend when the incoming connection was
made over an SSL/TLS transport layer. This can be used to know which
certificate was chosen using SNI.
Adds ssl fetchs and ACLs for outgoinf SSL/Transport layer connection with their
docs:
ssl_bc, ssl_bc_alg_keysize, ssl_bc_cipher, ssl_bc_protocol, ssl_bc_unique_id,
ssl_bc_session_id and ssl_bc_use_keysize.
On the mailing list, seri0528@naver.com reported an issue when
using balance url_param or balance uri. The request would sometimes
stall forever.
Cyril Bonté managed to reproduce it with the configuration below :
listen test :80
mode http
balance url_param q
hash-type consistent
server s demo.1wt.eu:80
and found it appeared with this commit : 80a92c0 ("BUG/MEDIUM: http:
don't start to forward request data before the connect").
The bug is subtle but real. The problem is that the HTTP request
forwarding analyzer refrains from starting to parse the request
body when some LB algorithms might need the body contents, in order
to preserve the data pointer and avoid moving things around during
analysis in case a redispatch is later needed. And in order to detect
that the connection establishes, it watches the response channel's
CF_READ_ATTACHED flag.
The problem is that a request analyzer is not subscribed to a response
channel, so it will only see changes when woken for other (generally
correlated) reasons, such as the fact that part of the request could
be sent. And since the CF_READ_ATTACHED flag is cleared once leaving
process_session(), it is important not to miss it. It simply happens
that sometimes the server starts to respond in a sequence that validates
the connection in the middle of process_session(), that it is detected
after the analysers, and that the newly assigned CF_READ_ATTACHED is
not used to detect that the request analysers need to be called again,
then the flag is lost.
The CF_WAKE_WRITE flag doesn't work either because it's cleared upon
entry into process_session(), ie if we spend more than one call not
connecting.
Thus we need a new flag to tell the connection initiator that we are
specifically interested in being notified about connection establishment.
This new flag is CF_WAKE_CONNECT. It is set by the requester, and is
cleared once the connection succeeds, where CF_WAKE_ONCE is set instead,
causing the request analysers to be scanned again.
For future versions, some better options will have to be considered :
- let all analysers subscribe to both request and response events ;
- let analysers subscribe to stream interface events (reduces number
of useless calls)
- change CF_WAKE_WRITE's semantics to persist across calls to
process_session(), but that is different from validating a
connection establishment (eg: no data sent, or no data to send)
The bug was introduced in 1.5-dev23, no backport is needed.
Commit fc6c032 ("MEDIUM: global: add support for CPU binding on Linux ("cpu-map")")
merged into 1.5-dev13 involves a useless test that clang reports as a warning. The
"low" variable cannot be negative here. Issue reported by Charles Carter.
Commit 5338eea ("MEDIUM: pattern: The match function browse itself the
list or the tree") changed the return type of pattern matching functions.
One enum was left over in pat_match_auth(). Fortunately, this one equals
zero where a null pointer is expected, so it's cast correctly.
This detected and reported by Charles Carter was introduced in 1.5-dev23,
no backport is needed.
Till now we used to return a pointer to a rule, but that makes it
complicated to later add support for registering new actions which
may fail. For example, the redirect may fail if the response is too
large to fit into the buffer.
So instead let's return a verdict. But we needed the pointer to the
last rule to get the address of a redirect and to get the realm used
by the auth page. So these pieces of code have moved into the function
and they produce a verdict.
Both use exactly the same mechanism, except for the choice of the
default realm to be emitted when none is selected. It can be achieved
by simply comparing the ruleset with the stats' for now. This achieves
a significant code reduction and further, removes the dependence on
the pointer to the final rule in the caller.
The "block" rules are redundant with http-request rules because they
are performed immediately before and do exactly the same thing as
"http-request deny". Moreover, this duplication has led to a few
minor stats accounting issues fixed lately.
Instead of keeping the two rule sets, we now build a list of "block"
rules that we compile as "http-request block" and that we later insert
at the beginning of the "http-request" rules.
The only user-visible change is that in case of a parsing error, the
config parser will now report "http-request block rule" instead of
"blocking condition".
Some of the remaining interleaving of request processing after the
http-request rules can now safely be removed, because all remaining
actions are mutually exclusive.
So we can move together all those related to an intercepting rule,
then proceed with stats, then with req*.
We still keep an issue with stats vs reqrep which forces us to
keep the stats split in two (detection and action). Indeed, from the
beginning, stats are detected before rewriting and not after. But a
reqdeny rule would stop stats, so in practice we have to first detect,
then perform the action. Maybe we'll be able to kill this in version
1.6.
Till now the Connection header was processed in the middle of the http-request
rules and some reqadd rules. It used to force some http-request actions to be
cut in two parts.
Now with keep-alive, not only that doesn't make any sense anymore, but it's
becoming a total mess, especially since we need to know the headers contents
before proceeding with most actions.
The real reason it was not moved earlier is that the "block" or "http-request"
rules can see a different version if some fields are changed there. But that
is already not reliable anymore since the values observed by the frontend
differ from those in the backend.
This patch is the equivalent of commit f118d9f ("REORG: http: move HTTP
Connection response header parsing earlier") but for the request side. It
has been tagged MEDIUM as it could theorically slightly affect some setups
relying on corner cases or invalid setups, though this does not make real
sense and is highly unlikely.
The session's backend request counters were incremented after the block
rules while these rules could increment the session's error counters,
meaning that we could have more errors than requests reported in a stick
table! Commit 5d5b5d8 ("MEDIUM: proto_tcp: add support for tracking L7
information") is the most responsible for this.
This bug is 1.5-specific and does not need any backport.
"block" rules used to build the whole response and forgot to increment
the denied_req counters. By jumping to the general "deny" label created
in previous patch, it's easier to fix this.
The issue was already present in 1.3 and remained unnoticed, in part
because few people use "block" nowadays.
Continue the cleanup of http-request post-processing to remove some
of the interleaved tests. Here we set up a few labels to deal with
the deny and tarpit actions and avoid interleaved ifs.
We still have a plate of spaghetti in the request processing rules.
All http-request rules are executed at once, then some responses are
built interlaced with other rules that used to be there in the past.
Here, reqadd is executed after an http-req redirect rule is *decided*,
but before it is *executed*.
So let's match the doc and config checks, to put the redirect actually
before the reqadd completely.
There's no point in open-coding the sending of 100-continue in
the stats initialization code, better simply rely on the function
designed to process the message body which already does it.
Commit 844a7e7 ("[MEDIUM] http: add support for proxy authentication")
merged in v1.4-rc1 added the ability to emit a status code 407 in auth
responses, but forgot to set the same status in the logs, which still
contain 401.
The bug is harmless, no backport is needed.
A switch from a TCP frontend to an HTTP backend initializes the HTTP
transaction. txn->hdr_idx.size is used by hdr_idx_init() but not
necessarily initialized yet here, because the first call to hdr_idx_init()
is in fact placed in http_init_txn(). Moving it before the call is
enough to fix it. We also remove the useless extra confusing call
to hdr_idx_init().
The bug was introduced in 1.5-dev8 with commit ac1932d ("MEDIUM:
tune.http.maxhdr makes it possible to configure the maximum number
of HTTP headers"). No backport to stable is needed.
Last fix did address the issue for inlined patterns, but it was not
enough because the flags are lost as well when updating patterns
dynamically over the CLI.
Also if the same file was used once with -i and another time without
-i, their references would have been merged and both would have used
the same matching method.
It's appear that the patterns have two types of flags. The first
ones are relative to the pattern matching, and the second are
relative to the pattern storage. The pattern matching flags are
the same for all the patterns of one expression. Now they are
stored in the expression. The storage flags are information
returned by the pattern mathing function. This information is
relative to each entry and is stored in the "struct pattern".
Now, the expression matching flags are forwarded to the parse
and index functions. These flags are stored during the
configuration parsing, and they are used during the parse and
index actions.
This issue was introduced in dev23 with the major pattern rework,
and is a continuation of commit a631fc8 ("BUG/MAJOR: patterns: -i
and -n are ignored for inlined patterns"). No backport is needed.
These flags are only passed to pattern_read_from_file() which
loads the patterns from a file. The functions used to parse the
patterns from the current line do not provide the means to pass
the pattern flags so they're lost.
This issue was introduced in dev23 with the major pattern rework,
and was reported by Graham Morley. No backport is needed.
Dmitry Sivachenko reported that nice warning :
src/pattern.c:2243:43: warning: if statement has empty body [-Wempty-body]
if (&ref2->list == &pattern_reference);
^
src/pattern.c:2243:43: note: put the semicolon on a separate line to silence
this warning
It was merged as is with the code from commit af5a29d ("MINOR: pattern:
Each pattern is identified by unique id").
So it looks like we can reassign an ID which is still in use because of
this.
Previous patch only focused on parsing the packet right and blocking
it, so it relaxed one test on the packet length. The difference is
not usable for attacking but the logs will not report an attack for
such cases, which is probably bad. Better report all known invalid
packets cases.
Recent commit f51c698 ("MEDIUM: ssl: implement a workaround for the
OpenSSL heartbleed attack") did not always work well, because OpenSSL
is fun enough for not testing errors before sending data... So the
output sometimes contained some data.
The OpenSSL code relies on the max_send_segment value to limit the
packet length. The code ensures that a value of zero will result in
no single byte leaking. So we're forcing this instead and that
definitely fixes the issue. Note that we need to set it the hard
way since the regular API checks for valid values.
Building with a version of openssl without heartbeat gives this since
latest 29f037d ("MEDIUM: ssl: explicitly log failed handshakes after a
heartbeat") :
src/ssl_sock.c: In function 'ssl_sock_msgcbk':
src/ssl_sock.c:188: warning: unused variable 'conn'
Simply declare conn inside the ifdef. No backport is needed.
The latest commit about set-map/add-acl/... causes this warning for
me :
src/proto_http.c: In function 'parse_http_req_cond':
src/proto_http.c:8863: warning: implicit declaration of function 'strndup'
src/proto_http.c:8863: warning: incompatible implicit declaration of built-in function 'strndup'
src/proto_http.c:8890: warning: incompatible implicit declaration of built-in function 'strndup'
src/proto_http.c:8917: warning: incompatible implicit declaration of built-in function 'strndup'
src/proto_http.c:8944: warning: incompatible implicit declaration of built-in function 'strndup'
Use my_strndup() instead of strndup() which is not portable. No backport
needed.
This reverts commit 9ece05f590.
Sander Klein reported an important performance regression with this
patch applied. It is not yet certain what is exactly the cause but
let's not break other setups now and sort this out after dev24.
The commit was merged into dev23, no need to backport.
Using the previous callback, it's trivial to block the heartbeat attack,
first we control the message length, then we emit an SSL error if it is
out of bounds. A special log is emitted, indicating that a heartbleed
attack was stopped so that they are not confused with other failures.
That way, haproxy can protect itself even when running on an unpatched
SSL stack. Tests performed with openssl-1.0.1c indicate a total success.
Add a callback to receive the heartbeat notification. There, we add
SSL_SOCK_RECV_HEARTBEAT flag on the ssl session if a heartbeat is seen.
If a handshake fails, we log a different message to mention the fact that
a heartbeat was seen. The test is only performed on the frontend side.
The http_(res|req)_keywords_register() functions allow to register
new keywords.
You need to declare a keyword list:
struct http_req_action_kw_list test_kws = {
.scope = "testscope",
.kw = {
{ "test", parse_test },
{ NULL, NULL },
}
};
and a parsing function:
int parse_test(const char **args, int *cur_arg, struct proxy *px, struct http_req_rule *rule, char **err)
{
rule->action = HTTP_REQ_ACT_CUSTOM_STOP;
rule->action_ptr = action_function;
return 0;
}
http_req_keywords_register(&test_kws);
The HTTP_REQ_ACT_CUSTOM_STOP action stops evaluation of rules after
your rule, HTTP_REQ_ACT_CUSTOM_CONT permits the evaluation of rules
after your rule.
This patch allows manipulation of ACL and MAP content thanks to any
information available in a session: source IP address, HTTP request or
response header, etc...
It's an update "on the fly" of the content of the map/acls. This means
it does not resist to reload or restart of HAProxy.
Finn Arne Gangstad suggested that we should have the ability to break
keep-alive when the target server has reached its maxconn and that a
number of connections are present in the queue. After some discussion
around his proposed patch, the following solution was suggested : have
a per-proxy setting to fix a limit to the number of queued connections
on a server after which we break keep-alive. This ensures that even in
high latency networks where keep-alive is beneficial, we try to find a
different server.
This patch is partially based on his original proposal and implements
this configurable threshold.
Commit bed410e ("MAJOR: http: centralize data forwarding in the request path")
has woken up an issue in redirects, where msg->next is not reset when flushing
the input buffer. The result is an attempt to forward a negative amount of
data, making haproxy crash.
This bug does not seem to affect versions prior to dev23, so no backport is
needed.
These ones report a string as "HTTP/1.0" or "HTTP/1.1" depending on the
version of the request message or the response message, respectively.
The purpose is to be able to emit custom log lines reporting this version
in a persistent way.
We used to emit either 1.0 or 1.1 depending on whether we were sending
chunks or not. This condition is useless, better always send 1.1. Also
that way at least clients and intermediary proxies know we speak 1.1.
The "Connection: close" header is still set anyway.
Currently, the parsing of the HTTP Connection header for the response
is performed at the same place as the rule sets, which means that after
parsing the beginning of the response, we still have no information on
whether the response is keep-alive compatible or not. Let's do that
earlier.
Note that this is the same code that was moved in the previous function,
both of them are always called in a row so no change of behaviour is
expected.
A future change might consist in having a late analyser to perform the
late header changes such as mangling the connection header. It's quite
painful that currently this is mixed with the rest of the processing
such as filters.
This allows the stats page to work in keep-alive mode and to be
compressed. At compression ratios up to 80%, it's quite interesting
for large pages.
We ensure to skip filters because we don't want to unexpectedly block
a response nor to mangle response headers.
In version 1.3.4, we got the ability to split configuration parts between
frontends and backends. The stats was attached to the backend and a control
was made to ensure that it was used only in a listen or backend section, but
not in a frontend.
The documentation clearly says that the statement may only be used in the
backend.
But since that same version above, the defaults stats configuration is
only filled in the frontend part of the proxy and not in the backend's.
So a backend will not get stats which are enabled in a defaults section,
despite what the doc says. However, a frontend configured after a defaults
section will get stats and will not emit the warning!
There were many technical limitations in 1.3.4 making it impossible to
have the stats working both in the frontend and backend, but now this has
become a total mess.
It's common however to see people create a frontend with a perfectly
working stats configuration which only emits a warning stating that it
might not work, adding to the confusion. Most people workaround the tricky
behaviour by declaring a "listen" section with no server, which was the
recommended solution in 1.3 where it was even suggested to add a dispatch
address to avoid a warning.
So the right solution seems to do the following :
- ensure that the defaults section's settings apply to the backends,
as documented ;
- let the frontends work in order not to break existing setups relying
on the defaults section ;
- officially allow stats to be declared in frontends and remove the
warninng
This patch should probably not be backported since it's not certain that
1.4 is fully compatible with having stats in frontends and backends (which
was really made possible thanks to applets).
All the code inherited from version 1.1 still holds a lot ot sessions
called "t" because in 1.1 they were tasks. This naming is very annoying
and sometimes even confusing, for example in code involving tables.
Let's get rid of this once for all and before 1.5-final.
Nothing changed beyond just carefully renaming these variables.
OK, for once it cannot easily know this one, and certain versions are
emitting this harmless warning :
src/dumpstats.c: In function 'http_stats_io_handler':
src/dumpstats.c:4507:19: warning: 'last_fwd' may be used uninitialized in this function [-Wmaybe-uninitialized]
It's useless to process 100-continue in the middle of response filters
because there's no info in the 100 response itself, and it could even
make things worse. So better use it as it is, an interim response
waiting for the next response, thus we just have to put it into
http_wait_for_response(). That way we ensure to have a valid response
in this function.
Since commit d7ad9f5 ("MAJOR: channel: add a new flag CF_WAKE_WRITE to
notify the task of writes"), we got another bug with 100-continue responses.
If the final response comes in the same packet as the 100, then the rest of
the buffer is not processed since there is no wake-up event.
In fact the change above uncoverred the real culprit which is more
likely session.c which should detect that an earlier analyser was set
and should loop back to it.
A cleaner fix would be better, but setting the flag works fine.
This issue was introduced in 1.5-dev22, no backport is needed.
Patches c623c17 ("MEDIUM: http: start to centralize the forwarding code")
and bed410e ("MAJOR: http: centralize data forwarding in the request path")
merged into 1.5-dev23 cause transfers to be silently aborted after the
server timeout due to the fact that the analysers are woken up when the
timeout strikes and they believe they have nothing more to do, so they're
terminating the transfer.
No backport is needed.
This basically reimplements commit f3221f9 ("MEDIUM: stats: add support
for HTTP keep-alive on the stats page") which was reverted by commit
51437d2 after Igor Chan reported a broken stats page caused by the bug
fix by previous commit.
The errors reported by Igor Chan on the stats interface in chunked mode
were caused by data wrapping at the wrong place in the buffer. It could
be reliably reproduced by picking random buffer sizes until the issue
appeared (for a given conf, 5300 with 1024 maxrewrite was OK).
The issue is that the stats interface uses bi_putchk() to emit data,
which relies on bi_putblk(). This code checks the largest part that can
be emitted while preserving the rewrite reserve, but uses that result to
compute the wrapping offset, which is wrong. If some data remain present
in the buffer, the wrapping may be allowed and will happen before the
end of the buffer, leaving some old data in the buffer.
The reason it did not happen before keep-alive is simply that the buffer
was much less likely to contain older data. It also used to happen only
for certain configs after a certain amount of time because the size of
the counters used to inflate the output till the point wrapping started
to happen.
The fix is trivial, buffer_contig_space_with_res() simply needs to be
replaced by buffer_contig_space().
Note that peers were using the same function so it is possible that they
were affected as well.
This issue was introduced in 1.5-dev8. No backport to stable is needed.
Commit f003d37 ("BUG/MINOR: http: don't report client aborts as server errors")
attempted to fix a longstanding issue by which some client aborts could be
logged as server errors. Unfortunately, one of the tests involved there also
catches truncated server responses, which are reported as client aborts.
Instead, only check that the client has really closed using the abortonclose
option, just as in done in the request path (which means that the close was
propagated to the server).
The faulty fix above was introduced in 1.5-dev15, and was backported into
1.4.23.
Thanks to Patrick Hemmer for reporting this issue with traces showing the
root cause of the problem.
Till now there was no check against misplaced use-server rules, and
no warning was emitted, adding to the confusion. They're processed
just after the use_backend rules, or more exactly at the same level
but for the backend.
Recently, the http-request ruleset started to be used a lot and some
bug reports were caused by misplaced http-request rules because there
was no warning if they're after a redirect or use_backend rule. Let's
fix this now. http-request rules are just after the block rules.
Since it became possible to use log-format expressions in use_backend,
having a mandatory condition becomes annoying because configurations
are full of "if TRUE". Let's relax the check to accept no condition
like many other keywords (eg: redirect).
Cyril Bonté reported that the "lastsess" field of a stats-only backend
was never updated. In fact the same is true for any applet and anything
not a server. Also, lastsess was not updated for a server reusing its
connection for a new request.
Since the goal of this field is to report recent activity, it's better
to ensure that all accesses are reported. The call has been moved to
the code validating the session establishment instead, since everything
passes there.
Commit ad90351 ("MINOR: http: Add the "language" converter to for use with accept-language")
introduced a typo in parse_qvalue :
if (*end)
*end = qvalue;
while it should be :
if (end)
*end = qvalue;
Since end is tested for being NULL. This crashes when selecting the
compression algorithm since end is NULL here. No backport is needed,
this is just in latest 1.5-dev.
The forwarding code is never obvious to enter into for newcomers, so
better improve the documentation about how states are chained and what
happens for each of them.
Doing so avoids calling channel_forward() for each part of the chunk
parsing and lowers the number of calls to channel_forward() to only
one per buffer, resulting in about 11% performance increase on small
chunks forwarding rate.
The call to flush the compression buffers only needs to be done when
entering the final states or when leaving with missing data. After
that, if trailers are present, they have to be forwarded.
Now we have valid buffer offsets, we can use them to safely parse the
input and only forward when needed. Thus we can get rid of the
consumed_data accumulator, and the code now works both for chunked and
content-length, even with a server feeding one byte at a time (which
systematically broke the previous one).
It's worth noting that 0<CRLF> must always be sent after end of data
(ie: chunk_len==0), and that the trailing CRLF is sent only content
length mode, because in chunked we'll have to pass trailers.
This is basically a revert of commit 667c2a3 ("BUG/MAJOR: http: compression
still has defects on chunked responses").
The latest changes applied to message pointers should have got rid of all
the issues that were making the compression of partial chunks unreliable.
Currently, we forward headers only if the incoming message is still before
HTTP_MSG_CHUNK_SIZE, otherwise they'll be considered as data. In practice
this is always true for the response since there's no data inspection, and
for the request there is no compression so there's no problem with forwarding
them as data.
But the principle is incorrect and will make it difficult to later add data
processing features. So better fix it now.
The new principle is simple :
- if headers were not yet forwarded, forward them now.
- while doing so, check if we need to update the state
If for some reason, the compression returns an error, the compression
is not deinitialized which also means that any pending data are not
flushed and could be lost, especially in the chunked-encoded case.
No backport is needed.
When a parsing error was encountered in a chunked response, we failed
to properly deinitialize the compression context. There was no impact
till now since compression of chunked responses was disabled. No backport
is needed.
Thanks to the last updates on the message pointers, it is now safe again to
enable forwarding of the request headers while waiting for the connection to
complete because we know how to safely rewind this part.
So this patch slightly modifies what was done in commit 80a92c0 ("BUG/MEDIUM:
http: don't start to forward request data before the connect") to let up to
msg->sov bytes be forwarded when waiting for the connection. The resulting
effect is that a POST request may now be sent with the connect's ACK, which
still saves a packet and may even be useful later when TFO is supported.
In order to avoid abusively relying on buf->o to guess how many bytes to
rewind during a redispatch, we now clear msg->sov. Thus the meaning of this
field is exactly "how many bytes of headers are left to be forwarded". It
is still possible to rewind because msg->eoh + msg->eol equal that value
before scheduling the forwarding, so we can always subtract them.
http_body_rewind() returns the number of bytes to rewind before buf->p to
find the message's body. It relies on http_hdr_rewind() to find the beginning
and adds msg->eoh + msg->eol which are always safe.
http_data_rewind() does the same to get the beginning of the data, which
differs from above when a chunk is present. It uses the function above and
adds msg->sol.
The purpose is to centralize further ->sov changes aiming at avoiding
to rely on buf->o.
http_uri_rewind() returns the number of bytes to rewind before buf->p to
find the URI. It relies on http_hdr_rewind() to find the beginning and
is just here to simplify operations.
The purpose is to centralize further ->sov changes aiming at avoiding
to rely on buf->o.
http_hdr_rewind() returns the number of bytes to rewind before buf->p to
find the beginning of headers. At the moment it's not exact as it still
relies on buf->o, assuming that no other data from a past message were
pending there, but it's what was done till there.
The purpose is to centralize further ->sov changes aiming at avoiding
to rely on buf->o.
http_body_bytes() returns the number of bytes of the current message body
present in the buffer. It is compatible with being called before and after
the headers are forwarded.
This is done to centralize further ->sov changes.
We used to have msg->sov updated for every chunk that was parsed. The issue
is that we want to be able to rewind after chunks were parsed in case we need
to redispatch a request and perform a new hash on the request or insert a
different server header name.
Currently, msg->sov and msg->next make parallel progress. We reached a point
where they're always equal because msg->next is initialized from msg->sov,
and is subtracted msg->sov's value each time msg->sov bytes are forwarded.
So we can now ensure that msg->sov can always be replaced by msg->next for
every state after HTTP_MSG_BODY where it is used as a position counter.
This allows us to keep msg->sov untouched whatever the number of chunks that
are parsed, as is needed to extract data from POST request (eg: url_param).
However, we still need to know the starting position of the data relative to
the body, which differs by the chunk size length. We use msg->sol for this
since it's now always zero and unused in the body.
So with this patch, we have the following situation :
- msg->sov = msg->eoh + msg->eol = size of the headers including last CRLF
- msg->sol = length of the chunk size if any. So msg->sov + msg->sol = DATA.
- msg->next corresponds to the byte being inspected based on the current
state and is always >= msg->sov before starting to forward anything.
Since sov and next are updated in case of header rewriting, a rewind will
fix them both when needed. Of course, ->sol has no reason for changing in
such conditions, so it's fine to keep it relative to msg->sov.
In theory, even if a redispatch has to be performed, a transformation
occurring on the request would still work because the data moved would
still appear at the same place relative to bug->p.
This function is only a parser, it must start to parse at the next character
and only update the outgoing relative pointers, but not expect the buffer to
be aligned with the next byte to be parsed.
It's important to fix this otherwise we cannot use this function to parse
chunks without starting to forward data.
There are still some pending issues in the gzip compressor, and fixing
them requires a better handling of intermediate parsing states.
Another issue to deal with is the rewinding of a buffer during a redispatch
when a load balancing algorithm involves L7 data because the exact amount of
data to rewind is not clear. At the moment, this is handled by unwinding all
pending data, which cannot work in responses due to pipelining.
Last, having a first analysis which parses the body and another one which
restarts from where the parsing was left is wrong. Right now it only works
because we never both parse and transform in the same direction. But that
is wrong anyway.
In order to address the first issue, we'll have to use msg->eoh + msg->eol
to find the end of headers, and we still need to store the information about
the forwarded header length somewhere (msg->sol might be reused for this).
msg->sov may only be used for the start of data and not for subsequent chunks
if possible. This first implies that we stop sharing it with header length,
and stop using msg->sol there. In fact we don't need it already as it is
always zero when reaching the HTTP_MSG_BODY state. It was only updated to
reflect a copy of msg->sov.
So now as a first step into that direction, this patch ensure that msg->sol
is never re-assigned after being set to zero and is not used anymore when
we're dealing with HTTP processing and forwarding. We'll later reuse it
differently but for now it's secured.
The patch does nothing magic, it only removes msg->sol everywhere it was
already zero and avoids setting it. In order to keep the sov-sol difference,
it now resets sov after forwarding data. In theory there's no problem here,
but the patch is still tagged major because that code is complex.
One of the issues we face when we need to either forward headers only
before compressing, or rewind the stream during a redispatch is to know
the proper length of the request headers. msg->eoh always has the total
length up to the last CRLF, and we never know whether the request ended
with a single LF or a standard CRLF. This makes it hard to rewind the
headers without explicitly checking the bytes in the buffer.
Instead of doing so, we now use msg->eol to carry the length of the last
CRLF (either 1 or 2). Since it is not modified at all after HTTP_MSG_BODY,
and was only left in an undefined state, it is safe to use at any moment.
Thus, the complete header length to forward or to rewind now is always
msg->eoh + msg->eol.
Content-length encoded message bodies are trivial to deal with, but
chunked-encoded will require improvements, so let's separate the code
flows between the two to ease next steps. The behaviour is not changed
at all, the code is only rearranged.
This is the continuation of previous patch. Now that full buffers are
not rejected anymore, let's wait for at least the advertised chunk or
body length to be present or the buffer to be full. When either
condition is met, the message processing can go forward.
Thus we don't need to use url_param_post_limit anymore, which was passed
in the configuration as an optionnal <max_wait> parameter after the
"check_post" value. This setting was necessary when the feature was
implemented because there was no support for parsing message bodies.
The argument is now silently ignored if set in the configuration.
http_process_request_body() currently expects a request body containing
exactly an expected message body. This was done in order to support load
balancing on a unique POST parameter but the way it's done still suffers
from some limitations. One of them is that there is no guarantee that the
accepted message will contain the appropriate string if it starts with
another parameter. But at the same time it will reject a message when the
buffer is full.
So as a first step, we don't reject anymore message bodies that fill the
buffer.
Use HAProxy's exit status as the systemd wrapper's exit status instead
of always returning EXIT_SUCCESS, permitting the use of systemd's
`Restart = on-failure' logic.
Use standard error for logging messages, as it seems that this gets
messages to the systemd journal more reliably. Also use systemd's
support for specifying log levels via stderr to apply different levels
to messages.
Re-execute the systemd wrapper on SIGUSR2 and before reloading HAProxy,
making it possible to load a completely new version of HAProxy
(including a new version of the systemd wrapper) gracefully.
Since the wrapper accepts no command-line arguments of its own,
re-execution is signaled using the HAPROXY_SYSTEMD_REEXEC environment
variable.
This is primarily intended to help seamless upgrades of distribution
packages.
Since commit 4d4149c ("MEDIUM: counters: support passing the counter
number as a fetch argument"), the sample fetch sc_tracked(num) became
equivalent to sc[0-9]_tracked, by using the same smp_fetch_sc_tracked()
function.
This was theorically made possible after the series of changes starting
with commit a65536ca ("MINOR: counters: provide a generic function to
retrieve a stkctr for sc* and src."). Unfortunately, while all other
functions were changed to use the generic primitive smp_fetch_sc_stkctr(),
smp_fetch_sc_tracked() was forgotten and is not able to differentiate
between sc_tracked, src_tracked and sc[0-9]_tracked. The resulting mess is
that if sc_tracked is used, the counter number is assumed to be 47 because
that's what remains after subtracting "0" from char "_".
Fix this by simply relying on the generic function as should have been
done. The bug was introduced in 1.5-dev20. No backport is needed.
If the unique-id value is missing, the build_logline() function dump
anything. It is because the function lf_text() is bypassed. This
function is responsible to dump '-' is the value is not present, and set
the '"' around the value displayed.
This fixes the bug reported by Julient Vehent
language(<value[;value[;value[;...]]]>[,<default>])
Returns the value with the highest q-factor from a list as
extracted from the "accept-language" header using "req.fhdr".
Values with no q-factor have a q-factor of 1. Values with a
q-factor of 0 are dropped. Only values which belong to the
list of semi-colon delimited <values> will be considered. If
no value matches the given list and a default value is
provided, it is returned. Note that language names may have
a variant after a dash ('-'). If this variant is present in
the list, it will be matched, but if it is not, only the base
language is checked. The match is case-sensitive, and the
output string is always one of those provided in arguments.
The ordering of arguments is meaningless, only the ordering
of the values in the request counts, as the first value among
multiple sharing the same q-factor is used.
Example :
# this configuration switches to the backend matching a
# given language based on the request :
acl de req.fhdr(accept-language),language(de;es;fr;en) de
acl es req.fhdr(accept-language),language(de;es;fr;en) es
acl fr req.fhdr(accept-language),language(de;es;fr;en) fr
acl en req.fhdr(accept-language),language(de;es;fr;en) en
use_backend german if de
use_backend spanish if es
use_backend french if fr
use_backend english if en
default_backend choose_your_language
The function addr_to_stktable_key doesn't consider the expected
type of key. If the stick table key is based on IPv6 addresses
and the input is IPv4, the returned key is IPv4 adddress and his
length is 4 bytes, while is expected 16 bytes key.
This patch considers the expected key and try to convert IPv4 to
IPv6 and IPv6 to IPv4 according with the expected key.
This fixes the bug reported by Apollon Oikonomopoulos.
This bug was introduced somewhere in the 1.5-dev process.
Lukas reported another OpenBSD complaint about this use of sprintf() that
I missed :
src/ssl_sock.o(.text+0x2a79): In function `bind_parse_crt':
src/ssl_sock.c:3015: warning: sprintf() is often misused, please use snprintf()
This one was even easier to handle. Note that some of these calls could
be simplified by checking the snprintf output size instead of doing the
preliminary size computation.
This patch adds standardized (rfc 2409 / rfc 3526)
DH parameters with prime lengths of 1024, 2048, 3072, 4096, 6144 and
8192 bits, based on the private key size.
When compiled with USE_GETADDRINFO, make sure we use getaddrinfo(3) to
perform name lookups. On default dual-stack setups this will change the
behavior of using IPv6 first. Global configuration option
'nogetaddrinfo' can be used to revert to deprecated gethostbyname(3).
OpenBSD complains this way due to strncat() :
src/haproxy-systemd-wrapper.o(.text+0xd5): In function `spawn_haproxy':
src/haproxy-systemd-wrapper.c:33: warning: strcat() is almost always misused, please use strlcat()
In fact, the code before strncat() here is wrong, because it may
dereference a NULL if /proc/self/exe is not readable. So fix it
and get rid of strncat() at the same time.
No backport is needed.
OpenBSD complains about this use of sprintf() :
src/proto_http.o(.text+0xb0e6): In function `http_process_request':
src/proto_http.c:4127: warning: sprintf() is often misused, please use snprintf()
Here there's no risk as the strings are way shorter than the buffer size
but let's fix it anyway.
OpenBSD complains about our use of sprintf() here :
src/checks.o(.text+0x44db): In function `process_chk':
src/checks.c:766: warning: sprintf() is often misused, please use snprintf()
This case was not really clean since the introduction of global.node BTW.
Better change the API to support a size argument in the function and enforce
the limit.
A few occurrences of sprintf() were causing harmless warnings on OpenBSD :
src/cfgparse.o(.text+0x259e): In function `cfg_parse_global':
src/cfgparse.c:1044: warning: sprintf() is often misused, please use snprintf()
These ones were easy to get rid of, so better do it.
OpenBSD complains about the use of sprintf in human_time() :
src/standard.o(.text+0x1c40): In function `human_time':
src/standard.c:2067: warning: sprintf() is often misused, please use snprintf()
We can easily get around this by having a pointer to the end of the string and
using snprintf() instead.
OpenBSD complains about our use of strcpy() in standard.c. The checks
were OK and we didn't fall into the category of "almost always misused",
but it's very simple to fix it so better do it before a problem happens.
src/standard.o(.text+0x26ab): In function `str2sa_range':
src/standard.c:718: warning: strcpy() is almost always misused, please use strlcpy()
SNI is a TLS extension and requires at least TLSv1.0 or later, however
the version in the record layer may be SSLv3, not necessarily TLSv1.0.
GnuTLS for example does this.
Relax the record layer version check in smp_fetch_ssl_hello_sni() to
allow fetching SNI values from clients indicating SSLv3 in the record
layer (maintaining the TLSv1.0+ check in the actual handshake version).
This was reported and analyzed by Pravin Tatti.
The TLS unique id, or unique channel binding, is a byte string that can be
pulled from a TLS connection and it is unique to that connection. It is
defined in RFC 5929 section 3. The value is used by various upper layer
protocols as part of an extra layer of security. For example XMPP
(RFC 6120) and EST (RFC 7030).
Add the ssl_fc_unique_id keyword and corresponding sample fetch method.
Value is retrieved from OpenSSL and base64 encoded as described in RFC
5929 section 3.
Constructions such as sc0_get_gpc0(foo) allow to look up the same key as
the current key but in an alternate table. A check was missing to ensure
we already have a key, resulting in a crash if this lookup is performed
before the associated track-sc rule.
This bug was reported on the mailing list by Neil@iamafreeman and
narrowed down further by Lukas Tribus and Thierry Fournier.
This bug was introduced in 1.5-dev20 by commit "0f791d4 MEDIUM: counters:
support looking up a key in an alternate table".
RFC 1945 (§4.1) defines an HTTP/0.9 request ("Simple-Request") as:
Simple-Request = "GET" SP Request-URI CRLF
HAProxy tries to automatically upgrade HTTP/0.9 requests to
to HTTP/1.0, by appending "HTTP/1.0" to the request and setting the
Request-URI to "/" if it was not present. The latter however is
RFC-incompatible, as HTTP/0.9 requests must already have a Request-URI
according to the definition above. Additionally,
http_upgrade_v09_to_v10() does not check whether the request method is
indeed GET (the mandatory method for HTTP/0.9).
As a result, any single- or double-word request line is regarded as a
valid HTTP request. We fix this by failing in http_upgrade_v09_to_v10()
if the request method is not GET or the request URI is not present.
The cfgparse.c file becomes huge, and a large part of it comes from the
server keyword parser. Since the configuration is a bit more modular now,
move this parser to server.c.
This patch also moves the check of the "server" keyword earlier in the
supported keywords list, resulting in a slightly faster config parsing
for configs with large numbers of servers (about 10%).
No functional change was made, only the code was moved.
We have a use case where we look up a customer ID in an HTTP header
and direct it to the corresponding server. This can easily be done
using ACLs and use_backend rules, but the configuration becomes
painful to maintain when the number of customers grows to a few
tens or even a several hundreds.
We realized it would be nice if we could make the use_backend
resolve its name at run time instead of config parsing time, and
use a similar expression as http-request add-header to decide on
the proper backend to use. This permits the use of prefixes or
even complex names in backend expressions. If no name matches,
then the default backend is used. Doing so allowed us to get rid
of all the use_backend rules.
Since there are some config checks on the use_backend rules to see
if the referenced backend exists, we want to keep them to detect
config errors in normal config. So this patch does not modify the
default behaviour and proceeds this way :
- if the backend name in the use_backend directive parses as a log
format rule, it's used as-is and is resolved at run time ;
- otherwise it's a static name which must be valid at config time.
There was the possibility of doing this with the use-server directive
instead of use_backend, but it seems like use_backend is more suited
to this task, as it can be used for other purposes. For example, it
becomes easy to serve a customer-specific proxy.pac file based on the
customer ID by abusing the errorfile primitive :
use_backend bk_cust_%[hdr(X-Cust-Id)] if { hdr(X-Cust-Id) -m found }
default_backend bk_err_404
backend bk_cust_1
errorfile 200 /etc/haproxy/static/proxy.pac.cust1
Signed-off-by: Bertrand Jacquin <bjacquin@exosec.fr>
This patch permit to register new sections in the haproxy's
configuration file. This run like all the "keyword" registration, it is
used during the haproxy initialization, typically with the
"__attribute__((constructor))" functions.
The function url2sa() converts faster url like http://<ip>:<port> in a
struct sockaddr_storage. This patch add:
- the https support
- permit to return the length parsed
- support IPv6
- support DNS synchronous resolution only during start of haproxy.
The faster IPv4 convertion way is keeped. IPv6 is slower, because I use
the standard IPv6 parser function.
This function it is used for dynamically update all the patterns
attached to one file. This function is atomic. All parsing or indexation
failures are reported in the haproxy logs.
For outgoing connections initiated from an applet, there might not be
any listener. It's the case with peers, which resort to a hack consisting
in making the session's listener point to the peer. This listener is only
used for statistics now so it's much easier to check for its presence now.
This patch replace the word <name> by the word <file>. This word defines
the (string) returned by show "map/acl". This patch also update
documentation to explain how is composed the map or acl identifier.
Till now we didn't consider "q=". It's problematic because the first
effect is that compression tokens were not even matched if it was
present.
It is important to parse it correctly because we still want to allow
a user-agent to send "q=0" to explicitly disable a compressor, or to
specify its preferences.
Now, q-values are respected in order of precedence, and when several
q-values are equal, the first occurrence is used.
The ACL changes made in the last patchset force the execution
of each pattern matching function. The function pat_match_nothing
was not provided to be excuted, it was just used as a flag that
was checked by the ACL execution code. Now this function is
executed and always returns false.
This patch makes it work as expected. Now, it returns the boolean
status of the received sample just as was done previously in the
ACL code.
This bug is a part of the patchset just merged. It does not need
to be backported.
This patch replace a lot of pointeur by pattern matching identifier. If
the declared ACL use all the predefined pattern matching functions, the
register function gets the functions provided by "pattern.c" and
identified by the PAT_LATCH_*.
In the case of the acl uses his own functions, they can be declared, and
the acl registration doesn't change it.
This flag is no longer used. The last place using this, are the display
of the result of pattern matching in the cli command "get map" or "get
acl".
The first parameter of this command is the reference of the file used to
perform the lookup.
The function str2net runs DNS resolution if valid ip cannot be parsed.
The DNS function used is the standard function of the libc and it
performs asynchronous request.
The asynchronous request is not compatible with the haproxy
archictecture.
str2net() is used during the runtime throught the "socket".
This patch remove the DNS resolution during the runtime.
The indexation functions now accept duplicates. This way it is possible
to always have some consistency between lists and trees. The "add" command
will always add regardless of any previous existence. The new entry will
not be used because both trees and list retrieve keys in insertion order.
Thus the "add" operation will always succeed (as long as there is enough
memory).
If acl is shared with a map, the "add acl" command must be blocked
because it not take a sample on his parameters. The absense of this
parameter can cause error with corresponding maps.
The pointer <regstr> is only used to compare and identify the original
regex string with the patterns. Now the patterns have a reference map
containing this original string. It is useless to store this value two
times.
Before this patch, this function try to add values in best effort. If
the parsing iof the value fail, the operation continue until the end.
Now, this function stop on the first error and left the pattern in
coherant state.
This patch adds new display type. This display returns allocated string,
when the string is flush into buffers, it is freed. This permit to
return the content of "memprintf(err, ...)" messages.
The pat_ref_add functions has changed to return error.
The format of the acl file are not the same than the format of the map
files. In some case, the same file can be used, but this is ambiguous
for the user because the patterns are not the expected.
The acl and map function do the same work with the file parsing. This
patch merge these code in only one.
Note that the function map_read_entries_from_file() in the file "map.c"
is moved to the the function pat_ref_read_from_file_smp() in the file
"pattern.c". The code of this function is not modified, only the the
name and the arguments order has changed.
Each pattern displayed is associated to the value of his pattern
reference. This value can be used for deleting the entry. It is useful
with complex regex: the users are not forced to write the regex with all
the amiguous chars and escaped chars on the CLI.
The find_smp search the smp using the value of the pat_ref_elt pointer.
The pat_find_smp_* are no longer used. The function pattern_find_smp()
known all pattern indexation, and can be found
All the pattern delete function can use her reference to the original
"struct pat_ref_elt" to find the element to be remove. The functions
pat_del_list_str() and pat_del_meth() were deleted because after
applying this modification, they have the same code than pat_del_list_ptr().
Before this patch, the "get map/acl" function try to convert and display
the sample. This behavior is not efficient because some type like the
regex cannot be reversed and displayed as string.
This patch display the original stored reference.
Now, each pattern entry known the original "struct pat_ref_elt" from
that was built. This patch permit to delete each pattern entry without
confusion. After this patch, each reference can use his pointer to be
targeted.
The function Pattern_add() is only used by pat_ref_push(). This patch
remove the function pattern_add() and merge his code in the function
pat_ref_push().
The pattern reference are stored with two identifiers: the unique_id and
the reference.
The reference identify a file. Each file with the same name point to the
same reference. We can register many times one file. If the file is
modified, all his dependencies are also modified. The reference can be
used with map or acl.
The unique_id identify inline acl. The unique id is unique for each acl.
You cannot force the same id in the configuration file, because this
repport an error.
The format of the acl and map listing through the "socket" has changed
for displaying these new ids.
This patch extract the expect_type variable from the "struct pattern" to
"struct pattern_head". This variable is set during the declaration of
ACL and MAP. With this change, the function "pat_parse_len()" become
useless and can be replaced by "pat_parse_int()".
Implicit ACLs by default rely on the fetch's output type, so let's simply do
the same for all other ones. It has been verified that they all match.
Sometimes the same pattern file is used with the same index, parse and
parse_smp functions. If this two condition are true, these two pattern
are identical and the same struct can be used.
This patch add the following socket command line options:
show acl [<id>]
clear acl <id>
get acl <id> <pattern>
del acl <id> <pattern>
add acl <id> <pattern>
The system used for maps is backported in the pattern functions.
Some functions needs to change the sample associated to pattern. This
new pointer permit to return the a pointer to the sample pointer. The
caller can use or change the value.
This commit adds a delete function for patterns. It looks up all
instances of the pattern to delete and deletes them all. The fetch
keyword declarations have been extended to point to the appropriate
delete function.
This commit adds second tree node in the pattern struct and use it to
index IPv6 addresses. This commit report feature used in the list. If
IPv4 not match the tree, try to convert the IPv4 address in IPv6 with
prefixing the IPv4 address by "::ffff", after this operation, the match
function try lookup in the IPv6 tree. If the IPv6 sample dont match the
IPv6 tree, try to convert the IPv6 addresses prefixed by "2002:IPv4",
"::ffff:IPv4" and "::0000:IPv4" in IPv4 address. after this operation,
the match function try lookup in the IPv4 tree.
The match function known the format of the pattern. The pattern can be
stored in a list or in a tree. The pattern matching function use itself
the good entry point and indexation type.
Each pattern matching function return the struct pattern that match. If
the flag "fill" is set, the struct pattern is filled, otherwise the
content of this struct must not be used.
With this feature, the general pattern matching function cannot have
exceptions for building the "struct pattern".
The original get map display function set the comma separator after each
word displayed. This is not efficient because we cannot knew if the
displayed word is the last.
This new system set the comma separator before the displayed word, and
independant "\n" is set a the end of the function.
Before this commit, the pattern_exec_match() function returns the
associate sample, the associate struct pattern or the associate struct
pattern_tree. This is complex to use, because we can check the type of
information returned.
Now the function return always a "struct pattern". If <fill> is not set,
only the value of the pointer can be used as boolean (NULL or other). If
<fill> is set, you can use the <smp> pointer and the pattern
information.
If information must be duplicated, it is stored in trash buffer.
Otherwise, the pattern can point on existing strings.
The method are actuelly stored using two types. Integer if the method is
known and string if the method is not known. The fetch is declared as
UINT, but in some case it can provides STR.
This patch create new type called METH. This type contain interge for
known method and string for the other methods. It can be used with
automatic converters.
The pattern matching can expect method.
During the free or prune function, http_meth pettern is freed. This
patch initialise the freed pointer to NULL.
The operations applied on types SMP_T_CSTR and SMP_T_STR are the same,
but the check code and the declarations are double, because it must
declare action for SMP_T_C* and SMP_T_*. The declared actions and checks
are the same. this complexify the code. Only the "conv" functions can
change from "C*" to "*"
Now, if a function needs to modify input string, it can call the new
function smp_dup(). This one duplicate data in a trash buffer.
The pattern parse functions put the parsed result in a "struct pattern"
without memory allocation. If the pattern must reference the input data
without changes, the pattern point to the parsed string. If buffers are
needed to store translated data, it use th trash buffer. The indexation
function that allocate the memory later if it is needed.
Before this patch, the indexation function check the declared patttern
matching function and index the data according with this function. This
is not useful to add some indexation mode.
This commit adds dedicated indexation function. Each struct pattern is
associated with one indexation function. This function permit to index
data according with the type of pattern and with the type of match.
This commit separes the "struct list" used for the chain the "struct
pattern" which contain the pattern data. Later, this change will permit
to manipulate lists ans trees with the same "struct pattern".
Each pattern parser take only one string. This change is reported to the
function prototype of the function "pattern_register()". Now, it is
called with just one string and no need to browse the array of args.
After the previous patches, the "pat_parse_strcat()" function disappear,
and the "pat_parse_int()" and "pat_parse_dotted_ver()" functions dont
use anymore the "opaque" argument, and take only one string on his
input.
So, after this patch, each pattern parser no longer use the opaque
variable and take only one string as input. This patch change the
prototype of the pattern parsing functions.
Now, the "char **args" is replaced by a "char *arg", the "int *opaque"
is removed and these functions return 1 in succes case, and 0 if fail.
The goal of these patch is to simplify the prototype of
"pat_pattern_*()" functions. I want to replace the argument "char
**args" by a simple "char *arg" and remove the "opaque" argument.
"pat_parse_int()" and "pat_parse_dotted_ver()" are the unique pattern
parser using the "opaque" argument and using more than one string
argument of the char **args. These specificities are only used with ACL.
Other systems using this pattern parser (MAP and CLI) just use one
string for describing a range.
This two functions can read a range, but the min and the max must y
specified. This patch extends the syntax to describe a range with
implicit min and max. This is used for operators like "lt", "le", "gt",
and "ge". the syntax is the following:
":x" -> no min to "x"
"x:" -> "x" to no max
This patch moves the parsing of the comparison operator from the
functions "pat_parse_int()" and "pat_parse_dotted_ver()" to the acl
parser. The acl parser read the operator and the values and build a
volatile string readable by the functions "pat_parse_int()" and
"pat_parse_dotted_ver()". The transformation is done with these rules:
If the parser is "pat_parse_int()":
"eq x" -> "x"
"le x" -> ":x"
"lt x" -> ":y" (with y = x - 1)
"ge x" -> "x:"
"gt x" -> "y:" (with y = x + 1)
If the parser is "pat_parse_dotted_ver()":
"eq x.y" -> "x.y"
"le x.y" -> ":x.y"
"lt x.y" -> ":w.z" (with w.z = x.y - 1)
"ge x.y" -> "x.y:"
"gt x.y" -> "w.z:" (with w.z = x.y + 1)
Note that, if "y" is not present, assume that is "0".
Now "pat_parse_int()" and "pat_parse_dotted_ver()" accept only one
pattern and the variable "opaque" is no longer used. The prototype of
the pattern parsers can be changed.
This patch remove the limit of 32 groups. It also permit to use standard
"pat_parse_str()" function in place of "pat_parse_strcat()". The
"pat_parse_strcat()" is no longer used and its removed. Before this
patch, the groups are stored in a bitfield, now they are stored in a
list of strings. The matching is slower, but the number of groups is
low and generally the list of allowed groups is short.
The fetch function "smp_fetch_http_auth_grp()" used with the name
"http_auth_group" return valid username. It can be used as string for
displaying the username or with the acl "http_auth_group" for checking
the group of the user.
Maybe the names of the ACL and fetch methods are no longer suitable, but
I keep the current names for conserving the compatibility with existing
configurations.
The function "userlist_postinit()" is created from verification code
stored in the big function "check_config_validity()". The code is
adapted to the new authentication storage system and it is moved in the
"src/auth.c" file. This function is used to check the validity of the
users declared in groups and to check the validity of groups declared
on the "user" entries.
This resolve function is executed before the check of all proxy because
many acl needs solved users and groups.
The ACL keyword returned by find_acl_kw() is checked for having a
valid ->parse() function. This dates back 2007 when ACLs were reworked
in order to differenciate old and new keywords. This check is
inappropriate and confusing since all keywords have a parser now.
The bin2str cast gives the hexadecimal representation of the binary
content when it is used as string. This was inherited from the
stick-table casts without realizing that it was a mistake. Indeed,
it breaks string processing on binary contents, preventing any _reg,
_beg, etc from working.
For example, with an HTTP GET request, the fetch "req.payload(0,3)"
returns the 3 bytes "G", "E", and "T" in binary. If this fetch is
used with regex, it is automatically converted to "474554" and the
regex is applied on this string, so it never matches.
This commit changes the cast so that bin2str does not convert the
contents anymore, and returns a string type. The contents can thus
be matched as is, and the NULL character continues to mark the end
of the string to avoid any issue with some string-based functions.
This commit could almost have been marked as a bug fix since it
does what the doc says.
Note that in case someone would rely on the hex encoding, then the
same behaviour could be achieved by appending ",hex" after the sample
fetch function (brought by previous patch).
This new filter converts BIN type to its hexadecimal
representation in STR type. It is used to keep the
compatibility with the original bin2str cast.
It will be useful when bin2str changes to copy the
string as-is without encoding anymore.
The binary samples are sometimes copied as is into http headers.
A sample can contain bytes unallowed by the http rfc concerning
header content, for example if it was extracted from binary data.
The resulting http request can thus be invalid.
This issue does not yet happen because haproxy currently (mistakenly)
hex-encodes binary data, so it is not really possible to retrieve
invalid HTTP chars.
The solution consists in hex-encoding all non-printable chars prefixed
by a '%' sign.
No backport is needed since existing code is not affected yet.
cfg_parse_listen() currently checks for duplicated proxy names.
Now that we have a tree for this, we can use it.
The config load time was further reduced by 1.6, which is now
about 4.5 times faster than what it was without the trees.
In fact it was the last CPU-intensive processing involving proxy
names. Now the only remaining point is the automatic fullconn
computation which can be bypassed by having a fullconn in the
defaults section, reducing the load time by another 10x.
Large configurations can take time to parse when thousands of backends
are in use. Let's store all the proxies in trees.
findproxy_mode() has been modified to use the tree for lookups, which
has divided the parsing time by about 2.5. But many lookups are still
present at many places and need to be dealt with.
Currently there are two places where the compression context is released,
one in session_free() and another one in http_end_txn_clean_session().
Both of them call http_end_txn(), either directly or via http_reset_txn(),
and this function is made for this exact purpose. So let's centralize the
call there instead.
Currently, "balance url_param check_post" randomly works. If the client
sends chunked data and there's another chunk after the one containing the
data, http_request_forward_body() will advance msg->sov and move the start
of data to the beginning of the last chunk, and get_server_ph_post() will
not find the data.
In order to avoid this, we add an HTTP_MSGF_WAIT_CONN flag whose goal is
to prevent the forwarding code from parsing until the connection is
confirmed, so that we're certain not to fail on a redispatch. Note that
we need to force channel_auto_connect() since the output buffer is empty
and a previous analyser might have stopped auto-connect.
The flag is currently set whenever some L7 POST analysis is needed for a
connect() so that it correctly addresses all corner cases involving a
possible rewind of the buffer, waiting for a better fix.
Note that this has been broken for a very long time. Even all 1.4 versions
seem broken but differently, with ->sov pointing to the end of the arguments.
So the fix should be considered for backporting to all stable releases,
possibly including 1.3 which works differently.
Julien Vehent repport that the log format '%{+Q}hr' display the value
termnated by two chars '"' like this: '"value""'. This patch just remove
the second quote.
This bug is old but 1.5-specific but users of older 1.5 versions may be
interested in a backport.
The parser check the end line comparing to the null character.
In fact, the end of line can be also '\r' or '\n'.
The effect is that empty lines are loaded and indexed in maps.
The bug was introduced by commit d5f624dd ("MEDIUM: sample:
add the "map" converter") in 1.5-dev20. No backport is needed.
smp_fetch_res_comp_algo() returns the name of the compression algorithm
in use. The output type is set to SMP_T_STR instead of SMP_T_CSTR, which
causes any transformation to be operated without a cast. Fortunately,
the current converters do not overwrite a zero-sized area, so the result
is an empty string. Fix this to have SMP_T_CSTR instead so that the cast
is always performed using a copy before any transformation is done.
I was testing haproxy-1.5-dev22 on SmartOS (an illumos-based system)
and ran into a problem. There's a small window after non-blocking
connect() is called, but before the TCP connection is established,
where recv() may return ENOTCONN. On Linux, the behaviour here seems
to be always to return EAGAIN. The fix is relatively trivial, and
appears to make haproxy work reliably on current SmartOS (see patch
below). It's possible that other UNIX platforms exhibit this
behaviour as well.
Note: the equivalent was already done for send() in commit 0ea0cf6
("BUG: raw_sock: also consider ENOTCONN in addition to EAGAIN").
Both patches should be backported to 1.4.
Lets set IP_FREEBIND on IPv6 sockets as well, this works since Linux 3.3
and doesn't require CAP_NET_ADMIN privileges (IPV6_TRANSPARENT does).
This allows unprivileged users to bind to non-local IPv6 addresses, which
can be useful when setting up the listening sockets or when connecting
to backend servers with a specific, non-local source IPv6 address (at that
point we usually dropped root privileges already).
Disabled backends don't have their symbols resolved. We must not initialize
their peers section since they're not valid and instead still contain the
section's name.
There are other places where such unions are still in use, and other similar
errors might still happen. Ideally we should get rid of all of them in the
quite sensible config stage.
Since commit 0ce3aa0c ("MEDIUM: acl: implement payload and payload_lv"),
the payload and payload_lv ACL patterns were declared as strings because
at this date there was no support for binary patterns. At this time, these
ACLs were not reliably usable due to the binary-to-string cast involved,
and because it was not possible to specify the direction of the match.
Since recent evolutions, the new fetch methods "req.payload" and
"res.payload" have leveraged the ambiguity and were of type "binary",
with an implicit ACL mapping of the same type. The doc also states
that "payload" is an alias for "req.payload" etc... while these two
don't share the same type.
Better fix this mess before it's too late. "payload" and "payload_lv"
return a binary content, so their ACLs must by default use a binary
pattern. That way they behave like their "req." and "res." sisters.
This change might break some configs making use of these, but there's
almost a zero probability that anyone managed to use them to match
exact strings, so in practice the change should be safe.
Finn Arne Gangstad reported that commit 6b726adb35 ("MEDIUM: http: do
not report connection errors for second and further requests") breaks
support for serving static files by abusing the errorfile 503 statement.
Indeed, a second request over a connection sent to any server or backend
returning 503 would silently be dropped.
The proper solution consists in adding a flag on the session indicating
that the server connection was reused, and to only avoid the error code
in this case.
Since 1.5-dev20, we have a working server-side keep-alive and an option
"prefer-last-server" to indicate that we explicitly want to reuse the
same server as the last one. Unfortunately this breaks the redispatch
feature because assign_server() insists on reusing the same server as
the first one attempted even if the connection failed to establish.
A simple solution consists in only considering the last connection if
it was connected. Otherwise there is no reason for being interested in
reusing the same server.
Commits e0d1bfb ("[MINOR] Allow shutdown of sessions when a server
becomes unavailable") and eb2c24a ("MINOR: checks: add on-marked-up
option") mentionned that the directive was supported in default-server
but while it can be stated there, it's ignored because the config value
is not copied from the default server upon creation of a new server.
Moving the statement to the "server" lines works fine though. Thanks
to Baptiste Assmann for reporting and diagnosing this bug.
These features were introduced in 1.5-dev6 and 1.5-dev10 respectively,
so no backport is needed.
Igor Chan reported a very interesting bug which was triggered by the
recent dynamic size change in SSL.
The OpenSSL API refuses to send less data than any failed previous
attempt. So what's happening is that if an SSL_write() in streaming
mode sends 5kB of data and the openssl layer cannot send them all,
it returns SSL_ERROR_WANT_WRITE, which haproxy reacts to by enabling
polling on the file descriptor. In the mean time, haproxy may detect
that the buffer was almost full and will disable streaming mode. Upon
write notification, it will try to send again, but less data this
time (limited to tune.ssl_max_record). OpenSSL disagrees with this
and returns a generic error SSL_ERROR_SSL.
The solution which was found consists in adding a flag to the SSL
context to remind that we must not shrink writes after a failed
attempt. Thus, if EAGAIN is encountered, the next send() will not
be limited in order to retry the same size as before.
Cyril Bonté reported that despite commit 0dbbf317 which attempted
to fix the crash when a peers section has no name, we still get a
segfault after the error message when parsing the peers. The reason
is that the returned error code is ERR_FATAL and not ERR_ABORT, so
the parsing continues while the section was not initialized.
This is 1.5-specific, no backport is needed.
Peers with integer stick tables are breaking the keys received. This is due to
the fact that the sender converts the key with htonl() but the receiver doesn't
convert the value back to its original format.
Peers appeared in haproxy-1.5, no backport is needed.
Sometimes it can be useful to generate a random value, at least
for debugging purposes, but also to take routing decisions or to
pass such a value to a backend server.
The ability to globally override the default client and server cipher
suites has been requested multiple times since the introduction of SSL.
This commit adds two new keywords to the global section for this :
- ssl-default-bind-ciphers
- ssl-default-server-ciphers
It is still possible to preset them at build time by setting the macros
LISTEN_DEFAULT_CIPHERS and CONNECT_DEFAULT_CIPHERS.
The new tune.idletimer value allows one to set a different value for
idle stream detection. The default value remains set to one second.
It is possible to disable it using zero, and to change the default
value at build time using DEFAULT_IDLE_TIMER.
Disabling the streamer flags after an idle period will help TCP proxies
to better adapt to the streams they're forwarding, especially with SSL
where this will allow the SSL sender to use smaller records. This is
typically used to optimally relay HTTP and derivatives such as SPDY or
HTTP/2 in pure TCP mode when haproxy is used as an SSL offloader.
This idea was first proposed by Ilya Grigorik on the haproxy mailing
list, and his tests seem to confirm the improvement :
https://www.mail-archive.com/haproxy@formilux.org/msg12576.html
tcp-check must not reinitialize the SSL stack upon each check!
It's done once after the config parsing and leaks memory and eats
performance when done upon every check.
This bug was introduced in 1.5-dev22, no backport is needed.
It happens that latest change broke some monitoring tools which expect the
field to be found at the same position as indicated in the doc. Let's move
it to the last column instead.
I forgot to remove one human_time() in the CSV output for the backend's
lastsess entry in previous patch, which caused the value to be reported
as "1m18s" for example instead of 78.
Summary:
Track and report last session time on the stats page for each server
in every backend, as well as the backend.
This attempts to address the requirement in the ROADMAP
- add a last activity date for each server (req/resp) that will be
displayed in the stats. It will be useful with soft stop.
The stats page reports this as time elapsed since last session. This
change does not adequately address the requirement for long running
session (websocket, RDP... etc).
By having the stream interface pass the CF_STREAMER flag to the
snd_buf() primitive, we're able to tell the send layer whether
we're sending large chunks or small ones.
We use this information in SSL to adjust the max record dynamically.
This results in small chunks respecting tune.ssl.maxrecord at the
beginning of a transfer or for small transfers, with an automatic
switch to full records if the exchanges last long. This allows the
receiver to parse HTML contents on the fly without having to retrieve
16kB of data, which is even more important with small initcwnd since
the receiver does not need to wait for round trips to start fetching
new objects. However, sending large files still produces large chunks.
For example, with tune.ssl.maxrecord = 2859, we see 5 write(2885)
sent in two segments each and 6 write(16421).
This idea was first proposed on the haproxy mailing list by Ilya Grigorik.
This prevents us from passing other useful info and requires the
upper levels to know these flags. Let's use a new flags category
instead : CO_SFL_*. For now, only MSG_MORE has been remapped.
When no check type is configured (so the basic connection check), we
want the connection success to be immediately reported. Unfortunately,
it did not happen because in this case the connection is not registered
for read nor for write, and the wake_srv() callback does not handle this
case where no data transfer was requested. However, having option tcp-check
hides this problem because the check type follows a different setup mode,
by having check->type != 0 and the connection believing it must try to
send data.
The effect was that without any option, checks would succeed only at the
end of the check interval. So let's just add the wake-up condition.
This bug appeared with the recent polling changes, no backport is needed.
As a workaround, using "option tcp-check" fixes the problem.
Useless strncpy were done in those two sample fetches, the
"struct chunk" allows us to dump the specified len.
The encode_string() in capture.req.uri was judged inappropriate and was
deleted.
The return type was fixed to SMP_T_CSTR.
A typo made first step of a tcpcheck to be a connect step. This patch
prevents this behavior. The bug was introduced in 1.5-dev22 with
"tcp-check connect" and only affects these directives. No backport is
needed.
Add 2 sample fetchs allowing to extract the method and the uri of an
HTTP request.
FIXME: the sample fetches parser can't add the LW_REQ requirement, at
the moment this flag is used automatically when you use sample fetches.
Note: also fixed the alphabetical order of other capture.req.* keywords
in the doc.
Released version 1.5-dev22 with the following main changes :
- MEDIUM: tcp-check new feature: connect
- MEDIUM: ssl: Set verify 'required' as global default for servers side.
- MINOR: ssl: handshake optim for long certificate chains.
- BUG/MINOR: pattern: pattern comparison executed twice
- BUG/MEDIUM: map: segmentation fault with the stats's socket command "set map ..."
- BUG/MEDIUM: pattern: Segfault in binary parser
- MINOR: pattern: move functions for grouping pat_match_* and pat_parse_* and add documentation.
- MINOR: standard: The parse_binary() returns the length consumed and his documentation is updated
- BUG/MINOR: payload: the patterns of the acl "req.ssl_ver" are no parsed with the good function.
- BUG/MEDIUM: pattern: "pat_parse_dotted_ver()" set bad expect_type.
- BUG/MINOR: sample: The c_str2int converter does not fail if the entry is not an integer
- BUG/MEDIUM: http/auth: Sometimes the authentication credentials can be mix between two requests
- MINOR: doc: Bad cli function name.
- MINOR: http: smp_fetch_capture_header_* fetch captured headers
- BUILD: last release inadvertently prepended a "+" in front of the date
- BUG/MEDIUM: stream-int: fix the keep-alive idle connection handler
- BUG/MEDIUM: backend: do not re-initialize the connection's context upon reuse
- BUG: Revert "OPTIM/MEDIUM: epoll: fuse active events into polled ones during polling changes"
- BUG/MINOR: checks: successful check completion must not re-enable MAINT servers
- MINOR: http: try to stick to same server after status 401/407
- BUG/MINOR: http: always disable compression on HTTP/1.0
- OPTIM: poll: restore polling after a poll/stop/want sequence
- OPTIM: http: don't stop polling for read on the client side after a request
- BUG/MEDIUM: checks: unchecked servers could not be enabled anymore
- BUG/MEDIUM: stats: the web interface must check the tracked servers before enabling
- BUG/MINOR: channel: CHN_INFINITE_FORWARD must be unsigned
- BUG/MINOR: stream-int: do not clear the owner upon unregister
- MEDIUM: stats: add support for HTTP keep-alive on the stats page
- BUG/MEDIUM: stats: fix HTTP/1.0 breakage introduced in previous patch
- Revert "MEDIUM: stats: add support for HTTP keep-alive on the stats page"
- MAJOR: channel: add a new flag CF_WAKE_WRITE to notify the task of writes
- OPTIM: session: set the READ_DONTWAIT flag when connecting
- BUG/MINOR: http: don't clear the SI_FL_DONT_WAKE flag between requests
- MINOR: session: factor out the connect time measurement
- MEDIUM: session: prepare to support earlier transitions to the established state
- MEDIUM: stream-int: make si_connect() return an established state when possible
- MINOR: checks: use an inline function for health_adjust()
- OPTIM: session: put unlikely() around the freewheeling code
- MEDIUM: config: report a warning when multiple servers have the same name
- BUG: Revert "OPTIM: poll: restore polling after a poll/stop/want sequence"
- BUILD/MINOR: listener: remove a glibc warning on accept4()
- BUG/MAJOR: connection: fix mismatch between rcv_buf's API and usage
- BUILD: listener: fix recent accept4() again
- BUG/MAJOR: ssl: fix breakage caused by recent fix abf08d9
- BUG/MEDIUM: polling: ensure we update FD status when there's no more activity
- MEDIUM: listener: fix polling management in the accept loop
- MINOR: protocol: improve the proto->drain() API
- MINOR: connection: add a new conn_drain() function
- MEDIUM: tcp: report in tcp_drain() that lingering is already disabled on close
- MEDIUM: connection: update callers of ctrl->drain() to use conn_drain()
- MINOR: connection: add more error codes to report connection errors
- MEDIUM: tcp: report connection error at the connection level
- MEDIUM: checks: make use of chk_report_conn_err() for connection errors
- BUG/MEDIUM: unique_id: HTTP request counter is not stable
- DOC: fix misleading information about SIGQUIT
- BUG/MAJOR: fix freezes during compression
- BUG/MEDIUM: stream-interface: don't wake the task up before end of transfer
- BUILD: fix VERDATE exclusion regex
- CLEANUP: polling: rename "spec_e" to "state"
- DOC: add a diagram showing polling state transitions
- REORG: polling: rename "spec_e" to "state" and "spec_p" to "cache"
- REORG: polling: rename "fd_spec" to "fd_cache"
- REORG: polling: rename the cache allocation functions
- REORG: polling: rename "fd_process_spec_events()" to "fd_process_cached_events()"
- MAJOR: polling: rework the whole polling system
- MAJOR: connection: remove the CO_FL_WAIT_{RD,WR} flags
- MEDIUM: connection: remove conn_{data,sock}_poll_{recv,send}
- MEDIUM: connection: add check for readiness in I/O handlers
- MEDIUM: stream-interface: the polling flags must always be updated in chk_snd_conn
- MINOR: stream-interface: no need to call fd_stop_both() on error
- MEDIUM: connection: no need to recheck FD state
- CLEANUP: connection: use conn_ctrl_ready() instead of checking the flag
- CLEANUP: connection: use conn_xprt_ready() instead of checking the flag
- CLEANUP: connection: fix comments in connection.h to reflect new behaviour.
- OPTIM: raw-sock: don't speculate after a short read if polling is enabled
- MEDIUM: polling: centralize polled events processing
- MINOR: polling: create function fd_compute_new_polled_status()
- MINOR: cli: add more information to the "show info" output
- MEDIUM: listener: add support for limiting the session rate in addition to the connection rate
- MEDIUM: listener: apply a limit on the session rate submitted to SSL
- REORG: stats: move the stats socket states to dumpstats.c
- MINOR: cli: add the new "show pools" command
- BUG/MEDIUM: counters: flush content counters after each request
- BUG/MEDIUM: counters: fix stick-table entry leak when using track-sc2 in connection
- MINOR: tools: add very basic support for composite pointers
- MEDIUM: counters: stop relying on session flags at all
- BUG/MINOR: cli: fix missing break in command line parser
- BUG/MINOR: config: correctly report when log-format headers require HTTP mode
- MAJOR: http: update connection mode configuration
- MEDIUM: http: make keep-alive + httpclose be passive mode
- MAJOR: http: switch to keep-alive mode by default
- BUG/MEDIUM: http: fix regression caused by recent switch to keep-alive by default
- BUG/MEDIUM: listener: improve detection of non-working accept4()
- BUILD: listener: add fcntl.h and unistd.h
- BUG/MINOR: raw_sock: correctly set the MSG_MORE flag
A new tcp-check rule type: connect.
It allows HAProxy to test applications which stand on multiple ports or
multiple applications load-balanced through the same backend.
Due to a typo, the MSG_MORE flag used to replace MSG_NOSIGNAL and
MSG_DONTWAIT. Fortunately, sockets are always marked non-blocking,
so the loss of MSG_DONTWAIT is harmless, and the NOSIGNAL is covered
by the interception of the SIGPIPE. So no issue could have been
caused by this bug.
On ARM, glibc does not implement accept4() and simply returns ENOSYS
which was not caught as a reason to fall back to accept(), resulting
in a spinning process since poll() would call again.
Let's change the error detection mechanism to save the broken status
of the syscall into a local variable that is used to fall back to the
legacy accept().
In addition to this, since the code was becoming a bit messy, the
accept4() was removed, so now the fallback code and the legacy code
are the same. This will also increase bug report accuracy if needed.
This is 1.5-specific, no backport is needed.
Yesterday's commit 70dffda ("MAJOR: http: switch to keep-alive mode by default")
broke HTTP/1.0 handling without keep-alive when keep-alive is enabled both in
the frontend and in the backend.
Before this patch, it used to work because tunnel mode was the default one,
so if no mode was present in the frontend and a mode was set in the backend,
the backend was the first one to parse the header. This is what the original
patch tried to do with keep-alive by default, causing the version and the
connection header to be ignored if both the frontend and the backend were
running in keep-alive mode.
The fix consists in always parsing the header in non-tunnel mode, and
processing the rest of the logic in at least once, and again if the
backend works in a different mode than the frontend.
This is 1.5-specific, no backport is needed.
The authentication function "get_http_auth()" extract credentials from
the request and keep it this values in shared cache. This function set
a flag in the session indicating that the authentication is already
parsed and the value stored in the cache are avalaible. If this flag is
set the authorization header is not re-parsed and the shared cache is
used.
If two request are simultaneous processsed, the first one check the
credentials. After this, the second request check also it's credentials
and change the data stored in the shared cache. When the first request
re-check credentials (for many reasons), they are changed. The change
can introduce a segfault.
This patch deactivate the cache upon success. When we need
authentication information from one request, they are re-parsed and
re-decoded. However, a failure to retrieve credentials is still
cached to avoid useless lookups.
This fix needs to be backported to 1.4 as well.
Since we support HTTP keep-alive, there is no more reason for staying
in tunnel mode by default. It is confusing for new users and creates
more issues than it solves. Option "http-tunnel" is available to force
to use it if really desired.
Switching to KA by default has implied to change the value of some
option flags and some transaction flags so that value zero (default)
matches keep-alive. That explains why more code has been changed than
expected. Tests have been run on the 25 combinations of frontend and
backend options, plus a few with option http-pretend-keepalive, and
no anomaly was found.
The relation between frontend and backends remains the same. Options
have been updated to take precedence over http-keep-alive which is now
implicit.
All references in the doc to haproxy not supporting keep-alive have
been fixed, and the doc for config options has been updated.
There's no particular reason for having keep-alive + httpclose combine
into forceclose when set in different frontend/backend sections, since
keep-alive does not close anything by default. Let's have this still
combination remain httpclose only.
At the very beginning of haproxy, there was "option httpclose" to make
haproxy add a "Connection: close" header in both directions to invite
both sides to agree on closing the connection. It did not work with some
rare products, so "option forceclose" was added to do the same and actively
close the connection. Then client-side keep-alive was supported, so option
http-server-close was introduced. Now we have keep-alive with a fourth
option, not to mention the implicit tunnel mode.
The connection configuration has become a total mess because all the
options above may be combined together, despite almost everyone thinking
they cancel each other, as judging from the common problem reports on the
mailing list. Unfortunately, re-reading the doc shows that it's not clear
at all that options may be combined, and the opposite seems more obvious
since they're compared. The most common issue is options being set in the
defaults section that are not negated in other sections, but are just
combined when the user expects them to be overloaded. The migration to
keep-alive by default will only make things worse.
So let's start to address the first problem. A transaction can only work in
5 modes today :
- tunnel : haproxy doesn't bother with what follows the first req/resp
- passive close : option http-close
- forced close : option forceclose
- server close : option http-server-close with keep-alive on the client side
- keep-alive : option http-keep-alive, end to end
All 16 combination for each section fall into one of these cases. Same for
the 256 combinations resulting from frontend+backend different modes.
With this patch, we're doing something slightly different, which will not
change anything for users with valid configs, and will only change the
behaviour for users with unsafe configs. The principle is that these options
may not combined anymore, and that the latest one always overrides all the
other ones, including those inherited from the defaults section. The "no
option xxx" statement is still supported to cancel one option and fall back
to the default one. It is mainly needed to ignore defaults sections (eg:
force the tunnel mode). The frontend+backend combinations have not changed.
So for examplen the following configuration used to put the connection
into forceclose :
defaults http
mode http
option httpclose
frontend foo.
option http-server-close
=> http-server-close+httpclose = forceclose before this patch! Now
the frontend's config replaces the defaults config and results in
the more expected http-server-close.
All 25 combinations of the 5 modes in (frontend,backend) have been
successfully tested.
In order to prepare for upcoming changes, a new "option http-tunnel" was
added. It currently only voids all other options, and has the lowest
precedence when mixed with another option in another frontend/backend.
If no CA file specified on a server line, the config parser will show an error.
Adds an cmdline option '-dV' to re-set verify 'none' as global default on
servers side (previous behavior).
Also adds 'ssl-server-verify' global statement to set global default to
'none' or 'required'.
WARNING: this changes the default verify mode from "none" to "required" on
the server side, and it *will* break insecure setups.
When using some log-format directives in header insertion without HTTP mode,
the config parser used to report a cryptic message about option httplog being
downgraded to tcplog and with "(null):0" as the file name and line number.
This is because the lfs_file and lfs_line were not properly set for some valid
use cases of log-format directives. Now we cover http-request and http-response
as well.
Yesterday's commit 12833bb ("MINOR: cli: add the new "show pools" command")
missed a "break" statement causing trouble to the "show map" command. Spotted
by Thierry Fournier.
Till now, we had one flag per stick counter to indicate if it was
tracked in a backend or in a frontend. We just had to add another
flag per stick-counter to indicate if it relies on contents or just
connection. These flags are quite painful to maintain and tend to
easily conflict with other flags if their number is changed.
The correct solution consists in moving the flags to the stkctr struct
itself, but currently this struct is made of 2 pointers, so adding a
new entry there to store only two bits will cause at least 16 more bytes
to be eaten per counter due to alignment issues, and we definitely don't
want to waste tens to hundreds of bytes per session just for things that
most users don't use.
Since we only need to store two bits per counter, an intermediate
solution consists in replacing the entry pointer with a composite
value made of the original entry pointer and the two flags in the
2 unused lower bits. If later a need for other flags arises, we'll
have to store them in the struct.
A few inline functions have been added to abstract the retrieval
and assignment of the pointers and flags, resulting in very few
changes. That way there is no more dependence on the number of
stick-counters and their position in the session flags.
In 1.5-dev19, commit e25c917 ("MEDIUM: counters: add support for tracking
a third counter") introduced the third track counter. However, there was
a hard-coded test in the accept() error path to release only sc0 and sc1.
So it seems that if tracking sc2 at the connection level and deciding to
reject once the track-sc2 has been done, there could be some leaking of
stick-table entries which remain marked used forever, thus which can never
be purged nor expired. There's no memory leak though, it's just that
entries are unexpirable forever.
The simple solution consists in removing the test and always calling
the inline function which iterates over all entries.
One year ago, commit 5d5b5d8 ("MEDIUM: proto_tcp: add support for tracking
L7 information") brought support for tracking L7 information in tcp-request
content rules. Two years earlier, commit 0a4838c ("[MEDIUM] session-counters:
correctly unbind the counters tracked by the backend") used to flush the
backend counters after processing a request.
While that earliest patch was correct at the time, it became wrong after
the second patch was merged. The code does what it says, but the concept
is flawed. "TCP request content" rules are evaluated for each HTTP request
over a single connection. So if such a rule in the frontend decides to
track any L7 information or to track L4 information when an L7 condition
matches, then it is applied to all requests over the same connection even
if they don't match. This means that a rule such as :
tcp-request content track-sc0 src if { path /index.html }
will count one request for index.html, and another one for each of the
objects present on this page that are fetched over the same connection
which sent the initial matching request.
Worse, it is possible to make the code do stupid things by using multiple
counters:
tcp-request content track-sc0 src if { path /foo }
tcp-request content track-sc1 src if { path /bar }
Just sending two requests first, one with /foo, one with /bar, shows
twice the number of requests for all subsequent requests. Just because
both of them persist after the end of the request.
So the decision to flush backend-tracked counters was not the correct
one. In practice, what is important is to flush countent-based rules
since they are the ones evaluated for each request.
Doing so requires new flags in the session however, to keep track of
which stick-counter was tracked by what ruleset. A later change might
make this easier to maintain over time.
This bug is 1.5-specific, no backport to stable is needed.
show pools
Dump the status of internal memory pools. This is useful to track memory
usage when suspecting a memory leak for example. It does exactly the same
as the SIGQUIT when running in foreground except that it does not flush
the pools.
Just like the previous commit, we sometimes want to limit the rate of
incoming SSL connections. While it can be done for a frontend, it was
not possible for a whole process, which makes sense when multiple
processes are running on a system to server multiple customers.
The new global "maxsslrate" setting is usable to fix a limit on the
session rate going to the SSL frontends. The limits applies before
the SSL handshake and not after, so that it saves the SSL stack from
expensive key computations that would finally be aborted before being
accounted for.
The same setting may be changed at run time on the CLI using
"set rate-limit ssl-session global".
It's sometimes useful to be able to limit the connection rate on a machine
running many haproxy instances (eg: per customer) but it removes the ability
for that machine to defend itself against a DoS. Thus, better also provide a
limit on the session rate, which does not include the connections rejected by
"tcp-request connection" rules. This permits to have much higher limits on
the connection rate without having to raise the session rate limit to insane
values.
The limit can be changed on the CLI using "set rate-limit sessions global",
or in the global section using "maxsessrate".
In addition to previous outputs, we also emit the cumulated number of
connections, the cumulated number of requests, the maximum allowed
SSL connection concurrency, the current number of SSL connections and
the cumulated number of SSL connections. This will help troubleshoot
systems which experience memory shortage due to SSL.
If the string not start with a number, the converter fails. In other, it
converts a maximum of characters to a number and stop to the first
character that not match a number.
This is a regression introducted by the patches "MINOR: pattern: Each
pattern sets the expected input type" and "MEDIUM: acl: Last patch
change the output type". The expected value is SMP_T_CSTR in place of
SMP_T_UINT.
This bug impact all the acl using the parser "pat_parse_dotted_ver()".
The two acl are "req_ssl_ver()" and "req.ssl_ver()".
This is a recent bug, no backport is needed.
This function is used to compute the new polling state based on
the previous state. All pollers have to do this in their update
loop, so better centralize the logic for it.
Currently, each poll loop handles the polled events the same way,
resulting in a lot of duplicated, complex code. Additionally, epoll
was the only one to handle newly created FDs immediately.
So instead, let's move that code to fd.c in a new function dedicated
to this task : fd_process_polled_events(). All pollers now use this
function.
This is the reimplementation of the "done" action : when we experience
a short read, we're almost certain that we've exhausted the system's
buffers and that we'll meet an EAGAIN if we attempt to read again. If
the FD is not yet polled, the stream interface already takes care of
stopping the speculative read. When the FD is already being polled, we
have two options :
- either we're running from a level-triggered poller, in which case
we'd rather report that we've reached the end so that we don't
speculate over the poller and let it report next time data are
available ;
- or we're running from an edge-triggered poller in which case we
have no choice and have to see the EAGAIN to re-enable events.
At the moment we don't have any edge-triggered poller, so it's desirable
to avoid speculative I/O that we know will fail.
Note that this must not be ported to SSL since SSL hides the real
readiness of the file descriptor.
Thanks to this change, we observe no EAGAIN anymore during keep-alive
transfers, and failed recvfrom() are reduced by half in http-server-close
mode (the client-facing side is always being polled and the second recv
can be avoided). Doing so results in about 5% performance increase in
keep-alive mode. Similarly, we used to have up to about 1.6% of EAGAIN
on accept() (1/maxaccept), and these have completely disappeared under
high loads.
It's easier and safer to rely on conn_xprt_ready() everywhere than to
check the flag itself. It will also simplify adding extra checks later
if needed. Some useless controls for !xprt have been removed, as the
XPRT_READY flag itself guarantees xprt is set.
It's easier and safer to rely on conn_ctrl_ready() everywhere than to
check the flag itself. It will also simplify adding extra checks later
if needed. Some useless controls for !ctrl have been removed, as the
CTRL_READY flag itself guarantees ctrl is set.
We already have everything in the connection flags using the
CO_FL_DATA_*_ENA bits combined with the fd's ready state, so
we do not need to check fdtab[fd].ev anymore. This considerably
simplifies the connection handling logic since it doesn't
have to mix connection flags with past polling states.
We don't need to call fd_stop_both() since we already call
conn_cond_update_polling() which will do it. This call was introduced by
commit d29a066 ("BUG/MAJOR: connection: always recompute polling status
upon I/O").
We used to only update the polling flags in data phase, but after that
we could update other flags. It does not seem possible to trigger a
bug here but it's not very safe either. Better always keep them up to
date.
The recv/send callbacks must check for readiness themselves instead of
having their callers do it. This will strengthen the test and will also
ensure we never refrain from calling a handshake handler because a
direction is being polled while the other one is ready.
We simply remove these functions and replace their calls with the
appropriate ones :
- if we're in the data phase, we can simply report wait on the FD
- if we're in the socket phase, we may also have to signal the
desire to read/write on the socket because it might not be
active yet.
These flags were used to report the readiness of the file descriptor.
Now this readiness is directly checked at the file descriptor itself.
This removes the need for constantly synchronizing updates between the
file descriptor and the connection and ensures that all layers share
the same level of information.
For now, the readiness is updated in conn_{sock,data}_poll_* by directly
touching the file descriptor. This must move to the lower layers instead
so that these functions can disappear as well. In this state, the change
works but is incomplete. It's sensible enough to avoid making it more
complex.
Now the sock/data updates become much simpler because they just have to
enable/disable access to a file descriptor and not to care anymore about
its readiness.
This commit heavily changes the polling system in order to definitely
fix the frequent breakage of SSL which needs to remember the last
EAGAIN before deciding whether to poll or not. Now we have a state per
direction for each FD, as opposed to a previous and current state
previously. An FD can have up to 8 different states for each direction,
each of which being the result of a 3-bit combination. These 3 bits
indicate a wish to access the FD, the readiness of the FD and the
subscription of the FD to the polling system.
This means that it will now be possible to remember the state of a
file descriptor across disable/enable sequences that generally happen
during forwarding, where enabling reading on a previously disabled FD
would result in forgetting the EAGAIN flag it met last time.
Several new state manipulation functions have been introduced or
adapted :
- fd_want_{recv,send} : enable receiving/sending on the FD regardless
of its state (sets the ACTIVE flag) ;
- fd_stop_{recv,send} : stop receiving/sending on the FD regardless
of its state (clears the ACTIVE flag) ;
- fd_cant_{recv,send} : report a failure to receive/send on the FD
corresponding to EAGAIN (clears the READY flag) ;
- fd_may_{recv,send} : report the ability to receive/send on the FD
as reported by poll() (sets the READY flag) ;
Some functions are used to report the current FD status :
- fd_{recv,send}_active
- fd_{recv,send}_ready
- fd_{recv,send}_polled
Some functions were removed :
- fd_ev_clr(), fd_ev_set(), fd_ev_rem(), fd_ev_wai()
The POLLHUP/POLLERR flags are now reported as ready so that the I/O layers
knows it can try to access the file descriptor to get this information.
In order to simplify the conditions to add/remove cache entries, a new
function fd_alloc_or_release_cache_entry() was created to be used from
pollers while scanning for updates.
The following pollers have been updated :
ev_select() : done, built, tested on Linux 3.10
ev_poll() : done, built, tested on Linux 3.10
ev_epoll() : done, built, tested on Linux 3.10 & 3.13
ev_kqueue() : done, built, tested on OpenBSD 5.2
We're completely changing the way FDs will be polled. There will be no
more speculative I/O since we'll know the exact FD state, so these will
only be cached events.
First, let's fix a few field names which become confusing. "spec_e" was
used to store a speculative I/O event state. Now we'll store the whole
R/W states for the FD there. "spec_p" was used to store a speculative
I/O cache position. Now let's clearly call it "cache".
We're completely changing the way FDs will be polled. First, let's fix
a few field names which become confusing. "spec_e" was used to store a
speculative I/O event state. Now we'll store the whole R/W states for
the FD there.
Recent commit d7ad9f5 ("MAJOR: channel: add a new flag CF_WAKE_WRITE to
notify the task of writes") was not correct. It used to wake up the task
as soon as there was some write activity and the flag was set, even if there
were still some data to be forwarded. This resulted in process_session()
being called a lot when transfering chunk-encoded HTTP responses made of
very large chunks.
The purpose of the flag is to wake up only a task waiting for some
room and not the other ones, so it's totally counter-productive to
wake it up as long as there are data to forward because the task
will not be allowed to write anyway.
Also, the commit above was taking some risks by not considering
certain events anymore (eg: state != SI_ST_EST). While such events
are not used at the moment, if some new features were developped
in the future relying on these, it would be better that they could
be notified when subscribing to the WAKE_WRITE event, so let's
restore the condition.
Recent commit d7ad9f5 ("MAJOR: channel: add a new flag CF_WAKE_WRITE to
notify the task of writes") introduced this new CF_WAKE_WRITE flag that
an analyser which requires some free space to write must set if it wants
to be notified.
Unfortunately, some places were missing. More specifically, the
compression engine can rarely be stuck by a lack of output space,
especially when dealing with non-compressible data. It then has to
stop until some pending data are flushed and for this it must set
the CF_WAKE_WRITE flag. But these cases were missed by the commit
above.
Fortunately, this change was introduced very recently and never
released, so the impact was limited.
Huge thanks to Sander Klein who first reported this issue and who kindly
and patiently provided lots of traces and test data that made it possible
to reproduce, analyze, then fix this issue.
Patrick Hemmer reported that using unique_id_format and logs did not
report the same unique ID counter since commit 9f09521 ("BUG/MEDIUM:
unique_id: HTTP request counter must be unique!"). This is because
the increment was done while producing the log message, so it was
performed twice.
A better solution consists in fetching a new value once per request
and saving it in the request or session context for all of this
request's life.
It happens that sessions already have a unique ID field which is used
for debugging and reporting errors, and which differs from the one
sent in logs and unique_id header.
So let's change this to reuse this field to have coherent IDs everywhere.
As of now, a session gets a new unique ID once it is instanciated. This
means that TCP sessions will also benefit from a unique ID that can be
logged. And this ID is renewed for each extra HTTP request received on
an existing session. Thus, all TCP sessions and HTTP requests will have
distinct IDs that will be stable along all their life, and coherent
between all places where they're used (logs, unique_id header,
"show sess", "show errors").
This feature is 1.5-specific, no backport to 1.4 is needed.
The fetch "req.ssl_ver" is not declared as explicit acl. If it is used
as implicit ACL, the acl engine detect SMP_T_UINT output type and choose
to use the default interger parser: pat_parse_int(). This fetch needs the
parser pat_parse_dotted_ver().
This patch declare explicit ACL named "req.ssl_ver" that use the good
parser function pat_parse_dotted_ver().
Checks used not to precisely report the errors that were detected at the
connection layer (eg: too many SSL connections). Using chk_report_conn_err()
makes this possible.
Now when a connection error happens, it is reported in the connection
so that upper layers know exactly what happened. This is particularly
useful with health checks and resources exhaustion.
Actually the values returned by this function is never used. All the
callers just check if the resultat is non-zero. Before this patch, the
function returns the length of the produced content. This value is not
useful because is returned twice: the first time in the return value and
the second time in the <binstrlen> argument. Now the function returns
the number of bytes consumed from <source>.
The functions pat_parse_* must return 0 if fail and the number of
elements eated from **text if not fail. The function pat_parse_bin()
returns 0 or the length parsed. This causes a segfault. I just apply the
double operator "!" on the result of the function pat_parse_bin() and
the return value value match the expected value.
Now we can more safely rely on the connection state to decide how to
drain and what to do when data are drained. Callers don't need to
manipulate the file descriptor's state anymore.
Note that it also removes the need for the fix ea90063 ("BUG/MEDIUM:
stream-int: fix the keep-alive idle connection handler") since conn_drain()
correctly sets the polling flags.
When an incoming shutdown or error is detected, we know that we
can safely close without disabling lingering. Do it in tcp_drain()
so that we don't have to do it from each and every caller.
It was not possible to know if the drain() function had hit an
EAGAIN, so now we change the API of this function to return :
< 0 if EAGAIN was met
= 0 if some data remain
> 0 if a shutdown was received
The accept loop used to force fd_poll_recv() even in places where it
was not completely appropriate (eg: unexpected errors). It does not
yet cause trouble but will do with the upcoming polling changes. Let's
use it only where relevant now. EINTR/ECONNABORTED do not result in
poll() anymore but the failed connection is simply skipped (this code
dates from 1.1.32 when error codes were first considered).
Some rare unexplained busy loops were observed on versions up to 1.5-dev19.
It happens that if a file descriptor happens to be disabled for both read and
write while it was speculatively enabled for both and this without creating a
new update entry, there will be no way to remove it from the speculative I/O
list until some other changes occur. It is suspected that a double sequence
such as enable_both/disable_both could have led to this situation where an
update cancels itself and does not clear the spec list in the poll loop.
While it is unclear what I/O sequence may cause this situation to arise, it
is safer to always add the FD to the update list if nothing could be done on
it so that the next poll round will automatically take care of it.
This is 1.5-specific, no backport is needed.
Recent commit abf08d9 ("BUG/MAJOR: connection: fix mismatch between rcv_buf's
API and usage") accidentely broke SSL by relying on an uninitialized value to
enter the read loop.
Many thanks to Cyril Bonté and Steve Ruiz for reporting this issue.
The value of the variable "appctx->ctx.map.ent" is used after the loop,
but its value has changed. The variable "value" is initialized and
contains the good value.
This is a recent bug, no backport is needed.
Steve Ruiz reported some reproducible crashes with HTTP health checks
on a certain page returning a huge length. The traces he provided
clearly showed that the recv() call was performed twice for a total
size exceeding the buffer's length.
Cyril Bonté tracked down the problem to be caused by the full buffer
size being passed to rcv_buf() in event_srv_chk_r() instead of passing
just the remaining amount of space. Indeed, this change happened during
the connection rework in 1.5-dev13 with the following commit :
f150317 MAJOR: checks: completely use the connection transport layer
But one of the problems is also that the comments at the top of the
rcv_buf() functions suggest that the caller only has to ensure the
requested size doesn't overflow the buffer's size.
Also, these functions already have to care about the buffer's size to
handle wrapping free space when there are pending data in the buffer.
So let's change the API instead to more closely match what could be
expected from these functions :
- the caller asks for the maximum amount of bytes it wants to read ;
This means that only the caller is responsible for enforcing the
reserve if it wants to (eg: checks don't).
- the rcv_buf() functions fix their computations to always consider
this size as a max, and always perform validity checks based on
the buffer's free space.
As a result, the code is simplified and reduced, and made more robust
for callers which now just have to care about whether they want the
buffer to be filled or not.
Since the bug was introduced in 1.5-dev13, no backport to stable versions
is needed.
The accept4() Linux syscall requires _GNU_SOURCE on ix86, otherwise
it emits a warning. On other archs including x86_64, this problem
doesn't happen. Thanks to Charles Carter from Sigma Software for
reporting this.
If the pattern is set as case insensitive, the string comparison
is executed twice. The first time is insensitive comparison, the
second is sensitive.
This is a recent bug, no backport is needed.
A config where multiple servers have the same name in the same backend is
prone to a number of issues : logs are not really exploitable, stats get
really tricky and even harder to change, etc...
In fact, it can be safe to have the same name between multiple servers only
when their respective IDs are known and used. So now we detect this situation
and emit a warning for the first conflict detected per server if any of the
servers uses an automatic ID.
The code which enables tunnel mode or TCP transfers is rarely used
and at most once per session. Putting it in an unlikely() clause
reduces the length of the hot path of process_session() which is
already quite long, and also slightly reduces its overall size.
Some measurements show a steady gain of about 0.2% thanks to this.
This function is called twice per request, and does almost always nothing.
Better use an inline version to avoid entering it when we can.
About 0.5% additional performance was gained this way.
si_connect() used to only return SI_ST_CON. But it already detect the
connection reuse and is the function which avoids calling connect().
So it already knows the connection is valid and reuse. Thus we make it
return SI_ST_EST when a connection is reused. This means that
connect_server() can return this state and sess_update_stream_int()
as well.
Thanks to this change, we don't need to leave process_session() in
SI_ST_CON state to immediately enter it again to switch to SI_ST_EST.
Implementing this removes one call to process_session() per request
in keep-alive mode. We're now at 2 calls per request, which is the
minimum (one for the request and another one for the response). The
number of calls to http_wait_for_response() has also dropped from 2
to one.
Tests indicate a performance gain of about 2.6% in request rate in
keep-alive mode. There should be no gain in http-server-close() since
we don't use this faster path.
At the moment it is possible in sess_prepare_conn_req() to switch to the
established state when the target is an applet. But sess_update_stream_int()
will soon also have the ability to set the established state via
connect_server() when a connection is reused, leading to a synchronous
connect.
So prepare the code to handle this SI_ST_ASS -> SI_ST_EST transition, which
really matches what's done in the lower layers.
Currently there are 3 places in the code where t_connect is set after
switching to state SI_ST_EST, and a fourth one will soon come. Since
all these places lead to an immediate call to sess_establish() to
complete the session establishment, better move that measurement
there.
It's a bit hasardous to wipe out all channel flags, this flag should
be left intact as it protects against recursive calls. Fortunately,
we have no possibility to meet this situation with current applets,
but better fix it before it becomes an issue.
This bug has been there for a long time, but it doesn't seem worth
backporting the fix.
As soon as we connect to the server, we want to limit the number of
recvfrom() on the response path because most of the time a single
call will retrieve enough information.
At the moment this is only done in the HTTP response parser, after
some reads have already failed, which is too late. We need to do
that at the earliest possible instant. It was already done for the
request side by frontend_accept() for the first request, and by
http_reset_txn() for the next requests.
Thanks to this change, there are no more failed recvfrom() calls in
keep-alive mode.
Since commit 6b66f3e ([MAJOR] implement autonomous inter-socket forwarding)
introduced in 1.3.16-rc1, we've been relying on a stupid mechanism to wake
up the task after a write, which was an exact copy-paste of the reader side.
The principle was that if we empty a buffer and there's no forwarding
scheduled or if the *producer* is not in a connected state, then we wake
the task up.
That does not make any sense. It happens to wake up too late sometimes (eg,
when the request analyser waits for some room in the buffer to start to
work), and leads to unneeded wakeups in client-side keep-alive, because
the task is woken up when the response is sent, while the analysers are
simply waiting for a new request.
In order to fix this, we introduce a new channel flag : CF_WAKE_WRITE. It
is designed so that an analyser can explicitly request being notified when
some data were written. It is used only when the HTTP request or response
analysers need to wait for more room in the buffers. It is automatically
cleared upon wake up.
The flag is also automatically set by the functions which try to write into
a buffer from an applet when they fail (bi_putblk() etc...).
That allows us to remove the stupid condition above and avoid some wakeups.
In http-server-close and in http-keep-alive modes, this reduces from 4 to 3
the average number of wakeups per request, and increases the overall
performance by about 1.5%.
This reverts commit f3221f99ac.
Igor reported some very strange breakage of his stats page which is
clearly caused by the chunking, though I don't see at first glance
what could be wrong. Better revert it for now.
In theory the principle is simple as we just need to send HTTP chunks
if the client is 1.1 compatible. In practice it's harder because we
have to append a CR LF after each block of data and we're never sure
to have the room for this. In order not to have to deal with this, we
instead send the CR LF prior to each chunk size. The only issue is for
the first chunk and for this reason we avoid to send the empty header
line when using chunked encoding.
Since the applet rework and the removal of the inter-task applets,
we must not clear the stream-interface's owner task anymore otherwise
we risk a crash when maintaining keep-alive with an applet. This is
not possible right now so there is no impact yet, but this bug is not
easy to track down. No backport is needed.
When enabling a tracked server via the web interface, we must first
check if the server tracks another one and the state of this tracked
server, just like the command line does.
Failure to do so causes incorrect logs to be emitted when the server
is enabled :
[WARNING] 361/212556 (2645) : Server bck2/srv3 is DOWN via bck2/srv2. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 361/212603 (2645) : Server bck2/srv3 is DOWN for maintenance.
--> enable server now
[WARNING] 361/212606 (2645) : Server bck2/srv3 is UP (leaving maintenance).
With this fix, it's correct now :
[WARNING] 361/212805 (2666) : Server bck2/srv3 is DOWN via bck2/srv2. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 361/212813 (2666) : Server bck2/srv3 is DOWN for maintenance.
--> enable server now
[WARNING] 361/212821 (2666) : Server bck2/srv3 is DOWN via bck2/srv2. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
It does not seem necessary to backport this fix, considering that it
depends on extremely fragile behaviours, there are more risks of breakage
caused by a backport than the current inconvenience.
Recent fix 02541e8 (BUG/MEDIUM: checks: servers must not start in
slowstart mode) failed to consider one case : a server chich is not
checked at all can be disabled and has to support being enabled
again. So we must also enter the set_server_up() function when the
checks are totally disabled.
No backport is needed.
We used to unconditionally disable client-side polling after the client
has posted its request. The goal was to avoid subscribing the file
descriptor to the poller for nothing.
This is perfect for the HTTP close mode where we know we won't have to
read on the client side anymore. However, when keep-alive is maintained
with the client, this makes the situation worse. Indeed, after the first
response, we'll have to wait for the client to send a next request and
since this is never immediate, we'll certainly poll. So what happens is
that polling is enabled after a response and disabled after a request,
so the polling is constantly alternating, which is very expensive with
epoll_ctl().
The solution implemented in this patch consists in only disabling the
polling if the client-side is not in keep-alive mode. That way we have
the best of both worlds. In close, we really close, and in keep-alive,
we poll only once.
The performance gained by this change is important, with haproxy jumping
from 158kreq/s to 184kreq/s (+16%) in HTTP keep-alive mode on a machine
which at best does 222k/s in raw TCP mode.
With this patch and the previous one, a keep-alive run with a fast
enough server (or enough concurrent connections to cover the connect
time) does no epoll_ctl() anymore during a run of ab -k. The net
measured gain is 19%.
Compression is normally disabled on HTTP/1.0 since it does not
support chunked encoded responses. But the test was incomplete, and
Bertrand Jacquin reported a case where if the server responded using
1.1 to an 1.0 request, then haproxy still used to compress (and of
course the client could not understand the response).
No backport is needed, this is 1.5-specific.
In HTTP keep-alive mode, if we receive a 401, we still have a chance
of being able to send the visitor again to the same server over the
same connection. This is required by some broken protocols such as
NTLM, and anyway whenever there is an opportunity for sending the
challenge to the proper place, it's better to do it (at least it
helps with debugging).
If a server is switched to maintenance mode while a check is in progress,
the successful completion of the check must not switch it back up. This
is still a consequence of using the same function set_server_up() for
every state change. Bug reported by Igor at owind.
This fix should be backported to 1.4 which is affected as well.
This reverts commit 2f877304ef.
This commit is OK for clear text traffic but causes trouble with SSL
when buffers are smaller than SSL buffers. Since the issue it addresses
will be gone once the polling redesign is complete, there's no reason
for trying to workaround temporary inefficiencies. Better remove it.
If we reuse a server-side connection, we must not reinitialize its context nor
try to enable send_proxy. At the moment HTTP keep-alive over SSL fails on the
first attempt because the SSL context was cleared, so it only worked after a
retry.
Commit 2737562 (MEDIUM: stream-int: implement a very simplistic idle
connection manager) implemented an idle connection handler. In the
case where all data is drained from the server, it fails to disable
polling, resulting in a busy spinning loop.
Thanks to Sander Klein and Guillaume Castagnino for reporting this bug.
No backport is needed.
Idle connections are not monitored right now. So if a server closes after
a response without advertising it, it won't be detected until a next
request wants to use the connection. This is a bit problematic because
it unnecessarily maintains file descriptors and sockets in an idle
state.
This patch implements a very simple idle connection manager for the stream
interface. It presents itself as an I/O callback. The HTTP engine enables
it when it recycles a connection. If a close or an error is detected on the
underlying socket, it tries to drain as much data as possible from the socket,
detect the close and responds with a close as well, then detaches from the
stream interface.
In 1.5-dev20, commit bb9665e (BUG/MEDIUM: checks: ensure we can enable
a server after boot) tried to fix a side effect of having both regular
checks and agent checks condition the up state propagation to servers.
Unfortunately it was still not fine because after this fix, servers
which make use of slowstart start in this mode. We must not check
the agent's health if agent checks are not enabled, and likewise,
we must not check the regular check's health if they are not enabled.
Reading the code, it seems like we could avoid entering this function
at all if (s->state & SRV_RUNNING) is not satisfied. Let's reserve
this for a later patch if needed.
Thanks to Sander Klein for reporting this abnormal situation.
Since comit b805f71 (MEDIUM: sample: let the cast functions set their
output type), the output type of a fetch function is automatically
considered and passed to the next converter. A bug introduced in
1.5-dev9 with commit f853c46 (MEDIUM: pattern/acl: get rid of
temp_pattern in ACLs) was revealed by this last one : the output type
remained string instead of UINT, causing the cast function to try to
cast the contents and to crash on a NULL deref.
Note: this fix was made after a careful review of all fetch functions.
A few non-trivial ones had their comments amended to clearly indicate
the output type.
There are very few users of http_proxy, and all of them complain about
the same thing : the request is passed unmodified to the server (in its
proxy form), and it is not possible to fix it using reqrep rules because
http_proxy happens after.
So let's have http_proxy fix the URL it has analysed to get rid of the
scheme and the host part. This will do what users of this feature expect.
A null pointer assignment was missing after a free in commit 7148ce6 (MEDIUM:
pattern: Extract the index process from the pat_parse_*() functions), causing
a double free after loading a file of string patterns.
This bug was introduced in 1.5-dev20, no backport is needed.
Thanks to Sander Klein for reporting this bug and providing the config
needed to trigger it.
The memset() was put here to corrupt memory for a debugging test,
it's not needed anymore and was unfortunately committed. It does
not harm anyway, it probably just slightly affects performance.
On several browsers, the monospace font used to display numbers in tips
is not much readable. Since the numbers are aligned anyway, there is too
little benefit in using such a font.
In HTTP keep-alive, if we face a connection error to the server while sending
the request, the error should not be reported, and the client-side connection
should simply be closed, so that client knows it can retry. This can happen if
the server has too short a keep-alive timeout and quits at the same moment the
new request comes in.
When the load balancing algorithm in use is not deterministic, and a previous
request was sent to a server to which haproxy still holds a connection, it is
sometimes desirable that subsequent requests on a same session go to the same
server as much as possible. Note that this is different from persistence, as
we only indicate a preference which haproxy tries to apply without any form
of warranty. The real use is for keep-alive connections sent to servers. When
this option is used, haproxy will try to reuse the same connection that is
attached to the server instead of rebalancing to another server, causing a
close of the connection. This can make sense for static file servers. It does
not make much sense to use this in combination with hashing algorithms.
This commit allows an existing server-side connection to be reused if
it matches the same target. Basic controls are performed ; right now
we do not allow to reuse a connection when dynamic source binding is
in use or when the destination address or port is dynamic (eg: proxy
mode). Later we'll have to also disable connection sharing when PROXY
protocol is being used or when non-idempotent requests are processed.
When a connection to the server is complete, if the transaction
requests keep-alive mode, we don't shut the connection and we just
reinitialize the stream interface in order to be able to reuse the
connection afterwards.
Note that the server connection count is decremented, just like the
backend's, and that we still try to wake up waiters. But that makes
sense considering that we'll eventually be able to immediately pass
idle connections to waiters.
When allocating a new connection, only the caller knows whether it's
acceptable to reuse the previous one or not. Let's pass this information
to si_alloc_conn() which will do the cleanup if the connection is not
acceptable.
This new option enables HTTP keep-alive processing on the connections.
It can be overwritten by http-server-close, httpclose and forceclose.
Right now full-chain keep-alive is not yet implemented, but we need
the option to work on it. The doc will come later.
It's common to observe a an recv() call on the client side just after
the connect() to has been issued to the server side when running in
server close mode. The reason is that the whole request has been sent
and the shutw() has been queued in the channel, so the request message
switches to the MSG_CLOSED state, which didn't disable reading. Let's
do it now. That way the reading will only be re-enabled after the
response is transferred to the client. However if abortonclose is set,
we still leave it enabled.
strace shows a lot of EAGAIN on small response messages. This
is caused by the fact that the READ_DONTWAIT flag is not set
on response message, it's only there when we want to flush
pending data.
For small responses, it's a waste of CPU cycles to call recv()
for nothing since most of the time, everything we'll need will
be in the first response. Also, this will offer more opportunities
for using splice() to transfer data.
Right now we see many places doing their own setsockopt(SO_LINGER).
Better only do it just before the close() in fd_delete(). For this
we add a new flag on the file descriptor, indicating if it's safe or
not to linger. If not (eg: after a connect()), then the setsockopt()
call is automatically performed before a close().
The flag automatically turns to safe when receiving a read0.
conn_xprt_ready() reports if the transport layer is ready.
conn_ctrl_ready() reports if the control layer is ready.
The stream interface uses si_conn_ready() to report that the
underlying connection is ready. This will be used for connection
reuse in keep-alive mode.
Since the recent addition of map updates, haproxy does not build anymore
on Solaris because "s_addr" is a #define :
src/dumpstats.c: In function `stats_map_lookup':
src/dumpstats.c:4688: error: syntax error before '.' token
src/dumpstats.c:4781: error: `S_un' undeclared (first use in this function)
src/dumpstats.c:4781: error: (Each undeclared identifier is reported only once
src/dumpstats.c:4781: error: for each function it appears in.)
make: *** [src/dumpstats.o] Error 1
Simply rename the variable.
The is* macros must not use a char on Solaris. Unsigned char is OK.
Casting char to int is wrong as well since we get a negative value.
src/log.c: In function `parse_logformat_string':
src/log.c:454: warning: subscript has type `char'
Gcc 3.4 warns that mask may be used uninitialized in pattern.c. This
is wrong since it's used in the same condition as its assignment,
although it's not necessarily obvious for the compiler. Fix this by
initializing the value.
This was introduced by recent commit 01cdcd4a so no backport is needed.
Since recent commit f79c817 (MAJOR: connection: add two new flags to
indicate readiness of control/transport) and the surrounding commits,
the session initialization has been slightly delayed and the control
layer of the connection is not yet initialized when processing the
rules.
We need to move that minimal initialization a bit above.
The bug was introduced with latest changes, no backport is needed.
If a server is disabled in configuration and another one tracks it,
this last one must not inherit the MAINT flag otherwise it needs to
be explicitly enabled afterwards. Just remove this to fix the issue.
Since commit 58c3297 (MEDIUM: Set rise and fall of agent checks to 1),
due to a bogus condition, it became impossible to re-enable a server
that was disabled in the configuration if no agent was enabled. The
reason is that in this case, the agent's health was zero while the
condition expected it to be at least one to consider the action.
Let's fix this by only considering the health of checks that are enabled.
The agent is able to retrieve some weight information from the server
and will eventually be able to force the server into maintenance mode.
It doesn't seem logical to have it depend on the health check being
configured, as for some servers it might very well make sense to only
fetch the weight from the server's load regardless of the health.
So let's stop disabling the agent checks when health checks are disabled.
Till now, a configuration required at least one health check in the
whole config file to create the agent tasks. Now we start them even
if no health check is enabled.
Health checks can now be paused. This is the status they get when the
server is put into maintenance mode, which is more logical than relying
on the server's state at some places. It will be needed to allow agent
checks to run when health checks are disabled (currently not possible).
start_checks() only used to consider the health checks intervals to
compute the start interval, so if an agent had a faster check than
all health checks, it would be significantly delayed.
Having the check state partially stored in the server doesn't help.
Some functions such as srv_getinter() rely on the server being checked
to decide what check frequency to use, instead of relying on the check
being configured. So let's get rid of SRV_CHECKED and SRV_AGENT_CHECKED
and only use the check's states instead.
At the moment, health checks and agent checks are tied : no agent
check is emitted if no health check is enabled. Other parameters
are considered in the condition for letting checks run. It will
help us selectively enable checks (agent and regular checks) to be
know whether they're enabled/disabled and configured or not. Now
we can already emit an error when trying to enable an unconfigured
agent.
The flag CHK_STATE_RUNNING is misleading as one may believe it means
the state is enabled (just like SRV_RUNNING). Let's rename these two
flags CHK_ST_INPROGRESS and CHK_ST_DISABLED.
We used to have up to 4 sets of flags which were almost all exclusive
to report a check result. And the names were inherited from the old
server states, adding to the confusion. Let's replace that with an
enum handling only the possible combinations :
SRV_CHK_UNKNOWN => CHK_RES_UNKNOWN
SRV_CHK_FAILED => CHK_RES_FAILED
SRV_CHK_PASSED => CHK_RES_PASSED
SRV_CHK_PASSED | SRV_CHK_DISABLE => CHK_RES_CONDPASS
Server tracking uses the same "tracknext" list for servers tracking
another one and for the servers being tracked. This caused an issue
which was fixed by commit f39c71c ([CRITICAL] fix server state tracking:
it was O(n!) instead of O(n)), consisting in ensuring that a server is
being checked before walking down the list, so that we don't propagate
the up/down information via servers being part of the track chain.
But the root cause is the fact that all servers share the same list.
The correct solution consists in having a list head for the tracked
servers and a list of next tracking servers. This simplifies the
propagation logic, especially for the case where status changes might
be passed to individual servers via the CLI.
The get_trash_chunk() function is convenient and is sometimes used even
to get a temporary string. While the chunk is initialized, the string
may contain some random garbage that some code might retrieve if it uses
chunk->str directly without checking ->len. This is what happened in checks
after commit 25e2ab5 (MEDIUM: checks: centralize error reporting). It's not
easy to guess it at first so better pre-initialize the string with a zero.
It's becoming increasingly difficult to ignore unwanted function returns in
debug code with gcc. Now even when you try to work around it, it suggests a
way to write your code differently. For example :
src/frontend.c:187:65: warning: if statement has empty body [-Wempty-body]
if (write(1, trash.str, trash.len) < 0) /* shut gcc warning */;
^
src/frontend.c:187:65: note: put the semicolon on a separate line to silence this warning
1 warning generated.
This is totally unacceptable, this code already had to be written this way
to shut it up in earlier versions. And now it comments the form ? What's the
purpose of the C language if you can't write anymore the code that does what
you want ?
Emeric proposed to just keep a global variable to drain such useless results
so that gcc stops complaining all the time it believes people who write code
are monkeys. The solution is acceptable because the useless assignment is done
only in debug code so it will not impact performance. This patch implements
this, until gcc becomes even "smarter" to detect that we tried to cheat.
The ACL expression parser recently became a huge mess like a
spaghetti plate. The keyword is looked up at the beginning, then
sample fetches are processed, then an expression is initialized,
then arguments and converters are parsed but only if the keyword
was an ACL one, etc... Lots of "if" and redundant variables
everywhere making it hard to read and follow.
Let's move the args/conv parsing just after the keyword lookup.
At least now it's consistent that when we leave this if/else
statement, we have a sample expression initialized and full
parsed wherever the elements came from.
Just like for the last commit, we need to fix the ACL argument parser so
that it lets the lower layer do the job of referencing unresolved arguments
and correctly report the type of missing arguments.
Some errors may be reported about missing mandatory arguments when some
sample fetch arguments are marked as mandatory and implicit (eg: proxy
names such as in table_cnt or be_conn).
In practice the argument parser already handles all the situations very
well, it's just that the sample fetch parser want to go beyond its role
and starts some controls that it should not do. Simply removing these
useless controls lets make_arg_list() create the correct argument types
when such types are encountered.
This regression was introduced by the recent use of sample_parse_expr()
in ACLs which makes use of its own argument parser, while previously
the arguments were parsed in the ACL function itself. No backport is
needed.
Doing so ensures that we're consistent between all the functions in the whole
chain. This is important so that we can extract the argument parsing from this
function.
This patch adds map manipulation commands to the socket interface.
add map <map> <key> <value>
Add the value <value> in the map <map>, at the entry corresponding to
the key <key>. This command does not verify if the entry already
exists.
clear map <map>
Remove entries from the map <map>
del map <map> <key>
Delete all the map entries corresponding to the <key> value in the map
<map>.
set map <map> <key> <value>
Modify the value corresponding to each key <key> in a map <map>. The
new value is <value>.
show map [<map>]
Dump info about map converters. Without argument, the list of all
available maps are returned. If a <map> is specified, is content is
dumped.
We'll need to pass patterns on the CLI for lookups. Till now there was no
need for a backslash, so it's still time to support them just like in the
config file.
With this patch, patterns can be compiled for two modes :
- match
- lookup
The match mode is used for example in ACLs or maps. The lookup mode
is used to lookup a key for pattern maintenance. For example, looking
up a network is different from looking up one address belonging to
this network.
A special case is made for regex. In lookup mode they return the input
regex string and do not compile the regex.
Now, the pat_parse_*() functions parses the incoming data. The input
"pattern" struct can be preallocated. If the parser needs to add some
buffers, it allocates memory.
The function pattern_register() runs the call to the parser, process
the key indexation and associate the "sample_storage" used by maps.
This patch remove the compatibility check from the input type and the
match method. Now, it checks if a casts from the input type to output
type exists and the pattern_exec_match() function apply casts before
each pattern matching.
This is used later for increasing the compability with incoming
sample types. When multiple compatible types are supported, one
is arbitrarily used (eg: UINT).
Applying inet_pton() to input contents is not reliable because the
function requires a zero-terminated string. While inet_pton() will
stop when contents do not match an IPv6 address anymore, it could
theorically read past the end of a buffer if the data to be converted
was at the end of a buffer (this cannot happen right now thanks to
the reserve at the end of the buffer). At least the conversion does
not work.
Fix this by using buf2ip6() instead, which copies the string into a
padded aread.
This bug came with recent commit b805f71 (MEDIUM: sample: let the
cast functions set their output type), no backport is needed.
There is a mix-up between input type of the data and input type of the
map file. This mix-up causes that all pattern matching function based
on "string" (reg, beg, end, ...) don't run.
This bug came with commit d5f624d (MEDIUM: sample: add the "map" converter),
no backport is needed.
The agent refrains from reading the server's response until the server
closes, but if the server waits for the client to close, the response
is never read. Let's try to fetch a whole line before deciding to wait
more.
The function stktable_init() will return 0 if create_pool() returns NULL. Since
the returned value of this function is ignored, HAProxy will crash if the pool
of stick table is NULL and stksess_new() is called to allocate a new stick
session. It is a better choice to check the returned value and make HAProxy exit
with alert message if any error is caught.
Signed-off-by: Godbach <nylzhaowei@gmail.com>
The original codes are indented by spaces and not aligned with the former line.
It should be a convention to indent by tabs in HAProxy.
Signed-off-by: Godbach <nylzhaowei@gmail.com>
We must not report incomplete data if the buffer is not full, otherwise
we can abort some processing on the stats socket when dealing with massive
amounts of commands.
There is a compiler warning after commit 1b6e75fa84 ("MEDIUM: haproxy-
systemd-wrapper: Use haproxy in same directory"):
src/haproxy-systemd-wrapper.c: In function ‘locate_haproxy’:
src/haproxy-systemd-wrapper.c:28:10: warning: ignoring return value of ‘readlink’, declared with attribute warn_unused_result [-Wunused-result]
Fix the compiler warning by checking the return value of readlink().
SSL and keep-alive will need to be able to fail on allocation errors,
and the stream interface did not allow to report such a cause. The flag
will then be "RC" as already documented.
This reduces its size which is not reused by anything else. However it
will significantly improve the debugger's output since we'll now get
real state values.
The default case had to be enabled in the parsers because gcc tries
to optimize the switch/case and noticed some values were missing from
the enums and emitted a warning.
Here again we had some oversized and misaligned entries. The method
and the status don't need 4 bytes each, and there was a hole after
the status that does not exist anymore. That's 8 additional bytes
saved from http_txn and as much for the session.
Also some fields were slightly moved to present better memory access
patterns resulting in a steady 0.5% performance increase.
When dumping a session, it can be useful to know what applet it is
connected to instead of having just the appctx pointer. We also
report st0/st1/st2 to help debugging.
Currently, all states, all status codes and a few constants used in
the peers are all prefixed with "PEER_SESSION_". It's confusing because
there is no way to know which one is a state, a status code or anything
else. Thus, let's rename them this way :
PEER_SESS_ST_* : states
PEER_SESS_SC_* : status codes
Additionally the states have been numbered from zero and contigously.
This will allow us not to have to deal with the stream interface
initialization anymore and to ease debugging using enums.
Some applet users don't need to initialize their applet, they just want
to route the traffic there just as if it were a server. Since applets
are now connected to from session.c, let's simply ensure that when
connecting, the applet in si->end matches the target, and allocate
one there if it's not already done. In case of error, we force the
status code to resource and connection so that it's clear that it
happens because of a memory shortage.
From now on, a call to stream_int_register_handler() causes a call
to si_alloc_appctx() and returns an initialized appctx for the
current stream interface. If one was previously allocated, it is
released. If the stream interface was attached to a connection, it
is released as well.
The appctx are allocated from the same pools as the connections, because
they're substantially smaller in size, and we can't have both a connection
and an appctx on an interface at any moment.
In case of memory shortage, the call may return NULL, which is already
handled by all consumers of stream_int_register_handler().
The field appctx was removed from the stream interface since we only
rely on the endpoint now. On 32-bit, the stream_interface size went down
from 108 to 44 bytes. On 64-bit, it went down from 144 to 64 bytes. This
represents a memory saving of 160 bytes per session.
It seems that a later improvement could be to move the call to
stream_int_register_handler() to session.c for most cases.
The task returned by stream_int_register_handler() is never used, however we
always need to access the appctx afterwards. So make it return the appctx
instead. We already plan for it to fail, which is the reason for the addition
of a few tests and the possibility for the HTTP analyser to return a status
code 500.
We're about to remove si->appctx, so first let's replace all occurrences
of its usage with a dynamic extract from si->end. A lot of code was changed
by search-n-replace, but the behaviour was intentionally not altered.
The code surrounding calls to stream_int_register_handler() was slightly
changed since we can only use si->end *after* the registration.
We used to have two very similar functions for sending a PROXY protocol
line header. The reason is that the default one relies on the stream
interface to retrieve the other end's address, while the "local" one
performs a local address lookup and sends that instead (used by health
checks).
Now that the send_proxy_ofs is stored in the connection and not the
stream interface, we can make the local_send_proxy rely on it and
support partial sends. This also simplifies the code by removing the
local_send_proxy function, making health checks use send_proxy_ofs,
resulting in the removal of the CO_FL_LOCAL_SPROXY flag, and the
associated test in the connection handler. The other flag,
CO_FL_SI_SEND_PROXY was renamed without the "SI" part so that it
is clear that it is not dedicated anymore to a usage with a stream
interface.
Till now the send_proxy_ofs field remained in the stream interface,
but since the dynamic allocation of the connection, it makes a lot
of sense to move that into the connection instead of the stream
interface, since it will not be statically allocated for each
session.
Also, it turns out that moving it to the connection fils an alignment
hole on 64 bit architectures so it does not consume more memory, and
removing it from the stream interface was an opportunity to correctly
reorder fields and reduce the stream interface's size from 160 to 144
bytes (-10%). This is 32 bytes saved per session.
The outgoing connection is now allocated dynamically upon the first attempt
to touch the connection's source or destination address. If this allocation
fails, we fail on SN_ERR_RESOURCE.
As we didn't use si->conn anymore, it was removed. The endpoints are released
upon session_free(), on the error path, and upon a new transaction. That way
we are able to carry the existing server's address across retries.
The stream interfaces are not initialized anymore before session_complete(),
so we could even think about allocating them dynamically as well, though
that would not provide much savings.
The session initialization now makes use of conn_new()/conn_free(). This
slightly simplifies the code and makes it more logical. The connection
initialization code is now shorter by about 120 bytes because it's done
at once, allowing the compiler to remove all redundant initializations.
The si_attach_applet() function now takes care of first detaching the
existing endpoint, and it is called from stream_int_register_handler(),
so we can safely remove the calls to si_release_endpoint() in the
application code around this call.
A call to si_detach() was made upon stream_int_unregister_handler() to
ensure we always free the allocated connection if one was allocated in
parallel to setting an applet (eg: detect HTTP proxy while proceeding
with stats maybe).