Commit 5fddab0 ("OPTIM: stream_interface: disable reading when
CF_READ_DONTWAIT is set") improved the connection layer's efficiency
back in 1.5-dev13 by avoiding successive read attempts on an active
FD. But by disabling this on a polled FD, it causes an unpleasant
side effect which is that the FD that was subscribed to polling is
suddenly stopped and may need to be re-enabled once the kernel
starts to slow down on data eviction (eg: saturated server at the
other end, bursty traffic caused by too large maxpollevents).
This behaviour is observable with persistent connections when there
is a large enough connection count so that there's no data in the
early connection and polling is required, because there are then
up to 4 epoll_ctl() calls per request. It's important that the
server is slower than haproxy to cause some delays when reading
response.
The current connection layer as designed in 1.6 with the FD cache
doesn't require this trick anymore, though it still benefits from
it when it saves an FD from being uselessly polled. But compared
to the increased cost of enabling and disabling poll all the time,
it's still better to disable it. In some cases it's possible to
observe a performance increase as high as 30% by avoiding this
epoll_ctl() dance.
In the end we only want to disable it when the FD is speculatively
read and not when it's polled. For this we introduce a new function
__conn_data_done_recv() which is used to indicate that we're done
with recv() and not interested in new attempts. If/when we later
support event-triggered epoll, this function will have to change
a bit to do the same even in the polled case.
A quick test with keep-alive requests run on a dual-core / dual-
thread Atom shows a significant improvement :
single process, 0 bytes :
before: Requests per second: 12243.20 [#/sec] (mean)
after: Requests per second: 13354.54 [#/sec] (mean)
single process, 4k :
before: Requests per second: 9639.81 [#/sec] (mean)
after: Requests per second: 10991.89 [#/sec] (mean)
dual process, 0 bytes (unstable) :
before: Requests per second: 16900-19800 ~ 17600 [#/sec] (mean)
after: Requests per second: 18600-21400 ~ 20500 [#/sec] (mean)
It already returns an empty string when the field is empty, but as a
preventive measure we should do the same when the string itself is a
NULL. While it is not supposed to happen, it will make the code more
resistant against failed allocations and unexpected results.
This fix should be backported to 1.7.
Historically we used to have the stick counters processing put into
session.c which became stream.c. But a big part of it is now in
stick-table.c (eg: converters) but despite this we still have all
the sample fetch functions in stream.c
These parts do not depend on the stream anymore, so let's move the
remaining chunks to stick-table.c and have cleaner files.
What remains in stream.c is everything needed to attach/detach
trackers to the stream and to update the counters while the stream
is being processed.
There's no more reason to keep tcp rules processing inside proto_tcp.c
given that there is nothing in common there except these 3 letters : tcp.
The tcp rules are in fact connection, session and content processing rules.
Let's move them to "tcp-rules" and let them live their life there.
We used to have 3 types of counters with a huge overlap :
- listener counters : stats collected for each bind line
- proxy counters : union of the frontend and backend counters
- server counters : stats collected per server
It happens that quite a good part was common between listeners and
proxies due to the frontend counters being updated at the two locations,
and that similarly the server and proxy counters were overlapping and
being updated together.
This patch cleans this up to propose only two types of counters :
- fe_counters: used by frontends and listeners, related to
incoming connections activity
- be_counters: used by backends and servers, related to outgoing
connections activity
This allowed to remove some non-sensical counters from both parts. For
frontends, the following entries were removed :
cum_lbconn, last_sess, nbpend_max, failed_conns, failed_resp,
retries, redispatches, q_time, c_time, d_time, t_time
For backends, this ones was removed : intercepted_req.
While doing this it was discovered that we used to incorrectly report
intercepted_req for backends in the HTML stats, which was always zero
since it's never updated.
Also it revealed a few inconsistencies (which were not fixed as they
are harmless). For example, backends count connections (cum_conn)
instead of sessions while servers count sessions and not connections.
Over the long term, some extra cleanups may be performed by having
some counters update functions touching both the server and backend
at the same time, as well as both the frontend and listener, to
ensure that all sides have all their stats properly filled. The stats
dump will also be able to factor the dump functions by counter types.
Reinhard Vicinus reported that the reported average response times cannot
be larger than 16s due to the double multiply being performed by
swrate_add() which causes an overflow very quickly. Indeed, with N=512,
the highest average value is 16448.
One solution proposed by Reinhard is to turn to long long, but this
involves 64x64 multiplies and 64->32 divides, which are extremely
expensive on 32-bit platforms.
There is in fact another way to avoid the overflow without using larger
integers, it consists in avoiding the multiply using the fact that
x*(n-1)/N = x-(x/N).
Now it becomes possible to store average values as large as 8.4 millions,
which is around 2h18mn.
Interestingly, this improvement also makes the code cheaper to execute
both on 32 and on 64 bit platforms :
Before :
00000000 <swrate_add>:
0: 8b 54 24 04 mov 0x4(%esp),%edx
4: 8b 0a mov (%edx),%ecx
6: 89 c8 mov %ecx,%eax
8: c1 e0 09 shl $0x9,%eax
b: 29 c8 sub %ecx,%eax
d: 8b 4c 24 0c mov 0xc(%esp),%ecx
11: c1 e8 09 shr $0x9,%eax
14: 01 c8 add %ecx,%eax
16: 89 02 mov %eax,(%edx)
After :
00000020 <swrate_add>:
20: 8b 4c 24 04 mov 0x4(%esp),%ecx
24: 8b 44 24 0c mov 0xc(%esp),%eax
28: 8b 11 mov (%ecx),%edx
2a: 01 d0 add %edx,%eax
2c: 81 c2 ff 01 00 00 add $0x1ff,%edx
32: c1 ea 09 shr $0x9,%edx
35: 29 d0 sub %edx,%eax
37: 89 01 mov %eax,(%ecx)
This fix may be backported to 1.6.
When dealing with many proxies, it's hard to spot response errors because
all internet-facing frontends constantly receive attacks. This patch now
makes it possible to demand that only request or response errors are dumped
by appending "request" or "reponse" to the show errors command.
The function log format emit its own error message using Alert(). This
patch replaces this behavior and uses the standard HAProxy error system
(with memprintf).
The benefits are:
- cleaning the log system
- the logformat can ignore the caller (actually the caller must set
a flag designing the caller function).
- Make the usage of the logformat function easy for future components.
Commit 1866d6d ("MEDIUM: ssl: Add support for OpenSSL 1.1.0")
introduced support for openssl 1.1.0 and temporarily broke 0.9.8.
In the end the port was not very hard given that the only cause of
build failures were functions supposedly absent from 0.9.8 that in
fact did exist.
Thus, adding a new #if to move these functions for versions older
than 0.9.8 was enough to fix the trouble. It received very light
testing, basically only an SSL bridge decrypting and re-encrypting
traffic, and checking that everything looks right. That said, the
functions specific to 0.9.8 here compared to 1.0.x are only
SSL_SESSION_set1_id_context(), EVP_PKEY_base_id(), and
X509_PUBKEY_get0_param().
Until now, the function parse_logformat_string() never fails. It
send warnings when it parses bad format, and returns expression in
best effort.
This patch replaces warnings by alert and returns a fail code.
Maybe the warning mode is designed for a compatibility with old
configuration versions. If it is the case, now this compatibility
is broken.
[wt: no, the reason is that an alert must cause a startup failure,
but this will be OK with next patch]
The log-format function parse_logformat_string() takes file and line
for building parsing logs. These two parameters are embedded in the
struct proxy curproxy, which is the current parsing context.
This patch removes these two unused arguments.
Remove export of the fucntion parse_logformat_var_args() and
parse_logformat_var(). These functions are a part of the
logformat parser, and this export is useless.
We get this when Lua is disabled, just a missing include.
In file included from src/queue.c:18:0:
include/proto/server.h:51:39: warning: 'struct appctx' declared inside parameter list [enabled by default]
This way we don't have any more state specific to a given yieldable
command. The other commands should be easier to move as they only
involve a parser.
It really belongs to proto_http.c since it's a dump for HTTP request
and response errors. Note that it's possible that some parts do not
need to be exported anymore since it really is the only place where
errors are manipulated.
The table dump code was a horrible mess, with common parts interleaved
all the way to deal with the various actions (set/clear/show). A few
error messages were still incorrect, as the "set" operation did not
update them so they would still report "unknown action" (now fixed).
The action was now passed as a private argument to the CLI keyword
which itself is copied into the appctx private field. It's just an
int cast to a pointer.
Some minor issues were noticed while doing this, for example when dumping
an entry by key, if the key doesn't exist, nothing is printed, not even
the table's header. It's unclear whether this was intentional but it
doesn't really match what is done for data-based dumps. It was left
unchanged for now so that a later fix can be backported if needed.
Enum entries STAT_CLI_O_TAB, STAT_CLI_O_CLR and STAT_CLI_O_SET were
removed.
Move the "show info" command to stats.c using the CLI keyword API
to register it on the CLI. The stats_dump_info_to_buffer() function
is now static again. Note, we don't need proto_ssl anymore in cli.c.
Move the "show stat" command to stats.c using the CLI keyword API
to register it on the CLI. The stats_dump_stat_to_buffer() function
is now static again.
Move 'show sess' CLI functions to stream.c and use the cli keyword API
to register it on the CLI.
[wt: the choice of stream vs session makes sense because since 1.6 these
really are streams that we're dumping and not sessions anymore]
Several CLI commands require a frontend, so let's have a function to
look this one up and prepare the appropriate error message and the
appctx's state in case of failure.
Several CLI commands require a server, so let's have a function to
look this one up and prepare the appropriate error message and the
appctx's state in case of failure.
Move map and acl CLI functions to map.c and use the cli keyword API to
register actions on the CLI. Then remove the now unused individual
"add" and "del" keywords.
proto/dumpstats.h has been split in 4 files:
* proto/cli.h contains protypes for the CLI
* proto/stats.h contains prototypes for the stats
* types/cli.h contains definition for the CLI
* types/stats.h contains definition for the stats
These functions will be needed by "show sess" on the CLI, let's make them
globally available. It's important to note that due to the fact that we
still do not set the data and transport layers' names in the structures,
we still have to rely on some exports just to match the pointers. This is
ugly but is preferable to adding many includes since the short-term goal
is to get rid of these tests by having proper names in place.
uint16_t instead of u_int16_t
None ISO fields of struct tm are not present, but
by zeroyfing it, on GNU and BSD systems tm_gmtoff
field will be set.
[wt: moved the memset into each of the date functions]
Setting an FD to -1 when closed isn't the most easily noticeable thing
to do when we're chasing accidental reuse of a stale file descriptor.
Instead set it to that large a negative value that it will overflow the
fdtab and provide an analysable core at the moment the issue happens.
Care was taken to ensure it doesn't overflow nor change sign on 32-bit
machines when multiplied by fdtab, and that it also remains negative for
the various checks that exist. The value equals 0xFDDEADFD which happens
to be easily spotted in a debugger.
The bug described in commit 568743a ("BUG/MEDIUM: stream-int: completely
detach connection on connect error") was not a stream-interface layer bug
but a connection layer bug. There was exactly one place in the code where
we could change a file descriptor's status without first checking whether
it is valid or not, it was in conn_stop_polling(). This one is called when
the polling status is changed after an update, and calls fd_stop_both even
if we had already closed the file descriptor :
1479388298.484240 ->->->->-> conn_fd_handler > conn_cond_update_polling
1479388298.484240 ->->->->->-> conn_cond_update_polling > conn_stop_polling
1479388298.484241 ->->->->->->-> conn_stop_polling > conn_ctrl_ready
1479388298.484241 conn_stop_polling < conn_ctrl_ready
1479388298.484241 ->->->->->->-> conn_stop_polling > fd_stop_both
1479388298.484242 ->->->->->->->-> fd_stop_both > fd_update_cache
1479388298.484242 ->->->->->->->->-> fd_update_cache > fd_release_cache_entry
1479388298.484242 fd_update_cache < fd_release_cache_entry
1479388298.484243 fd_stop_both < fd_update_cache
1479388298.484243 conn_stop_polling < fd_stop_both
1479388298.484243 conn_cond_update_polling < conn_stop_polling
1479388298.484243 conn_fd_handler < conn_cond_update_polling
The problem with the previous fix above is that it break the http_proxy mode
and possibly even some Lua parts and peers to a certain extent ; all outgoing
connections where the target address is initially copied into the outgoing
connection which experience a retry would use a random outgoing address after
the retry because closing and detaching the connection causes the target
address to be lost. This was attempted to be addressed by commit 0857d7a
("BUG/MAJOR: stream: properly mark the server address as unset on connect
retry") but it used to only solve the most visible effect and not the root
cause.
Prior to this fix, it was possible to cause this config to keep CLOSE_WAIT
for as long as it takes to expire a client or server timeout (note the
missing client timeout) :
listen test
mode http
bind :8002
server s1 127.0.0.1:8001
$ tcploop 8001 L0 W N20 A R P100 S:"HTTP/1.1 200 OK\r\nContent-length: 0\r\n\r\n" &
$ tcploop 8002 N200 C T W S:"GET / HTTP/1.0\r\n\r\n" O P10000 K
With this patch, these CLOSE_WAIT properly vanish when both processes leave.
This commit reverts the two fixes above and replaces them with the proper
fix in connection.h. It must be backported to 1.6 and 1.5. Thanks to
Robson Roberto Souza Peixoto for providing very detailed traces showing
some obvious inconsistencies leading to finding this bug.
This pointer will be used for storing private context. With this,
the same executed function can handle more than one keyword. This
will be very useful for creation Lua cli bindings.
The release function is called when the command is terminated (give
back the hand to the prompt) or when the session is broken (timeout
or client closed).
Commit d7c9196 ("MAJOR: filters: Add filters support") removed sample.h
from proto_http.h, but it has become necessary as of commit fd7edd3
("MINOR: Move http method enum from proto_http to sample") in order
to have HTTP_METH_*. Due to this, the "debug/flags" utility doesn't
build anymore.
A new "option spop-check" statement has been added to enable server health
checks based on SPOP HELLO handshake. SPOP is the protocol used by SPOE filters
to talk to servers.
SPOE makes possible the communication with external components to retrieve some
info using an in-house binary protocol, the Stream Processing Offload Protocol
(SPOP). In the long term, its aim is to allow any kind of offloading on the
streams. This first version, besides being experimental, won't do lot of
things. The most important today is to validate the protocol design and lay the
foundations of what will, one day, be a full offload engine for the stream
processing.
So, for now, the SPOE can offload the stream processing before "tcp-request
content", "tcp-response content", "http-request" and "http-response" rules. And
it only supports variables creation/suppression. But, in spite of these limited
features, we can easily imagine to implement a SSO solution, an ip reputation
service or an ip geolocation service.
Internally, the SPOE is implemented as a filter. So, to use it, you must use
following line in a proxy proxy section:
frontend my-front
...
filter spoe [engine <name>] config <file>
...
It uses its own configuration file to keep the HAProxy configuration clean. It
is also a easy way to disable it by commenting out the filter line.
See "doc/SPOE.txt" for all details about the SPOE configuration.
It does the opposite of 'set-var' action/converter. It is really useful for
per-process variables. But, it can be used for any scope.
The lua function 'unset_var' has also been added.
Now it is possible to use variables attached to a process. The scope name is
'proc'. These variables are released only when HAProxy is stopped.
'tune.vars.proc-max-size' directive has been added to confiure the maximum
amount of memory used by "proc" variables. And because memory accounting is
hierachical for variables, memory for "proc" vars includes memory for "sess"
vars.
This function, unsurprisingly, sets a variable value only if it already
exists. In other words, this function will succeed only if the variable was
found somewhere in the configuration during HAProxy startup.
It will be used by SPOE filter. So an agent will be able to set a value only for
existing variables. This prevents an agent to create a very large number of
unused variables to flood HAProxy and exhaust the memory reserved to variables..
This code has been moved from haproxy.c to sample.c and the function
release_sample_expr can now be called from anywhere to release a sample
expression. This function will be used by the stream processing offload engine
(SPOE).
A scope is a section name between square bracket, alone on its line, ie:
[scope-name]
...
The spaces at the beginning and at the end of the line are skipped. Comments at
the end of the line are also skipped.
When a scope is parsed, its name is saved in the global variable
cfg_scope. Initially, cfg_scope is NULL and it remains NULL until a valid scope
line is parsed.
This feature remains unused in the HAProxy configuration file and
undocumented. However, it will be used during SPOE configuration parsing.
This feature will be used by the stream processing offload engine (SPOE) to
parse dedicated configuration files without mixing HAProxy sections with SPOE
sections.
So, here we can back up all sections known by HAProxy, unregister all of them
and add new ones, dedicted to the SPOE. Once the SPOE configuration file parsed,
we can roll back all changes by restoring HAProxy sections.
New callbacks have been added to handle creation and destruction of filter
instances:
* 'attach' callback is called after a filter instance creation, when it is
attached to a stream. This happens when the stream is started for filters
defined on the stream's frontend and when the backend is set for filters
declared on the stream's backend. It is possible to ignore the filter, if
needed, by returning 0. This could be useful to have conditional filtering.
* 'detach' callback is called when a filter instance is detached from a stream,
before its destruction. This happens when the stream is stopped for filters
defined on the stream's frontend and when the analyze ends for filters defined
on the stream's backend.
In addition, the callback 'stream_set_backend' has been added to know when a
backend is set for a stream. It is only called when the frontend and the backend
are not the same. And it is called for all filters attached to a stream
(frontend and backend).
Finally, the TRACE filter has been updated.
It is very common when validating a configuration out of production not to
have access to the same resolvers and to fail on server address resolution,
making it difficult to test a configuration. This option simply appends the
"none" method to the list of address resolution methods for all servers,
ensuring that even if the libc fails to resolve an address, the startup
sequence is not interrupted.
This will allow a server to automatically fall back to an explicit numeric
IP address when all other methods fail. The address is simply specified in
the address list.
This new setting supports a comma-delimited list of methods used to
resolve the server's FQDN to an IP address. Currently supported methods
are "libc" (use the regular libc's resolver) and "last" (use the last
known valid address found in the state file).
The list is implemented in a 32-bit integer, because each init-addr
method only requires 3 bits. The last one must always be SRV_IADDR_END
(0), allowing to store up to 10 methods in a single 32 bit integer.
Note: the doc is provided at the end of this series.
This adds new "hold" timers : nx, refused, timeout, other. This timers
will be used to tell HAProxy to keep an erroneous response as valid for
the corresponding period. For now they're only configured, not enforced.
It will be important to help debugging some DNS resolution issues to
know why a server was marked down, so let's make the function support
a 3rd argument with an indication of the reason. Passing NULL will keep
the message as-is.
This flag has to be set when an IP address resolution fails (either
using libc at start up or using HAProxy's runtime resolver). This will
automatically trigger the administrative status "MAINT", through the
global mask SRV_ADMF_MAINT.
Server addresses are not resolved anymore upon the first pass so that we
don't fail if an address cannot be resolved by the libc. Instead they are
processed all at once after the configuration is fully loaded, by the new
function srv_init_addr(). This function only acts on the server's address
if this address uses an FQDN, which appears in server->hostname.
For now the function does two things, to followup with HAProxy's historical
default behavior:
1. apply server IP address found in server-state file if runtime DNS
resolution is enabled for this server
2. use the DNS resolver provided by the libc
If none of the 2 options above can find an IP address, then an error is
returned.
All of this will be needed to support the new server parameter "init-addr".
For now, the biggest user-visible change is that all server resolution errors
are dumped at once instead of causing a startup failure one by one.
In the last release a lot of the structures have become opaque for an
end user. This means the code using these needs to be changed to use the
proper functions to interact with these structures instead of trying to
manipulate them directly.
This does not fix any deprecations yet that are part of 1.1.0, it only
ensures that it can be compiled against that version and is still
compatible with older ones.
[wt: openssl-0.9.8 doesn't build with it, there are conflicts on certain
function prototypes which we declare as inline here and which are
defined differently there. But openssl-0.9.8 is not supported anymore
so probably it's OK to go without it for now and we'll see later if
some users still need it. Emeric has reviewed this change and didn't
spot anything obvious which requires special care. Let's try it for
real now]
The only reason wurfl/wurfl.h was needed outside of wurfl.c was to expose
wurfl_handle which is a pointer to a structure, referenced by global.h.
By just storing a void* there instead, we can confine all wurfl code to
wurfl.c, which is really nice.
WURFL is a high-performance and low-memory footprint mobile device
detection software component that can quickly and accurately detect
over 500 capabilities of visiting devices. It can differentiate between
portable mobile devices, desktop devices, SmartTVs and any other types
of devices on which a web browser can be installed.
In order to add WURFL device detection support, you would need to
download Scientiamobile InFuze C API and install it on your system.
Refer to www.scientiamobile.com to obtain a valid InFuze license.
Any useful information on how to configure HAProxy working with WURFL
may be found in:
doc/WURFL-device-detection.txt
doc/configuration.txt
examples/wurfl-example.cfg
Please find more information about WURFL device detection API detection
at https://docs.scientiamobile.com/documentation/infuze/infuze-c-api-user-guide
Right now there is an issue with the way the maintenance flags are
propagated upon startup. They are not propagate, just copied from the
tracked server. This implies that depending on the server's order, some
tracking servers may not be marked down. For example this configuration
does not work as expected :
server s1 1.1.1.1:8000 track s2
server s2 1.1.1.1:8000 track s3
server s3 1.1.1.1:8000 track s4
server s4 wtap:8000 check inter 1s disabled
It results in s1/s2 being up, and s3/s4 being down, while all of them
should be down.
The only clean way to process this is to run through all "root" servers
(those not tracking any other server), and to propagate their state down
to all their trackers. This is the same algorithm used to propagate the
state changes. It has to be done both to compute the IDRAIN flag and the
IMAINT flag. However, doing so requires that tracking servers are not
marked as inherited maintenance anymore while parsing the configuration
(and given that it is wrong, better drop it).
This fix also addresses another side effect of the bug above which is
that the IDRAIN/IMAINT flags are stored in the state files, and if
restored while the tracked server doesn't have the equivalent flag,
the servers may end up in a situation where it's impossible to remove
these flags. For example in the configuration above, after removing
"disabled" on server s4, the other servers would have remained down,
and not anymore with this fix. Similarly, the combination of IMAINT
or IDRAIN with their respective forced modes was not accepted on
reload, which is wrong as well.
This bug has been present at least since 1.5, maybe even 1.4 (it came
with tracking support). The fix needs to be backported there, though
the srv-state parts are irrelevant.
This commit relies on previous patch to silence warnings on startup.
We used to have 7 different character classes, each was 256 bytes long,
resulting in almost 2kB being used in the L1 cache. It's as cheap to
test a bit than to check the byte is not null, so let's store a 7-bit
composite value and check for the respective bits there instead.
The executable is now 4 kB smaller and the performance on small
objects increased by about 1% to 222k requests/second with a config
involving 4 http-request rules including 1 header lookup, one header
replacement, and 2 variable assignments.
There's no reason to use the stream anymore, only the appctx should be
used by a peer. This was a leftover from the migration to appctx and it
caused some confusion, so let's totally drop it now. Note that half of
the patch are just comment updates.
For active servers, this is the sum of the eweights of all active
servers before this one in the backend, and
[srv->cumulative_weight .. srv_cumulative_weight + srv_eweight) is a
space occupied by this server in the range [0 .. lbprm.tot_wact), and
likewise for backup servers with tot_wbck. This allows choosing a
server or a range of servers proportional to their weight, by simple
integer comparison.
Signed-off-by: Andrew Rodland <andrewr@vimeo.com>
0 will mean no balancing occurs; otherwise it represents the ratio
between the highest-loaded server and the average load, times 100 (i.e.
a value of 150 means a 1.5x ratio), assuming equal weights.
Signed-off-by: Andrew Rodland <andrewr@vimeo.com>
This commit introduces "tcp-request session" rules. These are very
much like "tcp-request connection" rules except that they're processed
after the handshake, so it is possible to consider SSL information and
addresses rewritten by the proxy protocol header in actions. This is
particularly useful to track proxied sources as this was not possible
before, given that tcp-request content rules are processed after each
HTTP request. Similarly it is possible to assign the proxied source
address or the client's cert to a variable.
This is in order to make integration of tcp-request-session cleaner :
- tcp_exec_req_rules() was renamed tcp_exec_l4_rules()
- LI_O_TCP_RULES was renamed LI_O_TCP_L4_RULES
(LI_O_*'s horrible indent was also fixed and a provision was left
for L5 rules).
These are denied conns. Strangely this wasn't emitted while it used to be
available for a while. It corresponds to the number of connections blocked
by "tcp-request connection reject".
To register a new cli keyword, you need to declare a cli_kw_list
structure in your source file:
static struct cli_kw_list cli_kws = {{ },{
{ { "test", "list", NULL }, "test list : do some tests on the cli", test_parsing, NULL },
{ { NULL }, NULL, NULL, NULL, NULL }
}};
And then register it:
cli_register_kw(&cli_kws);
The first field is an array of 5 elements, where you declare the
keywords combination which will match, it must be ended by a NULL
element.
The second field is used as a usage message, it will appear in the help
of the cli, you can set it to NULL if you don't want to show it, it's a
good idea if you want to overwrite some existing keywords.
The two last fields are callbacks.
The first one is used at parsing time, you can use it to parse the
arguments of your keywords and print small messages. The function must
return 1 in case of a failure, otherwise 0:
#include <proto/dumpstats.h>
static int test_parsing(char **args, struct appctx *appctx)
{
struct chunk out;
if (!*args[2]) {
appctx->ctx.cli.msg = "Error: the 3rd argument is mandatory !";
appctx->st0 = STAT_CLI_PRINT;
return 1;
}
chunk_reset(&trash);
chunk_printf(&trash, "arg[3]: %s\n", args[2]);
chunk_init(&out, NULL, 0);
chunk_dup(&out, &trash);
appctx->ctx.cli.err = out.str;
appctx->st0 = STAT_CLI_PRINT_FREE; /* print and free in the default cli_io_handler */
return 0;
}
The last field is the IO handler callback, it can be set to NULL if you
want to use the default cli_io_handler() otherwise you can write your
own. You can use the private pointer in the appctx if you need to store
a context or some data. stats_dump_sess_to_buffer() is a good example of
IO handler, IO handlers often use the appctx->st2 variable for the state
machine. The handler must return 0 in case it have to be recall later
otherwise 1.
During the stick-table teaching process which occurs at reloading/restart time,
expiration dates of stick-tables entries were not synchronized between peers.
This patch adds two new stick-table messages to provide such a synchronization feature.
As these new messages are not supported by older haproxy peers protocol versions,
this patch increments peers protol version, from 2.0 to 2.1, to help in detecting/supporting
such older peers protocol implementations so that new versions might still be able
to transparently communicate with a newer one.
[wt: technically speaking it would be nice to have this backported into 1.6
as some people who reload often are affected by this design limitation, but
it's not a totally transparent change that may make certain users feel
reluctant to upgrade older versions. Let's let it cook in 1.7 first and
decide later]
With Linux officially introducing SO_REUSEPORT support in 3.9 and
its mainstream adoption we have seen more people running into strange
SO_REUSEPORT related issues (a process management issue turning into
hard to diagnose problems because the kernel load-balances between the
new and an obsolete haproxy instance).
Also some people simply want the guarantee that the bind fails when
the old process is still bound.
This change makes SO_REUSEPORT configurable, introducing the command
line argument "-dR" and the noreuseport configuration directive.
A backport to 1.6 should be considered.
To avoid issues when porting code to some architecture, we need to know
the endianess the structures are currently used.
This patch simply had a short notice before those structures to report
endianess and ease contributor's job.
New DNS response parser function which turn the DNS response from a
network buffer into a DNS structure, much easier for later analysis
by upper layer.
Memory is pre-allocated at start-up in a chunk dedicated to DNS
response store.
New error code to report a wrong number of queries in a DNS response.
struct dns_query_item: describes a DNS query record
struct dns_answer_item: describes a DNS answer record
struct dns_response_packet: describes a DNS response packet
DNS_MIN_RECORD_SIZE: minimal size of a DNS record
DNS_MAX_QUERY_RECORDS: maximum number of query records we allow.
For now, we send one DNS query per request.
DNS_MAX_ANSWER_RECORDS: maximum number of records we may found in a
response
WIP dns: new MAX values
Current implementation of HAProxy's DNS resolution expect only 512 bytes
of data in the response.
Update DNS_MAX_UDP_MESSAGE to match this.
Backport: can be backported to 1.6
This function can replace update_server_addr() where the need to change the
server's port as well as the IP address is required.
It performs some validation before performing each type of change.
Introduction of 3 new server flags to remember if some parameters were set
during configuration parsing.
* SRV_F_CHECKADDR: this server has a check addr configured
* SRV_F_CHECKPORT: this server has a check port configured
* SRV_F_AGENTADDR: this server has a agent addr configured
HAProxy used to deduce port used for health checks when parsing configuration
at startup time.
Because of this way of working, it makes it complicated to change the port at
run time.
The current patch changes this behavior and makes HAProxy to choose the
port used for health checking when preparing the check task itself.
A new type of error is introduced and reported when no port can be found.
There won't be any impact on performance, since the process to find out the
port value is made of a few 'if' statements.
This patch also introduces a new check state CHK_ST_PORT_MISS: this flag is
used to report an error in the case when HAProxy needs to establish a TCP
connection to a server, to perform a health check but no TCP ports can be
found for it.
And last, it also introduces a new stream termination condition:
SF_ERR_CHK_PORT. Purpose of this flag is to report an error in the event when
HAProxy has to run a health check but no port can be found to perform it.
Trie now uses a dataset structure just like Pattern, so this has been
defined in includes/types/global.h for both Pattern and Trie where it
was just Pattern.
In src/51d.c all functions used by the Trie implementation which need a
dataset as an argument now use the global dataset. The
fiftyoneDegreesDestroy method has now been replaced with
fiftyoneDegreesDataSetFree which is common to Pattern and Trie. In
addition, two extra dataset init status' have been added to the switch
statement in init_51degrees.
A few log format fields were declared but never used, so let's drop
them, the whole list is confusing enough already :
LOG_FMT_VARIABLE, LOG_FMT_T, LOG_FMT_CONN, LOG_FMT_QUEUES.
Tq is the time between the instant the connection is accepted and a
complete valid request is received. This time includes the handshake
(SSL / Proxy-Protocol), the idle when the browser does preconnect and
the request reception.
This patch decomposes %Tq in 3 measurements names %Th, %Ti, and %TR
which returns respectively the handshake time, the idle time and the
duration of valid request reception. It also adds %Ta which reports
the request's active time, which is the total time without %Th nor %Ti.
It replaces %Tt as the total time, reporting accurate measurements for
HTTP persistent connections.
%Th is avalaible for TCP and HTTP sessions, %Ti, %TR and %Ta are only
avalaible for HTTP connections.
In addition to this, we have new timestamps %tr, %trg and %trl, which
log the date of start of receipt of the request, respectively in the
default format, in GMT time and in local time (by analogy with %t, %T
and %Tl). All of them are obviously only available for HTTP. These values
are more relevant as they more accurately represent the request date
without being skewed by a browser's preconnect nor a keep-alive idle
time.
The HTTP log format and the CLF log format have been modified to
use %tr, %TR, and %Ta respectively instead of %t, %Tq and %Tt. This
way the default log formats now produce the expected output for users
who don't want to manually fiddle with the log-format directive.
Example with the following log-format :
log-format "%ci:%cp [%tr] %ft %b/%s h=%Th/i=%Ti/R=%TR/w=%Tw/c=%Tc/r=%Tr/a=%Ta/t=%Tt %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r"
The request was sent by hand using "openssl s_client -connect" :
Aug 23 14:43:20 haproxy[25446]: 127.0.0.1:45636 [23/Aug/2016:14:43:20.221] test~ test/test h=6/i=2375/R=261/w=0/c=1/r=0/a=262/t=2643 200 145 - - ---- 1/1/0/0/0 0/0 "GET / HTTP/1.1"
=> 6 ms of SSL handshake, 2375 waiting before sending the first char (in
fact the time to type the first line), 261 ms before the end of the request,
no time spent in queue, 1 ms spend connecting to the server, immediate
response, total active time for this request = 262ms. Total time from accept
to close : 2643 ms.
The timing now decomposes like this :
first request 2nd request
|<-------------------------------->|<-------------- ...
t tr t tr ...
---|----|----|----|----|----|----|----|----|--
: Th Ti TR Tw Tc Tr Td : Ti ...
:<---- Tq ---->: :
:<-------------- Tt -------------->:
:<--------- Ta --------->:
The function ipcpy() simply duplicates the IP address found in one
struct sockaddr_storage into an other struct sockaddr_storage.
It also update the family on the destination structure.
Memory of destination structure must be allocated and cleared by the
caller.
FreeBSD prefers to use IPPROTO_TCP over SOL_TCP, just like it does
with their *_IP counterparts. It's worth noting that there are a few
inconsistencies between SOL_TCP and IPPROTO_TCP in the code, eg on
TCP_QUICKACK. The two values are the same but it's worth applying
what implementations recommend.
No backport is needed, this was uncovered by the recent tcp_info stuff.
Recent commit 93b227d ("MINOR: listener: add the "accept-netscaler-cip"
option to the "bind" keyword") introduced an include of netinet/ip.h
which requires in_systm.h on OpenBSD. No backport is needed.
It is sometimes needed in application server environments to easily tell
if a source is local to the machine or a remote one, without necessarily
knowing all the local addresses (dhcp, vrrp, etc). Similarly in transparent
proxy configurations it is sometimes desired to tell the difference between
local and remote destination addresses.
This patch adds two new sample fetch functions for this :
dst_is_local : boolean
Returns true if the destination address of the incoming connection is local
to the system, or false if the address doesn't exist on the system, meaning
that it was intercepted in transparent mode. It can be useful to apply
certain rules by default to forwarded traffic and other rules to the traffic
targetting the real address of the machine. For example the stats page could
be delivered only on this address, or SSH access could be locally redirected.
Please note that the check involves a few system calls, so it's better to do
it only once per connection.
src_is_local : boolean
Returns true if the source address of the incoming connection is local to the
system, or false if the address doesn't exist on the system, meaning that it
comes from a remote machine. Note that UNIX addresses are considered local.
It can be useful to apply certain access restrictions based on where the
client comes from (eg: require auth or https for remote machines). Please
note that the check involves a few system calls, so it's better to do it only
once per connection.
At some places, smp_dup() is inappropriately called to ensure a modification
is possible while in fact we only need to ensure the sample may be modified
in place. Let's provide smp_is_rw() to check for this capability and
smp_make_rw() to perform the smp_dup() when it is not the case.
Note that smp_is_rw() will also try to add the trailing zero on strings when
needed if possible, to avoid a useless duplication.
These functions ensure that the designated sample is "safe for use",
which means that its size is known, its length is correct regarding its
size, and that strings are properly zero-terminated.
smp_is_safe() only checks (and optionally sets the trailing zero when
needed and possible). smp_make_safe() will call smp_dup() after
smp_is_safe() fails.
Vedran Furac reported a strange problem where the "base" sample fetch
would not always work for tracking purposes.
In fact, it happens that commit bc8c404 ("MAJOR: stick-tables: use sample
types in place of dedicated types") merged in 1.6 exposed a fundamental
bug related to the way samples use chunks as strings. The problem is that
chunks convey a base pointer, a length and an optional size, which may be
zero when unknown or when the chunk is allocated from a read-only location.
The sole purpose of this size is to know whether or not the chunk may be
appended new data. This size cause some semantics issue in the sample,
which has its own SMP_F_CONST flag to indicate read-only contents.
The problem was emphasized by the commit above because it made use of new
calls to smp_dup() to convert a sample to a table key. And since smp_dup()
would only check the SMP_F_CONST flag, it would happily return read-write
samples indicating size=0.
So some tests were added upon smp_dup() return to ensure that the actual
length is smaller than size, but this in fact made things even worse. For
example, the "sni" server directive does some bad stuff on many occasions
because it limits len to size-1 and effectively sets it to -1 and writes
the zero byte before the beginning of the string!
It is therefore obvious that smp_dup() needs to be modified to take this
nature of the chunks into account. It's not enough but is needed. The core
of the problem comes from the fact that smp_dup() is called for 5 distinct
needs which are not always fulfilled :
1) duplicate a sample to keep a copy of it during some operations
2) ensure that the sample is rewritable for a converter like upper()
3) ensure that the sample is terminated with a \0
4) set a correct size on the sample
5) grow the sample in case it was extracted from a partial chunk
Case 1 is not used for now, so we can ignore it. Case 2 indicates the wish
to modify the sample, so its R/O status must be removed if any, but there's
no implied requirement that the chunk becomes larger. Case 3 is used when
the sample has to be made compatible with libc's str* functions. There's no
need to make it R/W nor to duplicate it if it is already correct. Case 4
can happen when the sample's size is required (eg: before performing some
changes that must fit in the buffer). Case 5 is more or less similar but
will happen when the sample by be grown but we want to ensure we're not
bound by the current small size.
So the proposal is to have different functions for various operations. One
will ensure a sample is safe for use with str* functions. Another one will
ensure it may be rewritten in place. And smp_dup() will have to perform an
inconditional duplication to guarantee at least #5 above, and implicitly
all other ones.
This patch only modifies smp_dup() to make the duplication inconditional. It
is enough to fix both the "base" sample fetch and the "sni" server directive,
and all use cases in general though not always optimally. More patches will
follow to address them more optimally and even better than the current
situation (eg: avoid a dup just to add a \0 when possible).
The bug comes from an ambiguous design, so its roots are old. 1.6 is affected
and a backport is needed. In 1.5, the function already existed but was only
used by two converters modifying the data in place, so the bug has no effect
there.
Similar to "escape_chunk", this function tries to prefix all characters
tagged in the <map> with the <escape> character. The specified <string>
contains the input to be escaped.
This enables tracking of sticky counters from current response. The only
difference from "http-request track-sc" is the <key> sample expression
can only make use of samples in response (eg. res.*, status etc.) and
samples below Layer 6.
If an action wrapper stops the processing of the transaction
with a txn_done() function, the return code of the action is
"continue". So the continue can implies the processing of other
like adding headers. However, the HTTP content is flushed and
a segfault occurs.
This patchs add a flag indicating that the Lua code want to
stop the processing, ths flags is forwarded to the haproxy core,
and other actions are ignored.
Must be backported in 1.6
The function txn_done() ends a transaction. It does not make
sense to call this function from a lua sample-fetch wrapper,
because the role of a sample-fetch is not to terminate a
transaction.
This patch modify the role of the fucntion txn_done() if it
is called from a sample-fetch wrapper, now it just ends the
execution of the Lua code like the done() function.
Must be backported in 1.6
Alexander Lebedev reported that the response bit is set on SPARC when
DNS queries are sent. This has been tracked to the endianess issue, so
this patch makes the code portable.
Signed-off-by: Nenad Merdanovic <nmerdan@anine.io>
Alexander Lebedev reported that the DNS parser crashes in 1.6 with a bus
error on Sparc when it receives a response. This is obviously caused by
some alignment issues. The issue can also be reproduced on ARMv5 when
setting /proc/cpu/alignment to 4 (which helps debugging).
Two places cause this crash in turn, the first one is when the IP address
from the packet is compared to the current one, and the second place is
when the address is assigned because an unaligned address is passed to
update_server_addr().
This patch modifies these places to properly use memcpy() and memcmp()
to manipulate the unaligned data.
Nenad Merdanovic found another set of places specific to 1.7 in functions
in_net_ipv4() and in_net_ipv6(), which are used to compare networks. 1.6
has the functions but does not use them. There we perform a temporary copy
to a local variable to fix the problem. The type of the function's argument
is wrong since it's not necessarily aligned, so we change it for a const
void * instead.
This fix must be backported to 1.6. Note that in 1.6 the code is slightly
different, there's no rec[] array, the pointer is used directly from the
buffer.
Originally, tcphdr's source and dest from Linux were used to get the
source and port which led to a build issue on BSD oses.
To avoid side problems related to network then we just use an internal
struct as we need only those two fields.
When realloc fails to allocate memory, the original pointer is not
freed. Sometime people override the original pointer with the pointer
returned by realloc which is NULL in case of failure. This results
in a memory leak because the memory pointed by the original pointer
cannot be freed.
This configures the client-facing connection to receive a NetScaler
Client IP insertion protocol header before any byte is read from the
socket. This is equivalent to having the "accept-netscaler-cip" keyword
on the "bind" line, except that using the TCP rule allows the PROXY
protocol to be accepted only for certain IP address ranges using an ACL.
This is convenient when multiple layers of load balancers are passed
through by traffic coming from public hosts.
When NetScaler application switch is used as L3+ switch, informations
regarding the original IP and TCP headers are lost as a new TCP
connection is created between the NetScaler and the backend server.
NetScaler provides a feature to insert in the TCP data the original data
that can then be consumed by the backend server.
Specifications and documentations from NetScaler:
https://support.citrix.com/article/CTX205670https://www.citrix.com/blogs/2016/04/25/how-to-enable-client-ip-in-tcpip-option-of-netscaler/
When CIP is enabled on the NetScaler, then a TCP packet is inserted just after
the TCP handshake. This is composed as:
- CIP magic number : 4 bytes
Both sender and receiver have to agree on a magic number so that
they both handle the incoming data as a NetScaler Client IP insertion
packet.
- Header length : 4 bytes
Defines the length on the remaining data.
- IP header : >= 20 bytes if IPv4, 40 bytes if IPv6
Contains the header of the last IP packet sent by the client during TCP
handshake.
- TCP header : >= 20 bytes
Contains the header of the last TCP packet sent by the client during TCP
handshake.
The previous dump algorithm was not trying to yield when the buffer is
full, it's not a problem with the TLS_TICKETS_NO which is 3 by default
but it can become one if the buffer size is lowered and if the
TLS_TICKETS_NO is increased.
The index of the latest ticket dumped is now stored to ensure we can
resume the dump after a yield.
The function stats_tlskeys_list() can meet an undefined behavior when
called with appctx->st2 == STAT_ST_LIST, indeed the ref pointer is used
uninitialized.
However this function was using NULL in appctx->ctx.tlskeys.ref as a
flag to dump every tickets from every references. A real flag
appctx->ctx.tlskeys.dump_all is now used for this behavior.
This patch delete the 'ref' variable and use appctx->ctx.tlskeys.ref
directly.
The 'set-src' action was not available for tcp actions The action code
has been converted into a function in proto_tcp.c to be used for both
'http-request' and 'tcp-request connection' actions.
Both http and tcp keywords are registered in proto_tcp.c
htonll()/ntohll() already exist on Solaris 11 with a different declaration,
causing a build error as reported by Jonathan Fisher. They used to exist on
OSX with a #define which allowed us to detect them. It was a bad idea to give
these functions a name subject to conflicts like this. Simply rename them
my_htonll()/my_ntohll() to definitely get rid of the conflict.
This patch must be backported to 1.6.
DNS requests (using the internal resolver) are corrupted since commit
e2f8497716 ("BUG/MINOR: dns: fix DNS header definition").
Fix it by defining the struct in network byte order, while complying
with RFC 2535, section 6.1.
First reported by Eduard Vopicka on discourse.
This must be backported to 1.6 (1.6.5 is affected).
Commit 108b1dd ("MEDIUM: http: configurable http result codes for
http-request deny") introduced in 1.6-dev2 was incomplete. It introduced
a new field "rule_deny_status" into struct http_txn, which is filled only
by actions "http-request deny" and "http-request tarpit". It's then used
in the deny code path to emit the proper error message, but is used
uninitialized when the deny comes from a "reqdeny" rule, causing random
behaviours ranging from returning a 200, an empty response, or crashing
the process. Often upon startup only 200 was returned but after the fields
are used the crash happens. This can be sped up using -dM.
There's no need at all for storing this status in the http_txn struct
anyway since it's used immediately after being set. Let's store it in
a temporary variable instead which is passed as an argument to function
http_req_get_intercept_rule().
As an extra benefit, removing it from struct http_txn reduced the size
of this struct by 8 bytes.
This fix must be backported to 1.6 where the bug was detected. Special
thanks to Falco Schmutz for his detailed report including an exploitable
core and a reproducer.
When compiled with GCC 6, the IP address specified for a frontend was
ignored and HAProxy was listening on all addresses instead. This is
caused by an incomplete copy of a "struct sockaddr_storage".
With the GNU Libc, "struct sockaddr_storage" is defined as this:
struct sockaddr_storage
{
sa_family_t ss_family;
unsigned long int __ss_align;
char __ss_padding[(128 - (2 * sizeof (unsigned long int)))];
};
Doing an aggregate copy (ss1 = ss2) is different than using memcpy():
only members of the aggregate have to be copied. Notably, padding can be
or not be copied. In GCC 6, some optimizations use this fact and if a
"struct sockaddr_storage" contains a "struct sockaddr_in", the port and
the address are part of the padding (between sa_family and __ss_align)
and can be not copied over.
Therefore, we replace any aggregate copy by a memcpy(). There is another
place using the same pattern. We also fix a function receiving a "struct
sockaddr_storage" by copy instead of by reference. Since it only needs a
read-only copy, the function is converted to request a reference.
'channel_analyze' callback has been removed. Now, there are 2 callbacks to
surround calls to analyzers:
* channel_pre_analyze: Called BEFORE all filterable analyzers. it can be
called many times for the same analyzer, once at each loop until the
analyzer finishes its processing. This callback is resumable, it returns a
negative value if an error occurs, 0 if it needs to wait, any other value
otherwise.
* channel_post_analyze: Called AFTER all filterable analyzers. Here, AFTER
means when an analyzer finishes its processing. This callback is NOT
resumable, it returns a negative value if an error occurs, any other value
otherwise.
Pre and post analyzer callbacks are not automatically called. 'pre_analyzers'
and 'post_analyzers' bit fields in the filter structure must be set to the right
value using AN_* flags (see include/types/channel.h).
The flag AN_RES_ALL has been added (AN_REQ_ALL already exists) to ease the life
of filter developers. AN_REQ_ALL and AN_RES_ALL include all filterable
analyzers.
Now, to call an analyzer in 'process_stream' function, we should use
FLT_ANALAYZE or ANALYZE macros, depending if this is a filterable analyzer or
not.
Instead of calling 'channel_analyze' callback with the flag AN_FLT_HTTP_HDRS,
now we use the new callback 'http_headers'. This change is done because
'channel_analyze' callback will be removed in a next commit.
As suggested by Pavlos, it's too bad that we didn't have a %Td log
format tag given that there are a few mentions of Td corresponding
to the data transmission time already in the doc, so this is now done.
Just like the other specifiers, we report -1 if the connection failed
before reaching the data transmission state.
int list_append_word(struct list *li, const char *str, char **err)
Append a copy of string <str> (inside a wordlist) at the end of
the list <li>.
The caller is responsible for freeing the <err> and <str> copy memory
area using free().
On failure : return 0 and <err> filled with an error message.
Conforming to RFC 2535, section 6.1. This is not an important bug as
those fields don't seem to be set to something else than 0 and to be
checked on answers.
This is the same issue as "show servers state", where the result is incorrect
it the data can't fit in one buffer. The similar fix is applied, to restart
the data processing where it stopped as buffers are sent to the client.
This fix should be backported to haproxy 1.6
It was reported that the unix socket command "show servers state" returned an
empty response while "show servers state <backend>" worked.
In fact, both cases can reproduce the issue. It happens when the response can't
fit in one buffer.
The fix consists in processing the response in several steps, as it is done in
some others commands, by restarting where it was stopped after the buffer is
sent to the client.
This fix should be backported to haproxy 1.6
In 1.4-dev3, commit 31971e5 ("[MEDIUM] add support for infinite forwarding")
made it possible to configure the lower layer to forward data indefinitely
by setting the forward size to CHN_INFINITE_FORWARD (4GB-1). By then larger
chunk sizes were not supported so there was no confusion in the usage of the
function.
Since 1.5 we support 64-bit content-lengths and chunk sizes and the function
has grown to support 64-bit arguments, though it still limits a single pass
to 32-bit quantities (what fit in the channel's to_forward field). The issue
now becomes that a 4GB-1 content-length can be confused with infinite
forwarding (in fact it's 4GB-1+what was already in the buffer). It causes a
visible effect when transferring this exact size because the transfer rate
is lower than with other sizes due in part to the disabling of the Nagle
algorithm on the sendto() call.
In theory with keep-alive it should prevent a second request from being
processed after such a transfer, but since the analysers are still present,
the forwarding analyser properly counts down the remaining size to transfer
and ultimately the transaction gets correctly reset so there is no visible
effect.
Since the root cause of the issue is an API problem (lack of distinction
between a real valid length and a magic value), this patch modifies the API
to have a new dedicated function called channel_forward_forever() to program
a permanent forwarding. The existing function __channel_forward() was modified
to properly take care of the requested sizes and ensure it 1) never overflows
and 2) never reaches CHN_INFINITE_FORWARD by accident.
It is worth noting that the function used to have a bug causing a 2GB
forward to be scheduled if it was called with less data than what is present
in buf->i. Fortunately this bug couldn't be triggered with existing code.
This fix should be backported to 1.6 and 1.5. While it also theorically
affects 1.4, it's better not to backport it there, as the risk of breaking
large object transfers due to significant API differences is high, compared
to the fact that the largest supported objects (4GB-1) are just slower to
transfer.
Unfortunately, commit 169c470 ("BUG/MEDIUM: channel: fix miscalculation of
available buffer space (3rd try)") was still not enough to completely
address the issue. It fell into an integer comparison trap. Contrary to
expectations, chn->to_forward may also have the sign bit set when
forwarding regular data having a large content-length, resulting in
an incomplete check of the result and of the reserve because the with
to_forward very large, to_forward+o could become very small and also
the reserve could become positive again and make channel_recv_limit()
return a negative value.
One way to reproduce this situation is to transfer a large file (> 2GB)
with http-keep-alive or http-server-close, without splicing, and ensure
that the server uses content-length instead of chunks. The transfer
should stall very early after the first buffer has been transferred
to the client.
This fix now properly checks 1) for an overflow caused by summing o and
to_forward, and 2) for o+to_forward being smaller or larger than maxrw
before performing the subtract, so that all sensitive operations are
properly performed on 33-bit arithmetics.
The code was subjected again to a series of tests using inject+httpterm
scanning a wide range of object sizes (+10MB after each new request) :
$ printf "new page 1\nget 127.0.0.1:8002 / s=%%s0m\n" | \
inject64 -o 1 -u 1 -f /dev/stdin
With previous fix, the transfer would suddenly stop when reaching 2GB :
hits ^hits hits/s ^h/s bytes kB/s last errs tout htime sdht ptime
203 1 2 1 216816173354 2710202 3144892 0 0 685.0 0.0 685.0
205 2 2 2 219257283186 2706880 2441109 0 0 679.5 6.5 679.5
205 0 2 0 219257283186 2673836 0 0 0 0.0 0.0 0.0
205 0 2 0 219257283186 2641622 0 0 0 0.0 0.0 0.0
205 0 2 0 219257283186 2610174 0 0 0 0.0 0.0 0.0
Now it's fine even past 4 GB.
Many thanks to Vedran Furac for reporting this issue early with a common
access pattern helping to troubleshoot this.
This fix must be backported to 1.6 and 1.5 where the commit above was
already backported.
This function returns non-zero if the channel is congested with data in
transit waiting for leaving, indicating to the caller that it should wait
for the reserve to be released before starting to process new data in
case it needs the ability to append data. This is meant to be used while
waiting for a clean response buffer before processing a request.
Add opaque data between the filter keyword registrering and the parsing
function. This opaque data allow to use the same parser with differents
registered keywords. The opaque data is used for giving data which mainly
makes difference between the two keywords.
It will be used with Lua keywords registering.
This is very useful in complex architecture systems where HAproxy
is balancing DB connections for example. We want to keep the maxconn
high in order to avoid issues with queueing on the LB level when
there is slowness on another part of the system. Example is a case of
an architecture where each thread opens multiple DB connections, which
if get stuck in queue cause a snowball effect (old connections aren't
closed, new ones cannot be established). These connections are mostly
idle and the DB server has no problem handling thousands of them.
Allowing us to dynamically set maxconn depending on the backend usage
(LA, CPU, memory, etc.) enables us to have high maxconn for situations
like above, but lowering it in case there are real issues where the
backend servers become overloaded (cache issues, DB gets hit hard).
Latest fix 8a32106 ("BUG/MEDIUM: channel: fix miscalculation of available
buffer space (2nd try)") did happen to fix some observable issues but not
all of them in fact, some corner cases still remained and at least one user
reported a busy loop that appeared possible, though not easily reproducible
under experimental conditions.
The remaining issue is that we still consider min(i, to_fwd) as the number
of bytes in transit, but in fact <i> is not relevant here. Indeed, what
matters is that we can read everything we want at once provided that at
the end, <i> cannot be larger than <size-maxrw> (if it was not already).
This is visible in two cases :
- let's have i=o=max/2 and to_fwd=0. Then i+o >= max indicates that the
buffer is already full, while it is not since once <o> is forwarded,
some space remains.
- when to_fwd is much larger than i, it's obvious that we can fill the
buffer.
The only relevant part in fact is o + to_fwd. to_fwd will ensure that at
least this many bytes will be moved from <i> to <o> hence will leave the
buffer, whatever the number of rounds it takes.
Interestingly, the fix applied here ensures that channel_recv_max() will
now equal (size - maxrw - i + to_fwd), which is indeed what remains
available below maxrw after to_fwd bytes are forwarded from i to o and
leave the buffer.
Additionally, the latest fix made it possible to meet an integer overflow
that was not caught by the range test when forwarding in TCP or tunnel
mode due to to_forward being added to an existing value, causing the
buffer size to be limited when it should not have been, resulting in 2
to 3 recv() calls when a single one was enough. The first one was limited
to the unreserved buffer size, the second one to the size of the reserve
minus 1, and the last one to the last byte. Eg with a 2kB buffer :
recvfrom(22, "HTTP/1.1 200\r\nConnection: close\r"..., 1024, 0, NULL, NULL) = 1024
recvfrom(22, "23456789.123456789.123456789.123"..., 1023, 0, NULL, NULL) = 1023
recvfrom(22, "5", 1, 0, NULL, NULL) = 1
This bug is still present in 1.6 and 1.5 so the fix should be backported
there.
The condition to poll for receive as implemented in channel_may_recv()
is still incorrect. If buf->o is null and buf->i is slightly larger than
chn->to_forward and at least as large as buf->size - maxrewrite, then
reading will be disabled. It may slightly delay some data delivery by
having first to forward pending bytes, but may also cause some random
issues with analysers that wait for some data before starting to forward
what they correctly parsed. For instance, a body analyser may be prevented
from seeing the data that only fits in the reserve.
This bug may also prevent an applet's chk_rcv() function from being called
when part of a buffer is released. It is possible (though not verified)
that this participated to some peers frozen session issues some people
have been facing.
This fix should be backported to 1.6 and 1.5 to ensure better coherency
with channel_recv_limit().
Commit 9c06ee4 ("BUG/MEDIUM: channel: don't schedule data in transit for
leaving until connected") took care of an issue involving POST in conjunction
with http-send-name-header, where we absolutely never want to touch the
reserve until we're sure not to touch the buffer contents anymore, which
is indicated by the output stream-interface being connected.
But channel_may_recv() was not equipped with such a test, so in some
situations it might decide that it is possible to poll for reads, and
later channel_recv_limit() will decide it's not possible to read,
causing a loop. So we must add a similar test there.
Since the fix above was backported to 1.6 and 1.5, this fix must as well.
There's quite some inconsistency in the internal API. listener_accept()
which is the main accept() function returns void but is declared as int
in the include file. It's assigned to proto->accept() for all stream
protocols where an int is expected but the result is never checked (nor
is it documented by the way). This proto->accept() is in turn assigned
to fd->iocb() which is supposed to return an int composed of FD_WAIT_*
flags, but which is never checked either.
So let's fix all this mess :
- nobody checks accept()'s return
- nobody checks iocb()'s return
- nobody sets a return value
=> let's mark all these functions void and keep the current ones intact.
Additionally we now include listener.h from listener.c to ensure we won't
silently hide this incoherency in the future.
Note that this patch could/should be backported to 1.6 and even 1.5 to
simplify debugging sessions.
Commit 999f643 ("BUG/MEDIUM: channel: fix miscalculation of available buffer
space.") introduced a bug which made output data to be ignored when computing
the remaining room in a buffer. The problem is that channel_may_recv()
properly considers them and may declare that the FD may be polled for read
events, but once the even strikes, channel_recv_limit() called before recv()
says the opposite. In 1.6 and later this case is automatically caught by
polling loop detection at the connection level and is harmless. But the
backport in 1.5 ends up with a busy polling loop as soon as it becomes
possible to have a buffer with this conflict. In order to reproduce it, it
is necessary to have less than [maxrewrite] bytes available in a buffer, no
forwarding enabled (end of transfer) and [buf->o >= maxrewrite - free space].
Since this heavily depends on socket buffers, it will randomly strike users.
On 1.5 with 8kB buffers it was possible to reproduce it with httpterm using
the following command line :
$ (printf "GET /?s=675000 HTTP/1.0\r\n\r\n"; sleep 60) | \
nc6 --rcvbuf-size 1 --send-only 127.0.0.1 8002
This bug is only medium in 1.6 and later but is major in the 1.5 backport,
so it must be backported there.
Thanks to Nenad Merdanovic and Janusz Dziemidowicz for reporting this issue
with enough elements to help understand it.
There are two issues with error captures. The first one is that the
capture size is still hard-coded to BUFSIZE regardless of any possible
tune.bufsize setting and of the fact that frontends only capture request
errors and that backends only capture response errors. The second is that
captures are allocated in both directions for all proxies, which start to
count a lot in configs using thousands of proxies.
This patch changes this so that error captures are allocated only when
needed, and of the proper size. It also refrains from dumping a buffer
that was not allocated, which still allows to emit all relevant info
such as flags and HTTP states. This way it is possible to save up to
32 kB of RAM per proxy in the default configuration.
This patch splits the function stats_dump_be_stats() in two parts. The
part is called stats_fill_be_stats(), and just fill the stats buffer.
This split allows the usage of preformated stats in other parts of HAProxy
like the Lua.
This patch splits the function stats_dump_sv_stats() in two parts. The
extracted part is called stats_fill_sv_stats(), and just fill the stats buffer.
This split allows the usage of preformated stats in other parts of HAProxy
like the Lua.
This patch splits the function stats_dump_li_stats() in two parts. The
extracted part is called stats_fill_li_stats(), and just fill the stats buffer.
This split allows the usage of preformated stats in other parts of HAProxy
like the Lua.
This patch splits the function stats_dump_fe_stats() in two parts. The
extracted part is called stats_fill_fe_stats(), and just fill the stats buffer.
This split allows the usage of preformated stats in other parts of HAProxy
like the Lua.
This patch splits the function stats_dump_info_to_buffer() in two parts. The
extracted part is called stats_fill_info(), and just fill the stats buffer.
This split allows the usage of preformated stats in other parts of HAProxy
like the Lua.
This patch adds a Lua post initialisation wrapper. It already exists for
pure Lua function, now it executes also C. It is useful for doing things
when the configuration is ready to use. For example we can can browse and
register all the proxies.
All the HAProxy Lua object are declared with the same pattern:
- Add the function __tosting which dumps the object name
- Register the name in the Lua REGISTRY
- Register the reference ID
These action are refactored in on function. This remove some
lines of code.
The functions
- hlua_class_const_int()
- hlua_class_const_str()
- hlua_class_function()
are use for common class registration actions.
The function 'hlua_dump_object()' is generic dump name function.
These functions can be used by all the HAProxy objects, so I move
it into the safe functions file.
Avoiding harmful memcpy call if the allocation failed.
Resetting the size which avoids further harmful freeing
invalid pointer. Closer to the comment behavior description.
The strftime() function can call tzset() internally on some platforms.
When haproxy is chrooted, the /etc/localtime file is not found, and some
implementations will clobber the content of the current timezone.
The GMT offset is computed by diffing the times returned by gmtime_r() and
localtime_r(). These variants are guaranteed to not call tzset() and were
already used in haproxy while chrooted, so they should be safe.
This patch must be backported to 1.6 and 1.5.
GMT offset used in local time formats was computed at startup, but was not updated when DST status changed while running.
For example these two RFC5424 syslog traces where emitted 5 seconds apart, just before and after DST changed:
<14>1 2016-03-27T01:59:58+01:00 bunch-VirtualBox haproxy 2098 - - Connect ...
<14>1 2016-03-27T03:00:03+01:00 bunch-VirtualBox haproxy 2098 - - Connect ...
It looked like they were emitted more than 1 hour apart, unlike with the fix:
<14>1 2016-03-27T01:59:58+01:00 bunch-VirtualBox haproxy 3381 - - Connect ...
<14>1 2016-03-27T03:00:03+02:00 bunch-VirtualBox haproxy 3381 - - Connect ...
This patch should be backported to 1.6 and partially to 1.5 (no fix needed in log.c).
This emits the field positions, names and types. It is more convenient
than the default output for a parser that doesn't know all the fields. It
simply relies on stats_emit_typed_data_field() and stats_emit_field_tags()
added by previous patch for the output. A new stats format flag was added,
STAT_FMT_TYPED, which is set when the "typed" keyword is specified on the
CLI.
New function stats_emit_typed_data_field() does exactly like
stats_emit_raw_data_field() except that it also prints the data
type after a colon. This will be used to print using the typed
format.
And function stats_emit_field_tags() appends a 3-letter code
describing the origin, nature, and scope, followed by an optional
delimiter. This will be particularly convenient to dump typed
data.
It's easier to have a new flag in <flags> to indicate whether or not we
want to display the admin column in HTML dumps. We already have similar
flags to show the version or the legends.
We're preparing for various data types for each stats field as they
appear in the CSV output. For now we only cover the regular types handled
by printf, so we have 32 and 64 bit ints and counters, strings, and of
course "empty" to indicate that there's nothing in the field and which
guarantees that any accessed entry will return 0.
More types will surely come later so that some fields are properly
represented. For example, we could see limits where only the value 0
doesn't show up, or human time, etc.
This is the continuation of previous patch called "BUG/MAJOR: samples:
check smp->strm before using it".
It happens that variables may have a session-wide scope, and that their
session is retrieved by dereferencing the stream. But nothing prevents them
from being used from a streamless context such as tcp-request connection,
thus crashing the process. Example :
tcp-request connection accept if { src,set-var(sess.foo) -m found }
In order to fix this, we have to always ensure that variable manipulation
only happens via the sample, which contains the correct owner and context,
and that we never use one from a different source. This results in quite a
large change since a lot of functions are inderctly involved in the call
chain, but the change is easy to follow.
This fix must be backported to 1.6, and requires the last two patches.
Since commit 6879ad3 ("MEDIUM: sample: fill the struct sample with the
session, proxy and stream pointers") merged in 1.6-dev2, the sample
contains the pointer to the stream and sample fetch functions as well
as converters use it heavily. This requires from a lot of call places
to initialize 4 fields, and it was even forgotten at a few places.
This patch provides a convenient helper to initialize all these fields
at once, making it easy to prepare a new sample from a previous one for
example.
A few call places were cleaned up to make use of it. It will be needed
by further fixes.
At one place in the Lua code, it was moved earlier because we used to
call sample casts with a non completely initialized sample, which is
not clean eventhough at the moment there are no consequences.
Since commit 6879ad3 ("MEDIUM: sample: fill the struct sample with the
session, proxy and stream pointers") merged in 1.6-dev2, the sample
contains the pointer to the stream and sample fetch functions as well
as converters use it heavily.
The problem is that earlier commit 87b0966 ("REORG/MAJOR: session:
rename the "session" entity to "stream"") had split the session and
stream resulting in the possibility for smp->strm to be NULL before
the stream was initialized. This is what happens in tcp-request
connection rulesets, as discovered by Baptiste.
The sample fetch functions must now check that smp->strm is valid
before using it. An alternative could consist in using a dummy stream
with nothing in it to avoid some checks but it would only result in
deferring them to the next step anyway, and making it harder to detect
that a stream is valid or the dummy one.
There is still an issue with variables which requires a complete
independant fix. They use strm->sess to find the session with strm
possibly NULL and passed as an argument. All call places indirectly
use smp->strm to build strm. So the problem is there but the API needs
to be changed to remove this duplicate argument that makes it much
harder to know what pointer to use.
This fix must be backported to 1.6, as well as the next one fixing
variables.
The recent addition of "show env" on the CLI has revealed an interesting
design bug. Chunks are supposed to support a negative length to indicate
that they carry no data. chunk_printf() sets this size to -1 if the string
is too large for the buffer. At a few places in the http engine we may end
up with trash.len = -1. But bi_putchk(), chunk_appendf() and a few other
chunks consumers don't consider this case as possible and will use such a
chunk, possibly restoring an invalid string or trying to copy -1 bytes.
This fix takes care of clarifying the situation in a backportable way
where such sizes are used, so that a negative length indicating an error
remains present until the chunk is reinitialized or overwritten. But a
cleaner design adjustment needs to be done so that there's a clear contract
on how to use these chunks. At first glance it doesn't seem *that* useful
to support negative sizes, so probably this is what should change.
This fix must be backported to 1.6 and 1.5.
the function server_parse_addr_change_request() contain an hardcoded
updater source "stats command". this function can be called from other
sources than the "stats command", so this patch make this argument
generic.
The commit 87b096 renames the functions srv_shutdown_backup_sessions()
and srv_shutdown_sessions() to srv_shutdown_backup_streams() and
srv_shutdown_streams().
The header file <proto/servers.h> does not repport these changes.
This bug should be repported in the 1.6 branch, even if it is useless
because new dev are frozen.
This patch introduces a configurable connection timeout for mailers
with a new "timeout mail <time>" directive.
Acked-by: Simon Horman <horms@verge.net.au>
This options prioritize th choice of an ip address matching a network. This is
useful with clouds to prefer a local ip. In some cases, a cloud high
avalailibility service can be announced with many ip addresses on many
differents datacenters. The latency between datacenter is not negligible, so
this patch permitsto prefers a local datacenter. If none address matchs the
configured network, another address is selected.
DNS selection preferences are actually declared inline in the
struct server. There are copied from the server struct to the
dns_resolution struct for each resolution.
Next patchs adds new preferences options, and it is not a good
way to copy all the configuration information before each dns
resolution.
This patch extract the configuration preference from the struct
server and declares a new dedicated struct. Only a pointer to this
new striuict will be copied before each dns resolution.
Concat object is based on "luaL_Buffer". The luaL_Buffer documentation says:
During its normal operation, a string buffer uses a variable number of stack
slots. So, while using a buffer, you cannot assume that you know where the
top of the stack is. You can use the stack between successive calls to buffer
operations as long as that use is balanced; that is, when you call a buffer
operation, the stack is at the same level it was immediately after the
previous buffer operation. (The only exception to this rule is
luaL_addvalue.) After calling luaL_pushresult the stack is back to its level
when the buffer was initialized, plus the final string on its top.
So, the stack cannot be manipulated between the first call at the function
"luaL_buffinit()" and the last call to the function "luaL_pushresult()" because
we cannot known the stack status.
In other way, the memory used by these functions seems to be collected by GC, so
if the GC is triggered during the usage of the Concat object, it can be used
some released memory.
This patch rewrite the Concat class without the "luaL_Buffer" system. It uses
"userdata()" forr the memory allocation of the buffer strings.
This allows the tcp connection to send multiple SYN packets, so 1 lost
packet does not cause the mail to be lost. It changes the socket timeout
from 2 to 10 seconds, this allows for 3 syn packets to be send and
waiting a little for their reply.
This patch should be backported to 1.6.
Acked-by: Simon Horman <horms@verge.net.au>
Using environment variables in configuration files can make troubleshooting
complicated because there's no easy way to verify that the variables are
correct. This patch introduces a new "show env" command which displays the
whole environment on the CLI, one variable per line.
The socket must at least have level operator to display the environment.
The +E mode escapes characters '"', '\' and ']' with '\' as prefix. It
mostly makes sense to use it in the RFC5424 structured-data log formats.
Example:
log-format-sd %{+Q,+E}o\ [exampleSDID@1234\ header=%[capture.req.hdr(0)]]
This patch moves the function hlua_checkudata which check that
an object contains the expected class_reference as metatable.
This function is commonly used by all the lua functions.
The function hlua_metatype is also moved.
This parser takes a string containing an HTTP date. It returns
a broken-down time struct. We must considers considers this
time as GMT. Maybe later the timezone will be taken in account.
When Lua executes functions from its API, these can throws an error.
These function must be executed in a special environment which catch
these error, otherwise a critical error (like segfault) can raise.
This patch add a c file called "hlua_fcn.c" which collect all the
Lua/c function needing safe environment for its execution.
Now, filter's configuration (.id, .conf and .ops fields) is stored in the
structure 'flt_conf'. So proxies own a flt_conf list instead of a filter
list. When a filter is attached to a stream, it gets a pointer on its
configuration. This avoids mixing the filter's context (owns by a stream) and
its configuration (owns by a proxy). It also saves 2 pointers per filter
instance.
Now, http_parse_chunk_size and http_skip_chunk_crlf return the number of bytes
parsed on success. http_skip_chunk_crlf does not use msg->sol anymore.
On the other hand, http_forward_trailers is unchanged. It returns >0 if the end
of trailers is reached and 0 if not. In all cases (except if an error is
encountered), msg->sol contains the length of the last parsed part of the
trailer headers.
Internal doc and comments about msg->sol has been updated accordingly.
Before, functions to filter HTTP body (and TCP data) were called from the moment
at least one filter was attached to the stream. If no filter is interested by
these data, this uselessly slows data parsing.
A good example is the HTTP compression filter. Depending of request and response
headers, the response compression can be enabled or not. So it could be really
nice to call it only when enabled.
So, now, to filter HTTP/TCP data, a filter must use the function
register_data_filter. For TCP streams, this function can be called only
once. But for HTTP streams, when needed, it must be called for each HTTP request
or HTTP response.
Only registered filters will be called during data parsing. At any time, a
filter can be unregistered by calling the function unregister_data_filter.
From the stream point of view, this new structure is opaque. it hides filters
implementation details. So, impact for future optimizations will be reduced
(well, we hope so...).
Some small improvements has been made in filters.c to avoid useless checks.
This new analyzer will be called for each HTTP request/response, before the
parsing of the body. It is identified by AN_FLT_HTTP_HDRS.
Special care was taken about the following condition :
* the frontend is a TCP proxy
* filters are defined in the frontend section
* the selected backend is a HTTP proxy
So, this patch explicitly add AN_FLT_HTTP_HDRS analyzer on the request and the
response channels when the backend is a HTTP proxy and when there are filters
attatched on the stream.
This patch simplifies http_request_forward_body and http_response_forward_body
functions.
For Chunked HTTP request/response, the body filtering can be really
expensive. In the worse case (many chunks of 1 bytes), the filters overhead is
of 3 calls per chunk. If http_data callback is useful, others are just
informative.
So these callbacks has been removed. Of course, existing filters (trace and
compression) has beeen updated accordingly. For the HTTP compression filter, the
update is quite huge. Its implementation is closer to the old one.
When no filter is attached to the stream, the CPU footprint due to the calls to
filters_* functions is huge, especially for chunk-encoded messages. Using macros
to check if we have some filters or not is a great improvement.
Furthermore, instead of checking the filter list emptiness, we introduce a flag
to know if filters are attached or not to a stream.
HTTP compression has been rewritten to use the filter API. This is more a PoC
than other thing for now. It allocates memory to work. So, if only for that, it
should be rewritten.
In the mean time, the implementation has been refactored to allow its use with
other filters. However, there are limitations that should be respected:
- No filter placed after the compression one is allowed to change input data
(in 'http_data' callback).
- No filter placed before the compression one is allowed to change forwarded
data (in 'http_forward_data' callback).
For now, these limitations are informal, so you should be careful when you use
several filters.
About the configuration, 'compression' keywords are still supported and must be
used to configure the HTTP compression behavior. In absence of a 'filter' line
for the compression filter, it is added in the filter chain when the first
compression' line is parsed. This is an easy way to do when you do not use other
filters. But another filter exists, an error is reported so that the user must
explicitly declare the filter.
For example:
listen tst
...
compression algo gzip
compression offload
...
filter flt_1
filter compression
filter flt_2
...
HTTP compression will be moved in a true filter. To prepare the ground, some
functions have been moved in a dedicated file. Idea is to keep everything about
compression algos in compression.c and everything related to the filtering in
flt_http_comp.c.
For now, a header has been added to help during the transition. It will be
removed later.
Unused empty ACL keyword list was removed. The "compression" keyword
parser was moved from cfgparse.c to flt_http_comp.c.
This patch adds the support of filters in HAProxy. The main idea is to have a
way to "easely" extend HAProxy by adding some "modules", called filters, that
will be able to change HAProxy behavior in a programmatic way.
To do so, many entry points has been added in code to let filters to hook up to
different steps of the processing. A filter must define a flt_ops sutrctures
(see include/types/filters.h for details). This structure contains all available
callbacks that a filter can define:
struct flt_ops {
/*
* Callbacks to manage the filter lifecycle
*/
int (*init) (struct proxy *p);
void (*deinit)(struct proxy *p);
int (*check) (struct proxy *p);
/*
* Stream callbacks
*/
void (*stream_start) (struct stream *s);
void (*stream_accept) (struct stream *s);
void (*session_establish)(struct stream *s);
void (*stream_stop) (struct stream *s);
/*
* HTTP callbacks
*/
int (*http_start) (struct stream *s, struct http_msg *msg);
int (*http_start_body) (struct stream *s, struct http_msg *msg);
int (*http_start_chunk) (struct stream *s, struct http_msg *msg);
int (*http_data) (struct stream *s, struct http_msg *msg);
int (*http_last_chunk) (struct stream *s, struct http_msg *msg);
int (*http_end_chunk) (struct stream *s, struct http_msg *msg);
int (*http_chunk_trailers)(struct stream *s, struct http_msg *msg);
int (*http_end_body) (struct stream *s, struct http_msg *msg);
void (*http_end) (struct stream *s, struct http_msg *msg);
void (*http_reset) (struct stream *s, struct http_msg *msg);
int (*http_pre_process) (struct stream *s, struct http_msg *msg);
int (*http_post_process) (struct stream *s, struct http_msg *msg);
void (*http_reply) (struct stream *s, short status,
const struct chunk *msg);
};
To declare and use a filter, in the configuration, the "filter" keyword must be
used in a listener/frontend section:
frontend test
...
filter <FILTER-NAME> [OPTIONS...]
The filter referenced by the <FILTER-NAME> must declare a configuration parser
on its own name to fill flt_ops and filter_conf field in the proxy's
structure. An exemple will be provided later to make it perfectly clear.
For now, filters cannot be used in backend section. But this is only a matter of
time. Documentation will also be added later. This is the first commit of a long
list about filters.
It is possible to have several filters on the same listener/frontend. These
filters are stored in an array of at most MAX_FILTERS elements (define in
include/types/filters.h). Again, this will be replaced later by a list of
filters.
The filter API has been highly refactored. Main changes are:
* Now, HA supports an infinite number of filters per proxy. To do so, filters
are stored in list.
* Because filters are stored in list, filters state has been moved from the
channel structure to the filter structure. This is cleaner because there is no
more info about filters in channel structure.
* It is possible to defined filters on backends only. For such filters,
stream_start/stream_stop callbacks are not called. Of course, it is possible
to mix frontend and backend filters.
* Now, TCP streams are also filtered. All callbacks without the 'http_' prefix
are called for all kind of streams. In addition, 2 new callbacks were added to
filter data exchanged through a TCP stream:
- tcp_data: it is called when new data are available or when old unprocessed
data are still waiting.
- tcp_forward_data: it is called when some data can be consumed.
* New callbacks attached to channel were added:
- channel_start_analyze: it is called when a filter is ready to process data
exchanged through a channel. 2 new analyzers (a frontend and a backend)
are attached to channels to call this callback. For a frontend filter, it
is called before any other analyzer. For a backend filter, it is called
when a backend is attached to a stream. So some processing cannot be
filtered in that case.
- channel_analyze: it is called before each analyzer attached to a channel,
expects analyzers responsible for data sending.
- channel_end_analyze: it is called when all other analyzers have finished
their processing. A new analyzers is attached to channels to call this
callback. For a TCP stream, this is always the last one called. For a HTTP
one, the callback is called when a request/response ends, so it is called
one time for each request/response.
* 'session_established' callback has been removed. Everything that is done in
this callback can be handled by 'channel_start_analyze' on the response
channel.
* 'http_pre_process' and 'http_post_process' callbacks have been replaced by
'channel_analyze'.
* 'http_start' callback has been replaced by 'http_headers'. This new one is
called just before headers sending and parsing of the body.
* 'http_end' callback has been replaced by 'channel_end_analyze'.
* It is possible to set a forwarder for TCP channels. It was already possible to
do it for HTTP ones.
* Forwarders can partially consumed forwardable data. For this reason a new
HTTP message state was added before HTTP_MSG_DONE : HTTP_MSG_ENDING.
Now all filters can define corresponding callbacks (http_forward_data
and tcp_forward_data). Each filter owns 2 offsets relative to buf->p, next and
forward, to track, respectively, input data already parsed but not forwarded yet
by the filter and parsed data considered as forwarded by the filter. A any time,
we have the warranty that a filter cannot parse or forward more input than
previous ones. And, of course, it cannot forward more input than it has
parsed. 2 macros has been added to retrieve these offets: FLT_NXT and FLT_FWD.
In addition, 2 functions has been added to change the 'next size' and the
'forward size' of a filter. When a filter parses input data, it can alter these
data, so the size of these data can vary. This action has an effet on all
previous filters that must be handled. To do so, the function
'filter_change_next_size' must be called, passing the size variation. In the
same spirit, if a filter alter forwarded data, it must call the function
'filter_change_forward_size'. 'filter_change_next_size' can be called in
'http_data' and 'tcp_data' callbacks and only these ones. And
'filter_change_forward_size' can be called in 'http_forward_data' and
'tcp_forward_data' callbacks and only these ones. The data changes are the
filter responsability, but with some limitation. It must not change already
parsed/forwarded data or data that previous filters have not parsed/forwarded
yet.
Because filters can be used on backends, when we the backend is set for a
stream, we add filters defined for this backend in the filter list of the
stream. But we must only do that when the backend and the frontend of the stream
are not the same. Else same filters are added a second time leading to undefined
behavior.
The HTTP compression code had to be moved.
So it simplifies http_response_forward_body function. To do so, the way the data
are forwarded has changed. Now, a filter (and only one) can forward data. In a
commit to come, this limitation will be removed to let all filters take part to
data forwarding. There are 2 new functions that filters should use to deal with
this feature:
* flt_set_http_data_forwarder: This function sets the filter (using its id)
that will forward data for the specified HTTP message. It is possible if it
was not already set by another filter _AND_ if no data was yet forwarded
(msg->msg_state <= HTTP_MSG_BODY). It returns -1 if an error occurs.
* flt_http_data_forwarder: This function returns the filter id that will
forward data for the specified HTTP message. If there is no forwarder set, it
returns -1.
When an HTTP data forwarder is set for the response, the HTTP compression is
disabled. Of course, this is not definitive.
The serial number for a generated certificate was computed using the requested
servername, without any variable/random part. It is not a problem from the
moment it is not regenerated.
But if the cache is disabled or when the certificate is evicted from the cache,
we may need to regenerate it. It is important to not reuse the same serial
number for the new certificate. Else clients (especially browsers) trigger a
warning because 2 certificates issued by the same CA have the same serial
number.
So now, the serial is a static variable initialized with now_ms (internal date
in milliseconds) and incremented at each new certificate generation.
(Ref MPS-2031)
in function 'si_connect', an existing connection is reused (and considered as
established) only when there are some pending data in the output channel.
This can be problem when filters are used, because a filter can choose to not
forward data immediatly. So when we try to initiate a connection to a server,
the output channel can be empty. In this situation, if the connection already
exists, it is not considered as established and nothing happens. If the stream
interface is in the state SI_ST_ASS, this leads to an infinite loop in
process_stream because it remains in this state.
This patch fixes this problem. Now, in 'si_connect', we always reuse an existing
connection, whether or not there are pending data in the output channel.
Usually it's desirable to merge similarly sized pools, which is the
reason why their size is rounded up to the next multiple of 16. But
for the buffers this is problematic because we add the size of
struct buffer to the user-requested size, and the rounding results
in 8 extra bytes that are usable in the end. So the user gets more
bytes than asked for, and in case of SSL it results in short writes
for the extra bytes that are sent above multiples of 16 kB.
So we add a new flag MEM_F_EXACT to request that the size is not
rounded up when creating the entry. Thus it doesn't disable merging.
The function channel_recv_limit() relies on channel_reserved() which
itself relies on channel_in_transit(). Individually they're OK but
combined they're doing the wrong thing.
The problem is that we refrain from filling buffers while to_forward
is even much larger than the buffer because of a semantic issue along
the call chain. This is particularly visible when offloading SSL on
moderately large files (1 MB), though it is also visible on clear text.
Twice the number of recv() calls are made compared to what is needed,
and the typical performance drops by 15-20% in SSL in 1.6 and later,
and no directly measurable drop in 1.5 except when using strace.
There's no need for all these intermediate functions, so let's get
rid of them and reimplement channel_recv_limit() from scratch in a
safer way.
This fix needs to be backported to 1.6 and 1.5 (at least). Note that in
1.5 the function is called buffer_recv_limit() and it may differ a bit.
This function should return a 16-bit type as that is the type for
dns header id.
Also because it is doing an uint16 unpack big-endian operation.
Backport: can be backported to 1.6
Signed-off-by: Thiago Farina <tfarina@chromium.org>
Signed-off-by: Baptiste Assmann <bedis9@gmail.com>
Introduction of a new function in the LRU cache source file.
Purpose of this function is to be used to delete a number of entries in
the cache. 'number' is defined by the caller and the key removed are
taken at the tail of the tree
We have csv_enc() but there's no way to append some CSV-encoded data
to an existing chunk, so here we modify the existing function for this
and create an inlined version of csv_enc() which first resets the output
chunk. It will be handy to append data to an existing chunk without
having to use an extra temporary chunk, or to encode multiple strings
into a single chunk with chunk_newstr().
The patch is quite small, in fact most changes are typo fixes in the
comments.
chunk_initstr() prepares a read-only chunk from a string of
fixed length. Thus it must be prepared to accept a read-only
string on the input, otherwise the caller has to force-cast
some const char* and that's not a good idea.
These two new functions will make it easier to manipulate small strings
from within functions, because at many places, multiple short strings
are needed which do not deserve a malloc() nor a free(), and alloca()
is often discouraged. Since we already have trash chunks, it's convenient
to be able to allocate substrings from a chunk and use them later since
our functions already perform all the length checks. chunk_newstr() adds
a trailing zero at the end of a chunk and returns the pointer to the next
character, which can be used as an independant string. chunk_strcat()
does what it says.
Since thus function bears the name of a well-known string function, it
must at least promise compatible semantics. Here it means always adding
the trailing zero so that anyone willing to use chunk->str as a regular
string can do it. Of course the zero is not counted in the chunk's length.
chunk_dup() was affected by two bugs at once related to dst->size :
- first, it didn't check dst->size to know if it could free(dst->str),
so using it on a statically allocated chunk would cause a free(constant)
and crash the process ;
- second, it didn't properly set dst->size, possibly causing smaller
strings not to be properly reported in a chunk that was previously
used for something else.
Fortunately, neither of these situations ever happened since the function
is rarely used.
In the process of doing this, we even allocate one more byte for a
trailing zero if the input chunk was not full, so that the copied
string can safely be reused by standard string functions.
The bug was introduced in 1.3.4 nine years ago with this commit :
0f77253 ("[MINOR] store HTTP error messages into a chunk array")
It's better to backport this fix in case a future fix relies on it.
The function http_reply_and_close has been added in proto_http.c to wrap calls
to stream_int_retnclose. This functions will be modified when the filters will
be added.
If a sample fetch needing http_txn is called from an HTTP Lua applet,
the result will be invalid and may even cause a crash because some HTTP
data can be forwarded and the HTTP txn is no longer valid.
Here the solution is to ensure that a fetch called from Lua never
needs http_txn. This is done thanks to a new flag HLUA_F_MAY_USE_HTTP
which indicates whether or not it is safe to call a fetch which needs
HTTP.
This fix needs to be backported to 1.6.
This patch converts a boolean "int" to a bitfiled. The main
reason is to save space in the struct if another flag may will
be require.
Note that this patch is required for next fix and will need to be
backported to 1.6.
When memmax is forced using "-m", the per-process memory limit is enforced
using setrlimit(), but this value is not used to compute the automatic
maxconn limit. In addition, the per-process memory limit didn't consider
the fact that the shared SSL cache only needs to be accounted once.
The doc was also fixed to clearly state that "-m" is global and not per
process. It makes sense because people who use -m want to protect the
system's resources regardless of whatever appears in the configuration.
Since commit fd7edd3 ("MINOR: Move http method enum from proto_http to sample")
proto_http.h needs to include sample.h. This can be backported to 1.6 though
it doesn't affect existing code.
On freebsd, the macro LIST_PREV already exists in the header file
<sys/queue.h>, and this makes a build error.
This patch removes the macros before declaring it. This ensure
that the error doesn't occurs.
It is possible to create a http capture rule which points to a capture slot
id which does not exist.
Current patch prevent this when parsing configuration and prevent running
configuration which contains such rules.
This configuration is now invalid:
frontend f
bind :8080
http-request capture req.hdr(User-Agent) id 0
default_backend b
this one as well:
frontend f
bind :8080
declare capture request len 32 # implicit id is 0 here
http-request capture req.hdr(User-Agent) id 1
default_backend b
It applies of course to both http-request and http-response rules.
Causes HAProxy to emit a static string to the agent on every check,
so that you can independently control multiple services running
behind a single agent port.
The direction (request or response) is not propagated in the
sample fecthes called throught Lua. This patch adds the direction
status in some structs (hlua_txn and hlua_smp) to make sure that
the sample fetches will be called with all the information.
The converters can not access to a TXN object, so there are not
impacted the direction. However, the samples used as input of the
Lua converter wrapper are initiliazed with the direction. Thereby,
the struct smp stay consistent.
[wt: needs to be backported to 1.6]
When DEBUG_MEMORY_POOLS is used, we now use the link pointer at the end
of the pool to store a pointer to the pool, and to control it during
pool_free2() in order to serve four purposes :
- at any instant we can know what pool an object was allocated from
when examining memory, hence how we should possibly decode it ;
- it serves to detect double free when they happen, as the pointer
cannot be valid after the element is linked into the pool ;
- it serves to detect if an element is released in the wrong pool ;
- it serves as a canary, to detect if some buffers experienced an
overflow before being release.
All these elements will definitely help better troubleshoot strange
situations, or at least confirm that certain conditions did not happen.
When debugging a core file, it's sometimes convenient to be able to
visit the released entries in the pools (typically last released
session). Unfortunately the first bytes of these entries are destroyed
by the link elements of the pool. And of course, most structures have
their most accessed elements at the beginning of the structure (typically
flags). Let's add a build-time option DEBUG_MEMORY_POOLS which allocates
an extra pointer in each pool to put the link at the end of each pool
item instead of the beginning.
Sometimes analysing a core file isn't easy due to shared memory pools.
Let's add a build option to disable this. It's not enabled by default,
it could be backported to older versions.
Basically, it's ill-defined and shouldn't really be used going forward.
We can't guarantee that resolvers will do the 'legwork' for us and
actually resolve CNAMES when we request the ANY query-type. Case in point
(obfuscated, clearly):
PRODUCTION! ahayworth@secret-hostname.com:~$
dig @10.11.12.53 ANY api.somestartup.io
; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> @10.11.12.53 ANY api.somestartup.io
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62454
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 0
;; QUESTION SECTION:
;api.somestartup.io. IN ANY
;; ANSWER SECTION:
api.somestartup.io. 20 IN CNAME api-somestartup-production.ap-southeast-2.elb.amazonaws.com.
;; AUTHORITY SECTION:
somestartup.io. 166687 IN NS ns-1254.awsdns-28.org.
somestartup.io. 166687 IN NS ns-1884.awsdns-43.co.uk.
somestartup.io. 166687 IN NS ns-440.awsdns-55.com.
somestartup.io. 166687 IN NS ns-577.awsdns-08.net.
;; Query time: 1 msec
;; SERVER: 10.11.12.53#53(10.11.12.53)
;; WHEN: Mon Oct 19 22:02:29 2015
;; MSG SIZE rcvd: 242
HAProxy can't handle that response correctly.
Rather than try to build in support for resolving CNAMEs presented
without an A record in an answer section (which may be a valid
improvement further on), this change just skips ANY record types
altogether. A and AAAA are much more well-defined and predictable.
Notably, this commit preserves the implicit "Prefer IPV6 behavior."
Furthermore, ANY query type by default is a bad idea: (from Robin on
HAProxy's ML):
Using ANY queries for this kind of stuff is considered by most people
to be a bad practice since besides all the things you named it can
lead to incomplete responses. Basically a resolver is allowed to just
return whatever it has in cache when it receives an ANY query instead
of actually doing an ANY query at the authoritative nameserver. Thus
if it only received queries for an A record before you do an ANY query
you will not get an AAAA record even if it is actually available since
the resolver doesn't have it in its cache. Even worse if before it
only got MX queries, you won't get either A or AAAA
This is equivalent to commit 2af207a ("MEDIUM: tcp: implement tcp-ut
bind option to set TCP_USER_TIMEOUT") except that this time it works
on the server side. The purpose is to detect dead server connections
even when checks are rare, disabled, or after a soft reload (since
checks are disabled there as well), and to ensure client connections
will get killed faster.
I would like to contribute the following fix to enable the Linux s390x
platform. The fix was built against today's git master. I've attached the
patch for review. Depending on your buildbot/jenkins/? requirements I can
set up a virtual machine for automated building/testing of the package in
this environment.
This is done by adding EVP_PKEY_EC type in supported types for the CA private
key when we get the message digest used to sign a generated X509 certificate.
So now, we support DSA, RSA and EC private keys.
And to be sure, when the type of the private key is not directly supported, we
get its default message digest using the function
'EVP_PKEY_get_default_digest_nid'.
We also use the key of the default certificate instead of generated it. So we
are sure to use the same key type instead of always using a RSA key.
First, the LRU cache must be initialized after the configuration parsing to
correctly set its size.
Next, the function 'ssl_sock_set_generated_cert' returns -1 when an error occurs
(0 if success). In that case, the caller is responsible to free the memory
allocated for the certificate.
Finally, when a SSL certificate is generated by HAProxy but cannot be inserted
in the cache, it must be freed when the SSL connection is closed. This happens
when 'tune.ssl.ssl-ctx-cache-size' is set to 0.
htonll and ntohll were defined in 5b4dd683cb but on osx they are already
defined in sys/_endian.h. So, we check if they are defined before
declaring them.
[wt: no backport needed]
When debugging an issue, sometimes it can be useful to be able to use
byte 0 to poison memory areas, resulting in the same effect as a calloc().
This patch changes the default mem_poison_byte to -1 to disable it so that
all positive values are usable.
Michael Ezzell reported a bug causing haproxy to segfault during startup
when trying to send syslog message from Lua. The function __send_log() can
be called with *p that is NULL and/or when the configuration is not fully
parsed, as is the case with Lua.
This patch fixes this problem by using individual vectors instead of the
pre-generated strings log_htp and log_htp_rfc5424.
Also, this patch fixes a problem causing haproxy to write the wrong pid in
the logs -- the log_htp(_rfc5424) strings were generated at the haproxy
start, but "pid" value would be changed after haproxy is started in
daemon/systemd mode.
This patch adds a new RFC5424-specific log-format for the structured-data
that is automatically send by __send_log() when the sender is in RFC5424
mode.
A new statement "log-format-sd" should be used in order to set log-format
for the structured-data part in RFC5424 formatted syslog messages.
Example:
log-format-sd [exampleSDID@1234\ bytes=\"%B\"\ status=\"%ST\"]
The function __send_log() iterates over senders and passes the header as
the first vector to sendmsg(), thus it can send a logger-specific header
in each message.
A new logger arguments "format rfc5424" should be used in order to enable
RFC5424 header format. For example:
log 10.2.3.4:1234 len 2048 format rfc5424 local2 info
At the moment we have to call snprintf() for every log line just to
rebuild a constant. Thanks to sendmsg(), we send the message in 3 parts:
time-based header, proxy-specific hostname+log-tag+pid, session-specific
message.
A new function introduced meant to be called during general deinit phase.
During the configuration parsing, the section entries are all allocated.
This new function free them.
The tune.maxrewrite parameter used to be pre-initialized to half of
the buffer size since the very early days when buffers were very small.
It has grown to absurdly large values over the years to reach 8kB for a
16kB buffer. This prevents large requests from being accepted, which is
the opposite of the initial goal.
Many users fix it to 1024 which is already quite large for header
addition.
So let's change the default setting policy :
- pre-initialize it to 1024
- let the user tweak it
- in any case, limit it to tune.bufsize / 2
This results in 15kB usable to buffer HTTP messages instead of 8kB, and
doesn't affect existing configurations which already force it.
This new target can be called from the frontend or the backend. It
is evaluated just before the backend choice and just before the server
choice. So, the input stream or HTTP request can be forwarded to a
server or to an internal service.
This patch only declares the execution timeout variable. The applet
must be respect or ignore it. For example an applet who doing some
network accesses and doesn't control its execution yime should use
this timeout.
this patch adds init function for the applet. the HAProxy standard
applet (peers, stats, lua) have no standard way for the applet
initialisation. This new function pointer have for goal to try to
standardize the applet initialization.
This flag is used by custom actions to know that they're called for the
first time. The only case where it's not set is when they're resuming
from a yield. It will be needed to let them know when they have to
allocate some resources.
The garbage collector is a little bit heavy to run, and it was added
only for cosockets. This patch prevent useless executions when no
cosockets are used.
This new flag indicates to a custom action that it must not yield because
it will not be called anymore. This addresses an issue introduced by commit
bc4c1ac ("MEDIUM: http/tcp: permit to resume http and tcp custom actions"),
which made it possible to yield even after the last call and causes Lua
actions not to be stopped when the session closes. Note that the Lua issue
is not fixed yet at this point. Also only TCP rules were handled, for now
HTTP rules continue to let the action yield since we don't know whether or
not it is a final call.
Since commit bc4c1ac ("MEDIUM: http/tcp: permit to resume http and tcp
custom actions"), some actions may yield and be called back when new
information are available. Unfortunately some of them may continue to
yield because they simply don't know that it's the last call from the
rule set. For this reason we'll need to pass a flag to the custom
action to pass such information and possibly other at the same time.
While the SI_ST_DIS state is set *after* doing the close on a connection,
it was set *before* calling release on an applet. Applets have no internal
flags contrary to connections, so they have no way to detect they were
already released. Because of this it happened that applets were closed
twice, once via si_applet_release() and once via si_release_endpoint() at
the end of a transaction. The CLI applet could perform a double free in
this case, though the situation to cause it is quite hard because it
requires that the applet is stuck on output in states that produce very
few data.
In order to solve this, we now assign the SI_ST_DIS state *after* calling
->release, and we refrain from doing so if the state is already assigned.
This makes applets work much more like connections and definitely avoids
this double release.
In the future it might be worth making applets have their own flags like
connections to carry their own state regardless of the stream interface's
state, especially when dealing with connection reuse.
No backport is needed since this issue was caused by the rearchitecture
in 1.6.
This function is a callback made only for calls from the applet handler.
Rename it to remove confusion. It's currently called from the Lua code
but that's not correct, we should call the notify and update functions
instead otherwise it will not enable the applet again.
This one is not needed anymore as what it used to do is either
completely covered by the new stream_int_notify() function, or undesired
and inherited from the past as a side effect of introducing the
connections.
This update is theorically never called since it's assigned only when
nothing is connected to the stream interface. However a test has been
added to si_update() to stay safe if some foreign code decides to call
si_update() in unsafe situations.
stream_int_notify() was taken from the common part between si_conn_wake_cb()
and si_applet_done(). It is designed to report activity to a stream from
outside its handler. It'll generally be used by lower layers to report I/O
completion but may also be used by remote streams if the buffer processing
is shared.
Now the call to stream_int_update() is moved to si_update(), which
is exclusively called from the stream, so that the socket layer may
be updated without updating the stream layer. This will later permit
to call it individually from other places (other tasks or applets for
example).
This function is designed to be called from within the stream handler to
update the channels' expiration timers and the stream interface's flags
based on the channels' flags. It needs to be called only once after the
channels' flags have settled down, and before they are cleared, though it
doesn't harm to call it as often as desired (it just slightly hurts
performance). It must not be called from outside of the stream handler,
as what it does will be used to compute the stream task's expiration.
The code was taken directly from stream_int_update_applet() and
stream_int_update_conn() which had exactly the same one except for
applet-specific or connection-specific status update.
This is not a real run queue and we're facing ugly bugs because
if this : if a an applet removes another applet from the queue,
typically the next one after itself, the list iterator loops
forever because the list's backup pointer is not valid anymore.
Before creating a run queue, let's rename this list.
This private configuration pointer is used for storing some configuration
data associated the keyword, So many keywords can use the same parse
function, and this one can use a discriminator.
Mailing list participant "mlist" reported negative conn_cur values in
stick tables as the result of "tcp-request connection track-sc". The
reason is that after the stick entry it copied from the session to the
stream, both the session and the stream grab a reference to the entry
and when the stream ends, it decrements one reference and one connection,
then the same is done for the session.
In fact this problem was already encountered slightly differently in the
past and addressed by Thierry using the patch below as it was believed by
then to be only a refcount issue since it was the observable symptom :
827752e "BUG/MEDIUM: stick-tables: refcount error after copying SC..."
In reality the problem is that the stream must touch neither the refcount
nor the connection count for entries it inherits from the session. While
we have no way to tell whether a track entry was inherited from the session
(since they're simply memcpy'd), it is possible to prevent the stream from
touching an entry that already exists in the session because that's a
guarantee that it was inherited from it.
Note that it may be a temporary fix. Maybe in the future when a session
gives birth to multiple streams we'll face a situation where a session may
be updated to add more tracked entries after instanciating some streams.
The correct long-term fix is to mark some tracked entries as shared or
private (or RO/RW). That will allow the session to track more entries
even after the same trackers are being used by early streams.
No backport is needed, this is only caused by the session/stream split in 1.6.
Added support for version 3.2 of 51Degrees C library.
Added fields to store HTTP header names important to device detection
other than User-Agent.
Included a pool of worksets for use with Pattern device detection.
Added the definition of CHECK_HTTP_MESSAGE_FIRST and the declaration of
smp_prefetch_http to the header.
Changed smp_prefetch_http implementation to remove the static qualifier.
This directive gives HAProxy the ability to use the either the global
server-state-file directive or a local one using server-state-file-name to
load server states.
The state can be saved right before the reload by the init script, using
the "show servers state" command on the stats socket redirecting output into
a file.
This new global section directive is used to store the path to the file
where HAProxy will be able to retrieve server states across reloads.
The file pointed by this path is used to store a file which can contains
state of all servers from all backends.
This new global directive can be used to provide a base directory where
all the server state files could be loaded.
If a server state file name starts with a slash '/', then this directive
must not be applied.
new command 'show servers state' which dumps all variable parameters
of a server during an HAProxy process life.
Purpose is to dump current server state at current run time in order to
read them right after the reload.
The format of the output is versionned and we support version 1 for now.
Introduces a few new macros used by server state save and application accros reloads:
- currently used state server file format version
- currently used state server file header fields
- MIN and MAX value for version number
- maximum number of fields that could be found in a server-state file
- an arbitrary state-file max line length
When a server is disabled in the configuration using the "disabled"
keyword, a single flag is positionned: SRV_ADMF_CMAINT (use to be
SRV_ADMF_FMAINT)..
That said, when providing the first version of this code, we also
changed the SRV_ADMF_MAINT mask to match any of the possible MAINT
cases: SRV_ADMF_FMAINT, SRV_ADMF_IMAINT, SRV_ADMF_CMAINT
Since SRV_ADMF_CMAINT is never (and is not supposed to be) altered at
run time, once a server has this flag set up, it can never ever be
enabled again using the stats socket.
In order to fix this, we should:
- consider SRV_ADMF_CMAINT as a simple flag to report the state in the
old configuration file (will be used after a reload to deduce the
state of the server in a new running process)
- enabling both SRV_ADMF_CMAINT and SRV_ADMF_FMAINT when the keyword
"disabled" is in use in the configuration
- update the mask SRV_ADMF_MAINT as it was before, to only match
SRV_ADMF_FMAINT and SRV_ADMF_IMAINT.
The following patch perform the changes above.
It allows fixing the regression without breaking the way the up coming
feature (seamless server state accross reloads) is going to work.
Note: this is 1.6-only, no backport needed.
Regex header file is missing in types/action.h
Repported by Conrad Hoffmann
I cannot build the current dev's master HEAD (ec3c37d) because of this error:
> In file included from include/proto/proto_http.h:26:0,
> from src/stick_table.c:26:
> include/types/action.h:102:20: error: field ‘re’ has incomplete type
> struct my_regex re; /* used by replace-header and replace-value */
> ^
> Makefile:771: recipe for target 'src/stick_table.o' failed
> make: *** [src/stick_table.o] Error 1
The struct act_rule defined in action.h includes a full struct my_regex
without #include-ing regex.h. Both gcc 5.2.0 and clang 3.6.2 do not allow this.
Some DNS response may be valid from a protocol point of view but may not
contain any IP addresses.
This patch gives a new flag to the function dns_get_ip_from_response to
report such case.
It's up to the upper layer to decide what to do with this information.
Some DNS responses may be valid from a protocol point of view, but may
not contain any information considered as interested by the requester..
Purpose of the flag DNS_RESP_NO_EXPECTED_RECORD introduced by this patch is
to allow reporting such situation.
When this happens, a new DNS query is sent with a new query type.
For now, the function only expect A and AAAA query types which is enough
to cover current cases.
In a next future, it will be up to the caller to tell the function which
query types are expected.
The function does a bunch of things among which resolving environment
variables, skipping address family specifiers and trimming port ranges.
It is the only one which sees the complete host name before trying to
resolve it. The DNS resolving code needs to know the original hostname,
so we modify this function to optionally provide it to the caller.
Note that the function itself doesn't know if the host part was a host
or an address, but str2ip() knows that and can be asked not to try to
resolve. So we first try to parse the address without resolving and
try again with resolving enabled. This way we know if the address is
explicit or needs some kind of resolution.
This patch introduces a new internal response state about the analysis
of a DNS response received by a server.
It is dedicated to report to above layer that the response is
'truncated'.
This patch updates the dns_nameserver structure to integrate a counter
dedicated to 'truncated' response sent by servers.
Such response are important to track, since HAProxy is supposed to
replay its request.
Current DNS client code implementation doesn't take care of response
flags setup by the server.
This patch introduces a couple of bitmasks one can use to retrieve the
truncated flag and the reply code available in the 2-bytes flag field.
3 variables of the dns_resolution structure are set to 'time_t' type.
Since they are all set by 'now_ms' and used as 'ticks' in HAProxy's
internal, it is safer to set them to the same type than now_ms:
'unsigned int'.
When converting the "method" fetch to a string, we used to get an empty
string if the first character was not an upper case. This was caused by
the lookup function which returns HTTP_METH_NONE when a lookup is not
possible, and this method being mapped to an empty string in the array.
This is a totally stupid mechanism, there's no reason for having the
result depend on the first char. In fact the message parser already
checks that the syntax matches an HTTP token so we can only land there
with a valid token, hence only HTTP_METH_OTHER should be returned.
This fix should be backported to all actively supported branches.
Before this patch, two type of custom actions exists: ACT_ACTION_CONT and
ACT_ACTION_STOP. ACT_ACTION_CONT is a non terminal action and ACT_ACTION_STOP is
a terminal action.
Note that ACT_ACTION_STOP is not used in HAProxy.
This patch remove this behavior. Only type type of custom action exists, and it
is called ACT_CUSTOM. Now, the custion action can return a code indicating the
required behavior. ACT_RET_CONT wants that HAProxy continue the current rule
list evaluation, and ACT_RET_STOP wants that HAPRoxy stops the the current rule
list evaluation.
This patch is inspired by Bowen Ni's proposal and it is based on his first
implementation:
With Lua integration in HAProxy 1.6, one can change the request method,
path, uri, header, response header etc except response line.
I'd like to contribute the following methods to allow modification of the
response line.
[...]
There are two new keywords in 'http-response' that allows you to rewrite
them in the native HAProxy config. There are also two new APIs in Lua that
allows you to do the same rewriting in your Lua script.
Example:
Use it in HAProxy config:
*http-response set-code 404*
Or use it in Lua script:
*txn.http:res_set_reason("Redirect")*
I dont take the full patch because the manipulation of the "reason" is useless.
standard reason are associated with each returned code, and unknown code can
take generic reason.
So, this patch can set the status code, and the reason is automatically adapted.
This was the first transparent proxy technology supported by haproxy
circa 2005 but it was obsoleted in 2007 by Tproxy 4.0 which removed a
lot of the earlier versions' shortcomings and was finally merged into
the kernel. Since nobody has been using cttproxy for many years now
and nobody has even just tried to compile the files, it's time to
remove it. The doc was updated as well.
This patch removes the special stick tables types names and
use the standard sample type names. This avoid the maintainance
of two types and remove the switch/case for matching a sample
type for each stick table type.
This patch is the first step for sample integration. Actually
the stick tables uses her own data type, and some converters
must be called to convert sample type to stick-tables types.
This patch removes the stick-table types and replace it by
the sample types. This prevent:
- Maintenance of two types of converters
- reduce the code using the samples converters
This patch normalize the return code of the configuration parsers. Before
these changes, the tcp action parser returned -1 if fail and 0 for the
succes. The http action returned 0 if fail and 1 if succes.
The normalisation does:
- ACT_RET_PRS_OK for succes
- ACT_RET_PRS_ERR for failure
Each (http|tcp)-(request|response) action use the same method
for looking up the action keyword during the cofiguration parsing.
This patch mutualize the code.
This patch merges the conguration keyword struct. Each declared configuration
keyword struct are similar with the others. This patch simplify the code.
Action function can return 3 status:
- error if the action encounter fatal error (like out of memory)
- yield if the action must terminate his work later
- continue in other cases
For performances considerations, some actions are not processed by remote
function. They are directly processed by the function. Some of these actions
does the same things but for different processing part (request / response).
This patch give the same name for the same actions, and change the normalization
of the other actions names.
This patch is ONLY a rename, it doesn't modify the code.
This patch group the action name in one file. Some action are called
many times and need an action embedded in the action caller. The main
goal is to have only one header file grouping all definitions.
This mark permit to detect if the action tag is over the allowed range.
- Normally, this case doesn't appear
- If it appears, it is processed by ded fault case of the switch
This patch removes the generic opaque type for storing the configuration of the
acion "set-src" (HTTP_REQ_ACT_SET_SRC), and use the dedicated type "struct expr"
The (http|tcp)-(request|response) action rules use common
opaque type. For the HAProxy embbedded feature, types are know,
it better to add this types in the action union and use it.
The (http|tcp)-(request|response) action rules use common
opaque type. For the HAProxy embbedded feature, types are know,
it better to add this types in the action union and use it.
This patch is the first of a serie which merge all the action structs. The
function "tcp-request content", "tcp-response-content", "http-request" and
"http-response" have the same values and the same process for some defined
actions, but the struct and the prototype of the declared function are
different.
This patch try to unify all of these entries.
A map can store and return various types as output. The only one example is the
IPv4 and IPv6 types. The previous patch remove the type from the sample storage
struct and use the conoverter output type, expecting that all entries of the
map have the same type.
This will be wrong when the maps will support both IPv4 and IPv6 as output.
The union name "data" is a little bit heavy while we read the source
code because we can read "data.data.sint". The rename from "data" to "u"
makes the read easiest like "data.u.sint".
This extract is not really required, but it maybe will be usefull later.
A comming soonpatch about simplification of stick table values will
use this union
This patch remove the struct information stored both in the struct
sample_data and in the striuct sample. Now, only thestruct sample_data
contains data, and the struct sample use the struct sample_data for storing
his own data.
This is useful to prevent cross includes. The header file sample.h
needs to include proto_http, stick_tables.h will need to include
sample.h and proto_http includes stick_tables.h.
I choose to move the known http method define because this enum is
mainly used in sample.h. This enum is used for the sample type method.
appsessions started to be deprecated with the introduction of stick
tables, and the latter are much more powerful and flexible, and in
addition they are replicated between nodes and maintained across
reloads. Let's now remove appsession completely.
Since sample fetches are not always available in the response phase,
this patch implements %HQ such that:
GET /foo?bar=baz HTTP/1.0
...would be logged as:
?bar=baz
These ones are considered safe as they have already been reused.
They will be useful in "aggressive" and "always" http-reuse modes
in order to place the first request of a connection with the least
risk.
For now it only supports "never", meaning that we never want to reuse a
shared connection, and "always", meaning that we can use any connection
that was not marked private. When "never" is set, this also implies that
no idle connection may become a shared one.
This flag is set on an outgoing connection when this connection gets
some properties that must not be shared with other connections, such
as dynamic transparent source binding, SNI or a proxy protocol header,
or an authentication challenge from the server. This will be needed
later to implement connection reuse.
This function is now dedicated to idle connections only, which means
that it must not be used without any endpoint nor anything not a
connection. The connection remains attached to the stream interface.
This list member will be used to attach a connection to a list of
idle, reusable or queued connections. It's unused for now. Given
that it's not expected to be used more than a few times per session,
the member was put after the target, in the area starting at the
second cache line of the structure.
For now it's not populated but we have the list entry. It will carry
all idle connections that sessions don't want to share. They may be
used later to reclaim connections upon socket shortage for example.
This function only detaches the endpoint from the stream-int and
optionally returns the original pointer. This will be needed to
steal idle connections from other connections.
Since we now always call this function with the reuse parameter cleared,
let's simplify the function's logic as it cannot return the existing
connection anymore. The savings on this inline function are appreciable
(240 bytes) :
$ size haproxy.old haproxy.new
text data bss dec hex filename
1020383 40816 36928 1098127 10c18f haproxy.old
1020143 40816 36928 1097887 10c09f haproxy.new
This patch removes the 32 bits unsigned integer and the 32 bit signed
integer. It replaces these types by a unique type 64 bit signed.
This makes easy the usage of integer and clarify signed and unsigned use.
With the previous version, signed and unsigned are used ones in place of
others, and sometimes the converter loose the sign. For example, divisions
are processed with "unsigned", if one entry is negative, the result is
wrong.
Note that the integer pattern matching and dotted version pattern matching
are already working with signed 64 bits integer values.
There is one user-visible change : the "uint()" and "sint()" sample fetch
functions which used to return a constant integer have been replaced with
a new more natural, unified "int()" function. These functions were only
introduced in the latest 1.6-dev2 so there's no impact on regular
deployments.
These are the 64-bit equivalent of htonl() and ntohl(). They're a bit
tricky in order to avoid expensive operations.
The principle consists in letting the compiler detect we're playing
with a union and simplify most or all operations. The asm-optimized
htonl() version involving bswap (x86) / rev (arm) / other is a single
operation on little endian, or a NOP on big-endian. In both cases,
this lets the compiler "see" that we're rebuilding a 64-bit word from
two 32-bit quantities that fit into a 32-bit register. In big endian,
the whole code is optimized out. In little endian, with a decent compiler,
a few bswap and 2 shifts are left, which is the minimum acceptable.