This one now migrates to the general purpose cli.p0 for the proxy pointer,
cli.p1 for the server pointer, and cli.i0 for the proxy's instance if only
one has to be dumped.
Most of the keywords don't need to have their own entry in the appctx
union, they just need to reuse some generic pointers like we've been
used to do in the appctx with st{0,1,2}. This patch adds p0, p1, i0, i1
and initializes them to zero before calling the parser. This way some
of the simplest existing keywords will be able to disappear from the
union.
It's worth noting that this is an extension to what was initially
attempted via the "private" member that I removed a few patches ago by
not understanding how it was supposed to be used. Here the fact that
we share the same union will force us to be stricter: the code either
uses the general purpose variables or it uses its own fields but not
both.
The appctx storage became a real mess along the years. It now contains
mostly CLI-specific parts that share the same storage as the "cli" part
which in fact only contains the fields needed to pass an error message
to the caller, and it also has room a few other regular applets which
may become more and more common.
This first patch moves the parts around in the union so that all
standard applet parts are grouped together and the CLI-specific ones
are grouped together. It also adds a few comments to indicate what
certain parts are used for since it's sometimes a bit confusing.
The struct hlua size is 128 bytes. The size is the biggest of all the elements
of the union embedded in the appctx struct. With HTTP2, it is possible that this
appctx struct will be use many times for each connection, so the 128 bytes are
a little bit heavy for the global memory consomation.
This patch replace the embbeded hlua struct by a pointer and an associated memory
pool. Now, the memory for lua is allocated only if it is required.
[wt: the appctx is now down to 160 bytes]
Just like previous patch, this was the only other user of the "private"
field of the applet. It used to store a copy of the keyword's action.
Let's just put it into ->table->action and use it from there. It also
slightly simplifies the code by removing a few pointer to integer casts.
We have very few users of the appctx's private field which was introduced
prior to the split of the CLI. Unfortunately it was not removed after the
end. This commit simply introduces hlua_cli->fcn which is the pointer to
the Lua function that the Lua code used to store in this private pointer.
This problem is already detected here:
8dc7316a6f
Another case raises. Now HAProxy sends a final message (typically
with "http-request deny"). Once the the message is sent, the response
channel flags are not modified.
HAProxy executes a Lua sample-fecthes for building logs, and the
result is ignored because the response flag remains set to the value
HTTP_MSG_RPBEFORE. So the Lua function hlua_check_proto() want to
guarantee the valid state of the buffer and ask for aborting the
request.
The function check_proto() is not the good way to ensure request
consistency. The real question is not "Are the message valid ?", but
"Are the validity of message unchanged ?"
This patch memorize the parser state before entering int the Lua
code, and perform a check when it go out of the Lua code. If the parser
state change for down, the request is aborted because the HTTP message
is degraded.
This patch should be backported in version 1.6 and 1.7
Fixing the build using LibreSSL as OpenSSL implementation.
Currently, LibreSSL 2.4.4 provides the same API of OpenSSL 1.0.1x,
but it redefine the OpenSSL version number as 2.0.x, breaking all
checks with OpenSSL 1.1.x.
The patch solves the issue checking the definition of the symbol
LIBRESSL_VERSION_NUMBER when Openssl 1.1.x features are requested.
When an entity tries to get a buffer, if it cannot be allocted, for example
because the number of buffers which may be allocated per process is limited,
this entity is added in a list (called <buffer_wq>) and wait for an available
buffer.
Historically, the <buffer_wq> list was logically attached to streams because it
were the only entities likely to be added in it. Now, applets can also be
waiting for a free buffer. And with filters, we could imagine to have more other
entities waiting for a buffer. So it make sense to have a generic list.
Anyway, with the current design there is a bug. When an applet failed to get a
buffer, it will wait. But we add the stream attached to the applet in
<buffer_wq>, instead of the applet itself. So when a buffer is available, we
wake up the stream and not the waiting applet. So, it is possible to have
waiting applets and never awakened.
So, now, <buffer_wq> is independant from streams. And we really add the waiting
entity in <buffer_wq>. To be generic, the entity is responsible to define the
callback used to awaken it.
In addition, applets will still request an input buffer when they become
active. But they will not be sleeped anymore if no buffer are available. So this
is the responsibility to the applet I/O handler to check if this buffer is
allocated or not. This way, an applet can decide if this buffer is required or
not and can do additional processing if not.
[wt: backport to 1.7 and 1.6]
A stream can be awakened for different reasons. During its processing, it can be
early stopped if no buffer is available. In this situation, the reason why the
stream was awakened is lost, because we rely on the task state, which is reset
after each processing loop.
In many cases, that's not a big deal. But it can be useful to accumulate the
task states if the stream processing is interrupted, especially if some filters
need to be called.
To be clearer, here is an simple example:
1) A stream is awakened with the reason TASK_WOKEN_MSG.
2) Because no buffer is available, the processing is interrupted, the stream
is back to sleep. And the task state is reset.
3) Some buffers become available, so the stream is awakened with the reason
TASK_WOKEN_RES. At this step, the previous reason (TASK_WOKEN_MSG) is lost.
Now, the task states are saved for a stream and reset only when the stream
processing is not interrupted. The correspoing bitfield represents the pending
events for a stream. And we use this one instead of the task state during the
stream processing.
Note that TASK_WOKEN_TIMER and TASK_WOKEN_RES are always removed because these
events are always handled during the stream processing.
[wt: backport to 1.7 and 1.6]
<run_queue> is used to track the number of task in the run queue and
<run_queue_cur> is a copy used for the reporting purpose. These counters has
been renamed, respectively, <tasks_run_queue> and <tasks_run_queue_cur>. So the
naming is consistent between tasks and applets.
[wt: needed for next fixes, backport to 1.7 and 1.6]
As for tasks, 2 counters has been added to track :
* the total number of applets : nb_applets
* the number of active applets : applets_active_queue
[wt: needed for next fixes, to backport to 1.7 and 1.6]
(http|tcp)-(request|response) action cannot take arguments from the
configuration file. Arguments are useful for executing the action with
a special context.
This patch adds the possibility of passing arguments to an action. It
runs exactly like sample fetches and other Lua wrappers.
Note that this patch implements a 'TODO'.
Commit 5fddab0 ("OPTIM: stream_interface: disable reading when
CF_READ_DONTWAIT is set") improved the connection layer's efficiency
back in 1.5-dev13 by avoiding successive read attempts on an active
FD. But by disabling this on a polled FD, it causes an unpleasant
side effect which is that the FD that was subscribed to polling is
suddenly stopped and may need to be re-enabled once the kernel
starts to slow down on data eviction (eg: saturated server at the
other end, bursty traffic caused by too large maxpollevents).
This behaviour is observable with persistent connections when there
is a large enough connection count so that there's no data in the
early connection and polling is required, because there are then
up to 4 epoll_ctl() calls per request. It's important that the
server is slower than haproxy to cause some delays when reading
response.
The current connection layer as designed in 1.6 with the FD cache
doesn't require this trick anymore, though it still benefits from
it when it saves an FD from being uselessly polled. But compared
to the increased cost of enabling and disabling poll all the time,
it's still better to disable it. In some cases it's possible to
observe a performance increase as high as 30% by avoiding this
epoll_ctl() dance.
In the end we only want to disable it when the FD is speculatively
read and not when it's polled. For this we introduce a new function
__conn_data_done_recv() which is used to indicate that we're done
with recv() and not interested in new attempts. If/when we later
support event-triggered epoll, this function will have to change
a bit to do the same even in the polled case.
A quick test with keep-alive requests run on a dual-core / dual-
thread Atom shows a significant improvement :
single process, 0 bytes :
before: Requests per second: 12243.20 [#/sec] (mean)
after: Requests per second: 13354.54 [#/sec] (mean)
single process, 4k :
before: Requests per second: 9639.81 [#/sec] (mean)
after: Requests per second: 10991.89 [#/sec] (mean)
dual process, 0 bytes (unstable) :
before: Requests per second: 16900-19800 ~ 17600 [#/sec] (mean)
after: Requests per second: 18600-21400 ~ 20500 [#/sec] (mean)
It already returns an empty string when the field is empty, but as a
preventive measure we should do the same when the string itself is a
NULL. While it is not supposed to happen, it will make the code more
resistant against failed allocations and unexpected results.
This fix should be backported to 1.7.
Historically we used to have the stick counters processing put into
session.c which became stream.c. But a big part of it is now in
stick-table.c (eg: converters) but despite this we still have all
the sample fetch functions in stream.c
These parts do not depend on the stream anymore, so let's move the
remaining chunks to stick-table.c and have cleaner files.
What remains in stream.c is everything needed to attach/detach
trackers to the stream and to update the counters while the stream
is being processed.
There's no more reason to keep tcp rules processing inside proto_tcp.c
given that there is nothing in common there except these 3 letters : tcp.
The tcp rules are in fact connection, session and content processing rules.
Let's move them to "tcp-rules" and let them live their life there.
We used to have 3 types of counters with a huge overlap :
- listener counters : stats collected for each bind line
- proxy counters : union of the frontend and backend counters
- server counters : stats collected per server
It happens that quite a good part was common between listeners and
proxies due to the frontend counters being updated at the two locations,
and that similarly the server and proxy counters were overlapping and
being updated together.
This patch cleans this up to propose only two types of counters :
- fe_counters: used by frontends and listeners, related to
incoming connections activity
- be_counters: used by backends and servers, related to outgoing
connections activity
This allowed to remove some non-sensical counters from both parts. For
frontends, the following entries were removed :
cum_lbconn, last_sess, nbpend_max, failed_conns, failed_resp,
retries, redispatches, q_time, c_time, d_time, t_time
For backends, this ones was removed : intercepted_req.
While doing this it was discovered that we used to incorrectly report
intercepted_req for backends in the HTML stats, which was always zero
since it's never updated.
Also it revealed a few inconsistencies (which were not fixed as they
are harmless). For example, backends count connections (cum_conn)
instead of sessions while servers count sessions and not connections.
Over the long term, some extra cleanups may be performed by having
some counters update functions touching both the server and backend
at the same time, as well as both the frontend and listener, to
ensure that all sides have all their stats properly filled. The stats
dump will also be able to factor the dump functions by counter types.
Reinhard Vicinus reported that the reported average response times cannot
be larger than 16s due to the double multiply being performed by
swrate_add() which causes an overflow very quickly. Indeed, with N=512,
the highest average value is 16448.
One solution proposed by Reinhard is to turn to long long, but this
involves 64x64 multiplies and 64->32 divides, which are extremely
expensive on 32-bit platforms.
There is in fact another way to avoid the overflow without using larger
integers, it consists in avoiding the multiply using the fact that
x*(n-1)/N = x-(x/N).
Now it becomes possible to store average values as large as 8.4 millions,
which is around 2h18mn.
Interestingly, this improvement also makes the code cheaper to execute
both on 32 and on 64 bit platforms :
Before :
00000000 <swrate_add>:
0: 8b 54 24 04 mov 0x4(%esp),%edx
4: 8b 0a mov (%edx),%ecx
6: 89 c8 mov %ecx,%eax
8: c1 e0 09 shl $0x9,%eax
b: 29 c8 sub %ecx,%eax
d: 8b 4c 24 0c mov 0xc(%esp),%ecx
11: c1 e8 09 shr $0x9,%eax
14: 01 c8 add %ecx,%eax
16: 89 02 mov %eax,(%edx)
After :
00000020 <swrate_add>:
20: 8b 4c 24 04 mov 0x4(%esp),%ecx
24: 8b 44 24 0c mov 0xc(%esp),%eax
28: 8b 11 mov (%ecx),%edx
2a: 01 d0 add %edx,%eax
2c: 81 c2 ff 01 00 00 add $0x1ff,%edx
32: c1 ea 09 shr $0x9,%edx
35: 29 d0 sub %edx,%eax
37: 89 01 mov %eax,(%ecx)
This fix may be backported to 1.6.
When dealing with many proxies, it's hard to spot response errors because
all internet-facing frontends constantly receive attacks. This patch now
makes it possible to demand that only request or response errors are dumped
by appending "request" or "reponse" to the show errors command.
The function log format emit its own error message using Alert(). This
patch replaces this behavior and uses the standard HAProxy error system
(with memprintf).
The benefits are:
- cleaning the log system
- the logformat can ignore the caller (actually the caller must set
a flag designing the caller function).
- Make the usage of the logformat function easy for future components.
Commit 1866d6d ("MEDIUM: ssl: Add support for OpenSSL 1.1.0")
introduced support for openssl 1.1.0 and temporarily broke 0.9.8.
In the end the port was not very hard given that the only cause of
build failures were functions supposedly absent from 0.9.8 that in
fact did exist.
Thus, adding a new #if to move these functions for versions older
than 0.9.8 was enough to fix the trouble. It received very light
testing, basically only an SSL bridge decrypting and re-encrypting
traffic, and checking that everything looks right. That said, the
functions specific to 0.9.8 here compared to 1.0.x are only
SSL_SESSION_set1_id_context(), EVP_PKEY_base_id(), and
X509_PUBKEY_get0_param().
Until now, the function parse_logformat_string() never fails. It
send warnings when it parses bad format, and returns expression in
best effort.
This patch replaces warnings by alert and returns a fail code.
Maybe the warning mode is designed for a compatibility with old
configuration versions. If it is the case, now this compatibility
is broken.
[wt: no, the reason is that an alert must cause a startup failure,
but this will be OK with next patch]
The log-format function parse_logformat_string() takes file and line
for building parsing logs. These two parameters are embedded in the
struct proxy curproxy, which is the current parsing context.
This patch removes these two unused arguments.
Remove export of the fucntion parse_logformat_var_args() and
parse_logformat_var(). These functions are a part of the
logformat parser, and this export is useless.
We get this when Lua is disabled, just a missing include.
In file included from src/queue.c:18:0:
include/proto/server.h:51:39: warning: 'struct appctx' declared inside parameter list [enabled by default]
This way we don't have any more state specific to a given yieldable
command. The other commands should be easier to move as they only
involve a parser.
It really belongs to proto_http.c since it's a dump for HTTP request
and response errors. Note that it's possible that some parts do not
need to be exported anymore since it really is the only place where
errors are manipulated.
The table dump code was a horrible mess, with common parts interleaved
all the way to deal with the various actions (set/clear/show). A few
error messages were still incorrect, as the "set" operation did not
update them so they would still report "unknown action" (now fixed).
The action was now passed as a private argument to the CLI keyword
which itself is copied into the appctx private field. It's just an
int cast to a pointer.
Some minor issues were noticed while doing this, for example when dumping
an entry by key, if the key doesn't exist, nothing is printed, not even
the table's header. It's unclear whether this was intentional but it
doesn't really match what is done for data-based dumps. It was left
unchanged for now so that a later fix can be backported if needed.
Enum entries STAT_CLI_O_TAB, STAT_CLI_O_CLR and STAT_CLI_O_SET were
removed.
Move the "show info" command to stats.c using the CLI keyword API
to register it on the CLI. The stats_dump_info_to_buffer() function
is now static again. Note, we don't need proto_ssl anymore in cli.c.
Move the "show stat" command to stats.c using the CLI keyword API
to register it on the CLI. The stats_dump_stat_to_buffer() function
is now static again.
Move 'show sess' CLI functions to stream.c and use the cli keyword API
to register it on the CLI.
[wt: the choice of stream vs session makes sense because since 1.6 these
really are streams that we're dumping and not sessions anymore]
Several CLI commands require a frontend, so let's have a function to
look this one up and prepare the appropriate error message and the
appctx's state in case of failure.
Several CLI commands require a server, so let's have a function to
look this one up and prepare the appropriate error message and the
appctx's state in case of failure.
Move map and acl CLI functions to map.c and use the cli keyword API to
register actions on the CLI. Then remove the now unused individual
"add" and "del" keywords.
proto/dumpstats.h has been split in 4 files:
* proto/cli.h contains protypes for the CLI
* proto/stats.h contains prototypes for the stats
* types/cli.h contains definition for the CLI
* types/stats.h contains definition for the stats
These functions will be needed by "show sess" on the CLI, let's make them
globally available. It's important to note that due to the fact that we
still do not set the data and transport layers' names in the structures,
we still have to rely on some exports just to match the pointers. This is
ugly but is preferable to adding many includes since the short-term goal
is to get rid of these tests by having proper names in place.
uint16_t instead of u_int16_t
None ISO fields of struct tm are not present, but
by zeroyfing it, on GNU and BSD systems tm_gmtoff
field will be set.
[wt: moved the memset into each of the date functions]
Setting an FD to -1 when closed isn't the most easily noticeable thing
to do when we're chasing accidental reuse of a stale file descriptor.
Instead set it to that large a negative value that it will overflow the
fdtab and provide an analysable core at the moment the issue happens.
Care was taken to ensure it doesn't overflow nor change sign on 32-bit
machines when multiplied by fdtab, and that it also remains negative for
the various checks that exist. The value equals 0xFDDEADFD which happens
to be easily spotted in a debugger.
The bug described in commit 568743a ("BUG/MEDIUM: stream-int: completely
detach connection on connect error") was not a stream-interface layer bug
but a connection layer bug. There was exactly one place in the code where
we could change a file descriptor's status without first checking whether
it is valid or not, it was in conn_stop_polling(). This one is called when
the polling status is changed after an update, and calls fd_stop_both even
if we had already closed the file descriptor :
1479388298.484240 ->->->->-> conn_fd_handler > conn_cond_update_polling
1479388298.484240 ->->->->->-> conn_cond_update_polling > conn_stop_polling
1479388298.484241 ->->->->->->-> conn_stop_polling > conn_ctrl_ready
1479388298.484241 conn_stop_polling < conn_ctrl_ready
1479388298.484241 ->->->->->->-> conn_stop_polling > fd_stop_both
1479388298.484242 ->->->->->->->-> fd_stop_both > fd_update_cache
1479388298.484242 ->->->->->->->->-> fd_update_cache > fd_release_cache_entry
1479388298.484242 fd_update_cache < fd_release_cache_entry
1479388298.484243 fd_stop_both < fd_update_cache
1479388298.484243 conn_stop_polling < fd_stop_both
1479388298.484243 conn_cond_update_polling < conn_stop_polling
1479388298.484243 conn_fd_handler < conn_cond_update_polling
The problem with the previous fix above is that it break the http_proxy mode
and possibly even some Lua parts and peers to a certain extent ; all outgoing
connections where the target address is initially copied into the outgoing
connection which experience a retry would use a random outgoing address after
the retry because closing and detaching the connection causes the target
address to be lost. This was attempted to be addressed by commit 0857d7a
("BUG/MAJOR: stream: properly mark the server address as unset on connect
retry") but it used to only solve the most visible effect and not the root
cause.
Prior to this fix, it was possible to cause this config to keep CLOSE_WAIT
for as long as it takes to expire a client or server timeout (note the
missing client timeout) :
listen test
mode http
bind :8002
server s1 127.0.0.1:8001
$ tcploop 8001 L0 W N20 A R P100 S:"HTTP/1.1 200 OK\r\nContent-length: 0\r\n\r\n" &
$ tcploop 8002 N200 C T W S:"GET / HTTP/1.0\r\n\r\n" O P10000 K
With this patch, these CLOSE_WAIT properly vanish when both processes leave.
This commit reverts the two fixes above and replaces them with the proper
fix in connection.h. It must be backported to 1.6 and 1.5. Thanks to
Robson Roberto Souza Peixoto for providing very detailed traces showing
some obvious inconsistencies leading to finding this bug.
This pointer will be used for storing private context. With this,
the same executed function can handle more than one keyword. This
will be very useful for creation Lua cli bindings.
The release function is called when the command is terminated (give
back the hand to the prompt) or when the session is broken (timeout
or client closed).
Commit d7c9196 ("MAJOR: filters: Add filters support") removed sample.h
from proto_http.h, but it has become necessary as of commit fd7edd3
("MINOR: Move http method enum from proto_http to sample") in order
to have HTTP_METH_*. Due to this, the "debug/flags" utility doesn't
build anymore.
A new "option spop-check" statement has been added to enable server health
checks based on SPOP HELLO handshake. SPOP is the protocol used by SPOE filters
to talk to servers.
SPOE makes possible the communication with external components to retrieve some
info using an in-house binary protocol, the Stream Processing Offload Protocol
(SPOP). In the long term, its aim is to allow any kind of offloading on the
streams. This first version, besides being experimental, won't do lot of
things. The most important today is to validate the protocol design and lay the
foundations of what will, one day, be a full offload engine for the stream
processing.
So, for now, the SPOE can offload the stream processing before "tcp-request
content", "tcp-response content", "http-request" and "http-response" rules. And
it only supports variables creation/suppression. But, in spite of these limited
features, we can easily imagine to implement a SSO solution, an ip reputation
service or an ip geolocation service.
Internally, the SPOE is implemented as a filter. So, to use it, you must use
following line in a proxy proxy section:
frontend my-front
...
filter spoe [engine <name>] config <file>
...
It uses its own configuration file to keep the HAProxy configuration clean. It
is also a easy way to disable it by commenting out the filter line.
See "doc/SPOE.txt" for all details about the SPOE configuration.
It does the opposite of 'set-var' action/converter. It is really useful for
per-process variables. But, it can be used for any scope.
The lua function 'unset_var' has also been added.
Now it is possible to use variables attached to a process. The scope name is
'proc'. These variables are released only when HAProxy is stopped.
'tune.vars.proc-max-size' directive has been added to confiure the maximum
amount of memory used by "proc" variables. And because memory accounting is
hierachical for variables, memory for "proc" vars includes memory for "sess"
vars.
This function, unsurprisingly, sets a variable value only if it already
exists. In other words, this function will succeed only if the variable was
found somewhere in the configuration during HAProxy startup.
It will be used by SPOE filter. So an agent will be able to set a value only for
existing variables. This prevents an agent to create a very large number of
unused variables to flood HAProxy and exhaust the memory reserved to variables..
This code has been moved from haproxy.c to sample.c and the function
release_sample_expr can now be called from anywhere to release a sample
expression. This function will be used by the stream processing offload engine
(SPOE).
A scope is a section name between square bracket, alone on its line, ie:
[scope-name]
...
The spaces at the beginning and at the end of the line are skipped. Comments at
the end of the line are also skipped.
When a scope is parsed, its name is saved in the global variable
cfg_scope. Initially, cfg_scope is NULL and it remains NULL until a valid scope
line is parsed.
This feature remains unused in the HAProxy configuration file and
undocumented. However, it will be used during SPOE configuration parsing.
This feature will be used by the stream processing offload engine (SPOE) to
parse dedicated configuration files without mixing HAProxy sections with SPOE
sections.
So, here we can back up all sections known by HAProxy, unregister all of them
and add new ones, dedicted to the SPOE. Once the SPOE configuration file parsed,
we can roll back all changes by restoring HAProxy sections.
New callbacks have been added to handle creation and destruction of filter
instances:
* 'attach' callback is called after a filter instance creation, when it is
attached to a stream. This happens when the stream is started for filters
defined on the stream's frontend and when the backend is set for filters
declared on the stream's backend. It is possible to ignore the filter, if
needed, by returning 0. This could be useful to have conditional filtering.
* 'detach' callback is called when a filter instance is detached from a stream,
before its destruction. This happens when the stream is stopped for filters
defined on the stream's frontend and when the analyze ends for filters defined
on the stream's backend.
In addition, the callback 'stream_set_backend' has been added to know when a
backend is set for a stream. It is only called when the frontend and the backend
are not the same. And it is called for all filters attached to a stream
(frontend and backend).
Finally, the TRACE filter has been updated.
It is very common when validating a configuration out of production not to
have access to the same resolvers and to fail on server address resolution,
making it difficult to test a configuration. This option simply appends the
"none" method to the list of address resolution methods for all servers,
ensuring that even if the libc fails to resolve an address, the startup
sequence is not interrupted.
This will allow a server to automatically fall back to an explicit numeric
IP address when all other methods fail. The address is simply specified in
the address list.
This new setting supports a comma-delimited list of methods used to
resolve the server's FQDN to an IP address. Currently supported methods
are "libc" (use the regular libc's resolver) and "last" (use the last
known valid address found in the state file).
The list is implemented in a 32-bit integer, because each init-addr
method only requires 3 bits. The last one must always be SRV_IADDR_END
(0), allowing to store up to 10 methods in a single 32 bit integer.
Note: the doc is provided at the end of this series.
This adds new "hold" timers : nx, refused, timeout, other. This timers
will be used to tell HAProxy to keep an erroneous response as valid for
the corresponding period. For now they're only configured, not enforced.
It will be important to help debugging some DNS resolution issues to
know why a server was marked down, so let's make the function support
a 3rd argument with an indication of the reason. Passing NULL will keep
the message as-is.
This flag has to be set when an IP address resolution fails (either
using libc at start up or using HAProxy's runtime resolver). This will
automatically trigger the administrative status "MAINT", through the
global mask SRV_ADMF_MAINT.
Server addresses are not resolved anymore upon the first pass so that we
don't fail if an address cannot be resolved by the libc. Instead they are
processed all at once after the configuration is fully loaded, by the new
function srv_init_addr(). This function only acts on the server's address
if this address uses an FQDN, which appears in server->hostname.
For now the function does two things, to followup with HAProxy's historical
default behavior:
1. apply server IP address found in server-state file if runtime DNS
resolution is enabled for this server
2. use the DNS resolver provided by the libc
If none of the 2 options above can find an IP address, then an error is
returned.
All of this will be needed to support the new server parameter "init-addr".
For now, the biggest user-visible change is that all server resolution errors
are dumped at once instead of causing a startup failure one by one.
In the last release a lot of the structures have become opaque for an
end user. This means the code using these needs to be changed to use the
proper functions to interact with these structures instead of trying to
manipulate them directly.
This does not fix any deprecations yet that are part of 1.1.0, it only
ensures that it can be compiled against that version and is still
compatible with older ones.
[wt: openssl-0.9.8 doesn't build with it, there are conflicts on certain
function prototypes which we declare as inline here and which are
defined differently there. But openssl-0.9.8 is not supported anymore
so probably it's OK to go without it for now and we'll see later if
some users still need it. Emeric has reviewed this change and didn't
spot anything obvious which requires special care. Let's try it for
real now]
The only reason wurfl/wurfl.h was needed outside of wurfl.c was to expose
wurfl_handle which is a pointer to a structure, referenced by global.h.
By just storing a void* there instead, we can confine all wurfl code to
wurfl.c, which is really nice.
WURFL is a high-performance and low-memory footprint mobile device
detection software component that can quickly and accurately detect
over 500 capabilities of visiting devices. It can differentiate between
portable mobile devices, desktop devices, SmartTVs and any other types
of devices on which a web browser can be installed.
In order to add WURFL device detection support, you would need to
download Scientiamobile InFuze C API and install it on your system.
Refer to www.scientiamobile.com to obtain a valid InFuze license.
Any useful information on how to configure HAProxy working with WURFL
may be found in:
doc/WURFL-device-detection.txt
doc/configuration.txt
examples/wurfl-example.cfg
Please find more information about WURFL device detection API detection
at https://docs.scientiamobile.com/documentation/infuze/infuze-c-api-user-guide
Right now there is an issue with the way the maintenance flags are
propagated upon startup. They are not propagate, just copied from the
tracked server. This implies that depending on the server's order, some
tracking servers may not be marked down. For example this configuration
does not work as expected :
server s1 1.1.1.1:8000 track s2
server s2 1.1.1.1:8000 track s3
server s3 1.1.1.1:8000 track s4
server s4 wtap:8000 check inter 1s disabled
It results in s1/s2 being up, and s3/s4 being down, while all of them
should be down.
The only clean way to process this is to run through all "root" servers
(those not tracking any other server), and to propagate their state down
to all their trackers. This is the same algorithm used to propagate the
state changes. It has to be done both to compute the IDRAIN flag and the
IMAINT flag. However, doing so requires that tracking servers are not
marked as inherited maintenance anymore while parsing the configuration
(and given that it is wrong, better drop it).
This fix also addresses another side effect of the bug above which is
that the IDRAIN/IMAINT flags are stored in the state files, and if
restored while the tracked server doesn't have the equivalent flag,
the servers may end up in a situation where it's impossible to remove
these flags. For example in the configuration above, after removing
"disabled" on server s4, the other servers would have remained down,
and not anymore with this fix. Similarly, the combination of IMAINT
or IDRAIN with their respective forced modes was not accepted on
reload, which is wrong as well.
This bug has been present at least since 1.5, maybe even 1.4 (it came
with tracking support). The fix needs to be backported there, though
the srv-state parts are irrelevant.
This commit relies on previous patch to silence warnings on startup.
We used to have 7 different character classes, each was 256 bytes long,
resulting in almost 2kB being used in the L1 cache. It's as cheap to
test a bit than to check the byte is not null, so let's store a 7-bit
composite value and check for the respective bits there instead.
The executable is now 4 kB smaller and the performance on small
objects increased by about 1% to 222k requests/second with a config
involving 4 http-request rules including 1 header lookup, one header
replacement, and 2 variable assignments.
There's no reason to use the stream anymore, only the appctx should be
used by a peer. This was a leftover from the migration to appctx and it
caused some confusion, so let's totally drop it now. Note that half of
the patch are just comment updates.
For active servers, this is the sum of the eweights of all active
servers before this one in the backend, and
[srv->cumulative_weight .. srv_cumulative_weight + srv_eweight) is a
space occupied by this server in the range [0 .. lbprm.tot_wact), and
likewise for backup servers with tot_wbck. This allows choosing a
server or a range of servers proportional to their weight, by simple
integer comparison.
Signed-off-by: Andrew Rodland <andrewr@vimeo.com>
0 will mean no balancing occurs; otherwise it represents the ratio
between the highest-loaded server and the average load, times 100 (i.e.
a value of 150 means a 1.5x ratio), assuming equal weights.
Signed-off-by: Andrew Rodland <andrewr@vimeo.com>
This commit introduces "tcp-request session" rules. These are very
much like "tcp-request connection" rules except that they're processed
after the handshake, so it is possible to consider SSL information and
addresses rewritten by the proxy protocol header in actions. This is
particularly useful to track proxied sources as this was not possible
before, given that tcp-request content rules are processed after each
HTTP request. Similarly it is possible to assign the proxied source
address or the client's cert to a variable.
This is in order to make integration of tcp-request-session cleaner :
- tcp_exec_req_rules() was renamed tcp_exec_l4_rules()
- LI_O_TCP_RULES was renamed LI_O_TCP_L4_RULES
(LI_O_*'s horrible indent was also fixed and a provision was left
for L5 rules).
These are denied conns. Strangely this wasn't emitted while it used to be
available for a while. It corresponds to the number of connections blocked
by "tcp-request connection reject".
To register a new cli keyword, you need to declare a cli_kw_list
structure in your source file:
static struct cli_kw_list cli_kws = {{ },{
{ { "test", "list", NULL }, "test list : do some tests on the cli", test_parsing, NULL },
{ { NULL }, NULL, NULL, NULL, NULL }
}};
And then register it:
cli_register_kw(&cli_kws);
The first field is an array of 5 elements, where you declare the
keywords combination which will match, it must be ended by a NULL
element.
The second field is used as a usage message, it will appear in the help
of the cli, you can set it to NULL if you don't want to show it, it's a
good idea if you want to overwrite some existing keywords.
The two last fields are callbacks.
The first one is used at parsing time, you can use it to parse the
arguments of your keywords and print small messages. The function must
return 1 in case of a failure, otherwise 0:
#include <proto/dumpstats.h>
static int test_parsing(char **args, struct appctx *appctx)
{
struct chunk out;
if (!*args[2]) {
appctx->ctx.cli.msg = "Error: the 3rd argument is mandatory !";
appctx->st0 = STAT_CLI_PRINT;
return 1;
}
chunk_reset(&trash);
chunk_printf(&trash, "arg[3]: %s\n", args[2]);
chunk_init(&out, NULL, 0);
chunk_dup(&out, &trash);
appctx->ctx.cli.err = out.str;
appctx->st0 = STAT_CLI_PRINT_FREE; /* print and free in the default cli_io_handler */
return 0;
}
The last field is the IO handler callback, it can be set to NULL if you
want to use the default cli_io_handler() otherwise you can write your
own. You can use the private pointer in the appctx if you need to store
a context or some data. stats_dump_sess_to_buffer() is a good example of
IO handler, IO handlers often use the appctx->st2 variable for the state
machine. The handler must return 0 in case it have to be recall later
otherwise 1.
During the stick-table teaching process which occurs at reloading/restart time,
expiration dates of stick-tables entries were not synchronized between peers.
This patch adds two new stick-table messages to provide such a synchronization feature.
As these new messages are not supported by older haproxy peers protocol versions,
this patch increments peers protol version, from 2.0 to 2.1, to help in detecting/supporting
such older peers protocol implementations so that new versions might still be able
to transparently communicate with a newer one.
[wt: technically speaking it would be nice to have this backported into 1.6
as some people who reload often are affected by this design limitation, but
it's not a totally transparent change that may make certain users feel
reluctant to upgrade older versions. Let's let it cook in 1.7 first and
decide later]
With Linux officially introducing SO_REUSEPORT support in 3.9 and
its mainstream adoption we have seen more people running into strange
SO_REUSEPORT related issues (a process management issue turning into
hard to diagnose problems because the kernel load-balances between the
new and an obsolete haproxy instance).
Also some people simply want the guarantee that the bind fails when
the old process is still bound.
This change makes SO_REUSEPORT configurable, introducing the command
line argument "-dR" and the noreuseport configuration directive.
A backport to 1.6 should be considered.
To avoid issues when porting code to some architecture, we need to know
the endianess the structures are currently used.
This patch simply had a short notice before those structures to report
endianess and ease contributor's job.
New DNS response parser function which turn the DNS response from a
network buffer into a DNS structure, much easier for later analysis
by upper layer.
Memory is pre-allocated at start-up in a chunk dedicated to DNS
response store.
New error code to report a wrong number of queries in a DNS response.
struct dns_query_item: describes a DNS query record
struct dns_answer_item: describes a DNS answer record
struct dns_response_packet: describes a DNS response packet
DNS_MIN_RECORD_SIZE: minimal size of a DNS record
DNS_MAX_QUERY_RECORDS: maximum number of query records we allow.
For now, we send one DNS query per request.
DNS_MAX_ANSWER_RECORDS: maximum number of records we may found in a
response
WIP dns: new MAX values
Current implementation of HAProxy's DNS resolution expect only 512 bytes
of data in the response.
Update DNS_MAX_UDP_MESSAGE to match this.
Backport: can be backported to 1.6
This function can replace update_server_addr() where the need to change the
server's port as well as the IP address is required.
It performs some validation before performing each type of change.
Introduction of 3 new server flags to remember if some parameters were set
during configuration parsing.
* SRV_F_CHECKADDR: this server has a check addr configured
* SRV_F_CHECKPORT: this server has a check port configured
* SRV_F_AGENTADDR: this server has a agent addr configured
HAProxy used to deduce port used for health checks when parsing configuration
at startup time.
Because of this way of working, it makes it complicated to change the port at
run time.
The current patch changes this behavior and makes HAProxy to choose the
port used for health checking when preparing the check task itself.
A new type of error is introduced and reported when no port can be found.
There won't be any impact on performance, since the process to find out the
port value is made of a few 'if' statements.
This patch also introduces a new check state CHK_ST_PORT_MISS: this flag is
used to report an error in the case when HAProxy needs to establish a TCP
connection to a server, to perform a health check but no TCP ports can be
found for it.
And last, it also introduces a new stream termination condition:
SF_ERR_CHK_PORT. Purpose of this flag is to report an error in the event when
HAProxy has to run a health check but no port can be found to perform it.
Trie now uses a dataset structure just like Pattern, so this has been
defined in includes/types/global.h for both Pattern and Trie where it
was just Pattern.
In src/51d.c all functions used by the Trie implementation which need a
dataset as an argument now use the global dataset. The
fiftyoneDegreesDestroy method has now been replaced with
fiftyoneDegreesDataSetFree which is common to Pattern and Trie. In
addition, two extra dataset init status' have been added to the switch
statement in init_51degrees.
A few log format fields were declared but never used, so let's drop
them, the whole list is confusing enough already :
LOG_FMT_VARIABLE, LOG_FMT_T, LOG_FMT_CONN, LOG_FMT_QUEUES.
Tq is the time between the instant the connection is accepted and a
complete valid request is received. This time includes the handshake
(SSL / Proxy-Protocol), the idle when the browser does preconnect and
the request reception.
This patch decomposes %Tq in 3 measurements names %Th, %Ti, and %TR
which returns respectively the handshake time, the idle time and the
duration of valid request reception. It also adds %Ta which reports
the request's active time, which is the total time without %Th nor %Ti.
It replaces %Tt as the total time, reporting accurate measurements for
HTTP persistent connections.
%Th is avalaible for TCP and HTTP sessions, %Ti, %TR and %Ta are only
avalaible for HTTP connections.
In addition to this, we have new timestamps %tr, %trg and %trl, which
log the date of start of receipt of the request, respectively in the
default format, in GMT time and in local time (by analogy with %t, %T
and %Tl). All of them are obviously only available for HTTP. These values
are more relevant as they more accurately represent the request date
without being skewed by a browser's preconnect nor a keep-alive idle
time.
The HTTP log format and the CLF log format have been modified to
use %tr, %TR, and %Ta respectively instead of %t, %Tq and %Tt. This
way the default log formats now produce the expected output for users
who don't want to manually fiddle with the log-format directive.
Example with the following log-format :
log-format "%ci:%cp [%tr] %ft %b/%s h=%Th/i=%Ti/R=%TR/w=%Tw/c=%Tc/r=%Tr/a=%Ta/t=%Tt %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r"
The request was sent by hand using "openssl s_client -connect" :
Aug 23 14:43:20 haproxy[25446]: 127.0.0.1:45636 [23/Aug/2016:14:43:20.221] test~ test/test h=6/i=2375/R=261/w=0/c=1/r=0/a=262/t=2643 200 145 - - ---- 1/1/0/0/0 0/0 "GET / HTTP/1.1"
=> 6 ms of SSL handshake, 2375 waiting before sending the first char (in
fact the time to type the first line), 261 ms before the end of the request,
no time spent in queue, 1 ms spend connecting to the server, immediate
response, total active time for this request = 262ms. Total time from accept
to close : 2643 ms.
The timing now decomposes like this :
first request 2nd request
|<-------------------------------->|<-------------- ...
t tr t tr ...
---|----|----|----|----|----|----|----|----|--
: Th Ti TR Tw Tc Tr Td : Ti ...
:<---- Tq ---->: :
:<-------------- Tt -------------->:
:<--------- Ta --------->:
The function ipcpy() simply duplicates the IP address found in one
struct sockaddr_storage into an other struct sockaddr_storage.
It also update the family on the destination structure.
Memory of destination structure must be allocated and cleared by the
caller.
FreeBSD prefers to use IPPROTO_TCP over SOL_TCP, just like it does
with their *_IP counterparts. It's worth noting that there are a few
inconsistencies between SOL_TCP and IPPROTO_TCP in the code, eg on
TCP_QUICKACK. The two values are the same but it's worth applying
what implementations recommend.
No backport is needed, this was uncovered by the recent tcp_info stuff.
Recent commit 93b227d ("MINOR: listener: add the "accept-netscaler-cip"
option to the "bind" keyword") introduced an include of netinet/ip.h
which requires in_systm.h on OpenBSD. No backport is needed.
It is sometimes needed in application server environments to easily tell
if a source is local to the machine or a remote one, without necessarily
knowing all the local addresses (dhcp, vrrp, etc). Similarly in transparent
proxy configurations it is sometimes desired to tell the difference between
local and remote destination addresses.
This patch adds two new sample fetch functions for this :
dst_is_local : boolean
Returns true if the destination address of the incoming connection is local
to the system, or false if the address doesn't exist on the system, meaning
that it was intercepted in transparent mode. It can be useful to apply
certain rules by default to forwarded traffic and other rules to the traffic
targetting the real address of the machine. For example the stats page could
be delivered only on this address, or SSH access could be locally redirected.
Please note that the check involves a few system calls, so it's better to do
it only once per connection.
src_is_local : boolean
Returns true if the source address of the incoming connection is local to the
system, or false if the address doesn't exist on the system, meaning that it
comes from a remote machine. Note that UNIX addresses are considered local.
It can be useful to apply certain access restrictions based on where the
client comes from (eg: require auth or https for remote machines). Please
note that the check involves a few system calls, so it's better to do it only
once per connection.
At some places, smp_dup() is inappropriately called to ensure a modification
is possible while in fact we only need to ensure the sample may be modified
in place. Let's provide smp_is_rw() to check for this capability and
smp_make_rw() to perform the smp_dup() when it is not the case.
Note that smp_is_rw() will also try to add the trailing zero on strings when
needed if possible, to avoid a useless duplication.
These functions ensure that the designated sample is "safe for use",
which means that its size is known, its length is correct regarding its
size, and that strings are properly zero-terminated.
smp_is_safe() only checks (and optionally sets the trailing zero when
needed and possible). smp_make_safe() will call smp_dup() after
smp_is_safe() fails.
Vedran Furac reported a strange problem where the "base" sample fetch
would not always work for tracking purposes.
In fact, it happens that commit bc8c404 ("MAJOR: stick-tables: use sample
types in place of dedicated types") merged in 1.6 exposed a fundamental
bug related to the way samples use chunks as strings. The problem is that
chunks convey a base pointer, a length and an optional size, which may be
zero when unknown or when the chunk is allocated from a read-only location.
The sole purpose of this size is to know whether or not the chunk may be
appended new data. This size cause some semantics issue in the sample,
which has its own SMP_F_CONST flag to indicate read-only contents.
The problem was emphasized by the commit above because it made use of new
calls to smp_dup() to convert a sample to a table key. And since smp_dup()
would only check the SMP_F_CONST flag, it would happily return read-write
samples indicating size=0.
So some tests were added upon smp_dup() return to ensure that the actual
length is smaller than size, but this in fact made things even worse. For
example, the "sni" server directive does some bad stuff on many occasions
because it limits len to size-1 and effectively sets it to -1 and writes
the zero byte before the beginning of the string!
It is therefore obvious that smp_dup() needs to be modified to take this
nature of the chunks into account. It's not enough but is needed. The core
of the problem comes from the fact that smp_dup() is called for 5 distinct
needs which are not always fulfilled :
1) duplicate a sample to keep a copy of it during some operations
2) ensure that the sample is rewritable for a converter like upper()
3) ensure that the sample is terminated with a \0
4) set a correct size on the sample
5) grow the sample in case it was extracted from a partial chunk
Case 1 is not used for now, so we can ignore it. Case 2 indicates the wish
to modify the sample, so its R/O status must be removed if any, but there's
no implied requirement that the chunk becomes larger. Case 3 is used when
the sample has to be made compatible with libc's str* functions. There's no
need to make it R/W nor to duplicate it if it is already correct. Case 4
can happen when the sample's size is required (eg: before performing some
changes that must fit in the buffer). Case 5 is more or less similar but
will happen when the sample by be grown but we want to ensure we're not
bound by the current small size.
So the proposal is to have different functions for various operations. One
will ensure a sample is safe for use with str* functions. Another one will
ensure it may be rewritten in place. And smp_dup() will have to perform an
inconditional duplication to guarantee at least #5 above, and implicitly
all other ones.
This patch only modifies smp_dup() to make the duplication inconditional. It
is enough to fix both the "base" sample fetch and the "sni" server directive,
and all use cases in general though not always optimally. More patches will
follow to address them more optimally and even better than the current
situation (eg: avoid a dup just to add a \0 when possible).
The bug comes from an ambiguous design, so its roots are old. 1.6 is affected
and a backport is needed. In 1.5, the function already existed but was only
used by two converters modifying the data in place, so the bug has no effect
there.
Similar to "escape_chunk", this function tries to prefix all characters
tagged in the <map> with the <escape> character. The specified <string>
contains the input to be escaped.
This enables tracking of sticky counters from current response. The only
difference from "http-request track-sc" is the <key> sample expression
can only make use of samples in response (eg. res.*, status etc.) and
samples below Layer 6.
If an action wrapper stops the processing of the transaction
with a txn_done() function, the return code of the action is
"continue". So the continue can implies the processing of other
like adding headers. However, the HTTP content is flushed and
a segfault occurs.
This patchs add a flag indicating that the Lua code want to
stop the processing, ths flags is forwarded to the haproxy core,
and other actions are ignored.
Must be backported in 1.6
The function txn_done() ends a transaction. It does not make
sense to call this function from a lua sample-fetch wrapper,
because the role of a sample-fetch is not to terminate a
transaction.
This patch modify the role of the fucntion txn_done() if it
is called from a sample-fetch wrapper, now it just ends the
execution of the Lua code like the done() function.
Must be backported in 1.6
Alexander Lebedev reported that the response bit is set on SPARC when
DNS queries are sent. This has been tracked to the endianess issue, so
this patch makes the code portable.
Signed-off-by: Nenad Merdanovic <nmerdan@anine.io>
Alexander Lebedev reported that the DNS parser crashes in 1.6 with a bus
error on Sparc when it receives a response. This is obviously caused by
some alignment issues. The issue can also be reproduced on ARMv5 when
setting /proc/cpu/alignment to 4 (which helps debugging).
Two places cause this crash in turn, the first one is when the IP address
from the packet is compared to the current one, and the second place is
when the address is assigned because an unaligned address is passed to
update_server_addr().
This patch modifies these places to properly use memcpy() and memcmp()
to manipulate the unaligned data.
Nenad Merdanovic found another set of places specific to 1.7 in functions
in_net_ipv4() and in_net_ipv6(), which are used to compare networks. 1.6
has the functions but does not use them. There we perform a temporary copy
to a local variable to fix the problem. The type of the function's argument
is wrong since it's not necessarily aligned, so we change it for a const
void * instead.
This fix must be backported to 1.6. Note that in 1.6 the code is slightly
different, there's no rec[] array, the pointer is used directly from the
buffer.
Originally, tcphdr's source and dest from Linux were used to get the
source and port which led to a build issue on BSD oses.
To avoid side problems related to network then we just use an internal
struct as we need only those two fields.
When realloc fails to allocate memory, the original pointer is not
freed. Sometime people override the original pointer with the pointer
returned by realloc which is NULL in case of failure. This results
in a memory leak because the memory pointed by the original pointer
cannot be freed.
This configures the client-facing connection to receive a NetScaler
Client IP insertion protocol header before any byte is read from the
socket. This is equivalent to having the "accept-netscaler-cip" keyword
on the "bind" line, except that using the TCP rule allows the PROXY
protocol to be accepted only for certain IP address ranges using an ACL.
This is convenient when multiple layers of load balancers are passed
through by traffic coming from public hosts.
When NetScaler application switch is used as L3+ switch, informations
regarding the original IP and TCP headers are lost as a new TCP
connection is created between the NetScaler and the backend server.
NetScaler provides a feature to insert in the TCP data the original data
that can then be consumed by the backend server.
Specifications and documentations from NetScaler:
https://support.citrix.com/article/CTX205670https://www.citrix.com/blogs/2016/04/25/how-to-enable-client-ip-in-tcpip-option-of-netscaler/
When CIP is enabled on the NetScaler, then a TCP packet is inserted just after
the TCP handshake. This is composed as:
- CIP magic number : 4 bytes
Both sender and receiver have to agree on a magic number so that
they both handle the incoming data as a NetScaler Client IP insertion
packet.
- Header length : 4 bytes
Defines the length on the remaining data.
- IP header : >= 20 bytes if IPv4, 40 bytes if IPv6
Contains the header of the last IP packet sent by the client during TCP
handshake.
- TCP header : >= 20 bytes
Contains the header of the last TCP packet sent by the client during TCP
handshake.
The previous dump algorithm was not trying to yield when the buffer is
full, it's not a problem with the TLS_TICKETS_NO which is 3 by default
but it can become one if the buffer size is lowered and if the
TLS_TICKETS_NO is increased.
The index of the latest ticket dumped is now stored to ensure we can
resume the dump after a yield.
The function stats_tlskeys_list() can meet an undefined behavior when
called with appctx->st2 == STAT_ST_LIST, indeed the ref pointer is used
uninitialized.
However this function was using NULL in appctx->ctx.tlskeys.ref as a
flag to dump every tickets from every references. A real flag
appctx->ctx.tlskeys.dump_all is now used for this behavior.
This patch delete the 'ref' variable and use appctx->ctx.tlskeys.ref
directly.
The 'set-src' action was not available for tcp actions The action code
has been converted into a function in proto_tcp.c to be used for both
'http-request' and 'tcp-request connection' actions.
Both http and tcp keywords are registered in proto_tcp.c
htonll()/ntohll() already exist on Solaris 11 with a different declaration,
causing a build error as reported by Jonathan Fisher. They used to exist on
OSX with a #define which allowed us to detect them. It was a bad idea to give
these functions a name subject to conflicts like this. Simply rename them
my_htonll()/my_ntohll() to definitely get rid of the conflict.
This patch must be backported to 1.6.
DNS requests (using the internal resolver) are corrupted since commit
e2f8497716 ("BUG/MINOR: dns: fix DNS header definition").
Fix it by defining the struct in network byte order, while complying
with RFC 2535, section 6.1.
First reported by Eduard Vopicka on discourse.
This must be backported to 1.6 (1.6.5 is affected).
Commit 108b1dd ("MEDIUM: http: configurable http result codes for
http-request deny") introduced in 1.6-dev2 was incomplete. It introduced
a new field "rule_deny_status" into struct http_txn, which is filled only
by actions "http-request deny" and "http-request tarpit". It's then used
in the deny code path to emit the proper error message, but is used
uninitialized when the deny comes from a "reqdeny" rule, causing random
behaviours ranging from returning a 200, an empty response, or crashing
the process. Often upon startup only 200 was returned but after the fields
are used the crash happens. This can be sped up using -dM.
There's no need at all for storing this status in the http_txn struct
anyway since it's used immediately after being set. Let's store it in
a temporary variable instead which is passed as an argument to function
http_req_get_intercept_rule().
As an extra benefit, removing it from struct http_txn reduced the size
of this struct by 8 bytes.
This fix must be backported to 1.6 where the bug was detected. Special
thanks to Falco Schmutz for his detailed report including an exploitable
core and a reproducer.
When compiled with GCC 6, the IP address specified for a frontend was
ignored and HAProxy was listening on all addresses instead. This is
caused by an incomplete copy of a "struct sockaddr_storage".
With the GNU Libc, "struct sockaddr_storage" is defined as this:
struct sockaddr_storage
{
sa_family_t ss_family;
unsigned long int __ss_align;
char __ss_padding[(128 - (2 * sizeof (unsigned long int)))];
};
Doing an aggregate copy (ss1 = ss2) is different than using memcpy():
only members of the aggregate have to be copied. Notably, padding can be
or not be copied. In GCC 6, some optimizations use this fact and if a
"struct sockaddr_storage" contains a "struct sockaddr_in", the port and
the address are part of the padding (between sa_family and __ss_align)
and can be not copied over.
Therefore, we replace any aggregate copy by a memcpy(). There is another
place using the same pattern. We also fix a function receiving a "struct
sockaddr_storage" by copy instead of by reference. Since it only needs a
read-only copy, the function is converted to request a reference.
'channel_analyze' callback has been removed. Now, there are 2 callbacks to
surround calls to analyzers:
* channel_pre_analyze: Called BEFORE all filterable analyzers. it can be
called many times for the same analyzer, once at each loop until the
analyzer finishes its processing. This callback is resumable, it returns a
negative value if an error occurs, 0 if it needs to wait, any other value
otherwise.
* channel_post_analyze: Called AFTER all filterable analyzers. Here, AFTER
means when an analyzer finishes its processing. This callback is NOT
resumable, it returns a negative value if an error occurs, any other value
otherwise.
Pre and post analyzer callbacks are not automatically called. 'pre_analyzers'
and 'post_analyzers' bit fields in the filter structure must be set to the right
value using AN_* flags (see include/types/channel.h).
The flag AN_RES_ALL has been added (AN_REQ_ALL already exists) to ease the life
of filter developers. AN_REQ_ALL and AN_RES_ALL include all filterable
analyzers.
Now, to call an analyzer in 'process_stream' function, we should use
FLT_ANALAYZE or ANALYZE macros, depending if this is a filterable analyzer or
not.
Instead of calling 'channel_analyze' callback with the flag AN_FLT_HTTP_HDRS,
now we use the new callback 'http_headers'. This change is done because
'channel_analyze' callback will be removed in a next commit.
As suggested by Pavlos, it's too bad that we didn't have a %Td log
format tag given that there are a few mentions of Td corresponding
to the data transmission time already in the doc, so this is now done.
Just like the other specifiers, we report -1 if the connection failed
before reaching the data transmission state.
int list_append_word(struct list *li, const char *str, char **err)
Append a copy of string <str> (inside a wordlist) at the end of
the list <li>.
The caller is responsible for freeing the <err> and <str> copy memory
area using free().
On failure : return 0 and <err> filled with an error message.
Conforming to RFC 2535, section 6.1. This is not an important bug as
those fields don't seem to be set to something else than 0 and to be
checked on answers.
This is the same issue as "show servers state", where the result is incorrect
it the data can't fit in one buffer. The similar fix is applied, to restart
the data processing where it stopped as buffers are sent to the client.
This fix should be backported to haproxy 1.6
It was reported that the unix socket command "show servers state" returned an
empty response while "show servers state <backend>" worked.
In fact, both cases can reproduce the issue. It happens when the response can't
fit in one buffer.
The fix consists in processing the response in several steps, as it is done in
some others commands, by restarting where it was stopped after the buffer is
sent to the client.
This fix should be backported to haproxy 1.6
In 1.4-dev3, commit 31971e5 ("[MEDIUM] add support for infinite forwarding")
made it possible to configure the lower layer to forward data indefinitely
by setting the forward size to CHN_INFINITE_FORWARD (4GB-1). By then larger
chunk sizes were not supported so there was no confusion in the usage of the
function.
Since 1.5 we support 64-bit content-lengths and chunk sizes and the function
has grown to support 64-bit arguments, though it still limits a single pass
to 32-bit quantities (what fit in the channel's to_forward field). The issue
now becomes that a 4GB-1 content-length can be confused with infinite
forwarding (in fact it's 4GB-1+what was already in the buffer). It causes a
visible effect when transferring this exact size because the transfer rate
is lower than with other sizes due in part to the disabling of the Nagle
algorithm on the sendto() call.
In theory with keep-alive it should prevent a second request from being
processed after such a transfer, but since the analysers are still present,
the forwarding analyser properly counts down the remaining size to transfer
and ultimately the transaction gets correctly reset so there is no visible
effect.
Since the root cause of the issue is an API problem (lack of distinction
between a real valid length and a magic value), this patch modifies the API
to have a new dedicated function called channel_forward_forever() to program
a permanent forwarding. The existing function __channel_forward() was modified
to properly take care of the requested sizes and ensure it 1) never overflows
and 2) never reaches CHN_INFINITE_FORWARD by accident.
It is worth noting that the function used to have a bug causing a 2GB
forward to be scheduled if it was called with less data than what is present
in buf->i. Fortunately this bug couldn't be triggered with existing code.
This fix should be backported to 1.6 and 1.5. While it also theorically
affects 1.4, it's better not to backport it there, as the risk of breaking
large object transfers due to significant API differences is high, compared
to the fact that the largest supported objects (4GB-1) are just slower to
transfer.