In dumpstats.c we have get_conn_xprt_name() and get_conn_data_name() to
report the name of the data and transport layers used on a connection.
But when the name is not known, its pointer is reported instead. But the
static char used to report the pointer is too small as it doesn't leave
room for '0x'. Fortunately all subsystems are known so we never trigger
this case.
This fix needs to be backported to 1.6 and 1.5.
uint16_t instead of u_int16_t
None ISO fields of struct tm are not present, but
by zeroyfing it, on GNU and BSD systems tm_gmtoff
field will be set.
[wt: moved the memset into each of the date functions]
It defines the variable to set when an error occurred during an event
processing. It will only be set when an error occurred in the scope of the
transaction. As for all other variables define by the SPOE, it will be
prefixed. So, if your variable name is "error" and your prefix is "my_spoe_pfx",
the variable will be "txn.my_spoe_pfx.error".
When set, the variable is the boolean "true". Note that if "option
continue-on-error" is set, the variable is not automatically removed between
events processing.
"maxconnrate" is the maximum number of connections per second. The SPOE will
stop to open new connections if the maximum is reached and will wait to acquire
an existing one.
"maxerrrate" is the maximum number of errors per second. The SPOE will stop its
processing if the maximum is reached.
These options replace hardcoded macros MAX_NEW_SPOE_APPLETS and
MAX_NEW_SPOE_APPLET_ERRS. We use it to limit SPOE activity, especially when
servers are down..
By default, for a specific stream, when an abnormal/unexpected error occurs, the
SPOE is disabled for all the transaction. So if you have several events
configured, such error on an event will disabled all followings. For TCP
streams, this will disable the SPOE for the whole session. For HTTP streams,
this will disable it for the transaction (request and response).
To bypass this behaviour, you can set 'continue-on-error' option in 'spoe-agent'
section. With this option, only the current event will be ignored.
It is a way to set the maximum time to wait for a stream to process an event,
i.e to acquire a stream to talk with an agent, to encode all messages, to send
the NOTIFY frame, to receive the corrsponding acknowledgement and to process all
actions. It is applied on the stream that handle the client and the server
sessions.
When:
- A Lua action return data and close the channel. The request status
is set to HTTP_MSG_CLOSED for the request and HTTP_MSG_DONE for the
response.
- HAProxy sets the state HTTP_MSG_ERROR. I don't known why, because
there are many line which sets this state.
- A Lua sample-fetch is executed, typically for building the log
line.
- When the Lua sample fetch exits, a control of the data is
executed. If HAProxy is currently parsing the request, the request
is aborted in order to prevent a segfault or sending corrupted
data.
This ast control is executed comparing the state HTTP_MSG_BODY. When
this state is reached, the request is parsed and no error are
possible. When the state is < than HTTP_MSG_BODY, the parser is
running.
Unfortunately, the code HTTP_MSG_ERROR is just < HTTP_MSG_BODY. When
we are in error, we want to terminate the execution of Lua without
error.
This patch changes the comparaison level.
This patch must be backported in 1.6
Gernot Pörner reported some constant leak of ref counts for stick tables
entries. It happens that this leak was not at all in the regular traffic
path but on the "show table" path. An extra ref count was taken during
the dump if the output had to be paused, and it was released upon clean
termination or an error detected in the I/O handler. But the release
handler didn't do it, while it used to properly do it for the sessions
dump.
This fix needs to be backported to 1.6.
Commit ef8f4fe ("BUG/MINOR: stick-table: handle out-of-memory condition
gracefully") unfortunately got trapped by a pointer operation. Replacing
ts = poll_alloc() + size;
with :
ts = poll_alloc();
ts += size;
Doesn't give the same result because pool_alloc() is void while ts is a
struct stksess*. So now we don't access the same places, which is visible
in certain stick-table scenarios causing a crash.
This must be backported to 1.6 and 1.5.
Setting an FD to -1 when closed isn't the most easily noticeable thing
to do when we're chasing accidental reuse of a stale file descriptor.
Instead set it to that large a negative value that it will overflow the
fdtab and provide an analysable core at the moment the issue happens.
Care was taken to ensure it doesn't overflow nor change sign on 32-bit
machines when multiplied by fdtab, and that it also remains negative for
the various checks that exist. The value equals 0xFDDEADFD which happens
to be easily spotted in a debugger.
The bug described in commit 568743a ("BUG/MEDIUM: stream-int: completely
detach connection on connect error") was not a stream-interface layer bug
but a connection layer bug. There was exactly one place in the code where
we could change a file descriptor's status without first checking whether
it is valid or not, it was in conn_stop_polling(). This one is called when
the polling status is changed after an update, and calls fd_stop_both even
if we had already closed the file descriptor :
1479388298.484240 ->->->->-> conn_fd_handler > conn_cond_update_polling
1479388298.484240 ->->->->->-> conn_cond_update_polling > conn_stop_polling
1479388298.484241 ->->->->->->-> conn_stop_polling > conn_ctrl_ready
1479388298.484241 conn_stop_polling < conn_ctrl_ready
1479388298.484241 ->->->->->->-> conn_stop_polling > fd_stop_both
1479388298.484242 ->->->->->->->-> fd_stop_both > fd_update_cache
1479388298.484242 ->->->->->->->->-> fd_update_cache > fd_release_cache_entry
1479388298.484242 fd_update_cache < fd_release_cache_entry
1479388298.484243 fd_stop_both < fd_update_cache
1479388298.484243 conn_stop_polling < fd_stop_both
1479388298.484243 conn_cond_update_polling < conn_stop_polling
1479388298.484243 conn_fd_handler < conn_cond_update_polling
The problem with the previous fix above is that it break the http_proxy mode
and possibly even some Lua parts and peers to a certain extent ; all outgoing
connections where the target address is initially copied into the outgoing
connection which experience a retry would use a random outgoing address after
the retry because closing and detaching the connection causes the target
address to be lost. This was attempted to be addressed by commit 0857d7a
("BUG/MAJOR: stream: properly mark the server address as unset on connect
retry") but it used to only solve the most visible effect and not the root
cause.
Prior to this fix, it was possible to cause this config to keep CLOSE_WAIT
for as long as it takes to expire a client or server timeout (note the
missing client timeout) :
listen test
mode http
bind :8002
server s1 127.0.0.1:8001
$ tcploop 8001 L0 W N20 A R P100 S:"HTTP/1.1 200 OK\r\nContent-length: 0\r\n\r\n" &
$ tcploop 8002 N200 C T W S:"GET / HTTP/1.0\r\n\r\n" O P10000 K
With this patch, these CLOSE_WAIT properly vanish when both processes leave.
This commit reverts the two fixes above and replaces them with the proper
fix in connection.h. It must be backported to 1.6 and 1.5. Thanks to
Robson Roberto Souza Peixoto for providing very detailed traces showing
some obvious inconsistencies leading to finding this bug.
This pointer will be used for storing private context. With this,
the same executed function can handle more than one keyword. This
will be very useful for creation Lua cli bindings.
The release function is called when the command is terminated (give
back the hand to the prompt) or when the session is broken (timeout
or client closed).
In case `pool_alloc2()` returns NULL, propagate the condition to the
caller. This could happen when limiting the amount of memory available
for HAProxy with `-m`.
[wt: backport to 1.6 and 1.5 needed]
Before this change, trash is being used to create certificate filename
to read in care Mutli-Cert are in used. But then ssl_sock_load_ocsp()
modify trash leading to potential wrong information given in later error
message.
This also blocks any further use of certificate filename for other
usage, like ongoing patch to support Certificate Transparency handling
in Multi-Cert bundle.
The unlikely macro doesn't take in acount the condition, but only one
variable.
Must be backported in 1.6
[wt: with gcc 3.x, unlikely(x) is defined as __builtin_expect((x) != 0, 0)
so the condition is wrong for negative numbers, which correspond to the
case where bi_getblk_nc() has reached the end of the buffer and the
channel is already closed. With gcc 4.x, the output is cast to unsigned long
so the <=0 will not match negative values either. This is only used in Lua
for now so that may explain why it hasn't hit yet]
A new "option spop-check" statement has been added to enable server health
checks based on SPOP HELLO handshake. SPOP is the protocol used by SPOE filters
to talk to servers.
SPOE makes possible the communication with external components to retrieve some
info using an in-house binary protocol, the Stream Processing Offload Protocol
(SPOP). In the long term, its aim is to allow any kind of offloading on the
streams. This first version, besides being experimental, won't do lot of
things. The most important today is to validate the protocol design and lay the
foundations of what will, one day, be a full offload engine for the stream
processing.
So, for now, the SPOE can offload the stream processing before "tcp-request
content", "tcp-response content", "http-request" and "http-response" rules. And
it only supports variables creation/suppression. But, in spite of these limited
features, we can easily imagine to implement a SSO solution, an ip reputation
service or an ip geolocation service.
Internally, the SPOE is implemented as a filter. So, to use it, you must use
following line in a proxy proxy section:
frontend my-front
...
filter spoe [engine <name>] config <file>
...
It uses its own configuration file to keep the HAProxy configuration clean. It
is also a easy way to disable it by commenting out the filter line.
See "doc/SPOE.txt" for all details about the SPOE configuration.
It does the opposite of 'set-var' action/converter. It is really useful for
per-process variables. But, it can be used for any scope.
The lua function 'unset_var' has also been added.
Now it is possible to use variables attached to a process. The scope name is
'proc'. These variables are released only when HAProxy is stopped.
'tune.vars.proc-max-size' directive has been added to confiure the maximum
amount of memory used by "proc" variables. And because memory accounting is
hierachical for variables, memory for "proc" vars includes memory for "sess"
vars.
This function, unsurprisingly, sets a variable value only if it already
exists. In other words, this function will succeed only if the variable was
found somewhere in the configuration during HAProxy startup.
It will be used by SPOE filter. So an agent will be able to set a value only for
existing variables. This prevents an agent to create a very large number of
unused variables to flood HAProxy and exhaust the memory reserved to variables..
This code has been moved from haproxy.c to sample.c and the function
release_sample_expr can now be called from anywhere to release a sample
expression. This function will be used by the stream processing offload engine
(SPOE).
A scope is a section name between square bracket, alone on its line, ie:
[scope-name]
...
The spaces at the beginning and at the end of the line are skipped. Comments at
the end of the line are also skipped.
When a scope is parsed, its name is saved in the global variable
cfg_scope. Initially, cfg_scope is NULL and it remains NULL until a valid scope
line is parsed.
This feature remains unused in the HAProxy configuration file and
undocumented. However, it will be used during SPOE configuration parsing.
This feature will be used by the stream processing offload engine (SPOE) to
parse dedicated configuration files without mixing HAProxy sections with SPOE
sections.
So, here we can back up all sections known by HAProxy, unregister all of them
and add new ones, dedicted to the SPOE. Once the SPOE configuration file parsed,
we can roll back all changes by restoring HAProxy sections.
Now, for TCP streams, backend filters are released when the stream is
destroyed. But, for HTTP streams, these filters are released when the
transaction analyze ends, in flt_end_analyze callback.
New callbacks have been added to handle creation and destruction of filter
instances:
* 'attach' callback is called after a filter instance creation, when it is
attached to a stream. This happens when the stream is started for filters
defined on the stream's frontend and when the backend is set for filters
declared on the stream's backend. It is possible to ignore the filter, if
needed, by returning 0. This could be useful to have conditional filtering.
* 'detach' callback is called when a filter instance is detached from a stream,
before its destruction. This happens when the stream is stopped for filters
defined on the stream's frontend and when the analyze ends for filters defined
on the stream's backend.
In addition, the callback 'stream_set_backend' has been added to know when a
backend is set for a stream. It is only called when the frontend and the backend
are not the same. And it is called for all filters attached to a stream
(frontend and backend).
Finally, the TRACE filter has been updated.
The 'set-var' converter uses function smp_conv_store (vars.c). In this function,
we should use the first argument (index 0) to retrieve the variable name and its
scope. But because of a typo, we get the scope of the second argument (index
1). In this case, there is no second argument. So the scope used was always 0
(SCOPE_SESS), always setting the variable in the session scope.
So, due to this bug, this rules
tcp-request content accept if { src,set-var(txn.foo) -m found }
always set the variable 'sess.foo' instead of 'txn.foo'.
Now that it is possible to decide whether we prefer to use libc or the
state file to resolve the server's IP address and it is possible to change
a server's IP address at run time on the CLI, let's not restrict the reuse
of the address from the state file anymore to the DNS only.
The impact is that by default the state file will be considered first
(which matches its purpose) and only then the libc. This way any address
change performed at run time over the CLI will be preserved regardless
of DNS usage or not.
It is very common when validating a configuration out of production not to
have access to the same resolvers and to fail on server address resolution,
making it difficult to test a configuration. This option simply appends the
"none" method to the list of address resolution methods for all servers,
ensuring that even if the libc fails to resolve an address, the startup
sequence is not interrupted.
This will allow a server to automatically fall back to an explicit numeric
IP address when all other methods fail. The address is simply specified in
the address list.
Now that we have "init-addr none", it becomes possible to recover on
libc resolver's failures. Thus it's preferable not to alert nor fail
at the moment the libc is called, and instead process the failure at
the end of the list. This allows "none" to be set after libc to
provide a smooth fallback in case of resolver issues.
This new setting supports a comma-delimited list of methods used to
resolve the server's FQDN to an IP address. Currently supported methods
are "libc" (use the regular libc's resolver) and "last" (use the last
known valid address found in the state file).
The list is implemented in a 32-bit integer, because each init-addr
method only requires 3 bits. The last one must always be SRV_IADDR_END
(0), allowing to store up to 10 methods in a single 32 bit integer.
Note: the doc is provided at the end of this series.
The RMAINT state happens when a server doesn't get a valid DNS response
past the hold time. If the address is forced on the CLI, we must use it
and leave the RMAINT state.
WARNING: this is a MAJOR (and disruptive) change with previous HAProxy's
behavior: before, HAProxy never ever used to change a server administrative
status when the DNS resolution failed at run time.
This patch gives HAProxy the ability to change the administrative status
of a server to MAINT (RMAINT actually) when an error is encountered for
a period longer than its own allowed by the corresponding 'hold'
parameter.
IE if the configuration sets "hold nx 10s" and a server's hostname
points to a NX for more than 10s, then the server will be set to RMAINT,
hence in MAINTENANCE mode.