The old module proto_http does not exist anymore. All code dedicated to the HTTP
analysis is now grouped in the file proto_htx.c. So, to finish the polishing
after removing the legacy HTTP code, proto_htx.{c,h} files have been moved in
http_ana.{c,h} files.
In addition, all HTX analyzers and related functions prefixed with "htx_" have
been renamed to start with "http_" instead.
First of all, all legacy HTTP analyzers and all functions exclusively used by
them were removed. So the most of the functions in proto_http.{c,h} were
removed. Only functions to deal with the HTTP transaction have been kept. Then,
http_msg and hdr_idx modules were entirely removed. And finally the structure
http_msg was lightened of all its useless information about the legacy HTTP. The
structure hdr_ctx was also removed because unused now, just like unused states
in the enum h1_state. Note that the memory pool "hdr_idx" was removed and
"http_txn" is now smaller.
Since the legacy HTTP mode is disabled and no multiplexer relies on it anymore,
there is no reason to have 2 multiplexer protocols for the HTTP. So the protocol
PROTO_MODE_HTX was removed and all HTTP multiplexers use now PROTO_MODE_HTTP.
As reported in GH issue #109 and in discourse issue
https://discourse.haproxy.org/t/haproxy-returns-408-or-504-error-when-timeout-client-value-is-every-25d
the time parser doesn't error on overflows nor underflows. This is a
recurring problem which additionally has the bad taste of taking a long
time before hitting the user.
This patch makes parse_time_err() return special error codes for overflows
and underflows, and adds the control in the call places to report suitable
errors depending on the requested unit. In practice, underflows are almost
never returned as the parsing function takes care of rounding values up,
so this might possibly happen on 64-bit overflows returning exactly zero
after rounding though. It is not really possible to cut the patch into
pieces as it changes the function's API, hence all callers.
Tests were run on about every relevant part (cookie maxlife/maxidle,
server inter, stats timeout, timeout*, cli's set timeout command,
tcp-request/response inspect-delay).
Commit fb55365f9 ("MINOR: server: increase the default pool-purge-delay
to 5 seconds") did this but the setting placed in new_server() was
overwritten by srv_settings_cpy() from the default-server values preset
in init_default_instance(). Now let's put it at the right place.
Make usage of the APIs implemented for dictionaries (dict.c) and their LRU caches (struct dcache)
so that to send/receive server names used for the server by name stickiness. These
names are sent over the network as follows:
- in every case we send the encode length of the data (STD_T_DICT), then
- if the server names is not present in the cache used upon transmission (struct dcache_tx)
we cache it and we the ID of this TX cache entry followed the encode length of the
server name, and finally the sever name itseft (non NULL terminated string).
- if the server name is present, we repead these operations but we only send the TX cache
entry ID.
Upon receipt, the couple of (cache IDs, server name) are stored the LRU cache used
only upon receipt (struct dcache_rx). As the peers protocol is symetrical, the fact
that the server name is present in the received data (resp. or not) denotes if
the entry is absent (resp. or not).
When parsing sticking rules, with this patch we reserve some room for the new
"server_name" stick-table data type, as this is already done for "server_id",
setting the offset and used space (in bytes) in the stick-table entry thanks
to stkable_alloc_data_type().
We've been emitting warnings for over 5 years (since 1.5-dev22) about
configs accidently carrying multiple servers with the same name in the
same backend, and this starts to cause some real trouble in dynamic
environments since it's still very difficult to accurately process
a state-file and we still can't transport a server's name over the
peers protocol because of this.
It's about time to force users to fix their configs if they still
hadn't given that there is zero technical justification for doing this,
beyond the "yyp" (or copy-paste accident) when editing the config.
The message remains as clear as before, indicating the file and lines
of the conflict so that the user can easily fix it.
We still have quite a number of build macros which are mapped 1:1 to a
USE_something setting in the makefile but which have a different name.
This patch cleans this up by renaming them to use the USE_something
one, allowing to clean up the makefile and make it more obvious when
reading the code what build option needs to be added.
The following renames were done :
ENABLE_POLL -> USE_POLL
ENABLE_EPOLL -> USE_EPOLL
ENABLE_KQUEUE -> USE_KQUEUE
ENABLE_EVPORTS -> USE_EVPORTS
TPROXY -> USE_TPROXY
NETFILTER -> USE_NETFILTER
NEED_CRYPT_H -> USE_CRYPT_H
CONFIG_HAP_CRYPT -> USE_LIBCRYPT
CONFIG_HAP_NS -> DUSE_NS
CONFIG_HAP_LINUX_SPLICE -> USE_LINUX_SPLICE
CONFIG_HAP_LINUX_TPROXY -> USE_LINUX_TPROXY
CONFIG_HAP_LINUX_VSYSCALL -> USE_LINUX_VSYSCALL
This directive never appeared in a stable release and instead was
introduced and deprecated within 1.8-dev. While it technically could
be outright removed we detect it and error out for good measure.
cfg_parse_peers previously leaked the contents of the `kws` string,
as it was unconditionally filled using bind_dump_kws, but only used
(and freed) within the error case.
Move the dumping into the error case to:
1. Ensure that the registered keywords are actually printed as least once.
2. The contents of kws are not leaked.
This move allows to narrow the scope of `kws`, so this is done as well.
This bug was found using valgrind:
==28217== 590 bytes in 1 blocks are definitely lost in loss record 51 of 71
==28217== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==28217== by 0x4AD4C7: indent_msg (standard.c:3676)
==28217== by 0x47E962: cfg_parse_peers (cfgparse.c:700)
==28217== by 0x480273: readcfgfile (cfgparse.c:2147)
==28217== by 0x479D51: init (haproxy.c:1585)
==28217== by 0x404A02: main (haproxy.c:2585)
with this super simple configuration:
peers peers
bind :8081
server A
This bug exists since the introduction of cfg_parse_peers in commit
355b2033ec (which was introduced for HAProxy
2.0, but marked as backportable). It should be backported to all branches
containing that commit.
In commit 1b8e68e ("MEDIUM: stick-table: Stop handling stick-tables as
proxies."), the ->table member of proxy struct was replaced by a pointer
that is not always checked and in some situations can cause a segfault,
eg. during reload or while using "show table" on CLI socket.
No backport is needed.
This commit "MINOR: stick-table: Add prefixes to stick-table names"
prepended the "peers" section name to stick-table names declared in such "peers"
sections followed by a '/' character. This is not this name which must be sent
over the network to avoid collisions with stick-table name declared as backends.
As the '/' character is forbidden as first character of a backend name, we prefix
the stick-table names declared in peers sections only with a '/' character.
With such declarations:
peers mypeers
table t1
backend t1
stick-table ... peers mypeers
at peer protocol level, "t1" declared as stick-table in "mypeers" section is different
of "t1" stick-table declared as backend.
In src/peers.c, only two modifications were required: use ->nid stktable struct
member in place of ->id in peer_prepare_switchmsg() to prepare the stick-table
definition messages. Same thing in peer_treat_definemsg() to treat a stick-table
definition messages.
With this patch we add a prefix to stick-table names declared in "peers" sections
concatenating the "peers" section name followed by a '/' character with
the stick-table name. Consequently, "peers" sections have their own
namespace for their stick-tables. Obviously, these stick-table names are not the
ones which should be sent over the network. So these configurations must be
compatible and should make A and B peers communicate with peers protocol:
# haproxy A config, old way stick-table declerations
peers mypeers
peer A ...
peer B ...
backend t1
stick-table type string size 10m store gpc0 peers mypeers
# haproxy B config, new way stick-table declerations
peers mypeers
peer A ...
peer B ...
table t1 type string size store gpc0 10m
This "network" name is stored in ->nid new field of stktable struct. The "local"
stktable-name is still stored in ->id.
Add a list of proxies for all the stick-tables (->proxies_list struct stktable
member) so that to be able to compute the process bindings of the peers after having
parsed the configuration file.
The proxies are added to the stick-tables they reference when parsing
stick-tables lines in proxy sections, when checking the actions in
check_trk_action() and when resolving samples args for stick-tables
without checking is they are duplicates. We check only there is no loop.
Then, after having parsed everything, we add the proxy bindings to the
peers frontend bindings with stick-tables they reference.
This patch adds the support for the "table" line parsing in "peers" sections
to declare stick-table in such sections. This also prevents the user from having
to declare dummy backends sections with a unique stick-table inside.
Even if still supported, this usage will become deprecated.
To do so, the ->table member of proxy struct which is a stktable struct is replaced
by a pointer to a stktable struct allocated at parsing time in src/cfgparse-listen.c
for the dummy stick-table backends and in src/cfgparse.c for "peers" sections.
This has an impact on the code for stick-table sample converters and on the stickiness
rules parsers which first store the name of the dummy before resolving the rules.
This patch replaces proxy_tbl_by_name() calls by stktable_find_by_name() calls
to lookup for stick-tables stored in "stktable_by_name" ebtree at parsing time.
There is only one remaining place where proxy_tbl_by_name() is used: src/hlua.c.
At several places in the code we relied on the fact that ->size member of stick-table
was equal to zero to consider the stick-table was present by not configured,
this do not make sense anymore as ->table member of struct proxyis fow now on a pointer.
These tests are replaced by a test on ->table value itself.
In "peers" section we do not have to temporary store the name of the section the
stick-table are attached to because this name is obviously already known just after
having entered this "peers" section.
About the CLI stick-table I/O handler, the pointer to proxy struct is replaced by
a pointer to a stktable struct.
This implements support for the new API which relies on a call to
setsockopt().
On systems that support it (currently, only Linux >= 4.11), this enables
using TCP fast open when connecting to server.
Please note that you should use the retry-on "conn-failure", "empty-response"
and "response-timeout" keywords, or the request won't be able to be retried
on failure.
Co-authored-by: Olivier Houchard <ohouchard@haproxy.com>
When running in HTX mode, if we sent the request, but failed to get the
answer, either because the server just closed its socket, we hit a server
timeout, or we get a 404, 408, 425, 500, 501, 502, 503 or 504 error,
attempt to retry the request, exactly as if we just failed to connect to
the server.
To do so, add a new backend keyword, "retry-on".
It accepts a list of keywords, which can be "none" (never retry),
"conn-failure" (we failed to connect, or to do the SSL handshake),
"empty-response" (the server closed the connection without answering),
"response-timeout" (we timed out while waiting for the server response),
or "404", "408", "425", "500", "501", "502", "503" and "504".
The default is "conn-failure".
This patch only removes a useless calculation on listener->maxaccept when nbproc
is set to 1. Indeed, the following formula has no effet in such case:
listener->maxaccept = (listener->maxaccept + nbproc - 1) / nbproc;
This patch may be backported as far as 1.5.
When tests are performed on the HTX mode during HAProxy startup, only HTTP
proxies are considered. It is important because, since the commit 1d2b586cd
("MAJOR: htx: Enable the HTX mode by default for all proxies"), the HTX is
enabled on all proxies by default. But for TCP proxies, it is "deactivated".
This patch must be backported to 1.9.
The option http-tunnel disables any HTTP processing past the first
transaction. In HTX, it works for full h1 transactions. As for the legacy HTTP,
it is a workaround, but it works. But it is impossible to make it works with an
h2 connection. In such case, it has no effect, the stream is closed at the end
of the transaction. So to avoid any inconsistancies between h1 and h2
connections, this option is now always ignored when the HTX is enabled. It is
also a good opportinity to deprecate an old and ugly option. A warning is
emitted during HAProxy startup to encourage users to remove this option.
Note that in legacy HTTP, this option only works with full h1 transactions
too. If an h2 connection is established on a frontend with this option enabled,
it will have no effect at all. But we keep it for the legacy HTTP for
compatibility purpose. It will be removed with the legacy HTTP.
So to be short, if you have to really (REALLY) use it, it will only work for
legacy HTTP frontends with H1 clients.
The documentation has been updated accordingly.
This patch must be backported to 1.9. It is not strictly speaking required but
it will ease futur backports.
Now that the P2C algorithm for the accept queue is removed, we don't
need to map a number to a thread bit anymore, so let's remove all
these fields which are taking quite some space for no reason.
Historically the default frontend's maxconn used to be quite low (2000),
which was sufficient two decades ago but often proved to be a problem
when users had purposely set the global maxconn value but forgot to set
the frontend's.
There is no point in keeping this arbitrary limit for frontends : when
the global maxconn is lower, it's already too high and when the global
maxconn is much higher, it becomes a limiting factor which causes trouble
in production.
This commit allows the value to be set to zero, which becomes the new
default value, to mean it's not directly limited, or in fact it's set
to the global maxconn. Since this operation used to be performed before
computing a possibly automatic global maxconn based on memory limits,
the calculation of the maxconn value and its propagation to the backends'
fullconn has now moved to a dedicated function, proxy_adjust_all_maxconn(),
which is called once the global maxconn is stabilized.
This comes with two benefits :
1) a configuration missing "maxconn" in the defaults section will not
limit itself to a magically hardcoded value but will scale up to the
global maxconn ;
2) when the global maxconn is not set and memory limits are used instead,
the frontends' maxconn automatically adapts, and the backends' fullconn
as well.
It's pointless to always set and maintain l->maxconn because the accept
loop already enforces the frontend's limit anyway. Thus let's stop setting
this value by default and keep it to zero meaning "no limit". This way the
frontend's maxconn will be used by default. Of course if a value is set,
it will be enforced.
In an attempt to try to provide automatic maxconn settings, we need to
decorrelate a listner's backlog and maxconn so that these values can be
independent. This introduces a listener_backlog() function which retrieves
the backlog value from the listener's backlog, the frontend's, the
listener's maxconn, the frontend's or falls back to 1024. This
corresponds to what was done in cfgparse.c to force a value there except
the last fallback which was not set since the frontend's maxconn is always
known.
global.maxsock used to be augmented by the frontend's maxconn value
for each frontend listener, which is absurd when there are many
listeners in a frontend because the frontend's maxconn fixes an
upper limit to how many connections will be accepted on all of its
listeners anyway. What is needed instead is to add one to count the
listening socket.
In addition, the CLI's and peers' value was incremented twice, the
first time when creating the listener and the second time in the
main init code.
Let's now make sure we only increment global.maxsock by the required
amount of sockets. This means not adding maxconn for each listener,
and relying on the global values when they are correct.
Threads have long matured by now, still for most users their usage is
not trivial. It's about time to enable them by default on platforms
where we know the number of CPUs bound. This patch does this, it counts
the number of CPUs the process is bound to upon startup, and enables as
many threads by default. Of course, "nbthread" still overrides this, but
if it's not set the default behaviour is to start one thread per CPU.
The default number of threads is reported in "haproxy -vv". Simply using
"taskset -c" is now enough to adjust this number of threads so that there
is no more need for playing with cpu-map. And thanks to the previous
patches on the listener, the vast majority of configurations will not
need to duplicate "bind" lines with the "process x/y" statement anymore
either, so a simple config will automatically adapt to the number of
processors available.
In order to quickly pick a thread ID when accepting a connection, we'll
need to know certain pre-computed values derived from the thread mask,
which are counts of bits per position multiples of 1, 2, 4, 8, 16 and
32. In practice it is sufficient to compute only the 4 first ones and
store them in the bind_conf. We update the count every time the
bind_thread value is adjusted.
The fields in the bind_conf struct have been moved around a little bit
to make it easier to group all thread bit values into the same cache
line.
The function used to return a thread number is bind_map_thread_id(),
and it maps a number between 0 and 31/63 to a thread ID between 0 and
31/63, starting from the left.
Now that nbproc and nbthread are exclusive, we can still provide more
detailed explanations about what we've found in the config when a bind
line appears on multiple threads and processes at the same time, then
ignore the setting.
This patch reduces the listener's thread mask to a single mask instead
of an array of masks per process. Now we have only one thread mask and
one process mask per bind-conf. This removes ~504 bytes of RAM per
bind-conf and will simplify handling of thread masks.
If a "bind" line only refers to process numbers not found by its parent
frontend or not covered by the global nbproc directive, or to a thread
not covered by the global nbthread directive, a warning is emitted saying
what will be used instead.
When 1.8 was released, we wanted to support both nbthread and nbproc to
observe how things would go. Since then it appeared obvious that the two
are never used together because of the pain to configure affinity in this
case, and instead of bringing benefits, it brings the limitations of both
models, and causes multiple threads to compete for the same CPU. In
addition, it costs a lot to support both in parallel, so let's get rid
of this once for all.
When calling calloc(), cast global.nbthread to unsigned int, so that gcc
doesn't freak out, as it has no way of knowing global.nbthread can't be
negative.
Instead of having one task per thread and per server that does clean the
idling connections, have only one global task for every servers.
That tasks parses all the servers that currently have idling connections,
and remove half of them, to put them in a per-thread list of connections
to kill. For each thread that does have connections to kill, wake a task
to do so, so that the cleaning will be done in the context of said thread.
Add a per-thread counter of idling connections, and use it to determine
how many connections we should kill after the timeout, instead of using
the global counter, or we're likely to just kill most of the connections.
This should be backported to 1.9.
Initialize ->srv peer field for all the peers, the local peer included.
Indeed, a haproxy process needs to connect to the local peer of a remote
process. Furthermore, when a "peer" or "server" line is parsed by parse_server()
the address must be copied to ->addr field of the peer object only if this address
has been also parsed by parse_server(). This is not the case if this address belongs
to the local peer and is provided on a "server" line.
After having parsed the "peer" or "server" lines of a peer
sections, the ->srv part of all the peer must be initialized for SSL, if
enabled. Same thing for the binding part.
Revert 1417f0b commit which is no more required.
No backport is needed, this is purely 2.0.
Now, in the function parse_process_number(), when a process number or a set of
processes is parsed, an error is triggered if an invalid character is found. It
means following syntaxes are not forbidden and will emit an alert during the
HAProxy startup:
1a
1/2
1-2-3
This bug was reported on Github. See issue #36.
This patch may be backported to 1.9 and 1.8.
For some embedded systems, it's pointless to have 32- or even 64- large
arrays of processes when it's known that much fewer processes will be
used in the worst case. Let's introduce this MAX_PROCS define which
contains the highest number of processes allowed to run at once. It
still defaults to LONGBITS but may be lowered.
This also depends on the nbthread count, so it must only be performed after
parsing the whole config file. As a side effect, this removes some code
duplication between servers and server-templates.
This must be backported to 1.9.
The idle conns lists are sized according to the number of threads. As such
they cannot be initialized during the parsing since nbthread can be set
later, as revealed by this simple config which randomly crashes when used.
Let's do this at the end instead.
listen proxy
bind :4445
mode http
timeout client 10s
timeout server 10s
timeout connect 10s
http-reuse always
server s1 127.0.0.1:8000
global
nbthread 8
This fix must be backported to 1.9 and 1.8.
When commit 151e1ca98 ("BUG/MAJOR: config: verify that targets of track-sc
and stick rules are present") added a check for some process inconsistencies
between rules and their stick tables, some errors resulted in a "return 0"
statement, which is taken as "no error" in some cases. Let's fix this.
This must be backported to all versions using the above commit.
Stick and track-sc rules may optionally designate a table in a different
proxy. In this case, a number of verifications are made such as validating
that this proxy actually exists. However, in multi-process mode, the target
table might indeed exist but not be bound to the set of processes the rules
will execute on. This will definitely result in a random behaviour especially
if these tables do require peer synchronization, because some tasks will be
started to try to synchronize form uninitialized areas.
The typical issue looks like this :
peers my-peers
peer foo ...
listen proxy
bind-process 1
stick on src table ip
...
backend ip
bind-process 2
stick-table type ip size 1k peers my-peers
While it appears obvious that the example above will not work, there are
less obvious situations, such as having bind-process in a defaults section
and having a larger set of processes for the referencing proxy than the
referenced one.
The present patch adds checks for such situations by verifying that all
processes from the referencing proxy are present on the other one in all
track-sc* and stick-* rules, and in sample fetch / converters referencing
another table so that sc_inc_gpc0() and similar are safe as well.
This fix must be backported to all maintained versions. It may potentially
disrupt configurations which already randomly crash. There hardly is any
intermediary solution though, such configurations need to be fixed.
At a number of places we used to have null tests on bind_proc for
listeners and proxies. Let's simplify all these tests by always
having the proper bits reported via proc_mask().
When no nbproc is specified, a computation leads to reading bind_thread[-1]
before checking if the thread mask is valid for a bind conf. It may either
report a false warning and compute a wrong mask, or miss some incorrect
configs.
This must be backported to 1.9 and possibly 1.8.
This bug was introduced by 355b203 commit which prevented the peer
addresses to be parsed for the local peer of a "peers" section.
When adding "parse_addr" boolean parameter to parse_server(), this commit
missed the case where the syntax with "peer" keyword should still be
supported in addition to the new syntax with "server"+"bind" keyword.
May be backported as fas as 1.5.
Some servers may wish to limit the total number of requests they execute
over a connection because some of their components might leak resources.
In HTTP/1 it was easy, they just had to emit a "connection: close" header
field with the last response. In HTTP/2, it's less easy because the info
is not always shared with the component dealing with the H2 protocol and
it could be harder to advertise a GOAWAY with a stream limit.
This patch provides a solution to this by adding a new "max-reuse" parameter
to the server keyword. This parameter indicates how many times an idle
connection may be reused for new requests. The information is made available
and the underlying muxes will be able to use it at will.
This patch should be backported to 1.9.
Make "bind" keywork be supported in "peers" sections.
All "bind" settings are supported on this line.
Add "default-bind" option to parse the binding options excepted the bind address.
Do not parse anymore the bind address for local peers on "server" lines.
Do not use anymore list_for_each_entry() to set the "peers" section
listener parameters because there is only one listener by "peers" section.
May be backported to 1.5 and newer.