With this patch, it is possible to configure HAProxy to forge the SSL
certificate sent to a client using the SNI servername. We do it in the SNI
callback.
To enable this feature, you must pass following BIND options:
* ca-sign-file <FILE> : This is the PEM file containing the CA certitifacte and
the CA private key to create and sign server's certificates.
* (optionally) ca-sign-pass <PASS>: This is the CA private key passphrase, if
any.
* generate-certificates: Enable the dynamic generation of certificates for a
listener.
Because generating certificates is expensive, there is a LRU cache to store
them. Its size can be customized by setting the global parameter
'tune.ssl.ssl-ctx-cache-size'.
When using the "51d" converter without specifying the list of 51Degrees
properties to detect (see parameter "51degrees-property-name-list"), the
"global._51d_property_names" could be left uninitialized which will lead to
segfault during init.
Since all rules listed in p->block_rules have been moved to the beginning of
the http-request rules in check_config_validity(), there is no need to clean
p->block_rules in deinit().
Signed-off-by: Godbach <nylzhaowei@gmail.com>
An ifdef was missing to avoid declaring these variables :
src/haproxy.c: In function 'deinit':
src/haproxy.c:1253:47: warning: unused variable '_51d_prop_nameb' [-Wunused-variable]
src/haproxy.c:1253:30: warning: unused variable '_51d_prop_name' [-Wunused-variable]
This diff initialises few DeviceAtlas struct fields member with
their inherent default values.
Furthermore, the specific DeviceAtlas configuration keywords are
registered and the module is initialised and all necessary
resources are freed during the deinit phase.
These ones were already obsoleted in 1.4, marked for removal in 1.5,
and not documented anymore. They used to emit warnings, and do still
require quite some code to stay in place. Let's remove them now.
Until now, HAproxy needed to be restarted to change the TLS ticket
keys. With this patch, the TLS keys can be updated on a per-file
basis using the admin socket. Two new socket commands have been
introduced: "show tls-keys" and "set ssl tls-keys".
Signed-off-by: Nenad Merdanovic <nmerdan@anine.io>
This will prevent the peers section from remaining in listen state on
the incorrect process. The peers_fe pointer is set to NULL, which will
tell the peers task to commit suicide if it was already scheduled.
The principle of this cache is to have a global cache for all pattern
matching operations which rely on lists (reg, sub, dir, dom, ...). The
input data, the expression and a random seed are used as a hashing key.
The cached entries contains a pointer to the expression and a revision
number for that expression so that we don't accidently used obsolete
data after a pattern update or a very unlikely hash collision.
Regarding the risk of collisions, 10k entries at 10k req/s mean 1% risk
of a collision after 60 years, that's already much less than the memory's
reliability in most machines and more durable than most admin's life
expectancy. A collision will result in a valid result to be returned
for a different entry from the same list. If this is not acceptable,
the cache can be disabled using tune.pattern.cache-size.
A test on a file containing 10k small regex showed that the regex
matching was limited to 6k/s instead of 70k with regular strings.
When enabling the LRU cache, the performance was back to 70k/s.
The new function is called for each round of polling in order to call any
active appctx. For now we pick the stream interface from the appctx's
owner. At the moment there's no appctx queued yet, but we have everything
needed to queue them and remove them.
We have to allow 32 or 64 processes depending on the machine's word
size, and on 64-bit machines only the first 32 processes were properly
bound.
This fix should be backported to 1.5.
The poll() functions have become a bit dirty because they now check the
size of the signal queue, the FD cache and the number of tasks. It's not
their job, this must be moved to the caller. In the end it simplifies the
code because the expiration date is now set to now_ms if we must not wait,
and this achieves in exactly the same result and is cleaner. The change
looks large due to the change of indent for blocks which were inside an
"if" block.
This one will not necessarily be allocated for each stream, and we want
to use the fact that it equals null to know it's not present so that we
can always deduce its presence from the stream pointer.
This commit only creates the new pool.
There is now a pointer to the session in the stream, which is NULL
for now. The session pool is created as well. Some parts will move
from the stream to the session now.
With HTTP/2, we'll have to support multiplexed streams. A stream is in
fact the largest part of what we currently call a session, it has buffers,
logs, etc.
In order to catch any error, this commit removes any reference to the
struct session and tries to rename most "session" occurrences in function
names to "stream" and "sess" to "strm" when that's related to a session.
The files stream.{c,h} were added and session.{c,h} removed.
The session will be reintroduced later and a few parts of the stream
will progressively be moved overthere. It will more or less contain
only what we need in an embryonic session.
Sample fetch functions and converters will have to change a bit so
that they'll use an L5 (session) instead of what's currently called
"L4" which is in fact L6 for now.
Once all changes are completed, we should see approximately this :
L7 - http_txn
L6 - stream
L5 - session
L4 - connection | applet
There will be at most one http_txn per stream, and a same session will
possibly be referenced by multiple streams. A connection will point to
a session and to a stream. The session will hold all the information
we need to keep even when we don't yet have a stream.
Some more cleanup is needed because some code was already far from
being clean. The server queue management still refers to sessions at
many places while comments talk about connections. This will have to
be cleaned up once we have a server-side connection pool manager.
Stream flags "SN_*" still need to be renamed, it doesn't seem like
any of them will need to move to the session.
Thanks to MSIE/IIS, the "deflate" name is ambigous. According to the RFC
it's a zlib-wrapped deflate stream, but IIS used to send only a raw deflate
stream, which is the only format MSIE understands for "deflate". The other
widely used browsers do support both formats. For this reason some people
prefer to emit a raw deflate stream on "deflate" to serve more users even
it that means violating the standards. Haproxy only follows the standard,
so they cannot do this.
This patch makes it possible to have one algorithm name in the configuration
and another one in the protocol. This will make it possible to have a new
configuration token to add a different algorithm so that users can decide if
they want a raw deflate or the standard one.
Released version 1.6-dev1 with the following main changes :
- CLEANUP: extract temporary $CFG to eliminate duplication
- CLEANUP: extract temporary $BIN to eliminate duplication
- CLEANUP: extract temporary $PIDFILE to eliminate duplication
- CLEANUP: extract temporary $LOCKFILE to eliminate duplication
- CLEANUP: extract quiet_check() to avoid duplication
- BUG/MINOR: don't start haproxy on reload
- DOC: Address issue where documentation is excluded due to a gitignore rule.
- BUG/MEDIUM: systemd: set KillMode to 'mixed'
- BUILD: fix "make install" to support spaces in the install dirs
- BUG/MINOR: config: http-request replace-header arg typo
- BUG: config: error in http-response replace-header number of arguments
- DOC: missing track-sc* in http-request rules
- BUILD: lua: missing ifdef related to SSL when enabling LUA
- BUG/MEDIUM: regex: fix pcre_study error handling
- MEDIUM: regex: Use pcre_study always when PCRE is used, regardless of JIT
- BUG/MINOR: Fix search for -p argument in systemd wrapper.
- MEDIUM: Improve signal handling in systemd wrapper.
- DOC: fix typo in Unix Socket commands
- BUG/MEDIUM: checks: external checks can't change server status to UP
- BUG/MEDIUM: checks: segfault with external checks in a backend section
- BUG/MINOR: checks: external checks shouldn't wait for timeout to return the result
- BUG/MEDIUM: auth: fix segfault with http-auth and a configuration with an unknown encryption algorithm
- BUG/MEDIUM: config: userlists should ensure that encrypted passwords are supported
- BUG/MINOR: config: don't propagate process binding for dynamic use_backend
- BUG/MINOR: log: fix request flags when keep-alive is enabled
- BUG/MEDIUM: checks: fix conflicts between agent checks and ssl healthchecks
- MINOR: checks: allow external checks in backend sections
- MEDIUM: checks: provide environment variables to the external checks
- MINOR: checks: update dynamic environment variables in external checks
- DOC: checks: environment variables used by "external-check command"
- BUG/MEDIUM: backend: correctly detect the domain when use_domain_only is used
- MINOR: ssl: load certificates in alphabetical order
- BUG/MINOR: checks: prevent http keep-alive with http-check expect
- MINOR: lua: typo in an error message
- MINOR: report the Lua version in -vv
- MINOR: lua: add a compilation error message when compiled with an incompatible version
- BUG/MEDIUM: lua: segfault when calling haproxy sample fetches from lua
- BUILD: try to automatically detect the Lua library name
- BUILD/CLEANUP: systemd: avoid a warning due to mixed code and declaration
- BUG/MEDIUM: backend: Update hash to use unsigned int throughout
- BUG/MEDIUM: connection: fix memory corruption when building a proxy v2 header
- MEDIUM: connection: add new bit in Proxy Protocol V2
- BUG/MINOR: ssl: rejects OCSP response without nextupdate.
- BUG/MEDIUM: ssl: Fix to not serve expired OCSP responses.
- BUG/MINOR: ssl: Fix OCSP resp update fails with the same certificate configured twice.
- BUG/MINOR: ssl: Fix external function in order not to return a pointer on an internal trash buffer.
- MINOR: add fetchs 'ssl_c_der' and 'ssl_f_der' to return DER formatted certs
- MINOR: ssl: add statement to force some ssl options in global.
- BUG/MINOR: ssl: correctly initialize ssl ctx for invalid certificates
- BUG/MEDIUM: ssl: fix bad ssl context init can cause segfault in case of OOM.
- BUG/MINOR: samples: fix unnecessary memcopy converting binary to string.
- MINOR: samples: adds the bytes converter.
- MINOR: samples: adds the field converter.
- MINOR: samples: add the word converter.
- BUG/MINOR: server: move the directive #endif to the end of file
- BUG/MAJOR: buffer: check the space left is enough or not when input data in a buffer is wrapped
- DOC: fix a few typos
- CLEANUP: epoll: epoll_events should be allocated according to global.tune.maxpollevents
- BUG/MINOR: http: fix typo: "401 Unauthorized" => "407 Unauthorized"
- BUG/MINOR: parse: refer curproxy instead of proxy
- BUG/MINOR: parse: check the validity of size string in a more strict way
- BUILD: add new target 'make uninstall' to support uninstalling haproxy from OS
- DOC: expand the docs for the provided stats.
- BUG/MEDIUM: unix: do not unlink() abstract namespace sockets upon failure.
- MEDIUM: ssl: Certificate Transparency support
- MEDIUM: stats: proxied stats admin forms fix
- MEDIUM: http: Compress HTTP responses with status codes 201,202,203 in addition to 200
- BUG/MEDIUM: connection: sanitize PPv2 header length before parsing address information
- MAJOR: namespace: add Linux network namespace support
- MINOR: systemd: Check configuration before start
- BUILD: ssl: handle boringssl in openssl version detection
- BUILD: ssl: disable OCSP when using boringssl
- BUILD: ssl: don't call get_rfc2409_prime when using boringssl
- MINOR: ssl: don't use boringssl's cipher_list
- BUILD: ssl: use OPENSSL_NO_OCSP to detect OCSP support
- MINOR: stats: fix minor typo in HTML page
- MINOR: Also accept SIGHUP/SIGTERM in systemd-wrapper
- MEDIUM: Add support for configurable TLS ticket keys
- DOC: Document the new tls-ticket-keys bind keyword
- DOC: clearly state that the "show sess" output format is not fixed
- MINOR: stats: fix minor typo fix in stats_dump_errors_to_buffer()
- DOC: httplog does not support 'no'
- BUG/MEDIUM: ssl: Fix a memory leak in DHE key exchange
- MINOR: ssl: use SSL_get_ciphers() instead of directly accessing the cipher list.
- BUG/MEDIUM: Consistently use 'check' in process_chk
- MEDIUM: Add external check
- BUG/MEDIUM: Do not set agent health to zero if server is disabled in config
- MEDIUM/BUG: Only explicitly report "DOWN (agent)" if the agent health is zero
- MEDIUM: Remove connect_chk
- MEDIUM: Refactor init_check and move to checks.c
- MEDIUM: Add free_check() helper
- MEDIUM: Move proto and addr fields struct check
- MEDIUM: Attach tcpcheck_rules to check
- MEDIUM: Add parsing of mailers section
- MEDIUM: Allow configuration of email alerts
- MEDIUM: Support sending email alerts
- DOC: Document email alerts
- MINOR: Remove trailing '.' from email alert messages
- MEDIUM: Allow suppression of email alerts by log level
- BUG/MEDIUM: Do not consider an agent check as failed on L7 error
- MINOR: deinit: fix memory leak
- MINOR: http: export the function 'smp_fetch_base32'
- BUG/MEDIUM: http: tarpit timeout is reset
- MINOR: sample: add "json" converter
- BUG/MEDIUM: pattern: don't load more than once a pattern list.
- MINOR: map/acl/dumpstats: remove the "Done." message
- BUG/MAJOR: ns: HAProxy segfault if the cli_conn is not from a network connection
- BUG/MINOR: pattern: error message missing
- BUG/MEDIUM: pattern: some entries are not deleted with case insensitive match
- BUG/MINOR: ARG6 and ARG7 don't fit in a 32 bits word
- MAJOR: poll: only rely on wake_expired_tasks() to compute the wait delay
- MEDIUM: task: call session analyzers if the task is woken by a message.
- MEDIUM: protocol: automatically pick the proto associated to the connection.
- MEDIUM: channel: wake up any request analyzer on response activity
- MINOR: converters: add a "void *private" argument to converters
- MINOR: converters: give the session pointer as converter argument
- MINOR: sample: add private argument to the struct sample_fetch
- MINOR: global: export function and permits to not resolve DNS names
- MINOR: sample: add function for browsing samples.
- MINOR: global: export many symbols.
- MINOR: includes: fix a lot of missing or useless includes
- MEDIUM: tcp: add register keyword system.
- MEDIUM: buffer: make bo_putblk/bo_putstr/bo_putchk return the number of bytes copied.
- MEDIUM: http: change the code returned by the response processing rule functions
- MEDIUM: http/tcp: permit to resume http and tcp custom actions
- MINOR: channel: functions to get data from a buffer without copy
- MEDIUM: lua: lua integration in the build and init system.
- MINOR: lua: add ease functions
- MINOR: lua: add runtime execution context
- MEDIUM: lua: "com" signals
- MINOR: lua: add the configuration directive "lua-load"
- MINOR: lua: core: create "core" class and object
- MINOR: lua: post initialisation bindings
- MEDIUM: lua: add coroutine as tasks.
- MINOR: lua: add sample and args type converters
- MINOR: lua: txn: create class TXN associated with the transaction.
- MINOR: lua: add shared context in the lua stack
- MINOR: lua: txn: import existing sample-fetches in the class TXN
- MINOR: lua: txn: add lua function in TXN that returns an array of http headers
- MINOR: lua: register and execute sample-fetches in LUA
- MINOR: lua: register and execute converters in LUA
- MINOR: lua: add bindings for tcp and http actions
- MINOR: lua: core: add sleep functions
- MEDIUM: lua: socket: add "socket" class for TCP I/O
- MINOR: lua: core: pattern and acl manipulation
- MINOR: lua: channel: add "channel" class
- MINOR: lua: txn: object "txn" provides two objects "channel"
- MINOR: lua: core: can set the nice of the current task
- MINOR: lua: core: can yield an execution stack
- MINOR: lua: txn: add binding for closing the client connection.
- MEDIUM: lua: Lua initialisation "on demand"
- BUG/MAJOR: lua: send function fails and return bad bytes
- MINOR: remove unused declaration.
- MINOR: lua: remove some #define
- MINOR: lua: use bitfield and macro in place of integer and enum
- MINOR: lua: set skeleton for Lua execution expiration
- MEDIUM: lua: each yielding function returns a wake up time.
- MINOR: lua: adds "forced yield" flag
- MEDIUM: lua: interrupt the Lua execution for running other process
- MEDIUM: lua: change the sleep function core
- BUG/MEDIUM: lua: the execution timeout is ignored in yield case
- DOC: lua: Lua configuration documentation
- MINOR: lua: add the struct session in the lua channel struct
- BUG/MINOR: lua: set buffer if it is nnot avalaible.
- BUG/MEDIUM: lua: reset flags before resuming execution
- BUG/MEDIUM: lua: fix infinite loop about channel
- BUG/MEDIUM: lua: the Lua process is not waked up after sending data on requests side
- BUG/MEDIUM: lua: many errors when we try to send data with the channel API
- MEDIUM: lua: use the Lua-5.3 version of the library
- BUG/MAJOR: lua: some function are not yieldable, the forced yield causes errors
- BUG/MEDIUM: lua: can't handle the response bytes
- BUG/MEDIUM: lua: segfault with buffer_replace2
- BUG/MINOR: lua: check buffers before initializing socket
- BUG/MINOR: log: segfault if there are no proxy reference
- BUG/MEDIUM: lua: sockets don't have buffer to write data
- BUG/MEDIUM: lua: cannot connect socket
- BUG/MINOR: lua: sockets receive behavior doesn't follows the specs
- BUG/BUILD: lua: The strict Lua 5.3 version check is not done.
- BUG/MEDIUM: buffer: one byte miss in buffer free space check
- MEDIUM: lua: make the functions hlua_gethlua() and hlua_sethlua() faster
- MINOR: replace the Core object by a simple model.
- MEDIUM: lua: change the objects configuration
- MEDIUM: lua: create a namespace for the fetches
- MINOR: converters: add function to browse converters
- MINOR: lua: wrapper for converters
- MINOR: lua: replace function (req|get)_channel by a variable
- MINOR: lua: fetches and converters can return an empty string in place of nil
- DOC: lua api
- BUG/MEDIUM: sample: fix random number upper-bound
- BUG/MINOR: stats:Fix incorrect printf type.
- BUG/MAJOR: session: revert all the crappy client-side timeout changes
- BUG/MINOR: logs: properly initialize and count log sockets
- BUG/MEDIUM: http: fetch "base" is not compatible with set-header
- BUG/MINOR: counters: do not untrack counters before logging
- BUG/MAJOR: sample: correctly reinitialize sample fetch context before calling sample_process()
- MINOR: stick-table: make stktable_fetch_key() indicate why it failed
- BUG/MEDIUM: counters: fix track-sc* to wait on unstable contents
- BUILD: remove TODO from the spec file and add README
- MINOR: log: make MAX_SYSLOG_LEN overridable at build time
- MEDIUM: log: support a user-configurable max log line length
- DOC: provide an example of how to use ssl_c_sha1
- BUILD: checks: external checker needs signal.h
- BUILD: checks: kill a minor warning on Solaris in external checks
- BUILD: http: fix isdigit & isspace warnings on Solaris
- BUG/MINOR: listener: set the listener's fd to -1 after deletion
- BUG/MEDIUM: unix: failed abstract socket binding is retryable
- MEDIUM: listener: implement a per-protocol pause() function
- MEDIUM: listener: support rebinding during resume()
- BUG/MEDIUM: unix: completely unbind abstract sockets during a pause()
- DOC: explicitly mention the limits of abstract namespace sockets
- DOC: minor fix on {sc,src}_kbytes_{in,out}
- DOC: fix alphabetical sort of converters
- MEDIUM: stick-table: implement lookup from a sample fetch
- MEDIUM: stick-table: add new converters to fetch table data
- MINOR: samples: add two converters for the date format
- BUG/MAJOR: http: correctly rewind the request body after start of forwarding
- DOC: remove references to CPU=native in the README
- DOC: mention that "compression offload" is ignored in defaults section
- DOC: mention that Squid correctly responds 400 to PPv2 header
- BUILD: fix dependencies between config and compat.h
- MINOR: session: export the function 'smp_fetch_sc_stkctr'
- MEDIUM: stick-table: make it easier to register extra data types
- BUG/MINOR: http: base32+src should use the big endian version of base32
- MINOR: sample: allow IP address to cast to binary
- MINOR: sample: add new converters to hash input
- MINOR: sample: allow integers to cast to binary
- BUILD: report commit ID in git versions as well
- CLEANUP: session: move the stick counters declarations to stick_table.h
- MEDIUM: http: add the track-sc* actions to http-request rules
- BUG/MEDIUM: connection: fix proxy v2 header again!
- BUG/MAJOR: tcp: fix a possible busy spinning loop in content track-sc*
- OPTIM/MINOR: proxy: reduce struct proxy by 48 bytes on 64-bit archs
- MINOR: log: add a new field "%lc" to implement a per-frontend log counter
- BUG/MEDIUM: http: fix inverted condition in pat_match_meth()
- BUG/MEDIUM: http: fix improper parsing of HTTP methods for use with ACLs
- BUG/MINOR: pattern: remove useless allocation of unused trash in pat_parse_reg()
- BUG/MEDIUM: acl: correctly compute the output type when a converter is used
- CLEANUP: acl: cleanup some of the redundancy and spaghetti after last fix
- BUG/CRITICAL: http: don't update msg->sov once data start to leave the buffer
- MEDIUM: http: enable header manipulation for 101 responses
- BUG/MEDIUM: config: propagate frontend to backend process binding again.
- MEDIUM: config: properly propagate process binding between proxies
- MEDIUM: config: make the frontends automatically bind to the listeners' processes
- MEDIUM: config: compute the exact bind-process before listener's maxaccept
- MEDIUM: config: only warn if stats are attached to multi-process bind directives
- MEDIUM: config: report it when tcp-request rules are misplaced
- DOC: indicate in the doc that track-sc* can wait if data are missing
- MINOR: config: detect the case where a tcp-request content rule has no inspect-delay
- MEDIUM: systemd-wrapper: support multiple executable versions and names
- BUG/MEDIUM: remove debugging code from systemd-wrapper
- BUG/MEDIUM: http: adjust close mode when switching to backend
- BUG/MINOR: config: don't propagate process binding on fatal errors.
- BUG/MEDIUM: check: rule-less tcp-check must detect connect failures
- BUG/MINOR: tcp-check: report the correct failed step in the status
- DOC: indicate that weight zero is reported as DRAIN
- BUG/MEDIUM: config: avoid skipping disabled proxies
- BUG/MINOR: config: do not accept more track-sc than configured
- BUG/MEDIUM: backend: fix URI hash when a query string is present
- BUG/MEDIUM: http: don't dump debug headers on MSG_ERROR
- BUG/MAJOR: cli: explicitly call cli_release_handler() upon error
- BUG/MEDIUM: tcp: fix outgoing polling based on proxy protocol
- BUILD/MINOR: ssl: de-constify "ciphers" to avoid a warning on openssl-0.9.8
- BUG/MEDIUM: tcp: don't use SO_ORIGINAL_DST on non-AF_INET sockets
- BUG/BUILD: revert accidental change in the makefile from latest SSL fix
- BUG/MEDIUM: ssl: force a full GC in case of memory shortage
- MEDIUM: ssl: add support for smaller SSL records
- MINOR: session: release a few other pools when stopping
- MINOR: task: release the task pool when stopping
- BUG/MINOR: config: don't inherit the default balance algorithm in frontends
- BUG/MAJOR: frontend: initialize capture pointers earlier
- BUG/MINOR: stats: correctly set the request/response analysers
- MAJOR: polling: centralize calls to I/O callbacks
- DOC: fix typo in the body parser documentation for msg.sov
- BUG/MINOR: peers: the buffer size is global.tune.bufsize, not trash.size
- MINOR: sample: add a few basic internal fetches (nbproc, proc, stopping)
- DEBUG: pools: apply poisonning on every allocated pool
- BUG/MAJOR: sessions: unlink session from list on out of memory
- BUG/MEDIUM: patterns: previous fix was incomplete
- BUG/MEDIUM: payload: ensure that a request channel is available
- BUG/MINOR: tcp-check: don't condition data polling on check type
- BUG/MEDIUM: tcp-check: don't rely on random memory contents
- BUG/MEDIUM: tcp-checks: disable quick-ack unless next rule is an expect
- BUG/MINOR: config: fix typo in condition when propagating process binding
- BUG/MEDIUM: config: do not propagate processes between stopped processes
- BUG/MAJOR: stream-int: properly check the memory allocation return
- BUG/MEDIUM: memory: fix freeing logic in pool_gc2()
- BUG/MAJOR: namespaces: conn->target is not necessarily a server
- BUG/MEDIUM: compression: correctly report zlib_mem
- CLEANUP: lists: remove dead code
- CLEANUP: memory: remove dead code
- CLEANUP: memory: replace macros pool_alloc2/pool_free2 with functions
- MINOR: memory: cut pool allocator in 3 layers
- MEDIUM: memory: improve pool_refill_alloc() to pass a refill count
- MINOR: stream-int: retrieve session pointer from stream-int
- MINOR: buffer: reset a buffer in b_reset() and not channel_init()
- MEDIUM: buffer: use b_alloc() to allocate and initialize a buffer
- MINOR: buffer: move buffer initialization after channel initialization
- MINOR: buffer: only use b_free to release buffers
- MEDIUM: buffer: always assign a dummy empty buffer to channels
- MEDIUM: buffer: add a new buf_wanted dummy buffer to report failed allocations
- MEDIUM: channel: do not report full when buf_empty is present on a channel
- MINOR: session: group buffer allocations together
- MINOR: buffer: implement b_alloc_fast()
- MEDIUM: buffer: implement b_alloc_margin()
- MEDIUM: session: implement a basic atomic buffer allocator
- MAJOR: session: implement a wait-queue for sessions who need a buffer
- MAJOR: session: only allocate buffers when needed
- MINOR: stats: report a "waiting" flags for sessions
- MAJOR: session: only wake up as many sessions as available buffers permit
- MINOR: config: implement global setting tune.buffers.reserve
- MINOR: config: implement global setting tune.buffers.limit
- MEDIUM: channel: implement a zero-copy buffer transfer
- MEDIUM: stream-int: support splicing from applets
- OPTIM: stream-int: try to send pending spliced data
- CLEANUP: session: remove session_from_task()
- DOC: add missing entry for log-format and clarify the text
- MINOR: logs: add a new per-proxy "log-tag" directive
- BUG/MEDIUM: http: fix header removal when previous header ends with pure LF
- MINOR: config: extend the default max hostname length to 64 and beyond
- BUG/MEDIUM: channel: fix possible integer overflow on reserved size computation
- BUG/MINOR: channel: compare to_forward with buf->i, not buf->size
- MINOR: channel: add channel_in_transit()
- MEDIUM: channel: make buffer_reserved() use channel_in_transit()
- MEDIUM: channel: make bi_avail() use channel_in_transit()
- BUG/MEDIUM: channel: don't schedule data in transit for leaving until connected
- CLEANUP: channel: rename channel_reserved -> channel_is_rewritable
- MINOR: channel: rename channel_full() to !channel_may_recv()
- MINOR: channel: rename buffer_reserved() to channel_reserved()
- MINOR: channel: rename buffer_max_len() to channel_recv_limit()
- MINOR: channel: rename bi_avail() to channel_recv_max()
- MINOR: channel: rename bi_erase() to channel_truncate()
- BUG/MAJOR: log: don't try to emit a log if no logger is set
- MINOR: tools: add new round_2dig() function to round integers
- MINOR: global: always export some SSL-specific metrics
- MINOR: global: report information about the cost of SSL connections
- MAJOR: init: automatically set maxconn and/or maxsslconn when possible
- MINOR: http: add a new fetch "query" to extract the request's query string
- MINOR: hash: add new function hash_crc32
- MINOR: samples: provide a "crc32" converter
- MEDIUM: backend: add the crc32 hash algorithm for load balancing
- BUG/MINOR: args: add missing entry for ARGT_MAP in arg_type_names
- BUG/MEDIUM: http: make http-request set-header compute the string before removal
- MEDIUM: args: use #define to specify the number of bits used by arg types and counts
- MEDIUM: args: increase arg type to 5 bits and limit arg count to 5
- MINOR: args: add type-specific flags for each arg in a list
- MINOR: args: implement a new arg type for regex : ARGT_REG
- MEDIUM: regex: add support for passing regex flags to regex_exec_match()
- MEDIUM: samples: add a regsub converter to perform regex-based transformations
- BUG/MINOR: sample: fix case sensitivity for the regsub converter
- MEDIUM: http: implement http-request set-{method,path,query,uri}
- DOC: fix missing closing brackend on regsub
- MEDIUM: samples: provide basic arithmetic and bitwise operators
- MEDIUM: init: continue to enforce SYSTEM_MAXCONN with auto settings if set
- BUG/MINOR: http: fix incorrect header value offset in replace-hdr/replace-value
- BUG/MINOR: http: abort request processing on filter failure
- MEDIUM: tcp: implement tcp-ut bind option to set TCP_USER_TIMEOUT
- MINOR: ssl/server: add the "no-ssl-reuse" server option
- BUG/MAJOR: peers: initialize s->buffer_wait when creating the session
- MINOR: http: add a new function to iterate over each header line
- MINOR: http: add the new sample fetches req.hdr_names and res.hdr_names
- MEDIUM: task: always ensure that the run queue is consistent
- BUILD: Makefile: add -Wdeclaration-after-statement
- BUILD/CLEANUP: ssl: avoid a warning due to mixed code and declaration
- BUILD/CLEANUP: config: silent 3 warnings about mixed declarations with code
- MEDIUM: protocol: use a family array to index the protocol handlers
- BUILD: lua: cleanup many mixed occurrences declarations & code
- BUG/MEDIUM: task: fix recently introduced scheduler skew
- BUG/MINOR: lua: report the correct function name in an error message
- BUG/MAJOR: http: fix stats regression consecutive to HTTP_RULE_RES_YIELD
- Revert "BUG/MEDIUM: lua: can't handle the response bytes"
- MINOR: lua: convert IP addresses to type string
- CLEANUP: lua: use the same function names in C and Lua
- REORG/MAJOR: move session's req and resp channels back into the session
- CLEANUP: remove now unused channel pool
- REORG/MEDIUM: stream-int: introduce si_ic/si_oc to access channels
- MEDIUM: stream-int: add a flag indicating which side the SI is on
- MAJOR: stream-int: only rely on SI_FL_ISBACK to find the requested channel
- MEDIUM: stream-interface: remove now unused pointers to channels
- MEDIUM: stream-int: make si_sess() use the stream int's side
- MEDIUM: stream-int: use si_task() to retrieve the task from the stream int
- MEDIUM: stream-int: remove any reference to the owner
- CLEANUP: stream-int: add si_ib/si_ob to dereference the buffers
- CLEANUP: stream-int: add si_opposite() to find the other stream interface
- REORG/MEDIUM: channel: only use chn_prod / chn_cons to find stream-interfaces
- MEDIUM: channel: add a new flag "CF_ISRESP" for the response channel
- MAJOR: channel: only rely on the new CF_ISRESP flag to find the SI
- MEDIUM: channel: remove now unused ->prod and ->cons pointers
- CLEANUP: session: simplify references to chn_{prod,cons}(&s->{req,res})
- CLEANUP: session: use local variables to access channels / stream ints
- CLEANUP: session: don't needlessly pass a pointer to the stream-int
- CLEANUP: session: don't use si_{ic,oc} when we know the session.
- CLEANUP: stream-int: limit usage of si_ic/si_oc
- CLEANUP: lua: limit usage of si_ic/si_oc
- MINOR: channel: add chn_sess() helper to retrieve session from channel
- MEDIUM: session: simplify receive buffer allocator to only use the channel
- MEDIUM: lua: use CF_ISRESP to detect the channel's side
- CLEANUP: lua: remove the session pointer from hlua_channel
- CLEANUP: lua: hlua_channel_new() doesn't need the pointer to the session anymore
- MEDIUM: lua: remove struct hlua_channel
- MEDIUM: lua: remove hlua_sample_fetch
As of the other libraries used by haproxy, it can be useful to display the Lua
version used at compilation time.
A new line is added to "haproxy -vv", which shows if Lua is supported by the
binary, and with which version it was compiled.
This system permits to execute some lua function after than HAProxy
complete his initialisation. These functions are executed between
the end of the configuration parsing and check and the begin of the
scheduler.
This is the first step of the lua integration. We add the useful
files in the HAProxy project. These files contains the main
includes, the Makefile options and empty initialisation function.
Is is the LUA skeleton.
Actually, HAProxy uses the function "process_runnable_tasks" and
"wake_expired_tasks" to get the next task which can expires.
If a task is added with "task_schedule" or other method during
the execution of an other task, the expiration of this new task
is not taken into account, and the execution of this task can be
too late.
Actualy, HAProxy seems to be no sensitive to this bug.
This fix moves the call to process_runnable_tasks() before the timeout
calculation and ensures that all wakeups are processed together. Only
wake_expired_tasks() needs to return a timeout now.
Commit d025648 ("MAJOR: init: automatically set maxconn and/or maxsslconn
when possible") resulted in a case where if enough memory is available,
a maxconn value larger than SYSTEM_MAXCONN could be computed, resulting
in possibly overflowing other systems resources (eg: kernel socket buffers,
conntrack entries, etc). Let's bound any automatic maxconn to SYSTEM_MAXCONN
if it is defined. Note that the value is set to DEFAULT_MAXCONN since
SYSTEM_MAXCONN forces DEFAULT_MAXCONN, thus it is not an error.
This one will be used when a regex is expected. It is automatically
resolved after the parsing and compiled into a regex. Some optional
flags are supported in the type-specific flags that should be set by
the optional arg checker. One is used during the regex compilation :
ARGF_REG_ICASE to ignore case.
If a memory size limit is enforced using "-n" on the command line and
one or both of maxconn / maxsslconn are not set, instead of using the
build-time values, haproxy now computes the number of sessions that can
be allocated depending on a number of parameters among which :
- global.maxconn (if set)
- global.maxsslconn (if set)
- maxzlibmem
- tune.ssl.cachesize
- presence of SSL in at least one frontend (bind lines)
- presence of SSL in at least one backend (server lines)
- tune.bufsize
- tune.cookie_len
The purpose is to ensure that not haproxy will not run out of memory
when maxing out all parameters. If neither maxconn nor maxsslconn are
used, it will consider that 100% of the sessions involve SSL on sides
where it's supported. That means that it will typically optimize maxconn
for SSL offloading or SSL bridging on all connections. This generally
means that the simple act of enabling SSL in a frontend or in a backend
will significantly reduce the global maxconn but in exchange of that, it
will guarantee that it will not fail.
All metrics may be enforced using #defines to accomodate variations in
SSL libraries or various allocation sizes.
We've already experimented with three wake up algorithms when releasing
buffers : the first naive one used to wake up far too many sessions,
causing many of them not to get any buffer. The second approach which
was still in use prior to this patch consisted in waking up either 1
or 2 sessions depending on the number of FDs we had released. And this
was still inaccurate. The third one tried to cover the accuracy issues
of the second and took into consideration the number of FDs the sessions
would be willing to use, but most of the time we ended up waking up too
many of them for nothing, or deadlocking by lack of buffers.
This patch completely removes the need to allocate two buffers at once.
Instead it splits allocations into critical and non-critical ones and
implements a reserve in the pool for this. The deadlock situation happens
when all buffers are be allocated for requests pending in a maxconn-limited
server queue, because then there's no more way to allocate buffers for
responses, and these responses are critical to release the servers's
connection in order to release the pending requests. In fact maxconn on
a server creates a dependence between sessions and particularly between
oldest session's responses and latest session's requests. Thus, it is
mandatory to get a free buffer for a response in order to release a
server connection which will permit to release a request buffer.
Since we definitely have non-symmetrical buffers, we need to implement
this logic in the buffer allocation mechanism. What this commit does is
implement a reserve of buffers which can only be allocated for responses
and that will never be allocated for requests. This is made possible by
the requester indicating how much margin it wants to leave after the
allocation succeeds. Thus it is a cooperative allocation mechanism : the
requester (process_session() in general) prefers not to get a buffer in
order to respect other's need for response buffers. The session management
code always knows if a buffer will be used for requests or responses, so
that is not difficult :
- either there's an applet on the initiator side and we really need
the request buffer (since currently the applet is called in the
context of the session)
- or we have a connection and we really need the response buffer (in
order to support building and sending an error message back)
This reserve ensures that we don't take all allocatable buffers for
requests waiting in a queue. The downside is that all the extra buffers
are really allocated to ensure they can be allocated. But with small
values it is not an issue.
With this change, we don't observe any more deadlocks even when running
with maxconn 1 on a server under severely constrained memory conditions.
The code becomes a bit tricky, it relies on the scheduler's run queue to
estimate how many sessions are already expected to run so that it doesn't
wake up everyone with too few resources. A better solution would probably
consist in having two queues, one for urgent requests and one for normal
requests. A failed allocation for a session dealing with an error, a
connection event, or the need for a response (or request when there's an
applet on the left) would go to the urgent request queue, while other
requests would go to the other queue. Urgent requests would be served
from 1 entry in the pool, while the regular ones would be served only
according to the reserve. Despite not yet having this, it works
remarkably well.
This mechanism is quite efficient, we don't perform too many wake up calls
anymore. For 1 million sessions elapsed during massive memory contention,
we observe about 4.5M calls to process_session() compared to 4.0M without
memory constraints. Previously we used to observe up to 16M calls, which
rougly means 12M failures.
During a test run under high memory constraints (limit enforced to 27 MB
instead of the 58 MB normally needed), performance used to drop by 53% prior
to this patch. Now with this patch instead it *increases* by about 1.5%.
The best effect of this change is that by limiting the memory usage to about
2/3 to 3/4 of what is needed by default, it's possible to increase performance
by up to about 18% mainly due to the fact that pools are reused more often
and remain hot in the CPU cache (observed on regular HTTP traffic with 20k
objects, buffers.limit = maxconn/10, buffers.reserve = limit/2).
Below is an example of scenario which used to cause a deadlock previously :
- connection is received
- two buffers are allocated in process_session() then released
- one is allocated when receiving an HTTP request
- the second buffer is allocated then released in process_session()
for request parsing then connection establishment.
- poll() says we can send, so the request buffer is sent and released
- process session gets notified that the connection is now established
and allocates two buffers then releases them
- all other sessions do the same till one cannot get the request buffer
without hitting the margin
- and now the server responds. stream_interface allocates the response
buffer and manages to get it since it's higher priority being for a
response.
- but process_session() cannot allocate the request buffer anymore
=> We could end up with all buffers used by responses so that none may
be allocated for a request in process_session().
When the applet processing leaves the session context, the test will have
to be changed so that we always allocate a response buffer regardless of
the left side (eg: H2->H1 gateway). A final improvement would consists in
being able to only retry the failed I/O operation without waking up a
task, but to date all experiments to achieve this have proven not to be
reliable enough.
This patch makes it possible to create binds and servers in separate
namespaces. This can be used to proxy between multiple completely independent
virtual networks (with possibly overlapping IP addresses) and a
non-namespace-aware proxy implementation that supports the proxy protocol (v2).
The setup is something like this:
net1 on VLAN 1 (namespace 1) -\
net2 on VLAN 2 (namespace 2) -- haproxy ==== proxy (namespace 0)
net3 on VLAN 3 (namespace 3) -/
The proxy is configured to make server connections through haproxy and sending
the expected source/target addresses to haproxy using the proxy protocol.
The network namespace setup on the haproxy node is something like this:
= 8< =
$ cat setup.sh
ip netns add 1
ip link add link eth1 type vlan id 1
ip link set eth1.1 netns 1
ip netns exec 1 ip addr add 192.168.91.2/24 dev eth1.1
ip netns exec 1 ip link set eth1.$id up
...
= 8< =
= 8< =
$ cat haproxy.cfg
frontend clients
bind 127.0.0.1:50022 namespace 1 transparent
default_backend scb
backend server
mode tcp
server server1 192.168.122.4:2222 namespace 2 send-proxy-v2
= 8< =
A bind line creates the listener in the specified namespace, and connections
originating from that listener also have their network namespace set to
that of the listener.
A server line either forces the connection to be made in a specified
namespace or may use the namespace from the client-side connection if that
was set.
For more documentation please read the documentation included in the patch
itself.
Signed-off-by: KOVACS Tamas <ktamas@balabit.com>
Signed-off-by: Sarkozi Laszlo <laszlo.sarkozi@balabit.com>
Signed-off-by: KOVACS Krisztian <hidden@balabit.com>
Google's boringssl doesn't have OPENSSL_VERSION_TEXT, SSLeay_version()
or SSLEAY_VERSION, in fact, it doesn't have any real versioning, its
just git-based.
So in case we build against boringssl, we can't access those values.
Instead, we just inform the user that HAProxy was build against
boringssl.
Signed-off-by: Lukas Tribus <luky-37@hotmail.com>
This patch remove all references of standard regex in haproxy. The last
remaining references are only in the regex.[ch] files.
In the file src/checks.c, the original function uses a "pmatch" array.
In fact this array is unused. This patch remove it.
This patch adds two new actions to http-request and http-response rulesets :
- replace-header : replace a whole header line, suited for headers
which might contain commas
- replace-value : replace a single header value, suited for headers
defined as lists.
The match consists in a regex, and the replacement string takes a log-format
and supports back-references.
When no static DH parameters are specified, this patch makes haproxy
use standardized (rfc 2409 / rfc 3526) DH parameters with prime lenghts
of 1024, 2048, 4096 or 8192 bits for DHE key exchange. The size of the
temporary/ephemeral DH key is computed as the minimum of the RSA/DSA server
key size and the value of a new option named tune.ssl.default-dh-param.
Using the systemd daemon mode the parent doesn't exits but waits for
his childs without closing its listening sockets.
As linux 3.9 introduced a SO_REUSEPORT option (always enabled in
haproxy if available) this will give unhandled connections problems
after an haproxy reload with open connections.
The problem is that when on reload a new parent is started (-Ds
$oldchildspids), in haproxy.c main there's a call to start_proxies
that, without SO_REUSEPORT, should fail (as the old processes are
already listening) and so a SIGTOU is sent to old processes. On this
signal the old childs will call (in pause_listener) a shutdown() on
the listening fd. From my tests (if I understand it correctly) this
affects the in kernel file (so the listen is really disabled for all
the processes, also the parent).
Instead, with SO_REUSEPORT, the call to start_proxies doesn't fail and
so SIGTOU is never sent. Only SIGUSR1 is sent and the listen isn't
disabled for the parent but only the childs will stop listening (with
a call to close())
So, with SO_REUSEPORT, the old childs will close their listening
sockets but will wait for the current connections to finish or
timeout, and, as their parent has its listening socket open, the
kernel will schedule some connections on it. These connections will
never be accepted by the parent as it's in the waitpid loop.
This fix will close all the listeners on the parent before entering the
waitpid loop.
Signed-off-by: Simone Gotti <simone.gotti@gmail.com>
Servers used to have 3 flags to store a state, now they have 4 states
instead. This avoids lots of confusion for the 4 remaining undefined
states.
The encoding from the previous to the new states can be represented
this way :
SRV_STF_RUNNING
| SRV_STF_GOINGDOWN
| | SRV_STF_WARMINGUP
| | |
0 x x SRV_ST_STOPPED
1 0 0 SRV_ST_RUNNING
1 0 1 SRV_ST_STARTING
1 1 x SRV_ST_STOPPING
Note that the case where all bits were set used to exist and was randomly
dealt with. For example, the task was not stopped, the throttle value was
still updated and reported in the stats and in the http_server_state header.
It was the same if the server was stopped by the agent or for maintenance.
It's worth noting that the internal function names are still quite confusing.
Till now, the server's state and flags were all saved as a single bit
field. It causes some difficulties because we'd like to have an enum
for the state and separate flags.
This commit starts by splitting them in two distinct fields. The first
one is srv->state (with its counter-part srv->prev_state) which are now
enums, but which still contain bits (SRV_STF_*).
The flags now lie in their own field (srv->flags).
The function srv_is_usable() was updated to use the enum as input, since
it already used to deal only with the state.
Note that currently, the maintenance mode is still in the state for
simplicity, but it must move as well.
Some consistency checks cannot be performed between frontends, backends
and peers at the moment because there is no way to check for intersection
between processes bound to some processes when the number of processes is
higher than the number of bits in a word.
So first, let's limit the number of processes to the machine's word size.
This means nbproc will be limited to 32 on 32-bit machines and 64 on 64-bit
machines. This is far more than enough considering that configs rarely go
above 16 processes due to scalability and management issues, so 32 or 64
should be fine.
This way we'll ensure we can always build a mask of all the processes a
section is bound to.
Since it became possible to use log-format expressions in use_backend,
having a mandatory condition becomes annoying because configurations
are full of "if TRUE". Let's relax the check to accept no condition
like many other keywords (eg: redirect).
When compiled with USE_GETADDRINFO, make sure we use getaddrinfo(3) to
perform name lookups. On default dual-stack setups this will change the
behavior of using IPv6 first. Global configuration option
'nogetaddrinfo' can be used to revert to deprecated gethostbyname(3).
The pattern reference are stored with two identifiers: the unique_id and
the reference.
The reference identify a file. Each file with the same name point to the
same reference. We can register many times one file. If the file is
modified, all his dependencies are also modified. The reference can be
used with map or acl.
The unique_id identify inline acl. The unique id is unique for each acl.
You cannot force the same id in the configuration file, because this
repport an error.
The format of the acl and map listing through the "socket" has changed
for displaying these new ids.
Sometimes it can be useful to generate a random value, at least
for debugging purposes, but also to take routing decisions or to
pass such a value to a backend server.
The ability to globally override the default client and server cipher
suites has been requested multiple times since the introduction of SSL.
This commit adds two new keywords to the global section for this :
- ssl-default-bind-ciphers
- ssl-default-server-ciphers
It is still possible to preset them at build time by setting the macros
LISTEN_DEFAULT_CIPHERS and CONNECT_DEFAULT_CIPHERS.
The new tune.idletimer value allows one to set a different value for
idle stream detection. The default value remains set to one second.
It is possible to disable it using zero, and to change the default
value at build time using DEFAULT_IDLE_TIMER.
Released version 1.5-dev22 with the following main changes :
- MEDIUM: tcp-check new feature: connect
- MEDIUM: ssl: Set verify 'required' as global default for servers side.
- MINOR: ssl: handshake optim for long certificate chains.
- BUG/MINOR: pattern: pattern comparison executed twice
- BUG/MEDIUM: map: segmentation fault with the stats's socket command "set map ..."
- BUG/MEDIUM: pattern: Segfault in binary parser
- MINOR: pattern: move functions for grouping pat_match_* and pat_parse_* and add documentation.
- MINOR: standard: The parse_binary() returns the length consumed and his documentation is updated
- BUG/MINOR: payload: the patterns of the acl "req.ssl_ver" are no parsed with the good function.
- BUG/MEDIUM: pattern: "pat_parse_dotted_ver()" set bad expect_type.
- BUG/MINOR: sample: The c_str2int converter does not fail if the entry is not an integer
- BUG/MEDIUM: http/auth: Sometimes the authentication credentials can be mix between two requests
- MINOR: doc: Bad cli function name.
- MINOR: http: smp_fetch_capture_header_* fetch captured headers
- BUILD: last release inadvertently prepended a "+" in front of the date
- BUG/MEDIUM: stream-int: fix the keep-alive idle connection handler
- BUG/MEDIUM: backend: do not re-initialize the connection's context upon reuse
- BUG: Revert "OPTIM/MEDIUM: epoll: fuse active events into polled ones during polling changes"
- BUG/MINOR: checks: successful check completion must not re-enable MAINT servers
- MINOR: http: try to stick to same server after status 401/407
- BUG/MINOR: http: always disable compression on HTTP/1.0
- OPTIM: poll: restore polling after a poll/stop/want sequence
- OPTIM: http: don't stop polling for read on the client side after a request
- BUG/MEDIUM: checks: unchecked servers could not be enabled anymore
- BUG/MEDIUM: stats: the web interface must check the tracked servers before enabling
- BUG/MINOR: channel: CHN_INFINITE_FORWARD must be unsigned
- BUG/MINOR: stream-int: do not clear the owner upon unregister
- MEDIUM: stats: add support for HTTP keep-alive on the stats page
- BUG/MEDIUM: stats: fix HTTP/1.0 breakage introduced in previous patch
- Revert "MEDIUM: stats: add support for HTTP keep-alive on the stats page"
- MAJOR: channel: add a new flag CF_WAKE_WRITE to notify the task of writes
- OPTIM: session: set the READ_DONTWAIT flag when connecting
- BUG/MINOR: http: don't clear the SI_FL_DONT_WAKE flag between requests
- MINOR: session: factor out the connect time measurement
- MEDIUM: session: prepare to support earlier transitions to the established state
- MEDIUM: stream-int: make si_connect() return an established state when possible
- MINOR: checks: use an inline function for health_adjust()
- OPTIM: session: put unlikely() around the freewheeling code
- MEDIUM: config: report a warning when multiple servers have the same name
- BUG: Revert "OPTIM: poll: restore polling after a poll/stop/want sequence"
- BUILD/MINOR: listener: remove a glibc warning on accept4()
- BUG/MAJOR: connection: fix mismatch between rcv_buf's API and usage
- BUILD: listener: fix recent accept4() again
- BUG/MAJOR: ssl: fix breakage caused by recent fix abf08d9
- BUG/MEDIUM: polling: ensure we update FD status when there's no more activity
- MEDIUM: listener: fix polling management in the accept loop
- MINOR: protocol: improve the proto->drain() API
- MINOR: connection: add a new conn_drain() function
- MEDIUM: tcp: report in tcp_drain() that lingering is already disabled on close
- MEDIUM: connection: update callers of ctrl->drain() to use conn_drain()
- MINOR: connection: add more error codes to report connection errors
- MEDIUM: tcp: report connection error at the connection level
- MEDIUM: checks: make use of chk_report_conn_err() for connection errors
- BUG/MEDIUM: unique_id: HTTP request counter is not stable
- DOC: fix misleading information about SIGQUIT
- BUG/MAJOR: fix freezes during compression
- BUG/MEDIUM: stream-interface: don't wake the task up before end of transfer
- BUILD: fix VERDATE exclusion regex
- CLEANUP: polling: rename "spec_e" to "state"
- DOC: add a diagram showing polling state transitions
- REORG: polling: rename "spec_e" to "state" and "spec_p" to "cache"
- REORG: polling: rename "fd_spec" to "fd_cache"
- REORG: polling: rename the cache allocation functions
- REORG: polling: rename "fd_process_spec_events()" to "fd_process_cached_events()"
- MAJOR: polling: rework the whole polling system
- MAJOR: connection: remove the CO_FL_WAIT_{RD,WR} flags
- MEDIUM: connection: remove conn_{data,sock}_poll_{recv,send}
- MEDIUM: connection: add check for readiness in I/O handlers
- MEDIUM: stream-interface: the polling flags must always be updated in chk_snd_conn
- MINOR: stream-interface: no need to call fd_stop_both() on error
- MEDIUM: connection: no need to recheck FD state
- CLEANUP: connection: use conn_ctrl_ready() instead of checking the flag
- CLEANUP: connection: use conn_xprt_ready() instead of checking the flag
- CLEANUP: connection: fix comments in connection.h to reflect new behaviour.
- OPTIM: raw-sock: don't speculate after a short read if polling is enabled
- MEDIUM: polling: centralize polled events processing
- MINOR: polling: create function fd_compute_new_polled_status()
- MINOR: cli: add more information to the "show info" output
- MEDIUM: listener: add support for limiting the session rate in addition to the connection rate
- MEDIUM: listener: apply a limit on the session rate submitted to SSL
- REORG: stats: move the stats socket states to dumpstats.c
- MINOR: cli: add the new "show pools" command
- BUG/MEDIUM: counters: flush content counters after each request
- BUG/MEDIUM: counters: fix stick-table entry leak when using track-sc2 in connection
- MINOR: tools: add very basic support for composite pointers
- MEDIUM: counters: stop relying on session flags at all
- BUG/MINOR: cli: fix missing break in command line parser
- BUG/MINOR: config: correctly report when log-format headers require HTTP mode
- MAJOR: http: update connection mode configuration
- MEDIUM: http: make keep-alive + httpclose be passive mode
- MAJOR: http: switch to keep-alive mode by default
- BUG/MEDIUM: http: fix regression caused by recent switch to keep-alive by default
- BUG/MEDIUM: listener: improve detection of non-working accept4()
- BUILD: listener: add fcntl.h and unistd.h
- BUG/MINOR: raw_sock: correctly set the MSG_MORE flag
If no CA file specified on a server line, the config parser will show an error.
Adds an cmdline option '-dV' to re-set verify 'none' as global default on
servers side (previous behavior).
Also adds 'ssl-server-verify' global statement to set global default to
'none' or 'required'.
WARNING: this changes the default verify mode from "none" to "required" on
the server side, and it *will* break insecure setups.
It's becoming increasingly difficult to ignore unwanted function returns in
debug code with gcc. Now even when you try to work around it, it suggests a
way to write your code differently. For example :
src/frontend.c:187:65: warning: if statement has empty body [-Wempty-body]
if (write(1, trash.str, trash.len) < 0) /* shut gcc warning */;
^
src/frontend.c:187:65: note: put the semicolon on a separate line to silence this warning
1 warning generated.
This is totally unacceptable, this code already had to be written this way
to shut it up in earlier versions. And now it comments the form ? What's the
purpose of the C language if you can't write anymore the code that does what
you want ?
Emeric proposed to just keep a global variable to drain such useless results
so that gcc stops complaining all the time it believes people who write code
are monkeys. The solution is acceptable because the useless assignment is done
only in debug code so it will not impact performance. This patch implements
this, until gcc becomes even "smarter" to detect that we tried to cheat.
We handle "http-request redirect" with a log-format string now, but we
leave "redirect" unaffected.
Note that the control of the special "/" case is move from the runtime
execution to the configuration parsing. If the format rule list is
empty, the build_logline() function does nothing.
Allow an auxiliary agent check to be run independently of the
regular a regular health check. This is enabled by the agent-check
server setting.
The agent-port, which specifies the TCP port to use for the agent's
connections, is required.
The agent-inter, which specifies the interval between agent checks and
timeout of agent checks, is optional. If not set the value for regular
checks is used.
e.g.
server web1_1 127.0.0.1:80 check agent-port 10000
If either the health or agent check determines that a server is down
then it is marked as being down, otherwise it is marked as being up.
An agent health check performed by opening a TCP socket and reading an
ASCII string. The string should have one of the following forms:
* An ASCII representation of an positive integer percentage.
e.g. "75%"
Values in this format will set the weight proportional to the initial
weight of a server as configured when haproxy starts.
* The string "drain".
This will cause the weight of a server to be set to 0, and thus it
will not accept any new connections other than those that are
accepted via persistence.
* The string "down", optionally followed by a description string.
Mark the server as down and log the description string as the reason.
* The string "stopped", optionally followed by a description string.
This currently has the same behaviour as "down".
* The string "fail", optionally followed by a description string.
This currently has the same behaviour as "down".
Signed-off-by: Simon Horman <horms@verge.net.au>
Both static-rr and hash with type map-based call init_server_map() to allocate
server map, so the server map should be freed while doing cleanup if one of
the above load balance algorithms is used.
Signed-off-by: Godbach <nylzhaowei@gmail.com>
[wt: removed the unneeded "if" before the free]
Both fdinfo and fdtab are allocated memory in init() while haproxy is starting,
but only fdtab is freed in deinit(), fdinfo should also be freed.
Signed-off-by: Godbach <nylzhaowei@gmail.com>
FreeBSD uses (IPPROTO_IP, IP_BINDANY) and (IPPROTO_IPV6, IPV6_BINDANY)
to enable transparent proxy on a socket.
This patch adds support for the relevant setsockopt() calls.
This patch does not change the logic of the code, it only changes the
way OS-specific defines are tested.
At the moment the transparent proxy code heavily depends on Linux-specific
defines. This first patch introduces a new define "CONFIG_HAP_TRANSPARENT"
which is set every time the defines used by transparent proxy are present.
This also means that with an up-to-date libc, it should not be necessary
anymore to force CONFIG_HAP_LINUX_TPROXY during the build, as the flags
will automatically be detected.
The CTTPROXY flags still remain separate because this older API doesn't
work the same way.
A new line has been added in the version output for haproxy -vv to indicate
what transparent proxy support is available.
It happens that openssl's API can differ between versions, causing some
serious trouble if the version used at runtime is not the same as used
for building.
Now we report the two versions separately along with a warning if the
version differs (except the patch version).
Since ea68d36 we show whether JIT is enabled, based on the USE-flag
(USE_PCRE_JIT). This is too naive; libpcre may be built without JIT
support (which is the default).
Fix this by calling pcre_config(), which has the accurate information
we are looking for.
Example of a libpcre without JIT support after this patch:
> ./haproxy -vv | grep PCRE
> OPTIONS = USE_STATIC_PCRE=1 USE_PCRE_JIT=1
> Built with PCRE version : 8.32 2012-11-30
> PCRE library supports JIT : no (libpcre build without JIT?)
Commit a4312fa2 merged into dev18 improved log-format management by
processing "log-format" and "unique-id-format" where they were declared,
so that the faulty args could be reported with their correct line numbers.
Unfortunately, the log-format parser considers the proxy mode (TCP/HTTP)
and now if the directive is set before the "mode" statement, it can be
rejected and report warnings.
So we really need to parse these directives at the end of a section at
least. Right now we do not have an "end of section" event, so we need
to store the file name and line number for each of these directives,
and take care of them at the end.
One of the benefits is that now the line numbers can be inherited from
the line passing "option httplog" even if it's in a defaults section.
Future improvements should be performed to report line numbers in every
log-format processed by the parser.
haproxy -vv shows build informations about USE flags and lib versions.
This patch introduces informations about PCRE and the new JIT feature.
It also makes USE_PCRE_JIT=1 appear in the haproxy -vv "OPTIONS".
This is useful since with the introduction of JIT we will see libpcre
related issues.
Released version 1.5-dev18 with the following main changes :
- DOCS: Add explanation of intermediate certs to crt paramater
- DOC: typo and minor fixes in compression paragraph
- MINOR: config: http-request configuration error message misses new keywords
- DOC: minor typo fix in documentation
- BUG/MEDIUM: ssl: ECDHE ciphers not usable without named curve configured.
- MEDIUM: ssl: add bind-option "strict-sni"
- MEDIUM: ssl: add mapping from SNI to cert file using "crt-list"
- MEDIUM: regex: Use PCRE JIT in acl
- DOC: simplify bind option "interface" explanation
- DOC: tfo: bump required kernel to linux-3.7
- BUILD: add explicit support for TFO with USE_TFO
- MEDIUM: New cli option -Ds for systemd compatibility
- MEDIUM: add haproxy-systemd-wrapper
- MEDIUM: add systemd service
- BUG/MEDIUM: systemd-wrapper: don't leak zombie processes
- BUG/MEDIUM: remove supplementary groups when changing gid
- BUG/MEDIUM: config: fix parser crash with bad bind or server address
- BUG/MINOR: Correct logic in cut_crlf()
- CLEANUP: checks: Make desc argument to set_server_check_status const
- CLEANUP: dumpstats: Make cli_release_handler() static
- MEDIUM: server: Break out set weight processing code
- MEDIUM: server: Allow relative weights greater than 100%
- MEDIUM: server: Tighten up parsing of weight string
- MEDIUM: checks: Add agent health check
- BUG/MEDIUM: ssl: openssl 0.9.8 doesn't open /dev/random before chroot
- BUG/MINOR: time: frequency counters are not totally accurate
- BUG/MINOR: http: don't process abortonclose when request was sent
- BUG/MEDIUM: stream_interface: don't close outgoing connections on shutw()
- BUG/MEDIUM: checks: ignore late resets after valid responses
- DOC: fix bogus recommendation on usage of gpc0 counter
- BUG/MINOR: http-compression: lookup Cache-Control in the response, not the request
- MINOR: signal: don't block SIGPROF by default
- OPTIM: epoll: make use of EPOLLRDHUP
- OPTIM: splice: detect shutdowns and avoid splice() == 0
- OPTIM: splice: assume by default that splice is working correctly
- BUG/MINOR: log: temporary fix for lost SSL info in some situations
- BUG/MEDIUM: peers: only the last peers section was used by tables
- BUG/MEDIUM: config: verbosely reject peers sections with multiple local peers
- BUG/MINOR: epoll: use a fix maxevents argument in epoll_wait()
- BUG/MINOR: config: fix improper check for failed memory alloc in ACL parser
- BUG/MINOR: config: free peer's address when exiting upon parsing error
- BUG/MINOR: config: check the proper variable when parsing log minlvl
- BUG/MEDIUM: checks: ensure the health_status is always within bounds
- BUG/MINOR: cli: show sess should always validate s->listener
- BUG/MINOR: log: improper NULL return check on utoa_pad()
- CLEANUP: http: remove a useless null check
- CLEANUP: tcp/unix: remove useless NULL check in {tcp,unix}_bind_listener()
- BUG/MEDIUM: signal: signal handler does not properly check for signal bounds
- BUG/MEDIUM: tools: off-by-one in quote_arg()
- BUG/MEDIUM: uri_auth: missing NULL check and memory leak on memory shortage
- BUG/MINOR: unix: remove the 'level' field from the ux struct
- CLEANUP: http: don't try to deinitialize http compression if it fails before init
- CLEANUP: config: slowstart is never negative
- CLEANUP: config: maxcompcpuusage is never negative
- BUG/MEDIUM: log: emit '-' for empty fields again
- BUG/MEDIUM: checks: fix a race condition between checks and observe layer7
- BUILD: fix a warning emitted by isblank() on non-c99 compilers
- BUILD: improve the makefile's support for libpcre
- MEDIUM: halog: add support for counting per source address (-ic)
- MEDIUM: tools: make str2sa_range support all address syntaxes
- MEDIUM: config: make use of str2sa_range() instead of str2sa()
- MEDIUM: config: use str2sa_range() to parse server addresses
- MEDIUM: config: use str2sa_range() to parse peers addresses
- MINOR: tests: add a config file to ease address parsing tests.
- MINOR: ssl: add a global tunable for the max SSL/TLS record size
- BUG/MINOR: syscall: fix NR_accept4 system call on sparc/linux
- BUILD/MINOR: syscall: add definition of NR_accept4 for ARM
- MINOR: config: report missing peers section name
- BUG/MEDIUM: tools: fix bad character handling in str2sa_range()
- BUG/MEDIUM: stats: never apply "unix-bind prefix" to the global stats socket
- MINOR: tools: prepare str2sa_range() to return an error message
- BUG/MEDIUM: checks: don't call connect() on unsupported address families
- MINOR: tools: prepare str2sa_range() to accept a prefix
- MEDIUM: tools: make str2sa_range() parse unix addresses too
- MEDIUM: config: make str2listener() use str2sa_range() to parse unix addresses
- MEDIUM: config: use a single str2sa_range() call to parse bind addresses
- MEDIUM: config: use str2sa_range() to parse log addresses
- CLEANUP: tools: remove str2sun() which is not used anymore.
- MEDIUM: config: add complete support for str2sa_range() in dispatch
- MEDIUM: config: add complete support for str2sa_range() in server addr
- MEDIUM: config: add complete support for str2sa_range() in 'server'
- MEDIUM: config: add complete support for str2sa_range() in 'peer'
- MEDIUM: config: add complete support for str2sa_range() in 'source' and 'usesrc'
- CLEANUP: minor cleanup in str2sa_range() and str2ip()
- CLEANUP: config: do not use multiple errmsg at once
- MEDIUM: tools: support specifying explicit address families in str2sa_range()
- MAJOR: listener: support inheriting a listening fd from the parent
- MAJOR: tools: support environment variables in addresses
- BUG/MEDIUM: http: add-header should not emit "-" for empty fields
- BUG/MEDIUM: config: ACL compatibility check on "redirect" was wrong
- BUG/MEDIUM: http: fix another issue caused by http-send-name-header
- DOC: mention the new HTTP 307 and 308 redirect statues
- MEDIUM: poll: do not use FD_* macros anymore
- BUG/MAJOR: ev_select: disable the select() poller if maxsock > FD_SETSIZE
- BUG/MINOR: acl: ssl_fc_{alg,use}_keysize must parse integers, not strings
- BUG/MINOR: acl: ssl_c_used, ssl_fc{,_has_crt,_has_sni} take no pattern
- BUILD: fix usual isdigit() warning on solaris
- BUG/MEDIUM: tools: vsnprintf() is not always reliable on Solaris
- OPTIM: buffer: remove one jump in buffer_count()
- OPTIM: http: improve branching in chunk size parser
- OPTIM: http: optimize the response forward state machine
- BUILD: enable poll() by default in the makefile
- BUILD: add explicit support for Mac OS/X
- BUG/MAJOR: http: use a static storage for sample fetch context
- BUG/MEDIUM: ssl: improve error processing and reporting in ssl_sock_load_cert_list_file()
- BUG/MAJOR: http: fix regression introduced by commit a890d072
- BUG/MAJOR: http: fix regression introduced by commit d655ffe
- BUG/CRITICAL: using HTTP information in tcp-request content may crash the process
- MEDIUM: acl: remove flag ACL_MAY_LOOKUP which is improperly used
- MEDIUM: samples: use new flags to describe compatibility between fetches and their usages
- MINOR: log: indicate it when some unreliable sample fetches are logged
- MEDIUM: samples: move payload-based fetches and ACLs to their own file
- MINOR: backend: rename sample fetch functions and declare the sample keywords
- MINOR: frontend: rename sample fetch functions and declare the sample keywords
- MINOR: listener: rename sample fetch functions and declare the sample keywords
- MEDIUM: http: unify acl and sample fetch functions
- MINOR: session: rename sample fetch functions and declare the sample keywords
- MAJOR: acl: make all ACLs reference the fetch function via a sample.
- MAJOR: acl: remove the arg_mask from the ACL definition and use the sample fetch's
- MAJOR: acl: remove fetch argument validation from the ACL struct
- MINOR: http: add new direction-explicit sample fetches for headers and cookies
- MINOR: payload: add new direction-explicit sample fetches
- CLEANUP: acl: remove ACL hooks which were never used
- MEDIUM: proxy: remove acl_requires and just keep a flag "http_needed"
- MINOR: sample: provide a function to report the name of a sample check point
- MAJOR: acl: convert all ACL requires to SMP use+val instead of ->requires
- CLEANUP: acl: remove unused references to ACL_USE_*
- MINOR: http: replace acl_parse_ver with acl_parse_str
- MEDIUM: acl: move the ->parse, ->match and ->smp fields to acl_expr
- MAJOR: acl: add option -m to change the pattern matching method
- MINOR: acl: remove the use_count in acl keywords
- MEDIUM: acl: have a pointer to the keyword name in acl_expr
- MEDIUM: acl: support using sample fetches directly in ACLs
- MEDIUM: http: remove val_usr() to validate user_lists
- MAJOR: sample: maintain a per-proxy list of the fetch args to resolve
- MINOR: ssl: add support for the "alpn" bind keyword
- MINOR: http: status code 303 is HTTP/1.1 only
- MEDIUM: http: implement redirect 307 and 308
- MINOR: http: status 301 should not be marked non-cacheable
ACL fetch functions used to directly reference a fetch function. Now
that all ACL fetches have their sample fetches equivalent, we can make
ACLs reference a sample fetch keyword instead.
In order to simplify the code, a sample keyword name may be NULL if it
is the same as the ACL's, which is the most common case.
A minor change appeared, http_auth always expects one argument though
the ACL allowed it to be missing and reported as such afterwards, so
fix the ACL to match this. This is not really a bug.
Some recent glibc updates have added controls on FD_SET/FD_CLR/FD_ISSET
that crash the program if it tries to use a file descriptor larger than
FD_SETSIZE.
For this reason, we now control the compatibility between global.maxsock
and FD_SETSIZE, and refuse to use select() if there too many FDs are
expected to be used. Note that on Solaris, FD_SETSIZE is already forced
to 65536, and that FreeBSD and OpenBSD allow it to be redefined, though
this is not needed thanks to kqueue which is much more efficient.
In practice, since poll() is enabled on all targets, it should not cause
any problem, unless it is explicitly disabled.
This change must be backported to 1.4 because the crashes caused by glibc
have already been reported on this version.
This patch adds a new option "-Ds" which is exactly like "-D", but instead of
forking n times to get n jobs running and then exiting, prefers to wait for all the
children it just created. With this done, haproxy becomes more systemd-compliant,
without changing anything for other systems.
Signed-off-by: Marc-Antoine Perennou <Marc-Antoine@Perennou.com>
Without it, haproxy will retain the group membership of root, which may
give more access than intended to the process. For example, haproxy would
still be in the wheel group on Fedora 18, as seen with :
# haproxy -f /etc/haproxy/haproxy.cfg
# ps a -o pid,user,group,command | grep hapr
3545 haproxy haproxy haproxy -f /etc/haproxy/haproxy.cfg
4356 root root grep --color=auto hapr
# grep Group /proc/3545/status
Groups: 0 1 2 3 4 6 10
# getent group wheel
wheel❌10:root,misc
[WT: The issue has been investigated by independent security research team
and realized by itself not being able to allow security exploitation.
Additionally, dropping groups is not allowed to unprivileged users,
though this mode of deployment is quite common. Thus a warning is
emitted in this case to inform the user. The fix could be backported
into all supported versions as the issue has always been there. ]
At the moment, we need trash chunks almost everywhere and the only
correctly implemented one is in the sample code. Let's move this to
the chunks so that all other places can use this allocator.
Additionally, the get_trash_chunk() function now really returns two
different chunks. Previously it used to always overwrite the same
chunk and point it to a different buffer, which was a bit tricky
because it's not obvious that two consecutive results do alias each
other.
global.tune.maxaccept was used for all listeners. This becomes really not
convenient when some listeners are bound to a single process and other ones
are bound to many processes.
Now we change the principle : we count the number of processes a listener
is bound to, and apply the maxaccept either entirely if there is a single
process, or divided by twice the number of processes in order to maintain
fairness.
The default limit has also been increased from 32 to 64 as it appeared that
on small machines, 32 was too low to achieve high connection rates.
The new "cpu-map" directive allows one to assign the CPU sets that
a process is allowed to bind to. This is useful in combination with
the "nbproc" and "bind-process" directives.
The support is implicit on Linux 2.6.28 and above.
Having nbproc preinitialized to zero is really annoying as it prevents
some checks from being correctly performed. Also the check to prevent
nbproc from being redefined is totally useless, so let's preset it to
1 and remove the test.
This is done by passing the default value to SSLCACHESIZE in sessions.
User can use tune.sslcachesize to change this value.
By default, it is set to 20000 sessions as openssl internal cache size.
Currently, a session entry size is between 592 and 616 bytes depending on the arch.
Now that all pollers make use of speculative I/O, there is no point
having two epoll implementations, so replace epoll with the sepoll code
and remove sepoll which has just become the standard epoll method.
Compression algorithms are not always supported depending on build options.
"haproxy -vv" now reports if zlib is supported and lists compression algorithms
also supported.
This patch adds input and output rate calcutation on the HTTP compresion
feature.
Compression can be limited with a maximum rate value in kilobytes per
second. The rate is set with the global 'maxcomprate' option. You can
change this value dynamicaly with 'set rate-limit http-compression
global' on the UNIX socket.
With the global maxzlibmem option, you are able ton control the maximum
amount of RAM usable for HTTP compression.
A test is done before each zlib allocation, if the there isn't available
memory, the test fail and so the zlib initialization, so data won't be
compressed.
The window size and the memlevel of the zlib are now configurable using
global options tune.zlib.memlevel and tune.zlib.windowsize.
It affects the memory consumption of the zlib.
Keys are copied from samples to stick_table_key. If a key is larger
than the stick_table_key, we have an overflow. In pratice it does not
happen because it requires :
1) a configuration with tune.bufsize larger than BUFSIZE (common)
2) a stick-table configured with keys strictly larger than buffers
3) extraction of data larger than BUFSIZE (eg: using payload())
Points 2 and 3 don't make any sense for a real world configuration. That
said the issue needs be fixed. The solution consists in allocating it the
same size as the global buffer size, just like the samples. This fixes the
issue.
Sample conversions rely on two alternative buffers which were previously
allocated as static bufs of size BUFSIZE. Now they're initialized to the
global buffer size. It was the same for HTTP authentication. Note that it
seems that none of them was prone to any mistake when dealing with the
buffer size, but better stay on the safe side by maintaining the old
assumption that a trash buffer is always "large enough".
The trash is used everywhere to store the results of temporary strings
built out of s(n)printf, or as a storage for a chunk when chunks are
needed.
Using global.tune.bufsize is not the most convenient thing either.
So let's replace trash with a chunk and directly use it as such. We can
then use trash.size as the natural way to get its size, and get rid of
many intermediary chunks that were previously used.
The patch is huge because it touches many areas but it makes the code
a lot more clear and even outlines places where trash was used without
being that obvious.
We will need to be able to switch server connections on a session and
to keep idle connections. In order to achieve this, the preliminary
requirement is that the connections can survive the session and be
detached from them.
Right now they're still allocated at exactly the same place, so when
there is a session, there are always 2 connections. We could soon
improve on this by allocating the outgoing connection only during a
connect().
This current patch touches a lot of code and intentionally does not
change any functionnality. Performance tests show no regression (even
a very minor improvement). The doc has not yet been updated.
From the beginning it has been said that -D must always be used on the
command line from startup scripts so that haproxy does not accidentally
stay in foreground when loaded from init script... Except that this has
not been true for a long time now.
The fix is easy and must be backported to 1.4 too which is affected.
ACL and sample fetches use args list and it is really not convenient to
check for null args everywhere. Now for empty args we pass a constant
list of end of lists. It will allow us to remove many useless checks.
With this commit, we now separate the channel from the buffer. This will
allow us to replace buffers on the fly without touching the channel. Since
nobody is supposed to keep a reference to a buffer anymore, doing so is not
a problem and will also permit some copy-less data manipulation.
Interestingly, these changes have shown a 2% performance increase on some
workloads, probably due to a better cache placement of data.
These ones are used to set the default ciphers suite on "bind" lines and
"server" lines respectively, instead of using OpenSSL's defaults. These
are probably mainly useful for distro packagers.
Till now the request was made in the trash and sent to the network at
once, and the response was read into a preallocated char[]. Now we
allocate a full buffer for both the request and the response, and make
use of it.
Some of the operations will probably be replaced later with buffer macros
but the point was to ensure we could migrate to use the data layers soon.
One nice improvement caused by this change is that requests are now formed
at the beginning of the check and may safely be sent in multiple chunks if
needed.
The health checks in the servers are becoming a real mess, move them
into their own subsection. We'll soon need to have a struct buffer to
replace the char * as well as check-specific protocol and transport
layers.
Each proxy contains a reference to the original config file and line
number where it was declared. The pointer used is just a reference to
the one passed to the function instead of being duplicated. The effect
is that it is not valid anymore at the end of the parsing and that all
proxies will be enumerated as coming from the same file on some late
configuration errors. This may happen for exmaple when reporting SSL
certificate issues.
By copying using strdup(), we avoid this issue.
1.4 has the same issue, though no report of the proxy file name is done
out of the config section. Anyway a backport is recommended to ease
post-mortem analysis.
Add keyword 'verify' on bind:
'verify none': authentication disabled (default)
'verify optional': accept connection without certificate
and process a verify if the client sent a certificate
'verify required': reject connection without certificate
and process a verify if the client send a certificate
Add keyword 'cafile' on bind:
'cafile <path>' path to a client CA file used to verify.
'crlfile <path>' path to a client CRL file used to verify.
Unix permissions are per-bind configuration line and not per listener,
so let's concretize this in the way the config is stored. This avoids
some unneeded loops to set permissions on all listeners.
The access level is not part of the unix perms so it has been moved
away. Once we can use str2listener() to set all listener addresses,
we'll have a bind keyword parser for this one.
Navigating through listeners was very inconvenient and error-prone. Not to
mention that listeners were linked in reverse order and reverted afterwards.
In order to definitely get rid of these issues, we now do the following :
- frontends have a dual-linked list of bind_conf
- frontends have a dual-linked list of listeners
- bind_conf have a dual-linked list of listeners
- listeners have a pointer to their bind_conf
This way we can now navigate from anywhere to anywhere and always find the
proper bind_conf for a given listener, as well as find the list of listeners
for a current bind_conf.
Some settings need to be merged per-bind config line and are not necessarily
SSL-specific. It becomes quite inconvenient to have this ssl_conf SSL-specific,
so let's replace it with something more generic.
Since it's common enough to discover that some config options are not
supported due to some openssl version or build options, we report the
relevant ones in "haproxy -vv".
A side effect of this change is that the "ssl" keyword on "bind" lines is now
just a boolean and that "crt" is needed to designate certificate files or
directories.
Note that much refcounting was needed to have the free() work correctly due to
the number of cert aliases which can make a context be shared by multiple names.
SSL config holds many parameters which are per bind line and not per
listener. Let's use a per-bind line config instead of having it
replicated for each listener.
At the moment we only do this for the SSL part but this should probably
evolved to handle more of the configuration and maybe even the state per
bind line.
SSL connections take a huge amount of memory, and unfortunately openssl
does not check malloc() returns and easily segfaults when too many
connections are used.
The only solution against this is to provide a global maxsslconn setting
to reject SSL connections above the limit in order to avoid reaching
unsafe limits.
Thomas Heil reported that when using nbproc > 1, his pidfiles were
regularly truncated. The issue could be tracked down to the presence
of a call to lseek(pidfile, 0, SEEK_SET) just before the close() call
in the children, resulting in the file being truncated by the children
while the parent was feeding it. This unexpected lseek() is transparently
performed by fclose().
Since there is no way to have the file automatically closed during the
fork, the only solution is to bypass the libc and use open/write/close
instead of fprintf() and fclose().
The issue was observed on eglibc 2.15.
This is a massive rename of most functions which should make use of the
word "channel" instead of the word "buffer" in their names.
In concerns the following ones (new names) :
unsigned long long channel_forward(struct channel *buf, unsigned long long bytes);
static inline void channel_init(struct channel *buf)
static inline int channel_input_closed(struct channel *buf)
static inline int channel_output_closed(struct channel *buf)
static inline void channel_check_timeouts(struct channel *b)
static inline void channel_erase(struct channel *buf)
static inline void channel_shutr_now(struct channel *buf)
static inline void channel_shutw_now(struct channel *buf)
static inline void channel_abort(struct channel *buf)
static inline void channel_stop_hijacker(struct channel *buf)
static inline void channel_auto_connect(struct channel *buf)
static inline void channel_dont_connect(struct channel *buf)
static inline void channel_auto_close(struct channel *buf)
static inline void channel_dont_close(struct channel *buf)
static inline void channel_auto_read(struct channel *buf)
static inline void channel_dont_read(struct channel *buf)
unsigned long long channel_forward(struct channel *buf, unsigned long long bytes)
Some functions provided by channel.[ch] have kept their "buffer" name because
they are really designed to act on the buffer according to some information
gathered from the channel. They have been moved together to the same place in
the file for better readability but they were not changed at all.
The "buffer" memory pool was also renamed "channel".
The "raw_sock" prefix will be more convenient for naming functions as
it will be prefixed with the data layer and suffixed with the data
direction. So let's rename the files now to avoid any further confusion.
The #include directive was also removed from a number of files which do
not need it anymore.
In an attempt to get rid of fdtab[].state, and to move the relevant
parts to the connection struct, we remove the FD_STCLOSE state which
can easily be deduced from the <owner> pointer as there is a 1:1 match.
When passing arguments to ACLs and samples, some types are stored as
strings then resolved later after config parsing is done. Upon exit,
the arguments need to be freed only if the string was not resolved
yet. At the moment we can encounter double free during deinit()
because some arguments (eg: userlists) are freed once as their own
type and once as a string.
The solution consists in adding an "unresolved" flag to the args to
say whether the value is still held in the <str> part or is final.
This could be debugged thanks to a useful bug report from Sander Klein.
Option httplog needs to be checked only once the proxy has been validated,
so that its final mode (tcp/http) can be used. Also we need to check for
httplog before checking the log format, so that we can report a warning
about this specific option and not about the format it implies.
Before it was possible to resize the buffers using global.tune.bufsize,
the trash has always been the size of a buffer by design. Unfortunately,
the recent buffer sizing at runtime forgot to adjust the trash, resulting
in it being too short for content rewriting if buffers were enlarged from
the default value.
The bug was encountered in 1.4 so the fix must be backported there.
We'll soon have an SSL socket layer, and in order to ease the difference
between the two, we use the name "sock_raw" to designate the one which
directly talks to the sockets without any conversion.
From time to time, some bugs are discovered that are caused by non-initialized
memory areas. It happens that most platforms return a zero-filled area upon
first malloc() thus hiding potential bugs. This patch also replaces malloc()
in pools with calloc() to ensure that all platforms exhibit the same behaviour
upon startup. In order to catch these bugs more easily, add a -dM command line
flag to enable memory poisonning. Optionally, passing -dM<byte> forces the
poisonning byte to <byte>.
This is mainly a massive renaming in the code to get it in line with the
calling convention. Next patch will rename a few files to complete this
operation.
arg_i was almost unused, and since we migrated to use struct arg everywhere,
the rare cases where arg_i was needed could be replaced by switching to
arg->type = ARGT_STOP.
There were a few unchecked write() calls in the debug code that cause
gcc 4.x to emit warnings on recent libc. We don't want to check them
as we can't make anything from the result, let's simply surround them
with an empty if statement.
Note that one of the warnings was for chdir("/") which normally cannot
fail since it follows a successful chroot (which means the perms are
necessarily there). Anyway let's move the call uppe to protect it too.
%Fi: Frontend IP
%Fp: Frontend Port
%Si: Server IP
%Sp: Server Port
%Ts: Timestamp
%rt: HTTP request counter
%H: hostname
%pid: PID
+X: Hexadecimal represenation
The +X mode in logformat displays hexadecimal for the following flags
%Ci %Cp %Fi %Fp %Bi %Bp %Si %Sp %Ts %ct %pid
rename logformat_write_string() to lf_text()
Optimize size computation
Sometimes it is desirable to forward a particular request to a specific
server without having to declare a dedicated backend for this server. This
can be achieved using the "use-server" rules. These rules are evaluated after
the "redirect" rules and before evaluating cookies, and they have precedence
on them. There may be as many "use-server" rules as desired. All of these
rules are evaluated in their declaration order, and the first one which
matches will assign the server.
Released version 1.5-dev8 with the following main changes :
- MINOR: patch for minor typo (ressources/resources)
- MEDIUM: http: add support for sending the server's name in the outgoing request
- DOC: mention that default checks are TCP connections
- BUG/MINOR: fix options forwardfor if-none when an alternative header name is specified
- CLEANUP: Make check_statuses, analyze_statuses and process_chk static
- CLEANUP: Fix HCHK spelling errors
- BUG/MINOR: fix typo in processing of http-send-name-header
- MEDIUM: log: Use linked lists for loggers
- BUILD: fix declaration inside a scope block
- REORG: log: split send_log function
- MINOR: config: Parse the string of the log-format config keyword
- MINOR: add ultoa, ulltoa, ltoa, lltoa implementations
- MINOR: Date and time fonctions that don't use snprintf
- MEDIUM: log: make http_sess_log use log_format
- DOC: log-format documentation
- MEDIUM: log: use log_format for mode tcplog
- MEDIUM: log-format: backend source address %Bi %Bp
- BUG/MINOR: log-format: fix %o flag
- BUG/MEDIUM: bad length in log_format and __send_log
- MINOR: logformat %st is signed
- BUILD/MINOR: fix the source URL in the spec file
- DOC: acl is http_first_req, not http_req_first
- BUG/MEDIUM: don't trim last spaces from headers consisting only of spaces
- MINOR: acl: add new matches for header/path/url length
- BUILD: halog: make halog build on solaris
- BUG/MINOR: don't use a wrong port when connecting to a server with mapped ports
- MINOR: remove the client/server side distinction in SI addresses
- MINOR: halog: add support for matching queued requests
- DOC: indicate that cookie "prefix" and "indirect" should not be mixed
- OPTIM/MINOR: move struct sockaddr_storage to the tail of structs
- OPTIM/MINOR: make it possible to change pipe size (tune.pipesize)
- BUILD/MINOR: silent a build warning in src/pipe.c (fcntl)
- OPTIM/MINOR: move the hdr_idx pools out of the proxy struct
- MEDIUM: tune.http.maxhdr makes it possible to configure the maximum number of HTTP headers
- BUG/MINOR: fix a segfault when parsing a config with undeclared peers
- CLEANUP: rename possibly confusing struct field "tracked"
- BUG/MEDIUM: checks: fix slowstart behaviour when server tracking is in use
- MINOR: config: tolerate server "cookie" setting in non-HTTP mode
- MEDIUM: buffers: add some new primitives and rework existing ones
- BUG: buffers: don't return a negative value on buffer_total_space_res()
- MINOR: buffers: make buffer_pointer() support negative pointers too
- CLEANUP: kill buffer_replace() and use an inline instead
- BUG: tcp: option nolinger does not work on backends
- CLEANUP: ebtree: remove a few annoying signedness warnings
- CLEANUP: ebtree: clarify licence and update to 6.0.6
- CLEANUP: ebtree: remove 4-year old harmless typo in duplicates insertion code
- CLEANUP: ebtree: remove another typo, a wrong initialization in insertion code
- BUG: ebtree: ebst_lookup() could return the wrong entry
- OPTIM: stream_sock: reduce the amount of in-flight spliced data
- OPTIM: stream_sock: save a failed recv syscall when splice returns EAGAIN
- MINOR: acl: add support for TLS server name matching using SNI
- BUG: http: re-enable TCP quick-ack upon incomplete HTTP requests
- BUG: proto_tcp: don't try to bind to a foreign address if sin_family is unknown
- MINOR: pattern: export the global temporary pattern
- CLEANUP: patterns: get rid of pattern_data_setstring()
- MEDIUM: acl: use temp_pattern to store fetched information in the "method" match
- MINOR: acl: include pattern.h to make pattern migration more transparent
- MEDIUM: pattern: change the pattern data integer from unsigned to signed
- MEDIUM: acl: use temp_pattern to store any integer-type information
- MEDIUM: acl: use temp_pattern to store any address-type information
- CLEANUP: acl: integer part of acl_test is not used anymore
- MEDIUM: acl: use temp_pattern to store any string-type information
- CLEANUP: acl: remove last data fields from the acl_test struct
- MEDIUM: http: replace get_ip_from_hdr2() with http_get_hdr()
- MEDIUM: patterns: the hdr() pattern is now of type string
- DOC: add minimal documentation on how ACLs work internally
- DOC: add a coding-style file
- OPTIM: halog: keep a fast path for the lines-count only
- CLEANUP: silence a warning when building on sparc
- BUG: http: tighten the list of allowed characters in a URI
- MEDIUM: http: block non-ASCII characters in URIs by default
- DOC: add some documentation from RFC3986 about URI format
- BUG/MINOR: cli: correctly remove the whole table on "clear table"
- BUG/MEDIUM: correctly disable servers tracking another disabled servers.
- BUG/MEDIUM: zero-weight servers must not dequeue requests from the backend
- MINOR: halog: add some help on the command line
- BUILD: fix build error on FreeBSD
- BUG: fix double free in peers config error path
- MEDIUM: improve config check return codes
- BUILD: make it possible to look for pcre in the default system paths
- MINOR: config: emit a warning when 'default_backend' masks servers
- MINOR: backend: rework the LC definition to support other connection-based algos
- MEDIUM: backend: add the 'first' balancing algorithm
- BUG: fix httplog trailing LF
- MEDIUM: increase chunk-size limit to 2GB-1
- BUG: queue: fix dequeueing sequence on HTTP keep-alive sessions
- BUG: http: disable TCP delayed ACKs when forwarding content-length data
- BUG: checks: fix server maintenance exit sequence
- BUG/MINOR: stream_sock: don't remove BF_EXPECT_MORE and BF_SEND_DONTWAIT on partial writes
- DOC: enumerate valid status codes for "observe layer7"
- MINOR: buffer: switch a number of buffer args to const
- CLEANUP: silence signedness warning in acl.c
- BUG: stream_sock: si->release was not called upon shutw()
- MINOR: log: use "%ts" to log term status only and "%tsc" to log with cookie
- BUG/CRITICAL: log: fix risk of crash in development snapshot
- BUG/MAJOR: possible crash when using capture headers on TCP frontends
- MINOR: config: disable header captures in TCP mode and complain
parse_logformat_string: parse the string, detect the type: text,
separator or variable
parse_logformat_var: dectect variable name
parse_logformat_var_args: parse arguments and flags
add_to_logformat_list: add to the logformat linked list
When checking a configuration file using "-c -f xxx", sometimes it is
reported that a config is valid while it will later fail (eg: no enabled
listener). Instead, let's improve the return values :
- return 0 if config is 100% OK
- return 1 if config has errors
- return 2 if config is OK but no listener nor peer is enabled
This patch settles the 2 loggers limitation.
Loggers are now stored in linked lists.
Using "global log", the global loggers list content is added at the end
of the current proxy list. Each "log" entries are added at the end of
the proxy list.
"no log" flush a logger list.
Ludovic Levesque reported and diagnosed an annoying bug. When a server is
configured to track another one and has a slowstart interval set, it's
assigned a minimal weight when the tracked server goes back up but keeps
this weight forever.
This is because the throttling during the warmup phase is only computed
in the health checking function.
After several attempts to resolve the issue, the only real solution is to
split the check processing task in two tasks, one for the checks and one
for the warmup. Each server with a slowstart setting has a warmum task
which is responsible for updating the server's weight after a down to up
transition. The task does not run in othe situations.
In the end, the fix is neither complex nor long and should be backported
to 1.4 since the issue was detected there first.
It makes no sense to have one pointer to the hdr_idx pool in each proxy
struct since these pools do not depend on the proxy. Let's have a common
pool instead as it is already the case for other types.
Passing -C <dir> causes haproxy to chdir to <dir> before loading
any file. The argument may be passed anywhere on the command line.
A typical use case is :
$ haproxy -C /etc/haproxy -f global.cfg -f haproxy.cfg
The way the unix socket is initialized is awkward. Some of the settings are put
in the sockets itself, other ones in the backend. And more importantly the
global.maxsock value is adjusted so that the stats socket evades the global
maxconn value. This complexifies maxsock computations for nothing, since the
stats socket is not supposed to receive hundreds of concurrent connections when
the global maxconn is very low. What is needed however is to ensure that there
are always connections left for the stats socket even when traffic sockets are
saturated, but this guarantee is not offered anymore by current code.
So as of now, the stats socket is subject to the global maxconn limitation just
as any other socket until a reservation mechanism is implemented.
This global task is used to periodically check for end of resource shortage
and to try to enable queued listeners again. This is important in case some
temporary system-wide shortage is encountered, so that we don't have to wait
for an existing connection to be released before checking the queue again.
For situations where listeners are queued due to the global maxconn being
reached, the task is woken up at least every second. For situations where
a system resource shortage is detected (memory, sockets, ...) the task is
woken up at least every 100 ms. That way, recovery from severe events can
still be achieved under acceptable conditions.
This function is finally not needed anymore, as it has been replaced with
a per-proxy task that is scheduled when some limits are encountered on
incoming connections or when the process is stopping. The savings should
be noticeable on configs with a large number of proxies. The most important
point is that the rate limiting is now enforced in a clean and solid way.
When an accept() fails because of a connection limit or a memory shortage,
we now disable it and queue it so that it's dequeued only when a connection
is released. This has improved the behaviour of the process near the fd limit
as now a listener with a no connection (eg: stats) will not loop forever
trying to get its connection accepted.
The solution is still not 100% perfect, as we'd like to have this used when
proxy limits are reached (use a per-proxy list) and for safety, we'd need
to have dedicated tasks to periodically re-enable them (eg: to overcome
temporary system-wide resource limitations when no connection is released).
Managing listeners state is difficult because they have their own state
and can at the same time have theirs dictated by their proxy. The pause
is not done properly, as the proxy code is fiddling with sockets. By
introducing new functions such as pause_listener()/resume_listener(), we
make it a bit more obvious how/when they're supposed to be used. The
listen_proxies() function was also renamed to resume_proxies() since
it's only used for pause/resume.
This patch is the first in a series aiming at getting rid of the maintain_proxies
mess. In the end, proxies should not call enable_listener()/disable_listener()
anymore.
By default on a single process, we accept 100 connections at once. This is too
much on recent CPUs where the cache is constantly thrashing, because we visit
all those connections several times. We should batch the processing slightly
less so that all the accepted session may remain in cache during their initial
processing.
Lowering the batch size from 100 to 32 has changed the connection rate for
concurrencies between 5-10k from 67 kcps to 94 kcps on a Core i5 660 (4M L3),
and forward rates from 30k to 39.5k.
Tests on this hardware show that values between 10 and 30 seem to do the job fine.
The motivation for this is that when soft-restart is merged
it will be come more important to free all relevant memory in deinit()
Discovered using valgrind.
The motivation for this is that when soft-restart is merged
it will be come more important to free all relevant memory in deinit()
Discovered using valgrind.
The motivation for this is that when soft-restart is merged
it will be come more important to free all relevant memory in deinit()
Discovered using valgrind.
The motivation for this is that when soft-restart is merged
it will be come more important to free all relevant memory in deinit()
Discovered using valgrind.
And also rename "req_acl_rule" "http_req_rule". At the beginning that
was a bit confusing to me, especially the "req_acl" list which in fact
holds what we call rules. After some digging, it appeared that some
part of the code is 100% HTTP and not just related to authentication
anymore, so let's move that part to HTTP and keep the auth-only code
in auth.c.
It's very annoying that frontend and backend stats are merged because we
don't know what we're observing. For instance, if a "listen" instance
makes use of a distinct backend, it's impossible to know what the bytes_out
means.
Some points take care of not updating counters twice if the backend points
to the frontend, indicating a "listen" instance. The thing becomes more
complex when we try to add support for server side keep-alive, because we
have to maintain a pointer to the backend used for last request, and to
update its stats. But we can't perform such comparisons anymore because
the counters will not match anymore.
So in order to get rid of this situation, let's have both frontend AND
backend stats in the "struct proxy". We simply update the relevant ones
during activity. Some of them are only accounted for in the backend,
while others are just for frontend. Maybe we can improve a bit on that
later, but the essential part is that those counters now reflect what
they really mean.
As reported by the Loadbalancer.org team, it was not possible to bind
more than 1024 ports. This is because the process' limits were set after
trying to bind the sockets, which defeats their purpose.
This fix must be backported to 1.4 and 1.3.
One of the requirements we have is to run multiple instances of haproxy on a
single host; this is so that we can split the responsibilities (and change
permissions) between product teams. An issue we ran up against is how we
would distinguish between the logs generated by each instance. The solution
we came up with (please let me know if there is a better way) is to override
the application tag written to syslog. We can then configure syslog to write
these to different files.
I have attached a patch adding a global option 'log-tag' to override the
default syslog tag 'haproxy' (actually defaults to argv[0]).
Haproxy does not include the hostname rather the IP of the machine in
the syslog headers it sends. Unfortunately this means that for each log
line rsyslog does a reverse dns on the client IP and in the case of
non-routable IPs one gets the public hostname not the internal one.
While this is valid according to RFC3164 as one might imagine this is
troublsome if you have some machines with public IPs, internal IPs, no
reverse DNS entries, etc and you want a standardized hostname based log
directory structure. The rfc says the preferred value is the hostname.
This patch adds a global "log-send-hostname" statement which accepts an
optional string to force the host name. If unset, the local host name
is used.
HTTP content-based health checks will be involved in searching text in pages.
Some pages may not fit in the default buffer (16kB) and sometimes it might be
desired to have larger buffers in order to find patterns. Running checks on
smaller URIs is always preferred of course.
(cherry picked from commit 043f44aeb835f3d0b57626c4276581a73600b6b1)
In deinit(), it is possible that we first free the listeners, then
unbind them all. Right now this situation can't happen because the
only way to call deinit() is to pass via a soft-stop which will
already unbind all protocols. But later this might become a problem.
This counter is incremented for each incoming connection and each active
listener, and is used to prevent haproxy from stopping upon SIGUSR1. It
will thus be possible for some tasks in increment this counter in order
to prevent haproxy from dying until they have completed their job.
Signal zero is never delivered by the system. However having a signal to
which functions and tasks can subscribe to be notified of a stopping event
is useful. So this patch does two things :
1) allow signal zero to be delivered from any function of signal handler
2) make soft_stop() deliver this signal so that tasks can be notified of
a stopping condition.
The two new functions below make it possible to register any number
of functions or tasks to a system signal. They will be called in the
registration order when the signal is received.
struct sig_handler *signal_register_fct(int sig, void (*fct)(struct sig_handler *), int arg);
struct sig_handler *signal_register_task(int sig, struct task *task, int reason);
In case of binding failure during startup, we wait for some time sending
signals to old pids so that they release the ports we need. But if there
aren't any old pids anymore, it's useless to wait, we prefer to fail fast.
Along with this change, we now have the number of old pids really found
in the nb_oldpids variable.
The 'client.c' file now only contained frontend-specific functions,
so it has naturally be renamed 'frontend.c'. Same for client.h. This
has also been an opportunity to remove some cross references from
files that should not have depended on it.
In the end, this file should contain a protocol-agnostic accept()
code, which would initialize a session, task, etc... based on an
accept() from a lower layer. Right now there are still references
to TCP.
Apparently some systems define MSG_NOSIGNAL but do not necessarily
check it (or maybe binaries are built somewhere and used on older
versions). There were reports of very recent FreeBSD setups causing
SIGPIPEs, while older ones catch the signal. Recent FreeBSD manpages
indeed define MSG_NOSIGNAL.
So let's now unconditionnaly catch the signal. It's useless not to do
it for the rare cases where it's not needed (linux 2.4 and below).
We are seeing both real servers repeatedly going on- and off-line with
a period of tens of seconds. Packet tracing, stracing, and adding
debug code to HAProxy itself has revealed that the real servers are
always responding correctly, but HAProxy is sometimes receiving only
part of the response.
It appears that the real servers are sending the test page as three
separate packets. HAProxy receives the contents of one, two, or three
packets, apparently randomly. Naturally, the health check only
succeeds when all three packets' data are seen by HAProxy. If HAProxy
and the real servers are modified to use a plain HTML page for the
health check, the response is in the form of a single packet and the
checks do not fail.
(...)
I've added buffer and length variables to struct server, and allocated
space with the rest of the server initialisation.
(...)
It seems to be working fine in my tests, and handles check responses
that are bigger than the buffer.
Marcello Gorlani reported that at least on FreeBSD, a long hostname
was reported with garbage on the stats page. POSIX does not make it
mandatory for gethostname() to NULL-terminate the string in case of
truncation, and at least FreeBSD appears not to do it. So let's
force null-termination to keep safe.
The bounce realign function was algorithmically good but as expected
it was not cache-friendly. Using it with large requests caused so many
cache thrashing that the function itself could drain 70% of the total
CPU time for only 0.5% of the calls !
Revert back to a standard memcpy() using a specially allocated swap
buffer. We're now back to 2M req/s on pipelined requests.
Thich patch fixes cfgparser not to leak memory on each
default server statement and adds several missing free
calls in deinit():
- free(l->name)
- free(l->counters)
- free(p->desc);
- free(p->fwdfor_hdr_name);
None of them are critical, hopefully.
Released version 1.4-rc1 with the following main changes :
- [MEDIUM] add a maintenance mode to servers
- [MINOR] http-auth: last fix was wrong
- [CONTRIB] add base64rev-gen.c that was used to generate the base64rev table.
- [MINOR] Base64 decode
- [MINOR] generic auth support with groups and encrypted passwords
- [MINOR] add ACL_TEST_F_NULL_MATCH
- [MINOR] http-request: allow/deny/auth support for frontend/backend/listen
- [MINOR] acl: add http_auth and http_auth_group
- [MAJOR] use the new auth framework for http stats
- [DOC] add info about userlists, http-request and http_auth/http_auth_group acls
- [STATS] make it possible to change a CLI connection timeout
- [BUG] patterns: copy-paste typo in type conversion arguments
- [MINOR] pattern: make the converter more flexible by supporting void* and int args
- [MINOR] standard: str2mask: string to netmask converter
- [MINOR] pattern: add support for argument parsers for converters
- [MINOR] pattern: add the "ipmask()" converting function
- [MINOR] config: off-by-one in "stick-table" after list of converters
- [CLEANUP] acl, patterns: make use of my_strndup() instead of malloc+memcpy
- [BUG] restore accidentely removed line in last patch !
- [MINOR] checks: make the HTTP check code add the CRLF itself
- [MINOR] checks: add the server's status in the checks
- [BUILD] halog: make without arch-specific optimizations
- [BUG] halog: fix segfault in case of empty log in PCT mode (cherry picked from commit fe362fe476)
- [MINOR] http: disable keep-alive when process is going down
- [MINOR] acl: add build_acl_cond() to make it easier to add ACLs in config
- [CLEANUP] config: use build_acl_cond() instead of parse_acl_cond()
- [CLEANUP] config: use warnif_cond_requires_resp() to check for bad ACLs
- [MINOR] prepare req_*/rsp_* to receive a condition
- [CLEANUP] config: specify correct const char types to warnif_* functions
- [MEDIUM] config: factor out the parsing of 20 req*/rsp* keywords
- [MEDIUM] http: make the request filter loop check for optional conditions
- [MEDIUM] http: add support for conditional request filter execution
- [DOC] add some build info about the AIX platform (cherry picked from commit e41914c77e)
- [MEDIUM] http: add support for conditional request header addition
- [MEDIUM] http: add support for conditional response header rewriting
- [DOC] add some missing ACLs about response header matching
- [MEDIUM] http: add support for proxy authentication
- [MINOR] http-auth: make the 'unless' keyword work as expected
- [CLEANUP] config: use build_acl_cond() to simplify http-request ACL parsing
- [MEDIUM] add support for anonymous ACLs
- [MEDIUM] http: switch to tunnel mode after status 101 responses
- [MEDIUM] http: stricter processing of the CONNECT method
- [BUG] config: reset check request to avoid double free when switching to ssl/sql
- [MINOR] config: fix too large ssl-hello-check message.
- [BUG] fix error response in case of server error
Support the new syntax (http-request allow/deny/auth) in
http stats.
Now it is possible to use the same syntax is the same like in
the frontend/backend http-request access control:
acl src_nagios src 192.168.66.66
acl stats_auth_ok http_auth(L1)
stats http-request allow if src_nagios
stats http-request allow if stats_auth_ok
stats http-request auth realm LB
The old syntax is still supported, but now it is emulated
via private acls and an aditional userlist.
Add generic authentication & authorization support.
Groups are implemented as bitmaps so the count is limited to
sizeof(int)*8 == 32.
Encrypted passwords are supported with libcrypt and crypt(3), so it is
possible to use any method supported by your system. For example modern
Linux/glibc instalations support MD5/SHA-256/SHA-512 and of course classic,
DES-based encryption.
Cyril Bont found that when an error is detected in one config file, it
is also reported in all other ones, which is wrong. The fix obviously
consists in checking the return code from readcfgfile() and not the
accumulator.
Cameron Simpson reported an annoying case where haproxy simply reports
"Error(s) found in configuration file" when the file is not found or
not readable.
Fortunately the parsing function still returns -1 in case of open
error, so we're able to detect the issue from the caller and report
the corresponding errno message.
Some rarely information are stored in fdtab, making it larger for no
reason (source port ranges, remote address, ...). Such information
lie there because the checks can't find them anywhere else. The goal
will be to move these information to the stream interface once the
checks make use of it.
For now, we move them to an fdinfo array. This simple change might
have improved the cache hit ratio a little bit because a 0.5% of
performance increase has measured.
During troubleshooting, it's often useful to get the list of supported
pollers but until now it was required to have a working configuration
first. Since the pollers are known before main() is called, let's list
them with the build options.
This patch implements "description" (proxy and global) and "node" (global)
options, removes "node-name" and adds "show-node" & "show-desc" options
for "stats". It also changes the way the header lines (with proxy name) and
the statistics are displayed, so stats no longer look so clumsy with very
long names.
Instead of "node-name" it is possible to use show-node/show-desc with
an optional parameter that overrides a default node/description.
backend cust-0045
# report specific values for this customer
stats show-node Europe
stats show-desc Master node for Europe, Asia, Africa
Commit 27a674efb8 introduced the ability
to configure buffer sizes. Unfortunately, the pool was created before
the conf was read, so that is was always set to the default size.
In order to fix that, we delay the call to init_buffer(), which is not
a problem since nothing uses it during the initialization.
The new tune.bufsize and tune.maxrewrite global directives allow one to
change the buffer size and the maxrewrite size. Right now, setting bufsize
too low will block stats sockets which will not be able to write at all.
An error checking must be added to buffer_write_chunk() so that if it
cannot write its message to an empty buffer, it causes the caller to abort.
Creating a frontend for the global stats socket will help merge
unix sockets management with the other socket management. Since
frontends are huge structs, we only allocate it if required.
This Linux-specific option was never really used in production and
has since been superseded by new splicing options brought by recent
Linux kernels.
It caused several particular cases in the code because the kernel
would take care of the session without haproxy being able to do
anything on it, which became hard to handle in the new architecture.
Let's simply get rid of it now that there is a replacement available.
Do not exit early at the first error found while checking configuration
validity. This particularly helps spotting multiple wrong tracked server
names at once.
Try not to immediately exit on non-fatal errors while parsing the
global section, so that the user has a chance to get most of the
errors at once, which is quite convenient especially during config
checks with the -c argument. Some other errors such as unresolved
server names also don't make the parser exit too early.
We now support up to 10 distinct configuration files. They are
all loaded in the order defined by -f <file1> -f <file2> ...
This can be useful in order to store global, private, public,
etc... configurations in distinct files.
This is a first step towards support of multiple configuration files.
Now readcfgfile() only reads a file in memory and performs very minimal
parsing. The checks are performed afterwards.
When a new process fails to grab some ports, it sends a signal to
the old process in order to release them. Then it tries to bind
again. If it still fails (eg: one of the ports is bound to a
completely different process), it must send the continue signal
to the old process so that this one re-binds to the ports. This
is correctly done, but the newly bound ports are not released
first, which sometimes causes the old process to remain running
with no port bound. The fix simply consists in unbinding all
ports before sending the signal to the old process.
It is recommended to have -D in init scripts, but -D also implies
quiet mode, which hides warning messages, and both options are now
completely unrelated. Remove the implication to get warnings with
-D.
The small list of signals currently handled by haproxy were processed
as soon as they were received. This has caused trouble with calls to
pool_gc2() occuring in the middle of libc's memory management functions
seldom causing deadlocks preventing the old process from leaving.
Now these signals use the new async signal framework and are called
asynchronously, when there is no risk of recursion. This ensures more
reliable operation, especially for sensible processing such as memory
management.
The byte counters have long been 64-bit to avoid overflows. But with
several sites nowadays, we see session counters wrap around every 10-days
or so. So it was the moment to switch counters to 64-bit, including
error and warning counters which can theorically rise as fast as session
counters even if in practice there is very low risk.
The performance impact should not be noticeable since those counters are
only updated once per session. The stats output have been carefully checked
for proper types on both 32- and 64-bit platforms.
If we get very large data at once, it's almost certain that it's
worthless trying to read again, because we got everything we could
get.
Doing this has made all -EAGAIN disappear from splice reads. The
threshold has been put in the global tunable structures so that if
we one day want to make it accessible from user config, it will be
easy to do so.
On overloaded systems, it sometimes happens that hundreds or thousands
of incoming connections are queued in the system's backlog, and all get
dequeued at once. The problem is that when haproxy processes them and
does not apply any limit, this can take some time and the internal date
does not progress, resulting in wrong timer measures for all sessions.
The most common effect of this is that all of these sessions report a
large request time (around several hundreds of ms) which is in fact
caused by the time spent accepting other connections. This might happen
on shared systems when the machine swaps.
For this reason, we finally apply a reasonable limit even in mono-process
mode. Accepting 100 connections at once is fast enough for extreme cases
and will not cause that much of a trouble when the system is saturated.
The "bind-process" keyword lets the admin select which instances may
run on which process (in multi-process mode). It makes it easier to
more evenly distribute the load across multiple processes by avoiding
having too many listen to the same IP:ports.
Setting "nosplice" in the global section will disable the use of TCP
splicing (both tcpsplice and linux 2.6 splice). The same will be
achieved using the "-dS" parameter on the command line.
The global tuning options right now only concern the polling mechanisms,
and they are not in the global struct itself. It's not very practical to
add other options so let's move them to the global struct and remove
types/polling.h which was not used for anything else.
global.maxconn/4 seems to be a good hint for global.maxpipes when that
one must be guessed. If the limit is reached, it's still possible to
set it manually in the configuration.
Using pipe pools makes pipe management a lot easier. It also allows to
remove quite a bunch of #ifdefs in areas which depended on the presence
or not of support for kernel splicing.
The buffer now holds a pointer to a pipe structure which is always NULL
except if there are still data in the pipe. When it needs to use that
pipe, it dynamically allocates it from the pipe pool. When the data is
consumed, the pipe is immediately released.
That way, there is no need anymore to care about pipe closure upon
session termination, nor about pipe creation when trying to use
splice().
Another immediate advantage of this method is that it considerably
reduces the number of pipes needed to use splice(). Tests have shown
that even with 0.2 pipe per connection, almost all sessions can use
splice(), because the same pipe may be used by several consecutive
calls to splice().
If splicing is enabled in a backend, we need to guess how many
pipes will be needed. We used to rely on fullconn, but this leads
to non-working splicing when fullconn is not specified. So we now
fallback to global.maxconn.
This code provides support for linux 2.6 kernel splicing. This feature
appeared in kernel 2.6.25, but initial implementations were awkward and
buggy. A kernel >= 2.6.29-rc1 is recommended, as well as some optimization
patches.
Using pipes, this code is able to pass network data directly between
sockets. The pipes are a bit annoying to manage (fd creation, release,
...) but finally work quite well.
Preliminary tests show that on high bandwidths, there's a substantial
gain (approx +50%, only +20% with kernel workarounds for corruption
bugs). With 2000 concurrent connections, with Myricom NICs, haproxy
now more easily achieves 4.5 Gbps for 1 process and 6 Gbps for two
processes buffers. 8-9 Gbps are easily reached with smaller numbers
of connections.
We also try to splice out immediately after a splice in by making
profit from the new ability for a data producer to notify the
consumer that data are available. Doing this ensures that the
data are immediately transferred between sockets without latency,
and without having to re-poll. Performance on small packets has
considerably increased due to this method.
Earlier kernels return only one TCP segment at a time in non-blocking
splice-in mode, while newer return as many segments as may fit in the
pipe. To work around this limitation without hurting more recent kernels,
we try to collect as much data as possible, but we stop when we believe
we have read 16 segments, then we forward everything at once. It also
ensures that even upon shutdown or EAGAIN the data will be forwarded.
Some tricks were necessary because the splice() syscall does not make
a difference between missing data and a pipe full, it always returns
EAGAIN. The trick consists in stop polling in case of EAGAIN and a non
empty pipe.
The receiver waits for the buffer to be empty before using the pipe.
This is in order to avoid confusion between buffer data and pipe data.
The BF_EMPTY flag now covers the pipe too.
Right now the code is disabled by default. It needs to be built with
CONFIG_HAP_LINUX_SPLICE, and the instances intented to use splice()
must have "option splice-response" (or option splice-request) enabled.
It is probably desirable to keep a pool of pre-allocated pipes to
avoid having to create them for every session. This will be worked
on later.
Preliminary tests show very good results, even with the kernel
workaround causing one memcpy(). At 3000 connections, performance
has moved from 3.2 Gbps to 4.7 Gbps.
Three new options have been added when CONFIG_HAP_LINUX_SPLICE is
set :
- splice-request
- splice-response
- splice-auto
They are used to enable splicing per frontend/backend. They are also
supported in defaults sections. The "splice-auto" option is meant to
automatically turn splice on for buffers marked as fast streamers.
This should save quite a bunch of file descriptors.
It was required to add a new "options2" field to the proxy structure
because the original "options" is full.
When global.maxpipes is not set, it is automatically adjusted to
the max of the sums of all frontend's and backend's maxconns for
those which have at least one splice option enabled.
Josh Goebel reported that haproxy silently dies when it fails to
chroot. In fact, it does so when in daemon mode, because daemon
mode has been disabling output for ages.
Since the code has been reworked, this could have been changed
because there is no reason for this anymore, hence this patch.
(cherry picked from commit 304d6fb00f)
(cherry picked from commit 50b7f7f12c67322c793f50a6be009f0fd0eec1bb)
The INTBITS macro was found to be already defined on some platforms,
and to equal 32 (while INTBITS was 5 here). Due to pure luck, there
was no declaration conflict, but it's nonetheless a problem to fix.
Looking at the code showed that this macro was only used for left
shifts and nothing else anymore. So the replacement is obvious. The
new macro, BITS_PER_INT is more obviously correct.
It should be stated as a rule that a C file should never
include types/xxx.h when proto/xxx.h exists, as it gives
less exposure to declaration conflicts (one of which was
caught and fixed here) and it complicates the file headers
for nothing.
Only types/global.h, types/capture.h and types/polling.h
have been found to be valid includes from C files.
This is the first attempt at moving all internal parts from
using struct timeval to integer ticks. Those provides simpler
and faster code due to simplified operations, and this change
also saved about 64 bytes per session.
A new header file has been added : include/common/ticks.h.
It is possible that some functions should finally not be inlined
because they're used quite a lot (eg: tick_first, tick_add_ifset
and tick_is_expired). More measurements are required in order to
decide whether this is interesting or not.
Some function and variable names are still subject to change for
a better overall logics.
We now insert tasks in a certain sequence in the run queue.
The sorting key currently is the arrival order. It will now
be possible to apply a "nice" value to any task so that it
goes forwards or backwards in the run queue.
The calls to wake_expired_tasks() and maintain_proxies()
have been moved to the main run_poll_loop(), because they
had nothing to do in process_runnable_tasks().
The task_wakeup() function is not inlined anymore, as it was
only used at one place.
The qlist member of the task structure has been removed now.
The run_queue list has been replaced for an integer indicating
the number of tasks in the run queue.
The following config makes haproxy segfault on exit :
defaults
mode http
balance roundrobin
listen no-stats
bind :8001
listen stats
bind :8002
stats uri /stats
The simple fix is to ensure that p->uri_auth is not NULL
before dereferencing it.
The ultree code has been removed in favor of a simpler and
cleaner ebtree implementation. The eternity queue does not
need to exist anymore, and the pool_tree64 has been removed.
The ebtree node is stored in the task itself. The qlist list
header is still used by the run-queue, but will be able to
disappear once the run-queue uses ebtree too.
The first implementation of the monotonic clock did not verify
forward jumps. The consequence is that a fast changing time may
expire a lot of tasks. While it does seem minor, in fact it is
problematic because most machines which boot with a wrong date
are in the past and suddenly see their time jump by several
years in the future.
The solution is to check if we spent more apparent time in
a poller than allowed (with a margin applied). The margin
is currently set to 1000 ms. It should be large enough for
any poll() to complete.
Tests with randomly jumping clock show that the result is quite
accurate (error less than 1 second at every change of more than
one second).
If the system date is set backwards while haproxy is running,
some scheduled events are delayed by the amount of time the
clock went backwards. This is particularly problematic on
systems where the date is set at boot, because it seldom
happens that health-checks do not get sent for a few hours.
Before switching to use clock_gettime() on systems which
provide it, we can at least ensure that the clock is not
going backwards and maintain two clocks : the "date" which
represents what the user wants to see (mostly for logs),
and an internal date stored in "now", used for scheduled
events.
The dequeuing logic was completely wrong. First, a task was assigned
to all servers to process the queue, but this task was never scheduled
and was only woken up on session free. Second, there was no reservation
of server entries when a task was assigned a server. This means that
as long as the task was not connected to the server, its presence was
not accounted for. This was causing trouble when detecting whether or
not a server had reached maxconn. Third, during a redispatch, a session
could lose its place at the server's and get blocked because another
session at the same moment would have stolen the entry. Fourth, the
redispatch option did not work when maxqueue was reached for a server,
and it was not possible to do so without indefinitely hanging a session.
The root cause of all those problems was the lack of pre-reservation of
connections at the server's, and the lack of tracking of servers during
a redispatch. Everything relied on combinations of flags which could
appear similarly in quite distinct situations.
This patch is a major rework but there was no other solution, as the
internal logic was deeply flawed. The resulting code is cleaner, more
understandable, uses less magics and is overall more robust.
As an added bonus, "option redispatch" now works when maxqueue has
been reached on a server.
A new "redirect" keyword adds the ability to send an HTTP 301/302/303
redirection to either an absolute location or to a prefix followed by
the original URI. The redirection is conditionned by ACL rules, so it
becomes very easy to move parts of a site to another site using this.
This work was almost entirely done at Exceliance by Emeric Brun.
A test-case has been added in the tests/ directory.
- free oldpids
- call free(exp->preg), not only regfree(exp->preg): req_exp, rsp_exp
- build a list of unique uri_auths and eventually free it
- prune_acl_cond/free for switching_rules
- add a callback pointer to free ptr from acl_pattern (used for regexs) and execute it
==1180== malloc/free: in use at exit: 0 bytes in 0 blocks.
==1180== malloc/free: 5,599 allocs, 5,599 frees, 4,220,556 bytes allocated.
==1180== All heap blocks were freed -- no leaks are possible.
New functions implemented:
- deinit_pollers: called at the end of deinit())
- prune_acl: called via list_for_each_entry_safe
Add missing pool_destroy2 calls:
- p->hdr_idx_pool
- pool2_tree64
Implement all task stopping:
- health-check: needs new "struct task" in the struct server
- queue processing: queue_mgt
- appsess_refresh: appsession_refresh
before (idle system):
==6079== LEAK SUMMARY:
==6079== definitely lost: 1,112 bytes in 75 blocks.
==6079== indirectly lost: 53,356 bytes in 2,090 blocks.
==6079== possibly lost: 52 bytes in 1 blocks.
==6079== still reachable: 150,996 bytes in 504 blocks.
==6079== suppressed: 0 bytes in 0 blocks.
after (idle system):
==6945== LEAK SUMMARY:
==6945== definitely lost: 7,644 bytes in 137 blocks.
==6945== indirectly lost: 9,913 bytes in 587 blocks.
==6945== possibly lost: 0 bytes in 0 blocks.
==6945== still reachable: 0 bytes in 0 blocks.
==6945== suppressed: 0 bytes in 0 blocks.
before (running system for ~2m):
==9343== LEAK SUMMARY:
==9343== definitely lost: 1,112 bytes in 75 blocks.
==9343== indirectly lost: 54,199 bytes in 2,122 blocks.
==9343== possibly lost: 52 bytes in 1 blocks.
==9343== still reachable: 151,128 bytes in 509 blocks.
==9343== suppressed: 0 bytes in 0 blocks.
after (running system for ~2m):
==11616== LEAK SUMMARY:
==11616== definitely lost: 7,644 bytes in 137 blocks.
==11616== indirectly lost: 9,981 bytes in 591 blocks.
==11616== possibly lost: 0 bytes in 0 blocks.
==11616== still reachable: 4 bytes in 1 blocks.
==11616== suppressed: 0 bytes in 0 blocks.
Still not perfect but significant improvement.
Released version 1.3.15 with the following main changes :
- [BUILD] Added support for 'make install'
- [BUILD] Added 'install-man' make target for installing the man page
- [BUILD] Added 'install-bin' make target
- [BUILD] Added 'install-doc' make target
- [BUILD] Removed "/" after '$(DESTDIR)' in install targets
- [BUILD] Changed 'install' target to install the binaries first
- [BUILD] Replace hardcoded 'LD = gcc' with 'LD = $(CC)'
- [MEDIUM]: Inversion for options
- [MEDIUM]: Count retries and redispatches also for servers, fix redistribute_pending, extend logs, %d->%u cleanup
- [BUG]: Restore clearing t->logs.bytes
- [MEDIUM]: rework checks handling
- [DOC] Update a "contrib" file with a hint about a scheme used for formathing subjects
- [MEDIUM] Implement "track [<backend>/]<server>"
- [MINOR] Implement persistent id for proxies and servers
- [BUG] Don't increment server connections too much + fix retries
- [MEDIUM]: Prevent redispatcher from selecting the same server, version #3
- [MAJOR] proto_uxst rework -> SNMP support
- [BUG] appsession lookup in URL does not work
- [BUG] transparent proxy address was ignored in backend
- [BUG] hot reconfiguration failed because of a wrong error check
- [DOC] big update to the configuration manual
- [DOC] large update to the configuration manual
- [DOC] document more options
- [BUILD] major rework of the GNU Makefile
- [STATS] add support for "show info" on the unix socket
- [DOC] document options forwardfor to logasap
- [MINOR] add support for the "backlog" parameter
- [OPTIM] introduce global parameter "tune.maxaccept"
- [MEDIUM] introduce "timeout http-request" in frontends
- [MINOR] tarpit timeout is also allowed in backends
- [BUG] increment server connections for each connect()
- [MEDIUM] add a turn-around state of one second after a connection failure
- [BUG] fix typo in redispatched connection
- [DOC] document options nolinger to ssl-hello-chk
- [DOC] added documentation for "option tcplog" to "use_backend"
- [BUG] connect_server: server might not exist when sending error report
- [MEDIUM] support fully transparent proxy on Linux (USE_LINUX_TPROXY)
- [MEDIUM] add non-local bind to connect() on Linux
- [MINOR] add transparent proxy support for balabit's Tproxy v4
- [BUG] use backend's source and not server's source with tproxy
- [BUG] fix overlapping server flags
- [MEDIUM] fix server health checks source address selection
- [BUG] build failed on CONFIG_HAP_LINUX_TPROXY without CONFIG_HAP_CTTPROXY
- [DOC] added "server", "source" and "stats" keywords
- [DOC] all server parameters have been documented
- [DOC] document all req* and rsp* keywords.
- [DOC] added documentation about HTTP header manipulations
- [BUG] log response byte count, not request
- [BUILD] code did not build in full debug mode
- [BUG] fix truncated responses with sepoll
- [MINOR] use s->frt_addr as the server's address in transparent proxy
- [MINOR] fix configuration hint about timeouts
- [DOC] minor cleanup of the doc and notice to contributors
- [MINOR] report correct section type for unknown keywords.
- [BUILD] update MacOS Makefile to build on newer versions
- [DOC] fix erroneous "useallbackups" option in the doc
- [DOC] applied small fixes from early readers
- [MINOR] add configuration support for "redir" server keyword
- [MEDIUM] completely implement the server redirection method
- [TESTS] add a test case for the server redirection mechanism
- [DOC] add a configuration entry for "server ... redir <prefix>"
- [BUILD] backend.c and checks.c did not build without tproxy !
- Revert "[BUILD] backend.c and checks.c did not build without tproxy !"
- [BUILD] backend.c and checks.c did not build without tproxy !
- [OPTIM] used unsigned ints for HTTP state and message offsets
- [OPTIM] GCC4's builtin_expect() is suboptimal
- [BUG] failed conns were sometimes incremented in the frontend!
- [BUG] timeout.check was not pre-set to eternity
- [TESTS] add test-pollers.cfg to easily report pollers in use
- [BUG] do not apply timeout.connect in checks if unset
- [BUILD] ensure that makefile understands USE_DLMALLOC=1
- [MINOR] silent gcc for a wrong warning
- [CLEANUP] update .gitignore to ignore more temporary files
- [CLEANUP] report dlmalloc's source path only if explictly specified
- [BUG] str2sun could leak a small buffer in case of error during parsing
- [BUG] option allbackups was not working anymore in roundrobin mode
- [MAJOR] implementation of the "leastconn" load balancing algorithm
- [BUILD] ensure that users don't build without setting the target anymore.
- [DOC] document the leastconn LB algo
- [MEDIUM] fix stats socket limitation to 16 kB
- [DOC] fix unescaped space in httpchk example.
- [BUG] fix double-decrement of server connections
- [TESTS] add a test case for port mapping
- [TESTS] add a benchmark for integer hashing
- [TESTS] add new methods in ip-hash test file
- [MAJOR] implement parameter hashing for POST requests
This new parameter makes it possible to override the default
number of consecutive incoming connections which can be
accepted on a socket. By default it is not limited on single
process mode, and limited to 8 in multi-process mode.
The build process was getting annoying under some conditions,
especially on platforms which are used to set CFLAGS, as well
as those which set a lot of complex defines. The new Makefile
takes care of this situation by not mixing TARGET, CPU and user
values, and by making privileging the pre-setting of common
variables with the ability to override them.
Now CFLAGS and LDFLAGS are set by default and may be overridden
without the risk of breaking useful defines. Options are better
dealt with, and as a bonus, it was possible to merge the FreeBSD
and OpenBSD targets into the common GNU Makefile.
The report of build options by "haproxy -vv" has been slightly
adapted to the new mode. Options implied by architecture are not
reported, only user-specified options are. It is also possible to
add options which will not be reported in order not to mangle the
output when specifying dirty informations such as URLs...
The Makefile was copiously documented and it should be easier to
build for any target now. Backwards compatibility with older
build processes was kept, and warnings are emitted for deprecated
build options.
The error check in return of start_proxies checked for exact ERR_RETRYABLE
but did not consider the return as a bit field. The function returned both
ERR_RETRYABLE and ERR_ALERT, hence the problem.
Sometimes it is useful to find out how a given binary version was
built. The build compiler and options are now provided for this,
and it's possible to get them with the -vv option.
Under certain circumstances, it is very useful to be able to fail some
monitor requests. One specific case is when the number of servers in
the backend falls below a certain level. The new "monitor fail" construct
followed by either "if"/"unless" <condition> makes it possible to specify
ACL-based conditions which will make the monitor return 503 instead of
200. Any number of conditions can be passed. Another use may be to limit
the requests to local networks only.
It is very convenient for SNMP monitoring to have unique process ID,
proxy ID and server ID. Those have been added to the CSV outputs.
The numbers start at 1. 0 is reserved. For servers, 0 means that the
reported name is not a server name but half a proxy (FRONTEND/BACKEND).
A remaining hidden "-" in the CSV output has been eliminated too.
Some applications do not have a strict persistence requirement, yet
it is still desirable for performance considerations, due to local
caches on the servers. For some reasons, there are some applications
which cannot rely on cookies, and for which the last resort is to use
a parameter passed in the URL.
The new 'url_param' balance method is there to solve this issue. It
accepts a parameter name which is looked up from the URL and which
is then hashed to select a server. If the parameter is not found,
then the round robin algorithm is used in order to provide a normal
load balancing across the servers for the first requests. It would
have been possible to use a source IP hash instead, but since such
applications are generally buried behind multiple levels of
reverse-proxies, it would not provide a good balance.
The doc has been updated, and two regression testing configurations
have been added.
localtime() was called with pointers to tv_sec, which is time_t on
some platforms and long on others. A problem was encountered on
Sparc64 under OpenBSD where tv_sec is long (64 bits) and time_t is
32 bits. Since this architecture is big-endian, it exhibited the
bug because localtime() always worked with the high part of the
value which is always zero. This problem was identified and debugged
by Thierry Fournier.
The correct solution is to pass the date by value and not by pointer,
through an intermediate function. The use of localtime_r() instead of
localtime() also made it possible to get rid of the first call to
localtime() since it does not need to allocate memory anymore.
Removed old unused MODE_LOG and MODE_STATS, and replaced the "stats"
keyword in the global section. The new "stats" keyword in the global
section is used to create a UNIX socket on which the statistics will
be accessed. The client must issue a "show stat\n" command in order
to get a CSV-formated output similar to the output on the HTTP socket
in CSV mode.
A new generic protocol mechanism has been added. It provides
an easy method to implement new protocols with different
listeners (eg: unix sockets).
The listeners are automatically started at the right moment
and enabled after the possible fork().
This must come from a copy-paste typo: in the unlikely event that
fork() would fail, the parent process would only exit(1) if there
were old pids. That's non-sense.
When one server appears at the same position in multiple backends, it
receives all the checks from all the backends exactly at the same time
because the health-checks are only spread within a backend but not
globally.
Attached patch implements per-server start delay in a different way.
Checks are now spread globally - not locally to one backend. It also makes
them start faster - IMHO there is no need to add a 'server->inter' when
calculating first execution. Calculation were moved from cfgparse.c to
checks.c. There is a new function start_checks() and now it is not called
when haproxy is started in MODE_CHECK.
With this patch it is also possible to set a global 'spread-checks'
parameter. It takes a percentage value (1..50, probably something near
5..10 is a good idea) so haproxy adds or removes that many percent to the
original interval after each check. My test shows that with 18 backends,
54 servers total and 10000ms/5% it takes about 45m to mix them completely.
I decided to use rand/srand pseudo-random number generator. I am aware it
is not recommend for a good randomness but a) we do not need a good random
generator here b) it is probably the most portable one.
The following patch will give the ability to tweak socket linger mode.
You can use this option with "option nolinger" inside fronted or backend
configuration declaration.
This will help in environments where lots of FIN_WAIT sockets are
encountered.
This patch fixes a nasty bug raported by both glibc and valgrind, which
leads into a problem that haproxy does not exit when a new instace
starts ap (-sf/-st).
==9299== Invalid free() / delete / delete[]
==9299== at 0x401D095: free (in
/usr/lib/valgrind/x86-linux/vgpreload_memcheck.so)
==9299== by 0x804A377: deinit (haproxy.c:721)
==9299== by 0x804A883: main (haproxy.c:1014)
==9299== Address 0x41859E0 is 0 bytes inside a block of size 21 free'd
==9299== at 0x401D095: free (in
/usr/lib/valgrind/x86-linux/vgpreload_memcheck.so)
==9299== by 0x804A84B: main (haproxy.c:985)
==9299==
6542 open("/dev/tty", O_RDWR|O_NONBLOCK|O_NOCTTY) = -1 ENOENT (No such file
or directory)
6542 writev(2, [{"*** glibc detected *** ", 23}, {"corrupted double-linked
list", 28}, {": 0x", 4}, {"6ff91878", 8}, {" ***\n", 5}], 5) = -1 EBADF (Bad
file descriptor)
I found this bug trying to find why, after one week with many restarts, I
finished with >100 haproxy process running. ;)
The deinit() function is specialized in memory area freeing.
There were a ton of information that were not released at the
exit time, which made valgrind complain. Now, most of the entries
are freed. However, it seems like regfree() does not completely
free a regex (12 bytes lost per regex).
By default, epoll/kqueue used to return as many events as possible.
This could sometimes cause huge latencies (latencies of up to 400 ms
have been observed with many thousands of fds at once). Limiting the
number of events returned also reduces the latency by avoiding too
many blind processing. The value is set to 200 by default and can be
changed in the global section using the tune.maxpollevents parameter.
When we're interrupted by another instance, it is very likely
that the other one will need some memory. Now we know how to
free what is not used, so let's do it.
Also only free non-null pointers. Previously, pool_destroy()
did implicitly check for this case which was incidentely
needed.
Also during this process, a bug was found in appsession_refresh().
It would not automatically requeue the task in the queue, so the
old sessions would not vanish.
The timeout functions were difficult to manipulate because they were
rounding results to the millisecond. Thus, it was difficult to compare
and to check what expired and what did not. Also, the comparison
functions were heavy with multiplies and divides by 1000. Now, all
timeouts are stored in timevals, reducing the number of operations
for updates and leading to cleaner and more efficient code.
The rbtree-based wait queue consumes a lot of CPU. Use the ul2tree
instead. Lots of cleanups and code reorganizations made it possible
to reduce the task struct and simplify the code a bit.
The principle behind speculative I/O is to speculatively try to
perform I/O before registering the events in the system. This
considerably reduces the number of calls to epoll_ctl() and
sometimes even epoll_wait(), and manages to increase overall
performance by about 10%.
The new poller has been called "sepoll". It is used by default
on Linux when it works. A corresponding option "nosepoll" and
the command line argument "-ds" allow to disable it.
Gcc provides __attribute__((constructor)) which is very convenient
to execute functions at startup right before main(). All the pollers
have been converted to have their register() function declared like
this, so that it is not necessary anymore to call them from a centralized
file.
Some pollers such as kqueue lose their FD across fork(), meaning that
the registered file descriptors are lost too. Now when the proxies are
started by start_proxies(), the file descriptors are not registered yet,
leaving enough time for the fork() to take place and to get a new pollfd.
It will be the first call to maintain_proxies that will register them.
select, poll and epoll now have their dedicated functions and have
been split into distinct files. Several FD manipulation primitives
have been provided with each poller.
The rest of the code needs to be cleaned to remove traces of
StaticReadEvent/StaticWriteEvent. A trick involving a macro has
temporarily been used right now. Some work needs to be done to
factorize tests and sets everywhere.
logs are handled better with dedicated functions. The HTTP implementation
moved to proto_http.c. It has been cleaned up a bit. Now a frontend with
option httplog and no log will not call the function anymore.
Previously, use of the "usesrc" keyword could silently fail if
either the module was not loaded, or the user did not have enough
permissions. Now the errors are better diagnosed and more appropriate
advices are given.
- stats now support the HEAD method too
- extracted http request from the session
- huge rework of the HTTP parser which is now a 28-state FSM.
- linux-style likely/unlikely macros for optimization hints
- do not create a server socket when there's no server
This patch from Sin Yu makes use of an rbtree for the wait queue,
which will solve the slowdown problem encountered when timeouts
are heterogenous in the configuration. The next step will be to
turn maintain_proxies() into a per-proxy task so that we won't
have to scan them all after each poll() loop.
The tcp-splicing code has been merged, and a doc has been written.
A configuration example has been derived from the previous content
switching sample.
It is now possible to define an errorloc in the backend as well as
in the frontend. The backend's will be used first, and if undefined,
then the frontend's will be used instead. If none is used, then the
original error messages will be used.
The nbconn attribute in the proxies was not relevant anymore because
a frontend A may use backend B and both of them must account for their
respective connections. For this reason, there now are two separate
counters for frontend and backend connections.
The stats page has been updated to reflect the backend, but a separate
line entry for the frontend with error counts would be good.
Note that as of now, beconn may be higher than maxconn, because maxconn
applies to the frontend, while beconn may be increased due to sessions
passed from another frontend.
The files are now stored under :
- include/haproxy for the generic includes
- include/types.h for the structures needed within prototypes
- include/proto.h for function prototypes and inline functions
- src/*.c for the C files
Most include files are now covered by LGPL. A last move still needs
to be done to put inline functions under GPL and not LGPL.
Version has been set to 1.3.0 in the code but some control still
needs to be done before releasing.