Commit Graph

702 Commits

Author SHA1 Message Date
Willy Tarreau
b1beaa302c BUG/MINOR: init: make the automatic maxconn consider the max of soft/hard limits
James Stroehmann reported something working as documented but that can
be considered as a regression in the way the automatic maxconn is
calculated from the process' limits :

  https://www.mail-archive.com/haproxy@formilux.org/msg36523.html

The purpose of the changes in 2.0 was to have maxconn default to the
highest possible value permitted to the user based on the ulimit -n
setting, however the calculation starts from the soft limit, which
can be lower than what users were allowed to with previous versions
where the default value of 2000 would force a higher ulimit -n as
long as it fitted in the hard limit.

Usually this is not noticeable if the user changes the limits, because
quite commonly setting a new value restricts both the soft and hard
values.

Let's instead always use the max between the hard and soft limits, as
we know these values are permitted. This was tried on the following
setup:

  $ cat ulimit-n.cfg
  global
    stats socket /tmp/sock1 level admin
  $ ulimit -n
  1024

Before the change the limits would show like this:

  $ socat - /tmp/sock1 <<< "show info" | grep -im2 ^Max
  Maxsock: 1023
  Maxconn: 489

After the change the limits are now much better and more in line with
the default settings in earlier versions:

  $ socat - /tmp/sock1 <<< "show info" | grep -im2 ^Max
  Maxsock: 4095
  Maxconn: 2025

The difference becomes even more obvious when running moderately large
configs with hundreds of checked servers and hundreds of listeners:

  $ cat ulimit-n.cfg
  global
    stats socket /tmp/sock1 level admin

  listen l
    bind :10000-10300
    server-template srv- 300 0.0.0.0 check disabled

          Before   After
  Maxsock  1024    4096
  Maxconn  189     1725

This issue is tagged as minor since a trivial config change fixes it,
but it would help new users to have it backported as far as 2.0.
2020-03-06 10:49:55 +01:00
Carl Henrik Lunde
f91ac19299 OPTIM: startup: fast unique_id allocation for acl.
pattern_finalize_config() uses an inefficient algorithm which is a
problem with very large configuration files. This affects startup, and
therefore reload time. When haproxy is deployed as a router in a
Kubernetes cluster the generated configuration file may be large and
reloads are frequently occuring, which makes this a significant issue.

The old algorithm is O(n^2)
* allocate missing uids - O(n^2)
* sort linked list - O(n^2)

The new algorithm is O(n log n):
* find the user allocated uids - O(n)
* store them for efficient lookup - O(n log n)
* allocate missing uids - n times O(log n)
* sort all uids - O(n log n)
* convert back to linked list - O(n)

Performance examples, startup time in seconds:

    pat_refs old     new
    1000      0.02   0.01
    10000     2.1    0.04
    20000    12.3    0.07
    30000    27.9    0.10
    40000    52.5    0.14
    50000    77.5    0.17

Please backport to 1.8, 2.0 and 2.1.
2020-03-06 08:11:58 +01:00
Willy Tarreau
3ebd55ee51 MINOR: haproxy: export run_poll_loop
This will help refine debug traces.
2020-03-03 15:26:10 +01:00
Willy Tarreau
908071171b BUILD: general: always pass unsigned chars to is* functions
The isalnum(), isalpha(), isdigit() etc functions from ctype.h are
supposed to take an int in argument which must either reflect an
unsigned char or EOF. In practice on some platforms they're implemented
as macros referencing an array, and when passed a char, they either cause
a warning "array subscript has type 'char'" when lucky, or cause random
segfaults when unlucky. It's quite unconvenient by the way since none of
them may return true for negative values. The recent introduction of
cygwin to the list of regularly tested build platforms revealed a lot
of breakage there due to the same issues again.

So this patch addresses the problem all over the code at once. It adds
unsigned char casts to every valid use case, and also drops the unneeded
double cast to int that was sometimes added on top of it.

It may be backported by dropping irrelevant changes if that helps better
support uncommon platforms. It's unlikely to fix bugs on platforms which
would already not emit any warning though.
2020-02-25 08:16:33 +01:00
Christopher Faulet
6d0c3dfac6 MEDIUM: http: Add a ruleset evaluated on all responses just before forwarding
This patch introduces the 'http-after-response' rules. These rules are evaluated
at the end of the response analysis, just before the data forwarding, on ALL
HTTP responses, the server ones but also all responses generated by
HAProxy. Thanks to this ruleset, it is now possible for instance to add some
headers to the responses generated by the stats applet. Following actions are
supported :

   * allow
   * add-header
   * del-header
   * replace-header
   * replace-value
   * set-header
   * set-status
   * set-var
   * strict-mode
   * unset-var
2020-02-06 14:55:34 +01:00
Christopher Faulet
546c4696bb MINOR: global: Set default tune.maxrewrite value during global structure init
When the global structure is initialized, instead of setting tune.maxrewrite to
-1, its default value can be immediately set. This way, it is always defined
during the configuration validity check. Otherwise, the only way to have it at
this stage, it is to explicity set it in the global section.
2020-02-06 09:36:36 +01:00
Willy Tarreau
71f95fa20e [RELEASE] Released version 2.2-dev1
Released version 2.2-dev1 with the following main changes :
    - DOC: this is development again
    - MINOR: version: this is development again, update the status
    - SCRIPTS: update create-release to fix the changelog on new branches
    - CLEANUP: ssl: Clean up error handling
    - BUG/MINOR: contrib/prometheus-exporter: decode parameter and value only
    - BUG/MINOR: h1: Don't test the host header during response parsing
    - BUILD/MINOR: trace: fix use of long type in a few printf format strings
    - DOC: Clarify behavior of server maxconn in HTTP mode
    - MINOR: ssl: deduplicate ca-file
    - MINOR: ssl: compute ca-list from deduplicate ca-file
    - MINOR: ssl: deduplicate crl-file
    - CLEANUP: dns: resolution can never be null
    - BUG/MINOR: http-htx: Don't make http_find_header() fail if the value is empty
    - DOC: ssl/cli: set/commit/abort ssl cert
    - BUG/MINOR: ssl: fix SSL_CTX_set1_chain compatibility for openssl < 1.0.2
    - BUG/MINOR: fcgi-app: Make the directive pass-header case insensitive
    - BUG/MINOR: stats: Fix HTML output for the frontends heading
    - BUG/MINOR: ssl: fix X509 compatibility for openssl < 1.1.0
    - DOC: clarify matching strings on binary fetches
    - DOC: Fix ordered list in summary
    - DOC: move the "group" keyword at the right place
    - MEDIUM: init: prevent process and thread creation at runtime
    - BUG/MINOR: ssl/cli: 'ssl cert' cmd only usable w/ admin rights
    - BUG/MEDIUM: stream-int: don't subscribed for recv when we're trying to flush data
    - BUG/MINOR: stream-int: avoid calling rcv_buf() when splicing is still possible
    - BUG/MINOR: ssl/cli: don't overwrite the filters variable
    - BUG/MEDIUM: listener/thread: fix a race when pausing a listener
    - BUG/MINOR: ssl: certificate choice can be unexpected with openssl >= 1.1.1
    - BUG/MEDIUM: mux-h1: Never reuse H1 connection if a shutw is pending
    - BUG/MINOR: mux-h1: Don't rely on CO_FL_SOCK_RD_SH to set H1C_F_CS_SHUTDOWN
    - BUG/MINOR: mux-h1: Fix conditions to know whether or not we may receive data
    - BUG/MEDIUM: tasks: Make sure we switch wait queues in task_set_affinity().
    - BUG/MEDIUM: checks: Make sure we set the task affinity just before connecting.
    - MINOR: debug: replace popen() with pipe+fork() in "debug dev exec"
    - MEDIUM: init: set NO_NEW_PRIVS by default when supported
    - BUG/MINOR: mux-h1: Be sure to set CS_FL_WANT_ROOM when EOM can't be added
    - BUG/MEDIUM: mux-fcgi: Handle cases where the HTX EOM block cannot be inserted
    - BUG/MINOR: proxy: make soft_stop() also close FDs in LI_PAUSED state
    - BUG/MINOR: listener/threads: always use atomic ops to clear the FD events
    - BUG/MINOR: listener: also clear the error flag on a paused listener
    - BUG/MEDIUM: listener/threads: fix a remaining race in the listener's accept()
    - MINOR: listener: make the wait paths cleaner and more reliable
    - MINOR: listener: split dequeue_all_listener() in two
    - REORG: listener: move the global listener queue code to listener.c
    - DOC: document the listener state transitions
    - BUG/MEDIUM: kqueue: Make sure we report read events even when no data.
    - BUG/MAJOR: dns: add minimalist error processing on the Rx path
    - BUG/MEDIUM: proto_udp/threads: recv() and send() must not be exclusive.
    - DOC: listeners: add a few missing transitions
    - BUG/MINOR: tasks: only requeue a task if it was already in the queue
    - MINOR: tasks: split wake_expired_tasks() in two parts to avoid useless wakeups
    - DOC: proxies: HAProxy only supports 3 connection modes
    - DOC: remove references to the outdated architecture.txt
    - BUG/MINOR: log: fix minor resource leaks on logformat error path
    - BUG/MINOR: mworker: properly pass SIGTTOU/SIGTTIN to workers
    - BUG/MINOR: listener: do not immediately resume on transient error
    - BUG/MINOR: server: make "agent-addr" work on default-server line
    - BUG/MINOR: listener: fix off-by-one in state name check
    - BUILD/MINOR: unix sockets: silence an absurd gcc warning about strncpy()
    - MEDIUM: h1-htx: Add HTX EOM block when the message is in H1_MSG_DONE state
    - MINOR: http-htx: Add some htx sample fetches for debugging purpose
    - REGTEST: Add an HTX reg-test to check an edge case
    - DOC: clarify the fact that replace-uri works on a full URI
    - BUG/MINOR: sample: fix the closing bracket and LF in the debug converter
    - BUG/MINOR: sample: always check converters' arguments
    - MINOR: sample: Validate the number of bits for the sha2 converter
    - BUG/MEDIUM: ssl: Don't set the max early data we can receive too early.
    - MINOR: ssl/cli: 'show ssl cert' give information on the certificates
    - BUG/MINOR: ssl/cli: fix build for openssl < 1.0.2
    - MINOR: debug: support logging to various sinks
    - MINOR: http: add a new "replace-path" action
    - REGTEST: ssl: test the "set ssl cert" CLI command
    - REGTEST: run-regtests: implement #REQUIRE_BINARIES
    - MINOR: task: only check TASK_WOKEN_ANY to decide to requeue a task
    - BUG/MAJOR: task: add a new TASK_SHARED_WQ flag to fix foreing requeuing
    - BUG/MEDIUM: ssl: Revamp the way early data are handled.
    - MINOR: fd/threads: make _GET_NEXT()/_GET_PREV() use the volatile attribute
    - BUG/MEDIUM: fd/threads: fix a concurrency issue between add and rm on the same fd
    - REGTEST: make the "set ssl cert" require version 2.1
    - BUG/MINOR: ssl: openssl-compat: Fix getm_ defines
    - BUG/MEDIUM: state-file: do not allocate a full buffer for each server entry
    - BUG/MINOR: state-file: do not store duplicates in the global tree
    - BUG/MINOR: state-file: do not leak memory on parse errors
    - BUG/MAJOR: mux-h1: Don't pretend the input channel's buffer is full if empty
    - BUG/MEDIUM: stream: Be sure to never assign a TCP backend to an HTX stream
    - BUILD: ssl: improve SSL_CTX_set_ecdh_auto compatibility
    - BUILD: travis-ci: link with ssl libraries using rpath instead of LD_LIBRARY_PATH/DYLD_LIBRARY_PATH
    - BUILD: travis-ci: reenable address sanitizer for clang builds
    - BUG/MINOR: checks: refine which errno values are really errors.
    - BUG/MINOR: connection: only wake send/recv callbacks if the FD is active
    - CLEANUP: connection: conn->xprt is never NULL
    - MINOR: pollers: add a new flag to indicate pollers reporting ERR & HUP
    - MEDIUM: tcp: make tcp_connect_probe() consider ERR/HUP
    - REORG: connection: move tcp_connect_probe() to conn_fd_check()
    - MINOR: connection: check for connection validation earlier
    - MINOR: connection: remove the double test on xprt_done_cb()
    - CLEANUP: connection: merge CO_FL_NOTIFY_DATA and CO_FL_NOTIFY_DONE
    - MINOR: poller: do not call the IO handler if the FD is not active
    - OPTIM: epoll: always poll for recv if neither active nor ready
    - OPTIM: polling: do not create update entries for FD removal
    - BUG/MEDIUM: checks: Only attempt to do handshakes if the connection is ready.
    - BUG/MEDIUM: connections: Hold the lock when wanting to kill a connection.
    - BUILD: CI: modernize cirrus-ci
    - MINOR: config: disable busy polling on old processes
    - MINOR: ssl: Remove unused variable "need_out".
    - BUG/MINOR: h1: Report the right error position when a header value is invalid
    - BUG/MINOR: proxy: Fix input data copy when an error is captured
    - BUG/MEDIUM: http-ana: Truncate the response when a redirect rule is applied
    - BUG/MINOR: channel: inject output data at the end of output
    - BUG/MEDIUM: session: do not report a failure when rejecting a session
    - MEDIUM: dns: implement synchronous send
    - MINOR: raw_sock: make sure to disable polling once everything is sent
    - MINOR: http: Add 410 to http-request deny
    - MINOR: http: Add 404 to http-request deny
    - CLEANUP: mux-h2: remove unused goto "out_free_h2s"
    - BUILD: cirrus-ci: choose proper openssl package name
    - BUG/MAJOR: listener: do not schedule a task-less proxy
    - CLEANUP: server: remove unused err section in server_finalize_init
    - REGTEST: set_ssl_cert.vtc: replace "echo" with "printf"
    - BUG/MINOR: stream-int: Don't trigger L7 retry if max retries is already reached
    - BUG/MEDIUM: tasks: Use the MT macros in tasklet_free().
    - BUG/MINOR: mux-h2: use a safe list_for_each_entry in h2_send()
    - BUG/MEDIUM: mux-h2: fix missing test on sending_list in previous patch
    - CLEANUP: ssl: remove opendir call in ssl_sock_load_cert
    - MEDIUM: lua: don't call the GC as often when dealing with outgoing connections
    - BUG/MEDIUM: mux-h2: don't stop sending when crossing a buffer boundary
    - BUG/MINOR: cli/mworker: can't start haproxy with 2 programs
    - REGTEST: mcli/mcli_start_progs: start 2 programs
    - BUG/MEDIUM: mworker: remain in mworker mode during reload
    - DOC: clarify crt-base usage
    - CLEANUP: compression: remove unused deinit_comp_ctx section
    - BUG/MEDIUM: mux_h1: Don't call h1_send if we subscribed().
    - BUG/MEDIUM: raw_sock: Make sur the fd and conn are sync.
    - CLEANUP: proxy: simplify proxy_parse_rate_limit proxy checks
    - BUG/MAJOR: hashes: fix the signedness of the hash inputs
    - REGTEST: add sample_fetches/hashes.vtc to validate hashes
    - BUG/MEDIUM: cli: _getsocks must send the peers sockets
    - CLEANUP: cli: deduplicate the code in _getsocks
    - BUG/MINOR: stream: don't mistake match rules for store-request rules
    - BUG/MEDIUM: connection: add a mux flag to indicate splice usability
    - BUG/MINOR: pattern: handle errors from fgets when trying to load patterns
    - MINOR: connection: move the CO_FL_WAIT_ROOM cleanup to the reader only
    - MINOR: stream-int: remove dependency on CO_FL_WAIT_ROOM for rcv_buf()
    - MEDIUM: connection: get rid of CO_FL_CURR_* flags
    - BUILD: pattern: include errno.h
    - MEDIUM: mux-h2: do not try to stop sending streams on blocked mux
    - MEDIUM: mux-fcgi: do not try to stop sending streams on blocked mux
    - MEDIUM: mux-h2: do not make an h2s subscribe to itself on deferred shut
    - MEDIUM: mux-fcgi: do not make an fstrm subscribe to itself on deferred shut
    - REORG: stream/backend: move backend-specific stuff to backend.c
    - MEDIUM: backend: move the connection finalization step to back_handle_st_con()
    - MEDIUM: connection: merge the send_wait and recv_wait entries
    - MEDIUM: xprt: merge recv_wait and send_wait in xprt_handshake
    - MEDIUM: ssl: merge recv_wait and send_wait in ssl_sock
    - MEDIUM: mux-h1: merge recv_wait and send_wait
    - MEDIUM: mux-h2: merge recv_wait and send_wait event notifications
    - MEDIUM: mux-fcgi: merge recv_wait and send_wait event notifications
    - MINOR: connection: make the last arg of subscribe() a struct wait_event*
    - MINOR: ssl: Add support for returning the dn samples from ssl_(c|f)_(i|s)_dn in LDAP v3 (RFC2253) format.
    - DOC: Fix copy and paste mistake in http-response replace-value doc
    - BUG/MINOR: cache: Fix leak of cache name in error path
    - BUG/MINOR: dns: Make dns_query_id_seed unsigned
    - BUG/MINOR: 51d: Fix bug when HTX is enabled
    - MINOR: http-htx: Move htx sample fetches in the scope "internal"
    - MINOR: http-htx: Rename 'internal.htx_blk.val' to 'internal.htx_blk.data'
    - MINOR: http-htx: Make 'internal.htx_blk_data' return a binary string
    - DOC: Add a section to document the internal sample fetches
    - MINOR: mux-h1: Inherit send flags from the upper layer
    - MINOR: contrib/prometheus-exporter: Add heathcheck status/code in server metrics
    - BUG/MINOR: http-ana/filters: Wait end of the http_end callback for all filters
    - BUG/MINOR: http-rules: Remove buggy deinit functions for HTTP rules
    - BUG/MINOR: stick-table: Use MAX_SESS_STKCTR as the max track ID during parsing
    - MEDIUM: http-rules: Register an action keyword for all http rules
    - MINOR: tcp-rules: Always set from which ruleset a rule comes from
    - MINOR: actions: Use ACT_RET_CONT code to ignore an error from a custom action
    - MINOR: tcp-rules: Kill connections when custom actions return ACT_RET_ERR
    - MINOR: http-rules: Return an error when custom actions return ACT_RET_ERR
    - MINOR: counters: Add a counter to report internal processing errors
    - MEDIUM: http-ana: Properly handle internal processing errors
    - MINOR: http-rules: Add a rule result to report internal error
    - MINOR: http-rules: Handle internal errors during HTTP rules evaluation
    - MINOR: http-rules: Add more return codes to let custom actions act as normal ones
    - MINOR: tcp-rules: Handle denied/aborted/invalid connections from TCP rules
    - MINOR: http-rules: Handle denied/aborted/invalid connections from HTTP rules
    - MINOR: stats: Report internal errors in the proxies/listeners/servers stats
    - MINOR: contrib/prometheus-exporter: Export internal errors per proxy/server
    - MINOR: counters: Remove failed_secu counter and use denied_resp instead
    - MINOR: counters: Review conditions to increment counters from analysers
    - MINOR: http-ana: Add a txn flag to support soft/strict message rewrites
    - MINOR: http-rules: Handle all message rewrites the same way
    - MINOR: http-rules: Add a rule to enable or disable the strict rewriting mode
    - MEDIUM: http-rules: Enable the strict rewriting mode by default
    - REGTEST: Fix format of set-uri HTTP request rule in h1or2_to_h1c.vtc
    - MINOR: actions: Add a function pointer to release args used by actions
    - MINOR: actions: Regroup some info about HTTP rules in the same struct
    - MINOR: http-rules/tcp-rules: Call the defined action function first if defined
    - MINOR: actions: Rename the act_flag enum into act_opt
    - MINOR: actions: Add flags to configure the action behaviour
    - MINOR: actions: Use an integer to set the action type
    - MINOR: http-rules: Use a specific action type for some custom HTTP actions
    - MINOR: http-rules: Make replace-header and replace-value custom actions
    - MINOR: http-rules: Make set-header and add-header custom actions
    - MINOR: http-rules: Make set/del-map and add/del-acl custom actions
    - MINOR: http-rules: Group all processing of early-hint rule in its case clause
    - MEDIUM: http-rules: Make early-hint custom actions
    - MINOR: http-rule/tcp-rules: Make track-sc* custom actions
    - MINOR: tcp-rules: Make tcp-request capture a custom action
    - MINOR: http-rules: Add release functions for existing HTTP actions
    - BUG/MINOR: http-rules: Fix memory releases on error path during action parsing
    - MINOR: tcp-rules: Add release functions for existing TCP actions
    - BUG/MINOR: tcp-rules: Fix memory releases on error path during action parsing
    - MINOR: http-htx: Add functions to read a raw error file and convert it in HTX
    - MINOR: http-htx: Add functions to create HTX redirect message
    - MINOR: config: Use dedicated function to parse proxy's errorfiles
    - MINOR: config: Use dedicated function to parse proxy's errorloc
    - MEDIUM: http-htx/proxy: Use a global and centralized storage for HTTP error messages
    - MINOR: proxy: Register keywords to parse errorfile and errorloc directives
    - MINOR: http-htx: Add a new section to create groups of custom HTTP errors
    - MEDIUM: proxy: Add a directive to reference an http-errors section in a proxy
    - MINOR: http-rules: Update txn flags and status when a deny rule is executed
    - MINOR: http-rules: Support an optional status on deny rules for http reponses
    - MINOR: http-rules: Use same function to parse request and response deny actions
    - MINOR: http-ana: Add an error message in the txn and send it when defined
    - MEDIUM: http-rules: Support an optional error message in http deny rules
    - REGTEST: Add a strict rewriting mode reg test
    - REGEST: Add reg tests about error files
    - MINOR: ssl: accept 'verify' bind option with 'set ssl cert'
    - BUG/MINOR: ssl: ssl_sock_load_ocsp_response_from_file memory leak
    - BUG/MINOR: ssl: ssl_sock_load_issuer_file_into_ckch memory leak
    - BUG/MINOR: ssl: ssl_sock_load_sctl_from_file memory leak
    - BUG/MINOR: http_htx: Fix some leaks on error path when error files are loaded
    - CLEANUP: http-ana: Remove useless test on txn when the error message is retrieved
    - BUILD: CI: introduce ARM64 builds
    - BUILD: ssl: more elegant anti-replay feature presence check
    - MINOR: proxy/http-ana: Add support of extra attributes for the cookie directive
    - MEDIUM: dns: use Additional records from SRV responses
    - CLEANUP: Consistently `unsigned int` for bitfields
    - CLEANUP: pattern: remove the pat_time definition
    - BUG/MINOR: http_act: don't check capture id in backend
    - BUG/MINOR: ssl: fix build on development versions of openssl-1.1.x
2020-01-22 10:34:58 +01:00
Christopher Faulet
2f5339079b MINOR: proxy/http-ana: Add support of extra attributes for the cookie directive
It is now possible to insert any attribute when a cookie is inserted by
HAProxy. Any value may be set, no check is performed except the syntax validity
(CTRL chars and ';' are forbidden). For instance, it may be used to add the
SameSite attribute:

    cookie SRV insert attr "SameSite=Strict"

The attr option may be repeated to add several attributes.

This patch should fix the issue #361.
2020-01-22 07:18:31 +01:00
Christopher Faulet
5885775de1 MEDIUM: http-htx/proxy: Use a global and centralized storage for HTTP error messages
All custom HTTP errors are now stored in a global tree. Proxies use a references
on these messages. The key used for errorfile directives is the file name as
specified in the configuration. For errorloc directives, a key is created using
the redirect code and the url. This means that the same custom error message is
now stored only once. It may be used in several proxies or for several status
code, it is only parsed and stored once.
2020-01-20 15:18:46 +01:00
Christopher Faulet
58b3564fde MINOR: actions: Add a function pointer to release args used by actions
Arguments used by actions are never released during HAProxy deinit. Now, it is
possible to specify a function to do so. ".release_ptr" field in the act_rule
structure may be set during the configuration parsing to a specific deinit
function depending on the action type.
2020-01-20 15:18:45 +01:00
Christopher Faulet
cb5501327c BUG/MINOR: http-rules: Remove buggy deinit functions for HTTP rules
Functions to deinitialize the HTTP rules are buggy. These functions does not
check the action name to release the right part in the arg union. Only few info
are released. For auth rules, the realm is released and there is no problem
here. But the regex <arg.hdr_add.re> is always unconditionally released. So it
is easy to make these functions crash. For instance, with the following rule
HAProxy crashes during the deinit :

      http-request set-map(/path/to/map) %[src] %[req.hdr(X-Value)]

For now, These functions are simply removed and we rely on the deinit function
used for TCP rules (renamed as deinit_act_rules()). This patch fixes the
bug. But arguments used by actions are not released at all, this part will be
addressed later.

This patch must be backported to all stable versions.
2020-01-20 15:18:45 +01:00
William Lallemand
24c928c8bd BUG/MEDIUM: mworker: remain in mworker mode during reload
If you reload an haproxy started in master-worker mode with
"master-worker" in the configuration, and no "-W" argument,
the new process lost the fact that is was in master-worker mode
resulting in weird behaviors.

The bigest problem is that if it is reloaded with an bad configuration,
the master will exits instead of remaining in waitpid mode.

This problem was discovered in bug #443.

Should be backported in every version using the master-worker mode.
(as far as 1.8)
2020-01-14 18:10:29 +01:00
Willy Tarreau
719e07c989 BUILD/MINOR: unix sockets: silence an absurd gcc warning about strncpy()
Apparently gcc developers decided that strncpy() semantics are no longer
valid and now deserve a warning, especially if used exactly as designed.
This results in issue #304. Let's just remove one to the target size to
please her majesty gcc, the God of C Compilers, who tries hard to make
users completely eliminate any use of string.h and reimplement it by
themselves at much higher risks. Pfff....

This can be backported to stable version, the fix is harmless since it
ignores the last zero that is already set on next line.
2019-12-11 16:29:10 +01:00
Willy Tarreau
d26c9f9465 BUG/MINOR: mworker: properly pass SIGTTOU/SIGTTIN to workers
If a new process is started with -sf and it fails to bind, it may send
a SIGTTOU to the master process in hope that it will temporarily unbind.
Unfortunately this one doesn't catch it and stops to background instead
of forwarding the signal to the workers. The same is true for SIGTTIN.

This commit simply implements an extra signal handler for the master to
deal with such signals that must be passed down to the workers. It must
be backported as far as 1.8, though there the code differs in that it's
entirely in haproxy.c and doesn't require an extra sig handler.
2019-12-11 14:26:53 +01:00
Willy Tarreau
c49ba52524 MINOR: tasks: split wake_expired_tasks() in two parts to avoid useless wakeups
We used to have wake_expired_tasks() wake up tasks and return the next
expiration delay. The problem this causes is that we have to call it just
before poll() in order to consider latest timers, but this also means that
we don't wake up all newly expired tasks upon return from poll(), which
thus systematically requires a second poll() round.

This is visible when running any scheduled task like a health check, as there
are systematically two poll() calls, one with the interval, nothing is done
after it, and another one with a zero delay, and the task is called:

  listen test
    bind *:8001
    server s1 127.0.0.1:1111 check

  09:37:38.200959 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8696843}) = 0
  09:37:38.200967 epoll_wait(3, [], 200, 1000) = 0
  09:37:39.202459 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8712467}) = 0
>> nothing run here, as the expired task was not woken up yet.
  09:37:39.202497 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8715766}) = 0
  09:37:39.202505 epoll_wait(3, [], 200, 0) = 0
  09:37:39.202513 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8719064}) = 0
>> now the expired task was woken up
  09:37:39.202522 socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 7
  09:37:39.202537 fcntl(7, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
  09:37:39.202565 setsockopt(7, SOL_TCP, TCP_NODELAY, [1], 4) = 0
  09:37:39.202577 setsockopt(7, SOL_TCP, TCP_QUICKACK, [0], 4) = 0
  09:37:39.202585 connect(7, {sa_family=AF_INET, sin_port=htons(1111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
  09:37:39.202659 epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLOUT, {u32=7, u64=7}}) = 0
  09:37:39.202673 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8814713}) = 0
  09:37:39.202683 epoll_wait(3, [{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=7, u64=7}}], 200, 1000) = 1
  09:37:39.202693 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=8818617}) = 0
  09:37:39.202701 getsockopt(7, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
  09:37:39.202715 close(7)                = 0

Let's instead split the function in two parts:
  - the first part, wake_expired_tasks(), called just before
    process_runnable_tasks(), wakes up all expired tasks; it doesn't
    compute any timeout.
  - the second part, next_timer_expiry(), called just before poll(),
    only computes the next timeout for the current thread.

Thanks to this, all expired tasks are properly woken up when leaving
poll, and each poll call's timeout remains up to date:

  09:41:16.270449 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=10223556}) = 0
  09:41:16.270457 epoll_wait(3, [], 200, 999) = 0
  09:41:17.270130 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=10238572}) = 0
  09:41:17.270157 socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 7
  09:41:17.270194 fcntl(7, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
  09:41:17.270204 setsockopt(7, SOL_TCP, TCP_NODELAY, [1], 4) = 0
  09:41:17.270216 setsockopt(7, SOL_TCP, TCP_QUICKACK, [0], 4) = 0
  09:41:17.270224 connect(7, {sa_family=AF_INET, sin_port=htons(1111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
  09:41:17.270299 epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLOUT, {u32=7, u64=7}}) = 0
  09:41:17.270314 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=10337841}) = 0
  09:41:17.270323 epoll_wait(3, [{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=7, u64=7}}], 200, 1000) = 1
  09:41:17.270332 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, tv_nsec=10341860}) = 0
  09:41:17.270340 getsockopt(7, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
  09:41:17.270367 close(7)                = 0

This may be backported to 2.1 and 2.0 though it's unlikely to bring any
user-visible improvement except to clarify debugging.
2019-12-11 09:42:58 +01:00
Willy Tarreau
a1d97f88e0 REORG: listener: move the global listener queue code to listener.c
The global listener queue code and declarations were still lying in
haproxy.c while not needed there anymore at all. This complicates
the code for no reason. As a result, the global_listener_queue_task
and the global_listener_queue were made static.
2019-12-10 14:16:03 +01:00
Willy Tarreau
241797a3fc MINOR: listener: split dequeue_all_listener() in two
We use it half times for the global_listener_queue and half times
for a proxy's queue and this requires the callers to take care of
these. Let's split it in two versions, the current one working only
on the global queue and another one dedicated to proxies for the
per-proxy queues. This cleans up quite a bit of code.
2019-12-10 14:14:09 +01:00
Willy Tarreau
a45a8b5171 MEDIUM: init: set NO_NEW_PRIVS by default when supported
HAProxy doesn't need to call executables at run time (except when using
external checks which are strongly recommended against), and is even expected
to isolate itself into an empty chroot. As such, there basically is no valid
reason to allow a setuid executable to be called without the user being fully
aware of the risks. In a situation where haproxy would need to call external
checks and/or disable chroot, exploiting a vulnerability in a library or in
haproxy itself could lead to the execution of an external program. On Linux
it is possible to lock the process so that any setuid bit present on such an
executable is ignored. This significantly reduces the risk of privilege
escalation in such a situation. This is what haproxy does by default. In case
this causes a problem to an external check (for example one which would need
the "ping" command), then it is possible to disable this protection by
explicitly adding this directive in the global section. If enabled, it is
possible to turn it back off by prefixing it with the "no" keyword.

Before the option:

  $ socat - /tmp/sock1 <<< "expert-mode on; debug dev exec sudo /bin/id"
  uid=0(root) gid=0(root) groups=0(root

After the option:
  $ socat - /tmp/sock1 <<< "expert-mode on; debug dev exec sudo /bin/id"
  sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the
        'nosuid' option set or an NFS file system without root privileges?
2019-12-06 17:20:26 +01:00
Willy Tarreau
d96f1126fe MEDIUM: init: prevent process and thread creation at runtime
Some concerns are regularly raised about the risk to inherit some Lua
files which make use of a fork (e.g. via os.execute()) as well as
whether or not some of bugs we fix might or not be exploitable to run
some code. Given that haproxy is event-driven, any foreground activity
completely stops processing and is easy to detect, but background
activity is a different story. A Lua script could very well discretely
fork a sub-process connecting to a remote location and taking commands,
and some injected code could also try to hide its activity by creating
a process or a thread without blocking the rest of the processing. While
such activities should be extremely limited when run in an empty chroot
without any permission, it would be better to get a higher assurance
they cannot happen.

This patch introduces something very simple: it limits the number of
processes and threads to zero in the workers after the last thread was
created. By doing so, it effectively instructs the system to fail on
any fork() or clone() syscall. Thus any undesired activity has to happen
in the foreground and is way easier to detect.

This will obviously break external checks (whose concept is already
totally insecure), and for this reason a new option
"insecure-fork-wanted" was added to disable this protection, and it
is suggested in the fork() error report from the checks. It is
obviously recommended not to use it and to reconsider the reasons
leading to it being enabled in the first place.

If for any reason we fail to disable forks, we still start because it
could be imaginable that some operating systems refuse to set this
limit to zero, but in this case we emit a warning, that may or may not
be reported since we're after the fork point. Ideally over the long
term it should be conditionned by strict-limits and cause a hard fail.
2019-12-03 11:49:00 +01:00
Willy Tarreau
47479eb0e7 MINOR: version: emit the link to the known bugs in output of "haproxy -v"
The link to the known bugs page for the current version is built and
reported there. When it is a development version (less than 2 dots),
instead a link to github open issues is reported as there's no way to
be sure about the current situation in this case and it's better that
users report their trouble there.
2019-11-21 18:48:20 +01:00
Willy Tarreau
08dd202d73 MINOR: version: report the version status in "haproxy -v"
As discussed on Discourse here:

    https://discourse.haproxy.org/t/haproxy-branch-support-lifetime/4466

it's not always easy for end users to know the lifecycle of the version
they are using. This patch introduces a "Status" line in the output of
"haproxy -vv" indicating whether it's a development, stable, long-term
supported version, possibly with an estimated end of life for the branch
when it can be anticipated (e.g. for stable versions). This field should
be adjusted when creating a major release to reflect the new status.

It may make sense to backport this to other branches to clarify the
situation.
2019-11-21 18:47:54 +01:00
William Lallemand
677e2f2c35 BUG/MEDIUM: mworker: don't fill the -sf argument with -1 during the reexec
Upon a reexec_on_failure, if the process tried to exit after the
initialization of the process structure but before it was filled with a
PID, the PID in the mworker_proc structure is set to -1.

In this particular case the -sf argument is filled with -1 and haproxy
will exit with the usage message because of that argument.

Should be backported in 2.0.
2019-11-19 17:30:34 +01:00
William Dauchy
f9af9d7f3c MINOR: init: avoid code duplication while setting identify
since the introduction of mworker, the setuid/setgid was duplicated in
two places; try to improve that by creating a dedicated function.
this patch does not introduce any functional change.

Signed-off-by: William Dauchy <w.dauchy@criteo.com>
2019-11-17 16:55:50 +01:00
William Dauchy
e039f26ba4 BUG/MINOR: init: fix set-dumpable when using uid/gid
in mworker mode used with uid/gid settings, it was not possible to get
a coredump despite the set-dumpable option.
indeed prctl(2) manual page specifies the dumpable attribute is reverted
to `/proc/sys/fs/suid_dumpable` in a few conditions such as process
effective user and group are changed.

this patch moves the whole set-dumpable logic before the polling code in
order to catch all possible cases where we could have changed the
uid/gid. It however does not cover the possible segfault at startup.

this patch should be backported in 2.0.

Signed-off-by: William Dauchy <w.dauchy@criteo.com>
2019-11-17 16:55:24 +01:00
William Dauchy
0fec3ab7bf MINOR: init: always fail when setrlimit fails
this patch introduces a strict-limits parameter which enforces the
setrlimit setting instead of a warning. This option can be forcingly
disable with the "no" keyword.
The general aim of this patch is to avoid bad surprises on a production
environment where you change the maxconn for example, a new fd limit is
calculated, but cannot be set because of sysfs setting. In that case you
might want to have an explicit failure to be aware of it before seeing
your traffic going down. During a global rollout it is also useful to
explictly fail as most progressive rollout would simply check the
general health check of the process.

As discussed, plan to use the strict by default mode starting from v2.3.

Signed-off-by: William Dauchy <w.dauchy@criteo.com>
2019-10-29 17:42:27 +01:00
William Lallemand
5fdb5b36e1 BUG/MINOR: mworker/ssl: close openssl FDs unconditionally
Patch 56996da ("BUG/MINOR: mworker/ssl: close OpenSSL FDs on reload")
fixes a issue where the /dev/random FD was leaked by OpenSSL upon a
reload in master worker mode. Indeed the FD was not flagged with
CLOEXEC.

The fix was checking if ssl_used_frontend or ssl_used_backend were set
to close the FD. This is wrong, indeed the lua init code creates an SSL
server without increasing the backend value, so the deinit is never
done when you don't use SSL in your configuration.

To reproduce the problem you just need to build haproxy with openssl and
lua with an openssl which does not use the getrandom() syscall.  No
openssl nor lua configuration are required for haproxy.

This patch must be backported as far as 1.8.

Fix issue #314.
2019-10-17 11:36:22 +02:00
David Carlier
5e4c8e2a67 BUILD/MEDIUM: threads: enable cpu_affinity on osx
Enable it but on a per thread basis only using Darwin native API.
2019-10-17 07:20:58 +02:00
David Carlier
a92c5cec2d BUILD/MEDIUM: threads: rename thread_info struct to ha_thread_info
On Darwin, the thread_info name exists as a standard function thus
we need to rename our array to ha_thread_info to fix this conflict.
2019-10-17 07:15:17 +02:00
Olivier Houchard
bba1a263c5 BUG/MEDIUM: tasklets: Make sure we're waking the target thread if it sleeps.
Now that we can wake tasklet for other threads, make sure that if the thread
is sleeping, we wake it up, or the tasklet won't be executed until it's
done sleeping.
That also means that, before going to sleep, and after we put our bit
in sleeping_thread_mask, we have to check that nobody added a tasklet for
us, just checking for global_tasks_mask isn't enough anymore.
2019-09-24 14:58:45 +02:00
Willy Tarreau
d022e9c98b MINOR: task: introduce a thread-local "sched" variable for local scheduler stuff
The aim is to rassemble all scheduler information related to the current
thread. It simply points to task_per_thread[tid] without having to perform
the operation at each time. We save around 1.2 kB of code on performance
sensitive paths and increase the request rate by almost 1%.
2019-09-24 11:23:30 +02:00
Olivier Houchard
859dc80f94 MEDIUM: list: Separate "locked" list from regular list.
Instead of using the same type for regular linked lists and "autolocked"
linked lists, use a separate type, "struct mt_list", for the autolocked one,
and introduce a set of macros, similar to the LIST_* macros, with the
MT_ prefix.
When we use the same entry for both regular list and autolocked list, as
is done for the "list" field in struct connection, we know have to explicitely
cast it to struct mt_list when using MT_ macros.
2019-09-23 18:16:08 +02:00
Christopher Faulet
c16929658f MINOR: config: Support per-proxy and per-server post-check functions callbacks
Most of times, when a keyword is added in proxy section or on the server line,
we need to have a post-parser callback to check the config validity for the
proxy or the server which uses this keyword.

It is possible to register a global post-parser callback. But all these
callbacks need to loop on the proxies and servers to do their job. It is neither
handy nor efficient. Instead, it is now possible to register per-proxy and
per-server post-check callbacks.
2019-09-17 10:18:54 +02:00
Christopher Faulet
3ea5cbe6a4 MINOR: config: Support per-proxy and per-server deinit functions callbacks
Most of times, when any allocation is done during configuration parsing because
of a new keyword in proxy section or on the server line, we must add a call in
the deinit() function to release allocated ressources. It is now possible to
register a post-deinit callback because, at this stage, the proxies and the
servers are already releases.

Now, it is possible to register deinit callbacks per-proxy or per-server. These
callbacks will be called for each proxy and server before releasing them.
2019-09-17 10:18:54 +02:00
Willy Tarreau
e0d86e2c1c BUG/MINOR: mworker: disable SIGPROF on re-exec
If haproxy is built with profiling enabled with -pg, it is possible to
see the master quit during a reload while it's re-executing itself with
error code 155 (signal 27) saying "Profile timer expired)". This happens
if the SIGPROF signal is delivered during the execve() call while the
handler was already unregistered. The issue itself is not directly inside
haproxy but it's easy to address. This patch disables this signal before
calling execvp() during a master reload. A simple test for this consists
in running this little script with haproxy started in master-worker mode :

     $ while usleep 50000; do killall -USR2 haproxy; done

This fix should be backported to all versions using the master-worker
model.
2019-08-26 10:44:48 +02:00
Olivier Houchard
305d5ab469 MAJOR: fd: Get rid of the fd cache.
Now that the architecture was changed so that attempts to receive/send data
always come from the upper layers, instead of them only trying to do so when
the lower layer let them know they could try, we can finally get rid of the
fd cache. We don't really need it anymore, and removing it gives us a small
performance boost.
2019-07-31 14:12:55 +02:00
Christopher Faulet
f734638976 MINOR: http: Don't store raw HTTP errors in chunks anymore
Default HTTP error messages are stored in an array of chunks. And since the HTX
was added, these messages are also converted in HTX and stored in another
array. But now, the first array is not used anymore because the legacy HTTP mode
was removed.

So now, only the array with the HTX messages are kept. The other one was
removed.
2019-07-19 09:46:23 +02:00
Christopher Faulet
41ba36f8b2 MINOR: global: Preset tune.max_http_hdr to its default value
By default, this tune parameter is set to MAX_HTTP_HDR. This assignment is done
after the configuration parsing, when we check the configuration validity. So
during the configuration parsing, its value is 0. Now, it is set to MAX_HTTP_HDR
from the start. So, it is possible to rely on it during the configuration
parsing.
2019-07-19 09:46:23 +02:00
Christopher Faulet
1b6adb4a51 MINOR: proxy/http_ana: Remove unused req_exp/rsp_exp and req_add/rsp_add lists
The keywords req* and rsp* are now unsupported. So the corresponding lists are
now unused. It is safe to remove them from the structure proxy.

As a result, the code dealing with these rules in HTTP analyzers was also
removed.
2019-07-19 09:24:12 +02:00
Christopher Faulet
fc9cfe4006 REORG: proto_htx: Move HTX analyzers & co to http_ana.{c,h} files
The old module proto_http does not exist anymore. All code dedicated to the HTTP
analysis is now grouped in the file proto_htx.c. So, to finish the polishing
after removing the legacy HTTP code, proto_htx.{c,h} files have been moved in
http_ana.{c,h} files.

In addition, all HTX analyzers and related functions prefixed with "htx_" have
been renamed to start with "http_" instead.
2019-07-19 09:24:12 +02:00
Christopher Faulet
711ed6ae4a MAJOR: http: Remove the HTTP legacy code
First of all, all legacy HTTP analyzers and all functions exclusively used by
them were removed. So the most of the functions in proto_http.{c,h} were
removed. Only functions to deal with the HTTP transaction have been kept. Then,
http_msg and hdr_idx modules were entirely removed. And finally the structure
http_msg was lightened of all its useless information about the legacy HTTP. The
structure hdr_ctx was also removed because unused now, just like unused states
in the enum h1_state. Note that the memory pool "hdr_idx" was removed and
"http_txn" is now smaller.
2019-07-19 09:24:12 +02:00
Willy Tarreau
7764a57d32 BUG/MEDIUM: threads: cpu-map designating a single thread/process are ignored
Since commit 81492c989 ("MINOR: threads: flatten the per-thread cpu-map"),
we don't keep the proc*thread matrix anymore to represent the full binding
possibilities, but only the proc and thread ones. The problem is that the
per-process binding is not the same for each thread and for the process,
and the proc[] array was assumed to store the per-proc first thread value
when doing this change. Worse, the logic present there tries to deal with
thread ranges and process ranges in a way which automatically exclused the
other possibility (since ranges cannot be used on both) but as such fails
to apply changes if neither the process nor the thread is expressed as a
range.

The real problem comes from the fact that specifying cpu-map 1/1 doesn't
yet reveal if the per-process mask or the per-thread mask needs to be
updated. In practice it's the thread one but then the current storage
doesn't allow to store the binding of the first thread of each other
process in nbproc>1 configurations.

When removing the proc*thread matrix, what ought to have been kept was
both the thread column for process 1 and the process line for threads 1,
but instead only the thread column was kept. This patch reintroduces the
storage of the configuration for the first thread of each process so that
it is again possible to store either the per-thread or per-process
configuration.

As a partial workaround for existing configurations, it is possible to
systematically indicate at least two processes or two threads at once
and map them by pairs or more so that at least two values are present
in the range. E.g :

  # set processes 1-4 to cpus 0-3 :

     cpu-map auto:1-4/1 0 1 2 3
  # or:
     cpu-map 1-2/1 0 1
     cpu-map 2-3/1 2 3

  # set threads 1-4 to cpus 0-3 :

     cpu-map auto:1/1-4 0 1 2 3
  # or :
     cpu-map 1/1-2 0 1
     cpu-map 3/3-4 2 3

This fix must be backported to 2.0.
2019-07-16 15:23:09 +02:00
William Lallemand
16866670dd BUG/MEDIUM: mworker: don't call the thread and fdtab deinit
Before switching to wait mode, the per thread deinit should not be
called, because we didn't initiate threads and fdtab.

The problem is that the master could crash if we try to reload HAProxy

The commit 944e619 ("MEDIUM: mworker: wait mode use standard init code
path") removed the deinit code by accident, but its fix 7c756a8
("BUG/MEDIUM: mworker: fix FD leak upon reload") was incomplete and did
not took care of the WAIT_MODE.

This fix must be backported in 1.9 and 2.0
2019-06-24 17:54:05 +02:00
Willy Tarreau
76a80c710c BUILD: mworker: silence two printf format warnings around getpid()
getpid() is documented as returning a pit pid_t result, not
necessarily an int. This causes a build warning on Solaris 10
because of '%d' or '%u' are used in the format passed to snprintf().
Let's just cast the result as an int (respectively unsigned int).

This can be backported to 2.0 and possibly older versions though
it really has no impact.
2019-06-22 07:57:56 +02:00
Willy Tarreau
3c39a7d889 CLEANUP: connection: rename the wait_event.task field to .tasklet
It's really confusing to call it a task because it's a tasklet and used
in places where tasks and tasklets are used together. Let's rename it
to tasklet to remove this confusion.
2019-06-14 14:42:29 +02:00
William Lallemand
63329e36ab MINOR: doc: update the manpage and usage message about -S
Add -S in the manpage, and update the usage message.

Should be backported to 1.9.
2019-06-13 17:09:27 +02:00
William Lallemand
1dc6963086 MINOR: mworker: add the HAProxy version in "show proc"
Displays the HAProxy version so you can compare the version of old
processes and new ones.
2019-06-12 19:19:57 +02:00
Willy Tarreau
34a150ccf5 MEDIUM: init/threads: don't use spinlocks during the init phase
PiBa-NL found some pathological cases where starting threads can hinder
each other and cause a measurable slow down. This problem is reproducible
with the following config (haproxy must be built with -DDEBUG_DEV) :

    global
	stats socket /tmp/sock1 mode 666 level admin
	nbthread 64

    backend stopme
	timeout server  1s
	option tcp-check
	tcp-check send "debug dev exit\n"
	server cli unix@/tmp/sock1 check

This will cause the process to be stopped once the checks are ready to
start. Binding all these to just a few cores magnifies the problem.
Starting them in loops shows a significant time difference among the
commits :

  # before startup serialization
  $ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy-e186161 -db -f slow-init.cfg >/dev/null 2>&1; done

  real    0m1.581s
  user    0m0.621s
  sys     0m5.339s

  # after startup serialization
  $ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy-e4d7c9dd -db -f slow-init.cfg >/dev/null 2>&1; done

  real    0m2.366s
  user    0m0.894s
  sys     0m8.238s

In order to address this, let's use plain mutexes and cond_wait during
the init phase. With this done, waiting threads now sleep and the problem
completely disappeared :

  $ time for i in {1..20}; do taskset -c 0,1,2,3 ./haproxy -db -f slow-init.cfg >/dev/null 2>&1; done

  real    0m0.161s
  user    0m0.079s
  sys     0m0.149s
2019-06-11 11:30:26 +02:00
Willy Tarreau
e4d7c9dd65 OPTIM/MINOR: init/threads: only call protocol_enable_all() on first thread
There's no point in calling this on each and every thread since the first
thread passing there will enable the listeners, and the next ones will
simply scan all of them in turn to discover that they are already
initialized. Let's only initilize them on the first thread. This could
slightly speed up start up on very large configurations, eventhough most
of the time is still spent in the main thread binding the sockets.

A few measurements have constantly shown that this decreases the startup
time by ~0.1s for 150k listeners. Starting all of them in parallel doesn't
provide better results and can still expose some undesired races.
2019-06-10 10:53:59 +02:00
Willy Tarreau
7109282577 BUG/MEDIUM: init/threads: prevent initialized threads from starting before others
Since commit 6ec902a ("MINOR: threads: serialize threads initialization")
we now serialize threads initialization. But doing so has emphasized another
race which is that some threads may actually start the loop before others
are done initializing.

As soon as all threads enter the first thread_release() call, their rdv
bit is cleared and they're all waiting for all others' rdv to be cleared
as well, with their harmless bit set. The first one to notice the cleared
mask will progress through thread_isolate(), take rdv again preventing
most others from noticing its short pass to zero, and this first one will
be able to run all the way through the initialization till the last call
to thread_release() which it happily crosses, being the only one with the
rdv bit, leaving the room for one or a few others to do the same. This
results in some threads entering the loop before others are done with
their initialization, which is particularly bad. PiBa-NL reported that
some regtests fail for him due to this (which was impossible to reproduce
here, but races are racy by definition). However placing some printf()
in the initialization code definitely shows this unsychronized startup.

This patch takes a different approach in three steps :
  - first, we don't start with thread_release() anymore and we don't
    set the rdv mask anymore in the main call. This was initially done
    to let all threads start toghether, which we don't want. Instead
    we just start with thread_isolate(). Since all threads are harmful
    by default, they all wait for each other's readiness before starting.

  - second, we don't release with thread_release() but with
    thread_sync_release(), meaning that we don't leave the function until
    other ones have reached the point in the function where they decide
    to leave it as well.

  - third, it makes sure we don't start the listeners using
    protocol_enable_all() before all threads have allocated their local
    FD tables or have initialized their pollers, otherwise startup could
    be racy as well. It's worth noting that it is even possible to limit
    this call to thread #0 as it only needs to be performed once.

This now guarantees that all thread init calls start only after all threads
are ready, and that no thread enters the polling loop before all others have
completed their initialization.

Please check GH issues #111 and #117 for more context.

No backport is needed, though if some new init races are reported in
1.9 (or even 1.8) which do not affect 2.0, then it may make sense to
carefully backport this small series.
2019-06-10 10:53:52 +02:00
Willy Tarreau
6ec902a659 MINOR: threads: serialize threads initialization
There is no point in initializing threads in parallel when we know that
it's the moment where some global variables are turned to thread-local
ones, and/or that some global variables are updated (like global_now or
trash_size). Some FDs might be created/destroyed/reallocated and could
be tricky to follow as well (think about epoll_fd for example).

Instead of having to be extremely careful about all these, and to trigger
false positives in thread sanitizers, let's simply initialize one thread
at a time. The init step is very fast so nobody should even notice, and
we won't have any more doubts about what might have happened when
analysing a dump.

See GH issues #111 and #117 for some background on this.
2019-06-07 15:37:47 +02:00