2069 Commits

Author SHA1 Message Date
Willy Tarreau
2086365f51 CLEANUP: pattern: remove the pat_time definition
It was inherited from acl_time, introduced in 1.3.10 by commit a84d374367
("[MAJOR] new framework for generic ACL support") and was never ever used.
Let's simply drop it now.
2020-01-22 07:44:36 +01:00
Tim Duesterhus
6a0dd73390 CLEANUP: Consistently unsigned int for bitfields
Signed bitfields of size `1` hold the values `0` and `-1`, but are
usually assigned `1`, possibly leading to subtle bugs when the value
is explicitely compared against `1`.
2020-01-22 07:28:39 +01:00
Baptiste Assmann
13a9232ebc MEDIUM: dns: use Additional records from SRV responses
Most DNS servers provide A/AAAA records in the Additional section of a
response, which correspond to the SRV records from the Answer section:

  ;; QUESTION SECTION:
  ;_http._tcp.be1.domain.tld.     IN      SRV

  ;; ANSWER SECTION:
  _http._tcp.be1.domain.tld. 3600 IN      SRV     5 500 80 A1.domain.tld.
  _http._tcp.be1.domain.tld. 3600 IN      SRV     5 500 80 A8.domain.tld.
  _http._tcp.be1.domain.tld. 3600 IN      SRV     5 500 80 A5.domain.tld.
  _http._tcp.be1.domain.tld. 3600 IN      SRV     5 500 80 A6.domain.tld.
  _http._tcp.be1.domain.tld. 3600 IN      SRV     5 500 80 A4.domain.tld.
  _http._tcp.be1.domain.tld. 3600 IN      SRV     5 500 80 A3.domain.tld.
  _http._tcp.be1.domain.tld. 3600 IN      SRV     5 500 80 A2.domain.tld.
  _http._tcp.be1.domain.tld. 3600 IN      SRV     5 500 80 A7.domain.tld.

  ;; ADDITIONAL SECTION:
  A1.domain.tld.          3600    IN      A       192.168.0.1
  A8.domain.tld.          3600    IN      A       192.168.0.8
  A5.domain.tld.          3600    IN      A       192.168.0.5
  A6.domain.tld.          3600    IN      A       192.168.0.6
  A4.domain.tld.          3600    IN      A       192.168.0.4
  A3.domain.tld.          3600    IN      A       192.168.0.3
  A2.domain.tld.          3600    IN      A       192.168.0.2
  A7.domain.tld.          3600    IN      A       192.168.0.7

SRV record support was introduced in HAProxy 1.8 and the first design
did not take into account the records from the Additional section.
Instead, a new resolution is associated to each server with its relevant
FQDN.
This behavior generates a lot of DNS requests (1 SRV + 1 per server
associated).

This patch aims at fixing this by:
- when a DNS response is validated, we associate A/AAAA records to
  relevant SRV ones
- set a flag on associated servers to prevent them from running a DNS
  resolution for said FADN
- update server IP address with information found in the Additional
  section

If no relevant record can be found in the Additional section, then
HAProxy will failback to running a dedicated resolution for this server,
as it used to do.
This behavior is the one described in RFC 2782.
2020-01-22 07:19:54 +01:00
Christopher Faulet
2f5339079b MINOR: proxy/http-ana: Add support of extra attributes for the cookie directive
It is now possible to insert any attribute when a cookie is inserted by
HAProxy. Any value may be set, no check is performed except the syntax validity
(CTRL chars and ';' are forbidden). For instance, it may be used to add the
SameSite attribute:

    cookie SRV insert attr "SameSite=Strict"

The attr option may be repeated to add several attributes.

This patch should fix the issue #361.
2020-01-22 07:18:31 +01:00
Christopher Faulet
554c0ebffd MEDIUM: http-rules: Support an optional error message in http deny rules
It is now possible to set the error message to use when a deny rule is
executed. It may be a specific error file, adding "errorfile <file>" :

  http-request deny deny_status 400 errorfile /etc/haproxy/errorfiles/400badreq.http

It may also be an error file from an http-errors section, adding "errorfiles
<name>" :

  http-request deny errorfiles my-errors  # use 403 error from "my-errors" section

When defined, this error message is set in the HTTP transaction. The tarpit rule
is also concerned by this change.
2020-01-20 15:18:46 +01:00
Christopher Faulet
473e880a25 MINOR: http-ana: Add an error message in the txn and send it when defined
It is now possible to set the error message to return to client in the HTTP
transaction. If it is defined, this error message is used instead of proxy's
errors or default errors.
2020-01-20 15:18:46 +01:00
Christopher Faulet
76edc0f29c MEDIUM: proxy: Add a directive to reference an http-errors section in a proxy
It is now possible to import in a proxy, fully or partially, error files
declared in an http-errors section. It may be done using the "errorfiles"
directive, followed by a name and optionally a list of status code. If there is
no status code specified, all error files of the http-errors section are
imported. Otherwise, only error files associated to the listed status code are
imported. For instance :

  http-errors my-errors
      errorfile 400 ...
      errorfile 403 ...
      errorfile 404 ...

  frontend frt
      errorfiles my-errors 403 404  # ==> error 400 not imported
2020-01-20 15:18:46 +01:00
Christopher Faulet
35cd81d363 MINOR: http-htx: Add a new section to create groups of custom HTTP errors
A new section may now be declared in the configuration to create global groups
of HTTP errors. These groups are not linked to a proxy and are referenced by
name. The section must be declared using the keyword "http-errors" followed by
the group name. This name must be unique. A list of "errorfile" directives may
be declared in such section. For instance:

    http-errors website-1
        errorfile 400 /path/to/site1/400.http
        errorfile 404 /path/to/site1/404.http

    http-errors website-2
        errorfile 400 /path/to/site2/400.http
        errorfile 404 /path/to/site2/404.http

For now, it is just possible to create "http-errors" sections. There is no
documentation because these groups are not used yet.
2020-01-20 15:18:46 +01:00
Christopher Faulet
5885775de1 MEDIUM: http-htx/proxy: Use a global and centralized storage for HTTP error messages
All custom HTTP errors are now stored in a global tree. Proxies use a references
on these messages. The key used for errorfile directives is the file name as
specified in the configuration. For errorloc directives, a key is created using
the redirect code and the url. This means that the same custom error message is
now stored only once. It may be used in several proxies or for several status
code, it is only parsed and stored once.
2020-01-20 15:18:46 +01:00
Christopher Faulet
d73b96d48c MINOR: tcp-rules: Make tcp-request capture a custom action
Now, this action is use its own dedicated function and is no longer handled "in
place" during the TCP rules evaluation. Thus the action name ACT_TCP_CAPTURE is
removed. The action type is set to ACT_CUSTOM and a check function is used to
know if the rule depends on request contents while there is no inspect-delay.
2020-01-20 15:18:45 +01:00
Christopher Faulet
ac98d81f46 MINOR: http-rule/tcp-rules: Make track-sc* custom actions
Now, these actions use their own dedicated function and are no longer handled
"in place" during the TCP/HTTP rules evaluation. Thus the action names
ACT_ACTION_TRK_SC0 and ACT_ACTION_TRK_SCMAX are removed. The action type is now
the tracking index. Thus the function trk_idx() is no longer needed.
2020-01-20 15:18:45 +01:00
Christopher Faulet
91b3ec13c6 MEDIUM: http-rules: Make early-hint custom actions
Now, the early-hint action uses its own dedicated action and is no longer
handled "in place" during the HTTP rules evaluation. Thus the action name
ACT_HTTP_EARLY_HINT is removed. In additionn, http_add_early_hint_header() and
http_reply_103_early_hints() are also removed. This part is now handled in the
new action_ptr callback function.
2020-01-20 15:18:45 +01:00
Christopher Faulet
046cf44f6c MINOR: http-rules: Make set/del-map and add/del-acl custom actions
Now, these actions use their own dedicated function and are no longer handled
"in place" during the HTTP rules evaluation. Thus the action names
ACT_HTTP_*_ACL and ACT_HTTP_*_MAP are removed. The action type is now mapped as
following: 0 = add-acl, 1 = set-map, 2 = del-acl and 3 = del-map.
2020-01-20 15:18:45 +01:00
Christopher Faulet
d1f27e3394 MINOR: http-rules: Make set-header and add-header custom actions
Now, these actions use their own dedicated function and are no longer handled
"in place" during the HTTP rules evaluation. Thus the action names
ACT_HTTP_SET_HDR and ACT_HTTP_ADD_VAL are removed. The action type is now set to
0 to set a header (so remove existing ones if any and add a new one) or to 1 to
add a header (add without remove).
2020-01-20 15:18:45 +01:00
Christopher Faulet
92d34fe38d MINOR: http-rules: Make replace-header and replace-value custom actions
Now, these actions use their own dedicated function and are no longer handled
"in place" during the HTTP rules evaluation. Thus the action names
ACT_HTTP_REPLACE_HDR and ACT_HTTP_REPLACE_VAL are removed. The action type is
now set to 0 to evaluate the whole header or to 1 to evaluate every
comma-delimited values.

The function http_transform_header_str() is renamed to http_replace_hdrs() to be
more explicit and the function http_transform_header() is removed. In fact, this
last one is now more or less the new action function.

The lua code has been updated accordingly to use http_replace_hdrs().
2020-01-20 15:18:45 +01:00
Christopher Faulet
006f6507d7 MINOR: actions: Use an integer to set the action type
<action> field in the act_rule structure is now an integer. The act_name values
are used for all actions without action function (but it is not a pre-requisit
though) or the action will have no effect. But for all other actions, any
integer value may used, only the action function will take care of it. The
default for such actions is ACT_CUSTOM.
2020-01-20 15:18:45 +01:00
Christopher Faulet
245cf795c1 MINOR: actions: Add flags to configure the action behaviour
Some flags can now be set on an action when it is registered. The flags are
defined in the act_flag enum. For now, only ACT_FLAG_FINAL may be set on an
action to specify if it stops the rules evaluation. It is set on
ACT_ACTION_ALLOW, ACT_ACTION_DENY, ACT_HTTP_REQ_TARPIT, ACT_HTTP_REQ_AUTH,
ACT_HTTP_REDIR and ACT_TCP_CLOSE actions. But, when required, it may also be set
on custom actions.

Consequently, this flag is checked instead of the action type during the
configuration parsing to trigger a warning when a rule inhibits all the
following ones.
2020-01-20 15:18:45 +01:00
Christopher Faulet
105ba6cc54 MINOR: actions: Rename the act_flag enum into act_opt
The flags in the act_flag enum have been renamed act_opt. It means ACT_OPT
prefix is used instead of ACT_FLAG. The purpose of this patch is to reserve the
action flags for the actions configuration.
2020-01-20 15:18:45 +01:00
Christopher Faulet
cd26e8a2ec MINOR: http-rules/tcp-rules: Call the defined action function first if defined
When TCP and HTTP rules are evaluated, if an action function (action_ptr field
in the act_rule structure) is defined for a given action, it is now always
called in priority over the test on the action type. Concretly, for now, only
custom actions define it. Thus there is no change. It just let us the choice to
extend the action type beyond the existing ones in the enum.
2020-01-20 15:18:45 +01:00
Christopher Faulet
96bff76087 MINOR: actions: Regroup some info about HTTP rules in the same struct
Info used by HTTP rules manipulating the message itself are splitted in several
structures in the arg union. But it is possible to group all of them in a unique
struct. Now, <arg.http> is used by most of these rules, which contains:

  * <arg.http.i>   : an integer used as status code, nice/tos/mark/loglevel or
                     action id.
  * <arg.http.str> : an IST used as header name, reason string or auth realm.
  * <arg.http.fmt> : a log-format compatible expression
  * <arg.http.re>  : a regular expression used by replace rules
2020-01-20 15:18:45 +01:00
Christopher Faulet
58b3564fde MINOR: actions: Add a function pointer to release args used by actions
Arguments used by actions are never released during HAProxy deinit. Now, it is
possible to specify a function to do so. ".release_ptr" field in the act_rule
structure may be set during the configuration parsing to a specific deinit
function depending on the action type.
2020-01-20 15:18:45 +01:00
Christopher Faulet
a00071e2e5 MINOR: http-ana: Add a txn flag to support soft/strict message rewrites
the HTTP_MSGF_SOFT_RW flag must now be set on the HTTP transaction to ignore
rewrite errors on a message, from HTTP rules. The mode is called the soft
rewrites. If thes flag is not set, strict rewrites are performed. In this mode,
if a rewrite error occurred, an internal error is reported.

For now, HTTP_MSGF_SOFT_RW is always set and there is no way to switch a
transaction in strict mode.
2020-01-20 15:18:45 +01:00
Christopher Faulet
a08546bb5a MINOR: counters: Remove failed_secu counter and use denied_resp instead
The failed_secu counter is only used for the servers stats. It is used to report
the number of denied responses. On proxies, the same info is stored in the
denied_resp counter. So, it is more consistent to use the same field for
servers.
2020-01-20 15:18:45 +01:00
Christopher Faulet
0159ee4032 MINOR: stats: Report internal errors in the proxies/listeners/servers stats
The stats field ST_F_EINT has been added to report internal errors encountered
per proxy, per listener and per server. It appears in the CLI export and on the
HTML stats page.
2020-01-20 15:18:45 +01:00
Christopher Faulet
30a2a3724b MINOR: http-rules: Add more return codes to let custom actions act as normal ones
When HTTP/TCP rules are evaluated, especially HTTP ones, some results are
possible for normal actions and not for custom ones. So missing return codes
(ACT_RET_) have been added to let custom actions act as normal ones. Concretely
following codes have been added:

 * ACT_RET_DENY : deny the request/response. It must be handled by the caller
 * ACT_RET_ABRT : abort the request/response, handled by action itsleft.
 * ACT_RET_INV  : invalid request/response
2020-01-20 15:18:45 +01:00
Christopher Faulet
4d90db5f4c MINOR: http-rules: Add a rule result to report internal error
Now, when HTTP rules are evaluated, HTTP_RULE_RES_ERROR must be returned when an
internal error is catched. It is a way to make the difference between a bad
request or a bad response and an error during its processing.
2020-01-20 15:18:45 +01:00
Christopher Faulet
d4ce6c2957 MINOR: counters: Add a counter to report internal processing errors
This counter, named 'internal_errors', has been added in frontend and backend
counters. It should be used when a internal error is encountered, instead for
failed_req or failed_resp.
2020-01-20 15:18:45 +01:00
Willy Tarreau
ee1a6fc943 MINOR: connection: make the last arg of subscribe() a struct wait_event*
The subscriber used to be passed as a "void *param" that was systematically
cast to a struct wait_event*. By now it appears clear that the subscribe()
call at every layer is well defined and always takes a pointer to an event
subscriber of type wait_event, so let's enforce this in the functions'
prototypes, remove the intermediary variables used to cast it and clean up
the comments to clarify what all these functions do in their context.
2020-01-17 18:30:37 +01:00
Willy Tarreau
7872d1fc15 MEDIUM: connection: merge the send_wait and recv_wait entries
In practice all callers use the same wait_event notification for any I/O
so instead of keeping specific code to handle them separately, let's merge
them and it will allow us to create new events later.
2020-01-17 18:30:36 +01:00
Willy Tarreau
3381bf89e3 MEDIUM: connection: get rid of CO_FL_CURR_* flags
These ones used to serve as a set of switches between CO_FL_SOCK_* and
CO_FL_XPRT_*, and now that the SOCK layer is gone, they're always a
copy of the last know CO_FL_XPRT_* ones that is resynchronized before
I/O events by calling conn_refresh_polling_flags(), and that are pushed
back to FDs when detecting changes with conn_xprt_polling_changes().

While these functions are not particularly heavy, what they do is
totally redundant by now because the fd_want_*/fd_stop_*() actions
already perform test-and-set operations to decide to create an entry
or not, so they do the exact same thing that is done by
conn_xprt_polling_changes(). As such it is pointless to call that
one, and given that the only reason to keep CO_FL_CURR_* is to detect
changes there, we can now remove them.

Even if this does only save very few cycles, this removes a significant
complexity that has been responsible for many bugs in the past, including
the last one affecting FreeBSD.

All tests look good, and no performance regressions were observed.
2020-01-17 17:45:12 +01:00
Willy Tarreau
17ccd1a356 BUG/MEDIUM: connection: add a mux flag to indicate splice usability
Commit c640ef1a7d ("BUG/MINOR: stream-int: avoid calling rcv_buf() when
splicing is still possible") fixed splicing in TCP and legacy mode but
broke it badly in HTX mode.

What happens in HTX mode is that the channel's to_forward value remains
set to CHN_INFINITE_FORWARD during the whole transfer, and as such it is
not a reliable signal anymore to indicate whether more data are expected
or not. Thus, when data are spliced out of the mux using rcv_pipe(), even
when the end is reached (that only the mux knows about), the call to
rcv_buf() to get the final HTX blocks completing the message were skipped
and there was often no new event to wake this up, resulting in transfer
timeouts at the end of large objects.

All this goes down to the fact that the channel has no more information
about whether it can splice or not despite being the one having to take
the decision to call rcv_pipe() or not. And we cannot afford to call
rcv_buf() inconditionally because, as the commit above showed, this
reduces the forwarding performance by 2 to 3 in TCP and legacy modes
due to data lying in the buffer preventing splicing from being used
later.

The approach taken by this patch consists in offering the muxes the ability
to report a bit more information to the upper layers via the conn_stream.
This information could simply be to indicate that more data are awaited
but the real need being to distinguish splicing and receiving, here
instead we clearly report the mux's willingness to be called for splicing
or not. Hence the flag's name, CS_FL_MAY_SPLICE.

The mux sets this flag when it knows that its buffer is empty and that
data waiting past what is currently known may be spliced, and clears it
when it knows there's no more data or that the caller must fall back to
rcv_buf() instead.

The stream-int code now uses this to determine if splicing may be used
or not instead of looking at the rcv_pipe() callbacks through the whole
chain. And after the rcv_pipe() call, it checks the flag again to decide
whether it may safely skip rcv_buf() or not.

All this bitfield dance remains a bit complex and it starts to appear
obvious that splicing vs reading should be a decision of the mux based
on permission granted by the data layer. This would however increase
the API's complexity but definitely need to be thought about, and should
even significantly simplify the data processing layer.

The way it was integrated in mux-h1 will also result in no more calls
to rcv_pipe() on chunked encoded data, since these ones are currently
disabled at the mux level. However once the issue with chunks+splice
is fixed, it will be important to explicitly check for curr_len|CHNK
to set MAY_SPLICE, so that we don't call rcv_buf() after each chunk.

This fix must be backported to 2.1 and 2.0.
2020-01-17 17:00:12 +01:00
Willy Tarreau
f31af9367e MEDIUM: lua: don't call the GC as often when dealing with outgoing connections
In order to properly close connections established from Lua in case
a Lua context dies, the context currently automatically gets a flag
HLUA_MUST_GC set whenever an outgoing connection is used. This causes
the GC to be enforced on the context's death as well as on yield. First,
it does not appear necessary to do it when yielding, since if the
connections die they are already cleaned up. Second, the problem with
the flag is that even if a connection gets properly closed, the flag is
not removed and the GC continues to be called on the Lua context.

The impact on performance looks quite significant, as noticed and
diagnosed by Sadasiva Gujjarlapudi in the following thread:

  https://www.mail-archive.com/haproxy@formilux.org/msg35810.html

This patch changes the flag for a counter so that each created
connection increments it and each cleanly closed connection decrements
it. That way we know we have to call the GC on the context's death only
if the count is non-null. As reported in the thread above, the Lua
performance gain is now over 20% by doing this.

Thanks to Sada and Thierry for the design discussion and tests that
led to this solution.
2020-01-14 10:12:31 +01:00
Willy Tarreau
0fbc318e24 CLEANUP: connection: merge CO_FL_NOTIFY_DATA and CO_FL_NOTIFY_DONE
Both flags became equal in commit 82967bf9 ("MINOR: connection: adjust
CO_FL_NOTIFY_DATA after removal of flags"), which already predicted the
overlap between xprt_done_cb() and wake() after the removal of the DATA
specific flags in 1.8. Let's simply remove CO_FL_NOTIFY_DATA since the
"_DONE" version already covers everything and explains the intent well
enough.
2019-12-27 16:38:47 +01:00
Willy Tarreau
11ef0837af MINOR: pollers: add a new flag to indicate pollers reporting ERR & HUP
In practice it's all pollers except select(). It turns out that we're
keeping some legacy code only for select and enforcing it on all
pollers, let's offer the pollers the ability to declare that they
do not need that.
2019-12-27 14:04:33 +01:00
Willy Tarreau
dd0e89a084 BUG/MAJOR: task: add a new TASK_SHARED_WQ flag to fix foreing requeuing
Since 1.9 with commit b20aa9eef3 ("MAJOR: tasks: create per-thread wait
queues") a task bound to a single thread will not use locks when being
queued or dequeued because the wait queue is assumed to be the owner
thread's.

But there exists a rare situation where this is not true: the health
check tasks may be running on one thread waiting for a response, and
may in parallel be requeued by another thread calling health_adjust()
after a detecting a response error in traffic when "observe l7" is set,
and "fastinter" is lower than "inter", requiring to shorten the running
check's timeout. In this case, the task being requeued was present in
another thread's wait queue, thus opening a race during task_unlink_wq(),
and gets requeued into the calling thread's wait queue instead of the
running one's, opening a second race here.

This patch aims at protecting against the risk of calling task_unlink_wq()
from one thread while the task is queued on another thread, hence unlocked,
by introducing a new TASK_SHARED_WQ flag.

This new flag indicates that a task's position in the wait queue may be
adjusted by other threads than then one currently executing it. This means
that such WQ manipulations must be performed under a lock. There are two
types of such tasks:
  - the global ones, using the global wait queue (technically speaking,
    those whose thread_mask has at least 2 bits set).
  - some local ones, which for now will be placed into the global wait
    queue as well in order to benefit from its lock.

The flag is automatically set on initialization if the task's thread mask
indicates more than one thread. The caller must also set it if it intends
to let other threads update the task's expiration delay (e.g. delegated
I/Os), or if it intends to change the task's affinity over time as this
could lead to the same situation.

Right now only the situation described above seems to be affected by this
issue, and it is very difficult to trigger, and even then, will often have
no visible effect beyond stopping the checks for example once the race is
met. On my laptop it is feasible with the following config, chained to
httpterm:

    global
        maxconn 400 # provoke FD errors, calling health_adjust()

    defaults
        mode http
        timeout client 10s
        timeout server 10s
        timeout connect 10s

    listen px
        bind :8001
        option httpchk /?t=50
        server sback 127.0.0.1:8000 backup
        server-template s 0-999 127.0.0.1:8000 check port 8001 inter 100 fastinter 10 observe layer7

This patch will automatically address the case for the checks because
check tasks are created with multiple threads bound and will get the
TASK_SHARED_WQ flag set.

If in the future more tasks need to rely on this (multi-threaded muxes
for example) and the use of the global wait queue becomes a bottleneck
again, then it should not be too difficult to place locks on the local
wait queues and queue the task on its bound thread.

This patch needs to be backported to 2.1, 2.0 and 1.9. It depends on
previous patch "MINOR: task: only check TASK_WOKEN_ANY to decide to
requeue a task".

Many thanks to William Dauchy for providing detailed traces allowing to
spot the problem.
2019-12-19 14:42:22 +01:00
Willy Tarreau
a1d97f88e0 REORG: listener: move the global listener queue code to listener.c
The global listener queue code and declarations were still lying in
haproxy.c while not needed there anymore at all. This complicates
the code for no reason. As a result, the global_listener_queue_task
and the global_listener_queue were made static.
2019-12-10 14:16:03 +01:00
Willy Tarreau
a45a8b5171 MEDIUM: init: set NO_NEW_PRIVS by default when supported
HAProxy doesn't need to call executables at run time (except when using
external checks which are strongly recommended against), and is even expected
to isolate itself into an empty chroot. As such, there basically is no valid
reason to allow a setuid executable to be called without the user being fully
aware of the risks. In a situation where haproxy would need to call external
checks and/or disable chroot, exploiting a vulnerability in a library or in
haproxy itself could lead to the execution of an external program. On Linux
it is possible to lock the process so that any setuid bit present on such an
executable is ignored. This significantly reduces the risk of privilege
escalation in such a situation. This is what haproxy does by default. In case
this causes a problem to an external check (for example one which would need
the "ping" command), then it is possible to disable this protection by
explicitly adding this directive in the global section. If enabled, it is
possible to turn it back off by prefixing it with the "no" keyword.

Before the option:

  $ socat - /tmp/sock1 <<< "expert-mode on; debug dev exec sudo /bin/id"
  uid=0(root) gid=0(root) groups=0(root

After the option:
  $ socat - /tmp/sock1 <<< "expert-mode on; debug dev exec sudo /bin/id"
  sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the
        'nosuid' option set or an NFS file system without root privileges?
2019-12-06 17:20:26 +01:00
Willy Tarreau
d96f1126fe MEDIUM: init: prevent process and thread creation at runtime
Some concerns are regularly raised about the risk to inherit some Lua
files which make use of a fork (e.g. via os.execute()) as well as
whether or not some of bugs we fix might or not be exploitable to run
some code. Given that haproxy is event-driven, any foreground activity
completely stops processing and is easy to detect, but background
activity is a different story. A Lua script could very well discretely
fork a sub-process connecting to a remote location and taking commands,
and some injected code could also try to hide its activity by creating
a process or a thread without blocking the rest of the processing. While
such activities should be extremely limited when run in an empty chroot
without any permission, it would be better to get a higher assurance
they cannot happen.

This patch introduces something very simple: it limits the number of
processes and threads to zero in the workers after the last thread was
created. By doing so, it effectively instructs the system to fail on
any fork() or clone() syscall. Thus any undesired activity has to happen
in the foreground and is way easier to detect.

This will obviously break external checks (whose concept is already
totally insecure), and for this reason a new option
"insecure-fork-wanted" was added to disable this protection, and it
is suggested in the fork() error report from the checks. It is
obviously recommended not to use it and to reconsider the reasons
leading to it being enabled in the first place.

If for any reason we fail to disable forks, we still start because it
could be imaginable that some operating systems refuse to set this
limit to zero, but in this case we emit a warning, that may or may not
be reported since we're after the fork point. Ideally over the long
term it should be conditionned by strict-limits and cause a hard fail.
2019-12-03 11:49:00 +01:00
Daniel Corbett
f8716914c7 MEDIUM: dns: Add resolve-opts "ignore-weight"
It was noted in #48 that there are times when a configuration
may use the server-template directive with SRV records and
simultaneously want to control weights using an agent-check or
through the runtime api.  This patch adds a new option
"ignore-weight" to the "resolve-opts" directive.

When specified, any weight indicated within an SRV record will
be ignored.  This is for both initial resolution and ongoing
resolution.
2019-11-21 17:25:31 +01:00
Frédéric Lécaille
ec1c10b839 MINOR: peers: Add debugging information to "show peers".
This patch adds three counters to help in debugging peers protocol issues
to "peer" struct:
	->no_hbt counts the number of reconnection period without receiving heartbeat
	->new_conn counts the number of reconnections after ->reconnect timeout expirations.
	->proto_err counts the number of protocol errors.
2019-11-19 14:48:28 +01:00
Frédéric Lécaille
33cab3c0eb MINOR: peers: Add TX/RX heartbeat counters.
Add RX/TX heartbeat counters to "peer" struct to have an idead about which
peer is alive or not.
Dump these counters values on the CLI via "show peers" command.
2019-11-19 14:48:25 +01:00
Cédric Dufour
0d7712dff0 MINOR: stick-table: allow sc-set-gpt0 to set value from an expression
Allow the sc-set-gpt0 action to set GPT0 to a value dynamically evaluated from
its <expr> argument (in addition to the existing static <int> alternative).
2019-11-15 18:24:19 +01:00
Willy Tarreau
869efd5eeb BUG/MINOR: log: make "show startup-log" use a ring buffer instead
The copy of the startup logs used to rely on a re-allocated memory area
on the fly, that would attempt to be delivered at once over the CLI. But
if it's too large (too many warnings) it will take time to start up, and
may not even show up on the CLI as it doesn't fit in a buffer.

The ring buffer infrastructure solves all this with no more code, let's
switch to this instead. It simply requires a parsing function to attach
the ring via ring_attach_cli() and all the rest is automatically handled.

Initially this was imagined as a code cleanup, until a test with a config
involving 100k backends and just one occurrence of
"load-server-state-from-file global" in the defaults section took approx
20 minutes to parse due to the O(N^2) cost of concatenating the warnings
resulting in ~1 TB of data to be copied, while it took only 0.57s with
the ring.

Ideally this patch should be backported to 2.0 and 1.9, though it relies
on the ring infrastructure which will then also need to be backported.
Configs able to trigger the bug are uncommon, so another workaround for
older versions without backporting the rings would consist in simply
limiting the size of the error message in print_message() to something
always printable, which will only return the first errors.
2019-11-15 15:50:16 +01:00
Christopher Faulet
0d1c2a65e8 MINOR: stats: Report max times in addition of the averages for sessions
Now, for the sessions, the maximum times (queue, connect, response, total) are
reported in addition of the averages over the last 1024 connections. These
values are called qtime_max, ctime_max, rtime_max and ttime_max.

This patch is related to #272.
2019-11-15 14:23:54 +01:00
Christopher Faulet
efb41f0d8d MINOR: counters: Add fields to store the max observed for {q,c,d,t}_time
For backends and servers, some average times for last 1024 connections are
already calculated. For the moment, the averages for the time passed in the
queue, the connect time, the response time (for HTTP session only) and the total
time are calculated. Now, in addition, the maximum time observed for these
values are also stored.

In addition, These new counters are cleared as all other max values with the CLI
command "clear counters".

This patch is related to #272.
2019-11-15 14:23:21 +01:00
Christopher Faulet
b2e58492b1 MEDIUM: filters: Adapt filters API to allow again TCP filtering on HTX streams
This change make the payload filtering uniform between TCP and HTTP
filters. Now, in TCP, like in HTTP, there is only one callback responsible to
forward data. Thus, old callbacks, tcp_data() and tcp_forward_data(), are
replaced by a single callback function, tcp_payload(). This new callback gets
the offset in the payload to (re)start the filtering and the maximum amount of
data it can forward. It is the filter's responsibility to be compatible with HTX
streams. If not, it must not set the flag FLT_CFG_FL_HTX.

Because of this change, nxt and fwd offsets are no longer needed. Thus they are
removed from the filter structure with their update functions,
flt_change_next_size() and flt_change_forward_size(). Moreover, the trace filter
has been updated accordingly.

This patch breaks the compatibility with the old API. Thus it should probably
not be backported. But, AFAIK, there is no TCP filter, thus the breakage is very
limited.
2019-11-15 13:43:08 +01:00
William Lallemand
21724f0807 MINOR: ssl/cli: replace the default_ctx during 'commit ssl cert'
If the SSL_CTX of a previous instance (ckch_inst) was used as a
default_ctx, replace the default_ctx of the bind_conf by the first
SSL_CTX inserted in the SNI tree.

Use the RWLOCK of the sni tree to handle the change of the default_ctx.
2019-11-04 18:16:53 +01:00
William Lallemand
beea2a476e CLEANUP: ssl/cli: remove leftovers of bundle/certs (it < 2)
Remove the leftovers of the certificate + bundle updating in 'ssl set
cert' and 'commit ssl cert'.

* Remove the it variable in appctx.ctx.ssl.
* Stop doing everything twice.
* Indent
2019-10-30 17:52:34 +01:00
William Lallemand
bc6ca7ccaa MINOR: ssl/cli: rework 'set ssl cert' as 'set/commit'
This patch splits the 'set ssl cert' CLI command into 2 commands.

The previous way of updating the certificate on the CLI was limited with
the bundles. It was only able to apply one of the tree part of the
certificate during an update, which mean that we needed 3 updates to
update a full 3 certs bundle.

It was also not possible to apply atomically several part of a
certificate with the ability to rollback on error. (For example applying
a .pem, then a .ocsp, then a .sctl)

The command 'set ssl cert' will now duplicate the certificate (or
bundle) and update it in a temporary transaction..

The second command 'commit ssl cert' will commit all the changes made
during the transaction for the certificate.

This commit breaks the ability to update a certificate which was used as
a unique file and as a bundle in the HAProxy configuration. This way of
using the certificates wasn't making any sense.

Example:

	// For a bundle:

	$ echo -e "set ssl cert localhost.pem.rsa <<\n$(cat kikyo.pem.rsa)\n" | socat /tmp/sock1 -
	Transaction created for certificate localhost.pem!

	$ echo -e "set ssl cert localhost.pem.dsa <<\n$(cat kikyo.pem.dsa)\n" | socat /tmp/sock1 -
	Transaction updated for certificate localhost.pem!

	$ echo -e "set ssl cert localhost.pem.ecdsa <<\n$(cat kikyo.pem.ecdsa)\n" | socat /tmp/sock1 -
	Transaction updated for certificate localhost.pem!

	$ echo "commit ssl cert localhost.pem" | socat /tmp/sock1 -
	Committing localhost.pem.
	Success!
2019-10-30 17:01:07 +01:00
William Dauchy
0fec3ab7bf MINOR: init: always fail when setrlimit fails
this patch introduces a strict-limits parameter which enforces the
setrlimit setting instead of a warning. This option can be forcingly
disable with the "no" keyword.
The general aim of this patch is to avoid bad surprises on a production
environment where you change the maxconn for example, a new fd limit is
calculated, but cannot be set because of sysfs setting. In that case you
might want to have an explicit failure to be aware of it before seeing
your traffic going down. During a global rollout it is also useful to
explictly fail as most progressive rollout would simply check the
general health check of the process.

As discussed, plan to use the strict by default mode starting from v2.3.

Signed-off-by: William Dauchy <w.dauchy@criteo.com>
2019-10-29 17:42:27 +01:00