Commit Graph

7859 Commits

Author SHA1 Message Date
Aurelien DARRAGON
68cfb222b5 BUG/MEDIUM: pattern: prevent UAF on reused pattern expr
Since c5959fd ("MEDIUM: pattern: merge same pattern"), UAF (leading to
crash) can be experienced if the same pattern file (and match method) is
used in two default sections and the first one is not referenced later in
the config. In this case, the first default section will be cleaned up.
However, due to an unhandled case in the above optimization, the original
expr which the second default section relies on is mistakenly freed.

This issue was discovered while trying to reproduce GH #2708. The issue
was particularly tricky to reproduce given the config and sequence
required to make the UAF happen. Hopefully, Github user @asmnek not only
provided useful informations, but since he was able to consistently
trigger the crash in his environment he was able to nail down the crash to
the use of pattern file involved with 2 named default sections. Big thanks
to him.

To fix the issue, let's push the logic from c5959fd a bit further. Instead
of relying on "do_free" variable to know if the expression should be freed
or not (which proved to be insufficient in our case), let's switch to a
simple refcounting logic. This way, no matter who owns the expression, the
last one attempting to free it will be responsible for freeing it.
Refcount is implemented using a 32bit value which fills a previous 4 bytes
structure gap:

        int                        mflags;               /*    80     4 */

        /* XXX 4 bytes hole, try to pack */

        long unsigned int          lock;                 /*    88     8 */
(output from pahole)

Even though it was not reproduced in 2.6 or below by @asmnek (the bug was
revealed thanks to another bugfix), this issue theorically affects all
stable versions (up to c5959fd), thus it should be backported to all
stable versions.
2024-09-09 16:07:05 +02:00
Aaron Kuehler
50322dff81 MEDIUM: server: add init-state
Allow the user to set the "initial state" of a server.

Context:

Servers are always set in an UP status by default. In
some cases, further checks are required to determine if the server is
ready to receive client traffic.

This introduces the "init-state {up|down}" configuration parameter to
the server.

- when set to 'fully-up', the server is considered immediately available
  and can turn to the DOWN sate when ALL health checks fail.
- when set to 'up' (the default), the server is considered immediately
  available and will initiate a health check that can turn it to the DOWN
  state immediately if it fails.
- when set to 'down', the server initially is considered unavailable and
  will initiate a health check that can turn it to the UP state immediately
  if it succeeds.
- when set to 'fully-down', the server is initially considered unavailable
  and can turn to the UP state when ALL health checks succeed.

The server's init-state is considered when the HAProxy instance
is (re)started, a new server is detected (for example via service
discovery / DNS resolution), a server exits maintenance, etc.

Link: https://github.com/haproxy/haproxy/issues/51
2024-09-05 11:13:10 +02:00
Ilya Shipitsin
1f6e5f7a61 CLEANUP: assorted typo fixes in the code and comments
This is 43rd iteration of typo fixes
2024-09-03 17:49:21 +02:00
Christopher Faulet
a7f6b0ac03 MEDIUM: stick-table: Add support of a factor for IN/OUT bytes rates
Add a factor parameter to stick-tables, called "brates-factor", that is
applied to in/out bytes rates to work around the 32-bits limit of the
frequency counters. Thanks to this factor, it is possible to have bytes
rates beyond the 4GB. Instead of counting each bytes, we count blocks
of bytes. Among other things, it will be useful for the bwlim filter, to be
able to configure shared limit exceeding the 4GB/s.

For now, this parameter must be in the range ]0-1024].
2024-09-02 15:50:25 +02:00
Aperence
20efb856e1 MEDIUM: protocol: add MPTCP per address support
Multipath TCP (MPTCP), standardized in RFC8684 [1], is a TCP extension
that enables a TCP connection to use different paths.

Multipath TCP has been used for several use cases. On smartphones, MPTCP
enables seamless handovers between cellular and Wi-Fi networks while
preserving established connections. This use-case is what pushed Apple
to use MPTCP since 2013 in multiple applications [2]. On dual-stack
hosts, Multipath TCP enables the TCP connection to automatically use the
best performing path, either IPv4 or IPv6. If one path fails, MPTCP
automatically uses the other path.

To benefit from MPTCP, both the client and the server have to support
it. Multipath TCP is a backward-compatible TCP extension that is enabled
by default on recent Linux distributions (Debian, Ubuntu, Redhat, ...).
Multipath TCP is included in the Linux kernel since version 5.6 [3]. To
use it on Linux, an application must explicitly enable it when creating
the socket. No need to change anything else in the application.

This attached patch adds MPTCP per address support, to be used with:

  mptcp{,4,6}@<address>[:port1[-port2]]

MPTCP v4 and v6 protocols have been added: they are mainly a copy of the
TCP ones, with small differences: names, proto, and receivers lists.

These protocols are stored in __protocol_by_family, as an alternative to
TCP, similar to what has been done with QUIC. By doing that, the size of
__protocol_by_family has not been increased, and it behaves like TCP.

MPTCP is both supported for the frontend and backend sides.

Also added an example of configuration using mptcp along with a backend
allowing to experiment with it.

Note that this is a re-implementation of Björn's work from 3 years ago
[4], when haproxy's internals were probably less ready to deal with
this, causing his work to be left pending for a while.

Currently, the TCP_MAXSEG socket option doesn't seem to be supported
with MPTCP [5]. This results in a warning when trying to set the MSS of
sockets in proto_tcp:tcp_bind_listener.

This can be resolved by adding two new variables:
sock_inet(6)_mptcp_maxseg_default that will hold the default
value of the TCP_MAXSEG option. Note that for the moment, this
will always be -1 as the option isn't supported. However, in the
future, when the support for this option will be added, it should
contain the correct value for the MSS, allowing to correctly
set the TCP_MAXSEG option.

Link: https://www.rfc-editor.org/rfc/rfc8684.html [1]
Link: https://www.tessares.net/apples-mptcp-story-so-far/ [2]
Link: https://www.mptcp.dev [3]
Link: https://github.com/haproxy/haproxy/issues/1028 [4]
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/515 [5]

Co-authored-by: Dorian Craps <dorian.craps@student.vinci.be>
Co-authored-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
2024-08-30 18:53:49 +02:00
Aperence
38618822e1 MINOR: server: add a alt_proto field for server
Add a new field alt_proto to the server structures that
specify if an alternate protocol should be used for this server.

This field can be transparently passed to protocol_lookup to get
an appropriate protocol structure.

This change allows thus to create servers with different protocols,
and not only TCP anymore.
2024-08-30 18:53:49 +02:00
Aperence
a7b04e383a MINOR: tools: extend str2sa_range to add an alt parameter
Add a new parameter "alt" that will store wether this configuration
use an alternate protocol.

This alt pointer will contain a value that can be transparently
passed to protocol_lookup to obtain an appropriate protocol structure.

This change is needed to allow for example the servers to know if it
need to use an alternate protocol or not.
2024-08-30 18:53:49 +02:00
Frederic Lecaille
f627b9272b BUG/MEDIUM: quic: always validate sender address on 0-RTT
It has been reported by Wedl Michael, a student at the University of Applied
Sciences St. Poelten, a potential vulnerability into haproxy as described below.

An attacker could have obtained a TLS session ticket after having established
a connection to an haproxy QUIC listener, using its real IP address. The
attacker has not even to send a application level request (HTTP3). Then
the attacker could open a 0-RTT session with a spoofed IP address
trusted by the QUIC listen to bypass IP allow/block list and send HTTP3 requests.

To mitigate this vulnerability, one decided to use a token which can be provided
to the client each time it successfully managed to connect to haproxy. These
tokens may be reused for future connections to validate the address/path of the
remote peer as this is done with the Retry token which is used for the current
connection, not the next one. Such tokens are transported by NEW_TOKEN frames
which was not used at this time by haproxy.

So, each time a client connect to an haproxy QUIC listener with 0-RTT
enabled, it is provided with such a token which can be reused for the
next 0-RTT session. If no such a token is presented by the client,
haproxy checks if the session is a 0-RTT one, so with early-data presented
by the client. Contrary to the Retry token, the decision to refuse the
connection is made only when the TLS stack has been provided with
enough early-data from the Initial ClientHello TLS message and when
these data have been accepted. Hopefully, this event arrives fast enough
to allow haproxy to kill the connection if some early-data have been accepted
without token presented by the client.

quic_build_post_handshake_frames() has been modified to build a NEW_TOKEN
frame with this newly implemented token to be transported inside.

quic_tls_derive_retry_token_secret() was renamed to quic_do_tls_derive_token_secre()
and modified to be reused and derive the secret for the new token implementation.

quic_token_validate() has been implemented to validate both the Retry and
the new token implemented by this patch. When this is a non-retry token
which could not be validated, the datagram received is marked as requiring
a Retry packet to be sent, and no connection is created.

When the Initial packet does not embed any non-retry token and if 0-RTT is enabled
the connection is marked with this new flag: QUIC_FL_CONN_NO_TOKEN_RCVD. As soon
as the TLS stack detects that some early-data have been provided and accepted by
the client, the connection is marked to be killed (QUIC_FL_CONN_TO_KILL) from
ha_quic_add_handshake_data(). This is done calling qc_ssl_eary_data_accepted()
new function. The secret TLS handshake is interrupted as soon as possible returnin
0 from ha_quic_add_handshake_data(). The connection is also marked as
requiring a Retry packet to be sent (QUIC_FL_CONN_SEND_RETRY) from
ha_quic_add_handshake_data(). The the handshake I/O handler (quic_conn_io_cb())
knows how to behave: kill the connection after having sent a Retry packet.

About TLS stack compatibility, this patch is supported by aws-lc. It is
disabled for wolfssl which does not support 0-RTT at this time thanks
to HAVE_SSL_0RTT_QUIC.

This patch depends on these commits:

     MINOR: quic: Add trace for QUIC_EV_CONN_IO_CB event.
     MINOR: quic: Implement qc_ssl_eary_data_accepted().
     MINOR: quic: Modify NEW_TOKEN frame structure (qf_new_token struct)
     BUG/MINOR: quic: Missing incrementation in NEW_TOKEN frame builder
     MINOR: quic: Token for future connections implementation.
     MINOR: quic: Implement quic_tls_derive_token_secret().
     MINOR: tools: Implement ipaddrcpy().

Must be backported as far as 2.6.
2024-08-30 17:04:09 +02:00
Frederic Lecaille
609b124561 MINOR: quic: Implement qc_ssl_eary_data_accepted().
This function is a wrapper around SSL_get_early_data_status() for
OpenSSL derived stack and SSL_early_data_accepted() boringSSL derived
stacks like AWS-LC. It returns true for a TLS server if it has
accepted the early data received from a client.

Also implement quic_ssl_early_data_status_str() which is dedicated to be used
for debugging purposes (traces). This function converts the enum returned
by the two function mentionned above to a human readable string.
2024-08-30 17:04:09 +02:00
Frederic Lecaille
e926378375 MINOR: quic: Modify NEW_TOKEN frame structure (qf_new_token struct)
Modify qf_new_token structure to use a static buffer with QUIC_TOKEN_LEN
as size as defined by the token for future connections (quic_token.c).
Modify consequently the NEW_TOKEN frame parser (see quic_parse_new_token_frame()).
Also add comments to denote that the NEW_TOKEN parser function is used only by
clients and that its builder is used only by servers.
2024-08-30 17:04:09 +02:00
Frederic Lecaille
f5b09dc452 MINOR: quic: Token for future connections implementation.
There exist two sorts of token used by QUIC. They are both used to validate
the peer address (path validation). Retry are used for the current
connection the client want to open. This patch implement the other
sort of tokens which after having been received from a connection, may
be provided for the next connection from the same IP address to validate
it (or validate the network path between the client and the server).

The token generation is implemented by quic_generate_token(), and
the token validation by quic_token_chek(). The same method
is used as for Retry tokens to build such tokens to be reused for
future connections. The format is very simple: one byte for the format
identifier to distinguish these new tokens for the Retry token, followed
by a 32bits timestamps. As this part is ciphered with AEAD as cryptographic
algorithm, 16 bytes are needed for the AEAD tag. 16 more random bytes
are added to this token and a salt to derive the AEAD secret used
to cipher the token. In addition to this salt, this is the client IP address
which is used also as AAD to derive the AEAD secret. So, the length of
the token is fixed: 37 bytes.
2024-08-30 17:04:09 +02:00
Frederic Lecaille
74caa0eece MINOR: quic: Implement quic_tls_derive_token_secret().
This is function is similar to quic_tls_derive_retry_token_secret().
Its aim is to derive the secret used to cipher the token to be used
for future connections.

This patch renames quic_tls_derive_retry_token_secret() to a more
and reuses its code to produce a more generic one: quic_do_tls_derive_token_secret().
Two arguments are added to this latter to produce both quic_tls_derive_retry_token_secret()
and quic_tls_derive_token_secret() new function which calls
quic_do_tls_derive_token_secret().
2024-08-30 17:04:09 +02:00
Frederic Lecaille
fb7a092203 MINOR: tools: Implement ipaddrcpy().
Implement ipaddrcpy() new function to copy only the IP address from
a sockaddr_storage struct object into a buffer.
2024-08-30 17:04:09 +02:00
Nicolas CARPi
a33407b499 CLEANUP: mqtt: fix typo in MQTT_REMAINING_LENGHT_MAX_SIZE
There was a typo in the macro name, where LENGTH was incorrectly
written. This didn't cause any issue because the typo appeared in all
occurrences in the codebase.
2024-08-30 14:58:59 +02:00
Christopher Faulet
62c9d51ca4 BUG/MINIR: proxy: Match on 429 status when trying to perform a L7 retry
Support for 429 was recently added to L7 retries (0d142e075 "MINOR: proxy:
Add support of 429-Too-Many-Requests in retry-on status"). But the
l7_status_match() function was not properly updated. The switch statement
must match the 429 status to be able to perform a L7 retry.

This patch must be backported if the commit above is backported. It is
related to #2687.
2024-08-30 12:13:32 +02:00
Christopher Faulet
0d142e0756 MINOR: proxy: Add support of 429-Too-Many-Requests in retry-on status
The "429" status can now be specified on retry-on directives. PR_RE_* flags
were updated to remains sorted.

This patch should fix the issue #2687. It is quite simple so it may safely
be backported to 3.0 if necessary.
2024-08-28 10:05:34 +02:00
William Lallemand
e8fecef0ff MEDIUM: ssl: capture the signature_algorithms extension from Client Hello
Activate the capture of the TLS signature_algorithms extension from the
Client Hello. This list is stored in the ssl_capture buffer when the
global option "tune.ssl.capture-cipherlist-size" is enabled.
2024-08-26 15:17:40 +02:00
William Lallemand
ce7fb6628e MEDIUM: ssl: capture the supported_versions extension from Client Hello
Activate the capture of the TLS supported_versions extension from the
Client Hello. This list is stored in the ssl_capture buffer when the
global option "tune.ssl.capture-cipherlist-size" is enabled.
2024-08-26 15:12:42 +02:00
Valentine Krasnobaeva
7b78e1571b MINOR: mworker: restore initial env before wait mode
This patch is the follow-up of 1811d2a6ba (MINOR: tools: add helpers to
backup/clean/restore env).

In order to avoid unexpected behaviour in master-worker mode during the process
reload with a new configuration, when the old one has contained '*env' keywords,
let's backup its initial environment before calling parse_cfg() and let's clean
and restore it in the context of master process, just before it enters in a wait
polling loop.

This will garantee that new workers will have a new updated environment and not
the previous one inherited from the master, which does not read the configuration,
when it's in a wait-mode.
2024-08-23 17:06:59 +02:00
Valentine Krasnobaeva
1811d2a6ba MINOR: tools: add helpers to backup/clean/restore env
'setenv', 'presetenv', 'unsetenv', 'resetenv' keywords in configuration could
modify the process runtime environment. In case of master-worker mode this
creates a problem, as the configuration is read only once before the forking a
worker and then the master process does the reexec without reading any config
files, just to free the memory. So, during the reload a new worker process will
be created, but it will inherited the previous unchanged environment from the
master in wait mode, thus it won't benefit the changes in configuration,
related to '*env' keywords. This may cause unexpected behavior or some parser
errors in master-worker mode.

So, let's add a helper to backup all process env variables just before it will
read its configuration. And let's also add helpers to clean up the current
runtime environment and to restore it to its initial state (as it was before
parsing the config).
2024-08-23 17:06:33 +02:00
Willy Tarreau
2a799b64b0 MINOR: protocol: add the real address family to the protocol
For custom families, there's sometimes an underlying real address and
it would be nice to be able to directly use the real family in calls
to bind() and connect() without having to add explicit checks for
exceptions everywhere.

Let's add a .real_family field to struct proto_fam for this. For now
it's always equal to the family except for non-transferable ones such
as rhttp where it's equal to the custom one (anything else could fit).
2024-08-21 17:37:46 +02:00
Willy Tarreau
ba4a416c66 MINOR: protocol: add a family lookup
At plenty of places we have access to an address family which may
include some custom addresses but we cannot simply convert them to
the real families without performing some random protocol lookups.

Let's simply add a proto_fam table like we have for the protocols.
The protocols could even be indexed there, but for now it's not worth
it.
2024-08-21 16:46:15 +02:00
Willy Tarreau
732913f848 MINOR: protocol: properly assign the sock_domain and sock_family
When we finally split sock_domain from sock_family in 2.3, something
was not cleanly finished. The family is what should be stored in the
address while the domain is what is supposed to be passed to socket().
But for the custom addresses, we did the opposite, just because the
protocol_lookup() function was acting on the domain, not the family
(both of which are equal for non-custom addresses).

This is an API bug but there's no point backporting it since it does
not have visible effects. It was visible in the code since a few places
were using PF_UNIX while others were comparing the domain against AF_MAX
instead of comparing the family.

This patch clarifies this in the comments on top of proto_fam, addresses
the indexing issue and properly reconfigures the two custom families.
2024-08-21 16:46:15 +02:00
Willy Tarreau
67bf1d6c9e MINOR: quic: support a tolerance for spurious losses
Tests performed between a 1 Gbps connected server and a 100 mbps client,
distant by 95ms showed that:

  - we need 1.1 MB in flight to fill the link
  - rare but inevitable losses are sufficient to make cubic's window
    collapse fast and long to recover
  - a 100 MB object takes 69s to download
  - tolerance for 1 loss between two ACKs suffices to shrink the download
    time to 20-22s
  - 2 losses go to 17-20s
  - 4 losses reach 14-17s

At 100 concurrent connections that fill the server's link:
  - 0 loss tolerance shows 2-3% losses
  - 1 loss tolerance shows 3-5% losses
  - 2 loss tolerance shows 10-13% losses
  - 4 loss tolerance shows 23-29% losses

As such while there can be a significant gain sometimes in setting this
tolerance above zero, it can also significantly waste bandwidth by sending
far more than can be received. While it's probably not a solution to real
world problems, it repeatedly proved to be a very effective troubleshooting
tool helping to figure different root causes of low transfer speeds. In
spirit it is comparable to the no-cc congestion algorithm, i.e. it must
not be used except for experimentation.
2024-08-21 08:34:30 +02:00
Willy Tarreau
fab0e99aa1 MINOR: quic: store the lost packets counter in the quic_cc_event element
Upon loss detection, qc_release_lost_pkts() notifies congestion
controllers about the event and its final time. However it does not
pass the number of lost packets, that can provide useful hints for
some controllers. Let's just pass this option.
2024-08-21 08:02:44 +02:00
Amaury Denoyelle
0d6112b40b MINOR: mux-quic: retry after small buf alloc failure
Previous commit switch to small buffers for HTTP/3 HEADERS emission.
This ensures that several parallel streams can allocate their own buffer
without hitting the connection buffer limit based now on the congestion
window size.

However, this prevents the transmission of responses with uncommonly
large headers. Indeed, if all headers cannot be encoded in a single
buffer, an error is reported which cause the whole connection closure.

Adjust this by implementing a realloc API exposed by QUIC MUX. This
allows application layer to switch from a small to a default buffer and
restart its processing. This guarantees that again headers not longer
than bufsize can be properly transferred.
2024-08-20 18:12:27 +02:00
Amaury Denoyelle
885e4c5cf8 MINOR: quic: support sbuf allocation in quic_stream
This patch extends qc_stream_desc API to be able to allocate small
buffers. QUIC MUX API is similarly updated as ultimatly each application
protocol is responsible to choose between a default or a smaller buffer.

Internally, the type of allocated buffer is remembered via qc_stream_buf
instance. This is mandatory to ensure that the buffer is released in the
correct pool, in particular as small and standard buffers can be
configured with the same size.

This commit is purely an API change. For the moment, small buffers are
not used. This will changed in a dedicated patch.
2024-08-20 18:12:27 +02:00
Amaury Denoyelle
d0d8e57d47 MINOR: quic: define sbuf pool
Define a new buffer pool reserved to allocate smaller memory area. For
the moment, its usage will be restricted to QUIC, as such it is declared
in quic_stream module.

Add a new config option "tune.bufsize.small" to specify the size of the
allocated objects. A special check ensures that it is not greater than
the default bufsize to avoid unexpected effects.
2024-08-20 18:12:27 +02:00
Amaury Denoyelle
1de5f718cf MINOR: quic/config: adapt settings to new conn buffer limit
QUIC MUX buffer allocation limit is now directly based on the underlying
congestion window size. previous static limit based on conn-tx-buffers
is now unused. As such, this commit adds a warning to users to prevent
that it is now obsolete.

Secondly, update max-window-size setting. It is now the main entrypoint
to limit both the maximum congestion window size and the number of QUIC
MUX allocated buffer on emission. Remove its special value '0' which was
used to automatically adjust it on now unused conn-tx-buffers.
2024-08-20 17:59:35 +02:00
Amaury Denoyelle
aeb8c1ddc3 MAJOR: mux-quic: allocate Tx buffers based on congestion window
Each QUIC MUX may allocate buffers for MUX stream emission. These
buffers are then shared with quic_conn to handle ACK reception and
retransmission. A limit on the number of concurrent buffers used per
connection has been defined statically and can be updated via a
configuration option. This commit replaces the limit to instead use the
current underlying congestion window size.

The purpose of this change is to remove the artificial static buffer
count limit, which may be difficult to choose. Indeed, if a connection
performs with minimal loss rate, the buffer count would limit severely
its throughput. It could be increase to fix this, but it also impacts
others connections, even with less optimal performance, causing too many
extra data buffering on the MUX layer. By using the dynamic congestion
window size, haproxy ensures that MUX buffering corresponds roughly to
the network conditions.

Using QCC <buf_in_flight>, a new buffer can be allocated if it is less
than the current window size. If not, QCS emission is interrupted and
haproxy stream layer will subscribe until a new buffer is ready.

One of the criticals parts is to ensure that MUX layer previously
blocked on buffer allocation is properly woken up when sending can be
retried. This occurs on two occasions :

* after an already used Tx buffer is cleared on ACK reception. This case
  is already handled by qcc_notify_buf() via quic_stream layer.

* on congestion window increase. A new qcc_notify_buf() invokation is
  added into qc_notify_send().

Finally, remove <avail_bufs> QCC field which is now unused.

This commit is labelled MAJOR as it may have unexpected effect and could
cause significant behavior change. For example, in previous
implementation QUIC MUX would be able to buffer more data even if the
congestion window is small. With this patch, data cannot be transferred
from the stream layer which may cause more streams to be shut down on
client timeout. Another effect may be more CPU consumption as the
connection limit would be hit more often, causing more streams to be
interrupted and woken up in cycle.
2024-08-20 17:17:17 +02:00
Amaury Denoyelle
000976af58 MINOR: mux-quic: define buf_in_flight
Define a new QCC counter named <buf_in_flight>. Its purpose is to
account the current sum of all allocated stream buffer size used on
emission.

For this moment, this counter is updated and buffer allocation and
deallocation. It will be used to replace <avail_bufs> once congestion
window is used as limit for buffer allocation in a future commit.
2024-08-20 17:17:17 +02:00
Amaury Denoyelle
4c4bf26f44 MEDIUM: mux-quic: implement API to ignore txbuf limit for some streams
Define a new qc_stream_desc flag QC_SD_FL_OOB_BUF. This is to mark
streams which are not subject to the connection limit on allocated MUX
stream buffer.

The purpose is to simplify handling of QUIC MUX streams which do not
transfer data and as such are not driven by haproxy layer, for example
HTTP/3 control stream. These streams interacts synchronously with QUIC
MUX and cannot retry emission in case of temporary failure.

This commit will be useful once connection buffer allocation limit is
reimplemented to directly rely on the congestion window size. This will
probably cause the buffer limit to be reached more frequently, maybe
even on QUIC MUX initialization. As such, it will be possible to mark
control streams and prevent them to be subject to the buffer limit.

QUIC MUX expose a new function qcs_send_metadata(). It can be used by an
application protocol to specify which streams are used for control
exchanges. For the moment, no such stream use this mechanism.
2024-08-20 17:17:17 +02:00
Amaury Denoyelle
f4d1bd0b76 MINOR: mux-quic: account stream txbuf in QCC
A limit per connection is put on the number of buffers allocated by QUIC
MUX for emission accross all its streams. This ensures memory
consumption remains under control. This limit is simply explained as a
count of buffers which can be concurrently allocated for each
connection.

As such, quic_conn structure was used to account currently allocated
buffers. However, a quic_conn nevers allocates new stream buffers. This
is only done at QUIC MUX layer. As such, this commit moves buffer
accounting inside QCC structure. This simplifies the API, most notably
qc_stream_buf_alloc() usage.

Note that this commit inverts the accounting. Previously, it was
initially set to 0 and increment for each allocated buffer. Now, it is
set to the maximum value and decrement for each buf usage. This is
considered as clearer to use.
2024-08-20 17:17:17 +02:00
Amaury Denoyelle
c24c8667b2 MINOR: quic: define max-window-size config setting
Define a new global keyword tune.quic.frontend.max-window-size. This
allows to set globally the maximum congestion window size for each QUIC
frontend connections.

The default value is 0. It is a special value which automatically derive
the size from the configured QUIC connection buffer limit. This is
similar to the previous "quic-cc-algo" behavior, which can be used to
override the maximum window size per bind line.
2024-08-20 17:02:29 +02:00
Valentine Krasnobaeva
8b1dfa9def MINOR: cfgparse: limit file size loaded via /dev/stdin
load_cfg_in_mem() can continuously reallocate memory in order to load an
extremely large input from /dev/stdin, until it fails with ENOMEM, which means
that process has consumed all available RAM. In case of containers and
virtualized environments it's not very good.

So, in order to prevent this, let's introduce MAX_CFG_SIZE as 10MB, which will
limit the size of input supplied via /dev/stdin.
2024-08-20 14:28:34 +02:00
Nathan Wehrman
fd48b28315 MINOR: Implements new log format of option tcplog clf
Some systems require log formats in the CLF format and that meant that I
could not send my logs for proxies in mode tcp to those servers.  This
implements a format that uses log variables that are compatble with TCP
mode frontends and replaces traditional HTTP values in the CLF format
to make them stand out. Instead of logging method and URI like this
"GET /example HTTP/1.1" it will log "TCP " and for a response code I
used "000" so it would be easy to separate from legitimate HTTP
traffic. Now your log servers that require a CLF format can see the
timings for TCP traffic as well as HTTP.
2024-08-20 07:46:34 +02:00
Aurelien DARRAGON
f8299bc5ea MINOR: log: "drop" support for log-profile steps
It is now possible to use "drop" keyword for "on" lines under a
log-profile section to specify that no log at all should be emitted for
the specified step (setting an empty format was not sufficient to do so
because only the log payload would be empty, not the log header, thus the
log would still be emitted).

It may be useful to selectively disable logging at specific steps for a
given log target (since the log profile may be set on log directives):

log-profile myprof
  on request format "blabla" sd "custom sd"
  on response drop

New testcase was added to reg-tests/log/log_profiles.vtc
2024-08-19 18:53:01 +02:00
William Lallemand
b2a8e8731d MINOR: channel: implement ci_insert() function
ci_insert() is a function which allows to insert a string <str> of size
<len> at <pos> of the input buffer. This is the equivalent of
ci_insert_line2() but without inserting '\r\n'
2024-08-08 17:29:37 +02:00
Valentine Krasnobaeva
c6cfa7cb4a MINOR: startup: rename readcfgfile in parse_cfg
As readcfgfile no longer opens configuration files and reads them with fgets,
but performs only the parsing of provided data, let's rename it to parse_cfg by
analogy with read_cfg in haproxy.c.
2024-08-07 18:41:41 +02:00
Valentine Krasnobaeva
5b52df4c4d MEDIUM: startup: load and parse configs from memory
Let's call load_cfg_in_ram() helper for each configuration file to load it's
content in some area in memory. Adapt readcfgfile() parser function
respectively. In order to limit changes in its scope we give as an argument a
cfgfile structure, already filled in init_args() and in load_cfg_in_ram() with
file metadata and content.

Parser function (readcfgfile()) uses now fgets_from_mem() instead of standard
fgets from libc implementations.

SPOE filter parses its own configuration file, pointed by 'config' keyword in
the configuration already loaded in memory. So, let's allocate and fill for
this a supplementary cfgfile structure, which is not referenced in cfg_cfgfiles
list. This structure and the memory with content of SPOE filter configuration
are freed immediately in parse_spoe_flt(), when readcfgfile() returns.

HAProxy OpenTracing filter also uses its own configuration file. So, let's
follow the same logic as we do for SPOE filter.
2024-08-07 18:41:41 +02:00
Valentine Krasnobaeva
007f7f2f02 MINOR: tools: add fgets_from_mem
Add fgets_from_mem() helper to read lines from configuration files, stored now
as memory chunks. In order to limit changes in the first-level parser code
(readcfgfile()), it is better to reimplement the standard fgets, i.e. to
have a fgets, which can read the serialized data line by line from some memory
area, instead of file stream, and can keep the same behaviour as libc
implementations fgets.
2024-08-07 18:41:41 +02:00
Valentine Krasnobaeva
5b9ed6e4be MINOR: cfgparse: add load_cfg_in_mem
Add load_cfg_in_mem() helper, which allows to store the content of a given file
in memory.
2024-08-07 18:41:41 +02:00
Valentine Krasnobaeva
bafb0ce272 MINOR: startup: adapt list_append_word to use cfgfile
list_append_word() helper was used before only to chain configuration file names
in a list. As now we start to use cfgfile structure which represents entire file
in memory and its metadata, let's adapt this helper to use this structure and
let's rename it to list_append_cfgfile().

Adapt functions, which process configuration files and directories to use
cfgfile structure and list_append_cfgfile() instead of wordlist.
2024-08-07 18:41:41 +02:00
Valentine Krasnobaeva
39f2a19620 REORG: tools: move list_append_word to cfgparse
Let's move list_append_word to cfgparse.c as it is used only to fill
cfg_cfgfiles list with configuration file names.
2024-08-07 18:41:41 +02:00
Valentine Krasnobaeva
70b842e847 MINOR: cfgparse: add struct cfgfile to represent config in memory
This and following commits serve to prepare loading configuration files in
memory, before parsing them, as we may need to parse some parts of
configuration in different moments of the startup sequence. This is a case of
the new master-worker initialization process. Here we need to read at first
only the global and the program sections and only after some steps
(forking worker, etc) the rest of the configuration.

Add a new structure cfgfile to keep configuration files metadata and content,
loaded somewhere in a memory. Instances of filled cfgfile structures could be
chained in a list, as the order in which they were loaded is important.
2024-08-07 18:41:41 +02:00
Willy Tarreau
10c8baca44 MINOR: trace: add a per-source helper to pre-fill the context
Now sources which want to do it can provide a helper that can pre-fill
some fields in the context based on their knowledge (e.g. mux streams).
2024-08-07 16:02:59 +02:00
Willy Tarreau
7d55a70f5a MINOR: trace: move the known trace context into a dedicated struct
We now have a trace_ctx to hold the sess, conn, qc, stream and so on.
This will allow us to pass it across layers so that other helpers can
help fill them.

Ideally it should be passed as an argument to __trace_enabled() by
__trace() so that it can be passed back to the trace callback. But
it seems that trace callbacks are smart enough to figure all their
info when they need them.
2024-08-07 16:02:59 +02:00
Willy Tarreau
d465610ec3 MEDIUM: trace: implement a "follow" mechanism
With "follow" from one source to another, it becomes possible for a
source to automatically follow another source's tracked pointer. The
best example is the session:
  - the "session" source is enabled and has a "lockon session"
    -> its lockon_ptr is equal to the session when valid
  - other sources (h1,h2,h3 etc) are configured for "follow session"
    and will then automatically check if session's lockon_ptr matches
    its own session, in which case tracing will be enabled for that
    trace (no state change).

It's not necessary to start/pause/stop traces when using this, only
"follow" followed by a source with lockon enabled is needed. Some
combinations might work better than others. At the moment the session
is almost never known from the backend, but this may improve.

The meta-source "all" is supported for the follower so that all sources
will follow the tracked one.
2024-08-07 16:02:59 +02:00
Amaury Denoyelle
9f829ea3f3 MINOR: mux-quic: measure QCS lifetime and its blocking state
Reuse newly defined tot_time structure to measure various values related
to a QCS lifetime.

First, a timer is used to comptabilize the total QCS lifetime. Then, two
other timers are used to account the total time during which Tx from
stream layer to MUX is blocked, either on lack of buffer or due to
flow-control.

These three timers are reported in qmux_dump_qcs_info(). Thus, they are
available in traces and for QUIC MUX debug string sample.
2024-08-07 15:40:52 +02:00
Amaury Denoyelle
a6e2523ca1 MINOR: time: define tot_time structure
Define a new utility type tot_time. Its purpose is to be able to account
elapsed time accross multiple periods. Functions are defined to easily
start and stop measures, and return the current value.
2024-08-07 15:40:52 +02:00