In file included from include/haproxy/jwt.h:25:
include/haproxy/jwt-t.h:66:2: error: unknown type name 'EVP_PKEY'
EVP_PKEY *pkey;
^
1 error generated.
Fix this compilation issue by inserting openssl-compat.h in jwt-t.h
This new converter takes a JSON Web Token, an algorithm (among the ones
specified for JWS tokens in RFC 7518) and a public key or a secret, and
it returns a verdict about the signature contained in the token. It does
not simply return a boolean because some specific error cases cas be
specified by returning an integer instead, such as unmanaged algorithms
or invalid tokens. This enables to distinguich malformed tokens from
tampered ones, that would be valid format-wise but would have a bad
signature.
This converter does not perform a full JWT validation as decribed in
section 7.2 of RFC 7519. For instance it does not ensure that the header
and payload parts of the token are completely valid JSON objects because
it would need a complete JSON parser. It only focuses on the signature
and checks that it matches the token's contents.
A JWT signed with the RSXXX or ESXXX algorithm (RSA or ECDSA) requires a
public certificate to be verified and to ensure it is valid. Those
certificates must not be read on disk at runtime so we need a caching
mechanism into which those certificates will be loaded during init.
This is done through a dedicated ebtree that is filled during
configuration parsing. The path to the public certificates will need to
be explicitely mentioned in the configuration so that certificates can
be loaded as early as possible.
This tree is different from the ckch one because ckch entries are much
bigger than the public certificates used in JWT validation process.
This helper function splits a JWT under Compact Serialization format
(dot-separated base64-url encoded strings) into its different sub
strings. Since we do not want to manage more than JWS for now, which can
only have at most three subparts, any JWT that has strictly more than
two dots is considered invalid.
The full list of possible algorithms used to create a JWS signature is
defined in section 3.1 of RFC7518. This patch adds a helper function
that converts the "alg" strings into an enum member.
This fetch can be used to retrieve the data contained in an HTTP
Authorization header when the Bearer scheme is used. This is used when
transmitting JSON Web Tokens for instance.
On receiving CONNECTION_CLOSE frame, the mux is flagged for immediate
connection close. A stream is closed even if there is data not ACKed
left if CONNECTION_CLOSE has been received.
The mux tx buffers have been rewritten with buffers attached to qcs
instances. qc_buf_available and qc_get_buf functions are updated to
manipulates qcs. All occurences of the unused qcc ring buffer are
removed to ease the code maintenance.
Defer the shutting of a qcs if there is still data in its tx buffers. In
this case, the conn_stream is closed but the qcs is kept with a new flag
QC_SF_DETACH.
On ACK reception, the xprt wake up the shut_tl tasklet if the stream is
flagged with QC_SF_DETACH. This tasklet is responsible to free the qcs
and possibly the qcc when all bidirectional streams are removed.
Remove the tx mux ring buffers in qcs, which should be in the qcc. For
the moment, use a simple architecture with 2 simple tx buffers in the
qcs.
The first buffer is used by the h3 layer to prepare the data. The mux
send operation transfer these into the 2nd buffer named xprt_buf. This
buffer is only freed when an ACK has been received.
This architecture is functional but not optimal for two reasons :
- it won't limit the buffer usage by connection
- each transfer on a new stream requires an allocation
This new ssllib_name_startswith precondition check can be used to
distinguish application linked with OpenSSL from the ones linked with
other SSL libraries (LibreSSL or BoringSSL namely). This check takes a
string as input and returns 1 when the SSL library's name starts with
the given string. It is based on the OpenSSL_version function which
returns the same output as the "openssl version" command.
These ones are passed on rule creation for the sole purpose of being
reported in "show sess", which is not done yet. For now the entries
are allocated upon rule creation and freed in free_act_rules().
Rules are currently allocated using calloc() by their caller, which does
not make it very convenient to pass more information such as the file
name and line number.
This patch introduces new_act_rule() which performs the malloc() and
already takes in argument the ruleset (ACT_F_*), the file name and the
line number. This saves the caller from having to assing ->from, and
will allow to improve the internal storage with more info.
Rename __GLOBL and __GLOBL1 to __HA_GLOBL and __HA_GLOBL1, as the former are
already defined on FreeBSD.
This should be backported to 2.4, 2.3 and 2.2.
There have been a large number of issues reported with conn_cur
synchronization because the concept is wrong. In an active-passive
setup, pushing the local connections count from the active node to
the passive one will result in the passive node to have a higher
counter than the real number of connections. Due to this, after a
switchover, it will never be able to close enough connections to
go down to zero. The same commonly happens on reloads since the new
process preloads its values from the old process, and if no connection
happens for a key after the value is learned, it is impossible to reset
the previous ones. In active-active setups it's a bit different, as the
number of connections reflects the number on the peer that pushed last.
This patch solves this by marking the "conn_cur" local and preventing
it from being learned from peers. It is still pushed, however, so that
any monitoring system that collects values from the peers will still
see it.
The patch is tiny and trivially backportable. While a change of behavior
in stable branches is never welcome, it remains possible to fix issues
if reports become frequent.
In the configuration sometimes we'll omit a thread group number to designate
a global thread number range, and sometimes we'll mention the group and
designate IDs within that group. The operation is more complex than it
seems due to the need to check for ranges spanning between multiple groups
and determining groups from threads from bit masks and remapping bit masks
between local/global.
This patch adds a function to perform this operation, it takes a group and
mask on input and updates them on output. It's designed to be used by "bind"
lines but will likely be usable at other places if needed.
For situations where specified threads do not exist in the group, we have
the choice in the code between silently fixing the thread set or failing
with a message. For now the better option seems to return an error, but if
it turns out to be an issue we can easily change that in the future. Note
that it should only happen with "x/even" when group x only has one thread.
This extends the "thread" statement of bind lines to support an optional
thread group number. When unspecified (0) it's an absolute thread range,
and when specified it's one relative to the thread group. Masks are still
used so no more than 64 threads may be specified at once, and a single
group is possible. The directive is not used for now.
This is the equivalent of "tid" for ease of access. In the future if we
make th_cfg a pure thread-local array (not a pointer), it may make sense
to move it there.
ha_set_tid() was randomly used either to explicitly set thread 0 or to
set any possibly incomplete thread during boot. Let's replace it with
a pointer to a valid thread or NULL for any thread. This allows us to
check that the designated threads are always valid, and to ignore the
thread 0's mapping when setting it to NULL, and always use group 0 with
it during boot.
The initialization code is also cleaner, as we don't pass ugly casts
of a thread ID to a pointer anymore.
This will be a convenient way to communicate the thread ID and its
local ID in the group, as well as their respective bits when creating
the threads or when only a pointer is given.
This will ease the reporting of the current thread group ID when coming
from the thread itself, especially since it returns the visible ID,
starting at 1.
This takes care of unassigned threads groups and places unassigned
threads there, in a more or less balanced way. Too sparse allocations
may still fail though. For now with a maximum group number fixed to 1
nothing can really fail.
A the "tg" thread-local variable now always points to the current
thread group. It's pre-initializd to the first one during boot and is
set to point to the thread's one by ha_set_tid(). This last one takes
care of checking whether the thread group was assigned or not because
it may be called during boot before threads are initialized.
This registers a mapping of threads to groups by enumerating for each thread
what group it belongs to, and marking the group as assigned. It takes care of
checking for redefinitions, overlaps, and holes. It supports both individual
numbers and ranges. The thread group is referenced from the thread config.
This creates a struct tgroup_info which knows the thread ID of the first
thread in a group, and the number of threads in it. For now there's only
one thread group supported in the configuration, but it may be forced to
other values for development purposes by defining MAX_TGROUPS, and it's
enabled even when threads are disabled and will need to remain accessible
during boot to keep a simple enough internal API.
For the purpose of easing the configurations which do not specify a thread
group, we're starting group numbering at 1 so that thread group 0 can be
"undefined" (i.e. for "bind" lines or when binding tasks).
The goal will be to later move there some global items that must be
made per-group.
We want to make sure that the current thread_info accessed via "ti" will
remain constant, so that we don't accidentally place new variable parts
there and so that the compiler knows that info retrieved from there is
not expected to have changed between two function calls.
Only a few init locations had to be adjusted to use the array and the
rest is unaffected.
The last 3 fields were 3 list heads that are per-thread, and which are:
- the pool's LRU head
- the buffer_wq
- the streams list head
Moving them into thread_ctx completes the removal of dynamic elements
from the struct thread_info. Now all these dynamic elements are packed
together at a single place for a thread.
The TI_FL_STUCK flag is manipulated by the watchdog and scheduler
and describes the apparent life/death of a thread so it changes
all the time and it makes sense to move it to the thread's context
for an active thread.
The "thread_info" name was initially chosen to store all info about
threads but since we now have a separate per-thread context, there is
no point keeping some of its elements in the thread_info struct.
As such, this patch moves prev_cpu_time, prev_mono_time and idle_pct to
thread_ctx, into the thread context, with the scheduler parts. Instead
of accessing them via "ti->" we now access them via "th_ctx->", which
makes more sense as they're totally dynamic, and will be required for
future evolutions. There's no room problem for now, the structure still
has 84 bytes available at the end.
The scheduler contains a lot of stuff that is thread-local and not
exclusively tied to the scheduler. Other parts (namely thread_info)
contain similar thread-local context that ought to be merged with
it but that is even less related to the scheduler. However moving
more data into this structure isn't possible since task.h is high
level and cannot be included everywhere (e.g. activity) without
causing include loops.
In the end, it appears that the task_per_thread represents most of
the per-thread context defined with generic types and should simply
move to tinfo.h so that everyone can use them.
The struct was renamed to thread_ctx and the variable "sched" was
renamed to "th_ctx". "sched" used to be initialized manually from
run_thread_poll_loop(), now it's initialized by ha_set_tid() just
like ti, tid, tid_bit.
The memset() in init_task() was removed in favor of a bss initialization
of the array, so that other subsystems can put their stuff in this array.
Since the tasklet array has TL_CLASSES elements, the TL_* definitions
was moved there as well, but it's not a problem.
The vast majority of the change in this patch is caused by the
renaming of the structures.
We used to remap SI_TKILL to SI_LWP when SI_TKILL was not available
(e.g. FreeBSD) but that's ugly and since we need this only in a single
switch/case block in wdt.c it's even simpler and cleaner to perform the
two tests there, so let's do this.
The watchdog timer had no more reason for being shared with the struct
thread_info since the watchdog is the only user now. Let's remove it
from the struct and move it to a static array in wdt.c. This removes
some ifdefs and the need for the ugly mapping to empty_t that might be
subject to a cast to a long when compared to TIMER_INVALID. Now timer_t
is not known outside of wdt.c and clock.c anymore.
This removes the knowledge of clockid_t from anywhere but clock.c, thus
eliminating a source of includes burden. The unused clock_id field was
removed from thread_info, and the definition setting of clockid_t was
removed from compat.h. The most visible change is that the function
now_cpu_time_thread() now takes the thread number instead of a tinfo
pointer.
The code that deals with timer creation for the WDT was moved to clock.c
and is called with the few relevant arguments. This removes the need for
awareness of clock_id from wdt.c and as such saves us from having to
share it outside. The timer_t is also known only from both ends but not
from the public API so that we don't have to create a fake timer_t
anymore on systems which do not support it (e.g. macos).
This was previously open-coded in run_thread_poll_loop(). Now that
we have clock.c dedicated to such stuff, let's move the code there
so that we don't need to keep such ifdefs nor to depend on the
clock_id.
Instead of fiddling with before_poll and after_poll in
activity_count_runtime(), the function is now called by
clock_entering_poll() which passes it the number of microseconds
spent working. This allows to remove all calls to
activity_count_runtime() from the pollers.
The entering_poll/leaving_poll/measure_idle functions that were hard
to classify and used to move to various locations have now been placed
into clock.c since it's precisely about time-keeping. The functions
were renamed to clock_*. The samp_time and idle_time values are now
static since there is no reason for them to be read from outside.
There is currently a problem related to time keeping. We're mixing
the functions to perform calculations with the os-dependent code
needed to retrieve and adjust the local time.
This patch extracts from time.{c,h} the parts that are solely dedicated
to time keeping. These are the "now" or "before_poll" variables for
example, as well as the various now_*() functions that make use of
gettimeofday() and clock_gettime() to retrieve the current time.
The "tv_*" functions moved there were also more appropriately renamed
to "clock_*".
Other parts used to compute stolen time are in other files, they will
have to be picked next.
It was brough by an unneeded addition of a local variable after a test
in commit f7f53afcf ("BUILD/MEDIUM: tcp: set-mark setting support for
FreeBSD."). No backport needed.
Remove the quic_conn from the receiver connection_ids tree on
quic_conn_free. This fixes a crash due to dangling references in the
tree after a quic connection release.
This operation must be conducted under the listener lock. For this
reason, the quic_conn now contains a reference to its attached listener.
Following include reorganzation, there is some missing include files for
task.h when compiling with DEBUG_TASK :
- activity.h for task_profiling_mask
- time.h for now_mono_time()
This is present since the following commit
d8b325c748
REORG: task: uninline the loop time measurement code
No need to backport this.
These ones are rarely used or only to waste CPU cycles waiting, and are
the last ones requiring system includes in thread.h. Let's uninline them
and move them to thread.c.
This removes the thread identifiers from struct thread_info and moves
them only in static array in thread.c since it's now the only file that
needs to touch it. It's also the only file that needs to include
pthread.h, beyond haproxy.c which needs it to start the poll loop. As
a result, much less system includes are needed and the LoC reduced by
around 3%.
haproxy.c still has to deal with pthread-specific low-level stuff that
is OS-dependent. We should not have to deal with this there, and we do
not need to access pthread anywhere else.
Let's move these 3 functions to thread.c and keep empty inline ones for
when threads are disabled.
It's not needed to inline it at all (one call per loop) and it introduces
dependencies, let's move it to fd.c.
Removing the few remaining includes that came with it further reduced
by ~0.2% the LoC and the build time is now below 6s.
TV_ETERNITY, TV_ETERNITY_MS and MAX_DELAY_MS may be configured and
ought to be in defaults.h so that they can be inherited from everywhere
without including time.h and could also be redefined if neede
(particularly for MAX_DELAY_MS).
It's pointless to inline this, it's called exactly once per poll loop,
and it depends on time.h which is quite deep. Let's move that to task.c
along with sched_report_idle().
The remaining large functions are those allocating/initializing and
occasionally freeing connections, conn_streams and sockaddr. Let's
move them to connection.c. In fact, cs_free() is the only one-liner
but let's move it along with the other ones since a call will be
small compared to the rest of the work done there.
The following inlined functions are particularly large (and probably not
inlined at all by the compiler), and together represent roughly half of
the file, while they're used at most once per connection. They were moved
to connection.c.
conn_upgrade_mux_fe, conn_install_mux_fe, conn_install_mux_be,
conn_install_mux_chk, conn_delete_from_tree, conn_init, conn_new,
conn_free
No need to include the full tree management code, type files only
need the definitions. Doing so reduces the whole code size by around
3.6% and the build time is down to just 6s.
ebtree is one piece using a lot of inlines and each tree root or node
definition needed by many of our structures requires to parse and
compile all these includes, which is large and painfully slow. Let's
move the very basic definitions to their own file and include it from
ebtree.h.
The following functions are quite heavy and have no reason to be kept
inlined:
srv_release_conn, srv_lookup_conn, srv_lookup_conn_next,
srv_add_to_idle_list
They were moved to server.c. It's worth noting that they're a bit
at the edge between server and connection and that maybe we could
create an idle-conn file for these in the near future.
We do not really need to have them inlined, and having xxhash.h included
by connection.h results in this 4700-lines file being processed 101 times
over the whole project, which accounts for 13.5% of the total size!
Additionally, half of the functions are only needed from connection.c.
Let's move the functions there and get rid of the painful include.
The build time is now down to 6.2s just due to this.
The hash type stored everywhere is XXH64_hash_t, which annoyingly forces
everyone to include the huge xxhash file. We know it's an uint64_t because
that's its purpose and the type is only made to abstract it on machines
where uint64_t is not availble. Let's switch the type to uint64_t
everywhere and avoid including xxhash from the type file.
Plenty of includes were present there only for struct pointers resulting
in them being used from many other places. The LoC reduced again by more
than 1% by cleaning this.
This one is expensive in code size because it comes with xxhash.h at a
low level of dependency that's inherited at plenty of places, and for
a function does doesn't benefit from inlining and could possibly even
benefit from not being inline given that it's large and called from the
scheduler.
Moving it to activity.c reduces the LoC by 1.2% and the binary size by
~1kB.
This function has no reason for being inlined, it's called from non
critical places (once in pollers), is quite large and comes with
dependencies (time and freq_ctr). Let's move it to acitvity.c. That's
another 0.4% less LoC to build.
These are ticks, not timeval, and they're a cause for plenty of files
including time.h just to access now_ms that's only used with ticks
functions. Let's move them over there.
The idle time calculation stuff was moved to task.h by commit 6dfab112e
("REORG: sched: move idle time calculation from time.h to task.h") but
these two variables that are only maintained by task.{c,h} were still
left in time.{c,h}. They have to move as well.
These two counters were the only ones not in the global struct, while
the SSL freq counters or the req counts are already in it, this forces
stats.c to include ssl_sock just to know about them. Let's move them
over there with their friends. This reduces from 408 to 384 the number
of includes of opensslconf.h.
This one has nothing to do with ssl_sock as it manipulates the struct
server only. Let's move it to server.c and remove unneeded dependencies
on ssl_sock.h. This further reduces by 10% the number of includes of
opensslconf.h and by 0.5% the number of compiled lines.
This one doesn't use anything from an SSL context, it only checks the
type of the transport layer of a connection, thus it belongs to
connection.h. This is particularly visible due to all the ifdefs
around it in various call places.
This is exactly the same as for listeners, servers only include
openssl-compat to provide the SSL_CTX type to use as two pointers to
contexts, and to detect if NPN, ALPN, and cipher suites are supported,
and save up to 5 pointers in the ssl_ctx struct if not supported. This
is pointless, as these ones have all been supported for about a decade,
and including this file comes with a long dependency chain that impacts
lots of other files. The ctx was made a void*.
Now the build time was significantly reduced, from 9.2 to 8.1 seconds,
thanks to opensslconf.h being included "only" 456 times instead of 2424
previously!
The total number of lines of code compiled was reduced by 15%.
Listeners only include openssl-compat to provide the SSL_CTX type to
use as two pointers to contexts, and to detect if NPN, ALPN, and cipher
suites are supported, and save up to 5 pointers in the ssl_bind_conf
struct if not supported. This is pointless, as these ones have all been
supported for about a decade, and including this file comes with a long
dependency chain that impacts lots of other files. The initial_ctx and
default_ctx can perfectly remain void* instead of SSL_CTX*.
These functions have no reason for being inlined, and they require some
includes with long dependencies. Let's move them to listener.c and trim
unused includes in listener.h.
This file includes streams, proxies, Lua just for some definitions of
structures for which we only have a pointer. Let's drop this. That's
responsible for 0.2% of all the lines of code.
The lock-debugging code in thread.h has no reason to be inlined. the
functions are quite fat and perform a lot of operations so there's no
saving keeping them inlined. Worse, most of them are in fact not
inlined, resulting in a significantly bigger executable.
This patch moves all this part from thread.h to thread.c. The functions
are still exported in thread.h of course. This results in ~166kB less
code:
text data bss dec hex filename
3165938 99424 897376 4162738 3f84b2 haproxy-before
2991987 99424 897376 3988787 3cdd33 haproxy-after
In addition the build time with thread debugging enabled has shrunk
from 19.2 to 17.7s thanks to much less code to be parsed in thread.h
that is included virtually everywhere.
pool-os.h relies on a number of includes solely because the
pool_alloc_area() function was inlined, and this only because we want
the normal version to be inlined so that we can track the calling
places for the memory profiler. It's worth noting that it already
does not work at -O0, and that when UAF is enabled we don't care a
dime about profiling.
This patch does two things at once:
- force-inline the functions so that pool_alloc_area() is still
inlined at -O0 to help track malloc() users ;
- uninline the UAF version of these (that rely on mmap/munmap)
and move them to pools.c so that we can remove all unneeded
includes.
Doing so reduces by ~270kB or 0.15% the total build size.
These ones are called from a few places in the code and are only provided
by ebtree.h, which is not normal given that some callers do not even use
ebtree.
channel, stream_interface, appctx, buffer, proxy and htx ones are used
in function arguments and most of them are not defined but were inherited
from intermediary inclues. Let's define them here and drop the unneeded
includes.
Some structures are inherited via intermediary includes (e.g. dns_counters
comes from a long path). Let's define the missing ones and includes vars-t
that is needed in the structure.
We're using variable-to-sample conversion at least 4 times in the code,
two of which are bogus. Let's introduce a generic conversion function
that performs the required checks.
This one only handles integers, contrary to its sibling with the suffix
_str that only handles strings. Let's rename it and uninline it since it
may well be used from outside.
The SSL stuff in struct server takes less than 3% of it and requires
lots of annoying ifdefs in the code just to take care of the cases
where the field is absent. Let's get rid of this and stop including
openssl-compat from server.c to detect NPN and ALPN capabilities.
This reduces the total LoC by another 0.4%.
During httpclient_destroy, add a condition in the BUG_ON which checks
that the client was started before it has ended. A httpclient structure
could have been created without being started.
httpclient_stop_and_destroy() tries to destroy the httpclient structure
if the client was stopped.
In the case the client wasn't stopped, it ask the client to stop itself
and to destroy the httpclient structure itself during the release of the
applet.
That's where that code initially was but it had been moved to
activity_count_runtime() for pure reasons of dependency loops. These
ones are no longer true so we can move that code back to the scheduler
and keep it where the information are updated and checked.
time.h is a horrible place to put activity calculation, it's a
historical mistake because the functions were there. We already have
most of the parts in sched.{c,h} and these ones make an exception in
the middle, forcing time.h to include some thread stuff and to access
the before/after_poll and idle_pct values.
Let's move these 3 functions to task.h with the other ones. They were
prefixed with "sched_" instead of the historical "tv_" which already
made no sense anymore.
I don't know why I inlined this one, this makes no sense given that it's
only used for stats, and it starts a circular dependency on tinfo.h which
can be problematic in the future. In addition, all the stuff related to
idle time calculation should be with the rest of the scheduler, which
currently is in task.{c,h}, so let's move it there.
We'll need to improve the API to pass other arguments in the future, so
let's start to adapt better to the current use cases. task_new() is used:
- 18 times as task_new(tid_bit)
- 18 times as task_new(MAX_THREADS_MASK)
- 2 times with a single bit (in a loop)
- 1 in the debug code that uses a mask
This patch provides 3 new functions to achieve this:
- task_new_here() to create a task on the calling thread
- task_new_anywhere() to create a task to be run anywhere
- task_new_on() to create a task to run on a specific thread
The change is trivial and will allow us to later concentrate the
required adaptations to these 3 functions only. It's still possible
to call task_new() if needed but a comment was added to encourage the
use of the new ones instead. The debug code was not changed and still
uses it.
Work lists were a mechanism introduced in 1.8 to asynchronously delegate
some work to be performed on another thread via a dedicated task.
The only user was the listeners, to deal with the queue. Nowadays
the tasklets have made this much more convenient, and have replaced
work_lists in the listeners. It seems there will be no valid use case
of work lists anymore, so better get rid of them entirely and keep the
scheduler code cleaner.
__task_queue() must absolutely not be called with TICK_ETERNITY or it
will place a never-expiring node upfront in the timers queue, preventing
any timer from expiring until the process is restarted. Code was found
to cause this using "task_schedule(task, now_ms)" which does this one
millisecond every 49.7 days, so let's add a condition against this. It
must never trigger since any process susceptible to trigger it would
already accumulate tasks until it dies.
An extra test was added in wake_expired_tasks() to detect tasks whose
timeout would have been changed after being queued.
An improvement over this could be in the future to use a non-scalar
type (union/struct) for expiration dates so as to avoid the risk of
using them directly like this. But now_ms is already such a valid
time and this specific construct would still not be caught.
This could even be backported to stable versions to help detect other
occurrences if any.
The ssl_bc_hsk_err sample fetch will need to raise more errors than only
handshake related ones hence its renaming to a more generic ssl_bc_err.
This patch is required because some handshake failures that should have
been caught by this fetch (verify error on the server side for instance)
were missed. This is caused by a change in TLS1.3 in which the
'Finished' state on the client is reached before its certificate is sent
(and verified) on the server side (see the "Protocol Overview" part of
RFC 8446).
This means that the SSL_do_handshake call is finished long before the
server can verify and potentially reject the client certificate.
The ssl_bc_hsk_err will then need to be expanded to catch other types of
errors.
This change is also applied to the frontend fetches (ssl_fc_hsk_err
becomes ssl_fc_err) and to their string counterparts.
In case of a connection error happening after the SSL handshake is
completed, the error code stored in the connection structure would not
always be set, hence having some connection failures being described as
successful in the fc_conn_err or bc_conn_err sample fetches.
The most common case in which it could happen is when the SSL server
rejects the client's certificate. The SSL_do_handshake call on the
client side would be sucessful because the client effectively sent its
client hello and certificate information to the server, but the next
call to SSL_read on the client side would raise an SSL_ERROR_SSL code
(through the SSL_get_error function) which is decribed in OpenSSL
documentation as a non-recoverable and fatal SSL error.
This patch ensures that in such a case, the connection's error code is
set to a special CO_ERR_SSL_FATAL value.
There's no reason CONFIG_HAP_POOLS and its opposite are located into
pools-t.h, it forces those that depend on them to inlcude the file.
Other similar options are normally dealt with in defaults.h, which is
part of the default API, so let's do that.
According to the RFC7230, "chunked" encoding must not be applied more than
once to a message body. To handle this case, h1_parse_xfer_enc_header() is
now responsible to fail when a parsing error is found. It also fails if the
"chunked" encoding is not the last one for a request.
To help the parsing, two H1 parser flags have been added: H1_MF_TE_CHUNKED
and H1_MF_TE_OTHER. These flags are set, respectively, when "chunked"
encoding and any other encoding are found. H1_MF_CHNK flag is used when
"chunked" encoding is the last one.
This commit provides an hlua_httpclient object which is a bridge between
the httpclient and the lua API.
The HTTPClient is callable in lua this way:
local httpclient = core.httpclient()
local response = httpclient:get("http://127.0.0.1:9000/?s=9999")
core.Debug("Status: ".. res.status .. ", Reason : " .. res.reason .. ", Len:" .. string.len(res.body) .. "\n")
The resulting response object will provide a "status" field which
contains the status code, a "reason" string which contains the reason
string, and a "body" field which contains the response body.
The implementation uses the httpclient callback to wake up the lua task
which yield each time it pushes some data. The httpclient works in the
same thread as the lua task.
The transient flag CO_RFL_BUF_NOT_STUCK should now be set when the mux's
rcv_buf() function is called, in si_cs_recv(), to be sure the mux is able to
perform some optimisation during data copy. This flag is set when we are
sure the channel buffer is not stuck. Concretely, it happens when there are
data scheduled to be sent.
It is not a fix and this flag is not used for now. But it makes sense to have
this info to be sure to be able to do some optimisations if necessary.
This patch is related to the issue #1362. It may be backported to 2.4 to
ease future backports.
HTX_FL_FRAGMENTED flag is now set on an HTX message when it is
fragmented. It happens when an HTX block is removed in the middle of the
message and flagged as unused. HTX_FL_FRAGMENTED flag is removed when all
data are removed from the message or when the message is defragmented.
Note that some optimisations are still possible because the flag can be
avoided in other situations. For instance when the last header of a bodyless
message is removed.
If the stream-interface is waiting for more buffer room to store incoming
data, it is important at the stream level to stop to wait for more data to
continue. Thanks to the previous patch ("BUG/MEDIUM: stream-int: Notify
stream that the mux wants more room to xfer data"), the stream is woken up
when this happens. In this patch, we take care to interrupt the
corresponding tcp-content ruleset or to stop waiting for the HTTP message
payload.
To ease detection of the state, si_rx_blocked_room() helper function has
been added. It returns non-zero if the stream interface's Rx path is blocked
because of lack of room in the input buffer.
This patch is part of a series related to the issue #1362. It should be
backported as ar as 2.0, probably with some adaptations. So be careful
during backports.
When the mux failed to transfer data to the upper layer because of a lack of
room, it is important to wake the stream up to let it handle this
event. Otherwise, if the stream is waiting for more data, both the stream
and the mux reamin blocked waiting for each other.
When this happens, the mux set the CS_FL_WANT_ROOM flag on the
conn-stream. Thus, in si_cs_recv() we are able to detect this event. Today,
the stream-interface is blocked. But, it is not enough to wake the stream
up. To fix the bug, CF_READ_PARTIAL flag is extended to also handle cases
where a read exception occurred. This flag should idealy be renamed. But for
now, it is good enough. By setting this flag, we are sure the stream will be
woken up.
This patch is part of a series related to the issue #1362. It should be
backported as far as 2.0, probably with some adaptations. So be careful
during backports.
When a message is parsed and copied into the channel buffer, in
h1_process_demux(), more space is requested if some pending data remain
after the parsing while the channel buffer is not empty. To do so,
CS_FL_WANT_ROOM flag is set. It means the H1 parser needs more space in the
channel buffer to continue. In the stream-interface, when this flag is set,
the SI is considered as blocked on the RX path. It is only unblocked when
some data are sent.
However, it is not accurrate because the parsing may be stopped because
there is not enough data to continue. For instance in the middle of a chunk
size. In this case, some data may have been already copied but the parser is
blocked because it must receive more data to continue. If the calling SI is
blocked on RX at this stage when the stream is waiting for the payload
(because http-buffer-request is set for instance), the stream remains stuck
infinitely.
To fix the bug, we must request more space to the app layer only when it is
not possible to copied more data. Actually, this happens when data remain in
the input buffer while the H1 parser is in states MSG_DATA or MSG_TUNNEL, or
when we are unable to copy headers or trailers into a non-empty buffer.
The first condition is quite easy to handle. The second one requires an API
refactoring. h1_parse_msg_hdrs() and h1_parse_msg_tlrs() fnuctions have been
updated. Now it is possible to know when we need more space in the buffer to
copy headers or trailers (-2 is returned). In the H1 mux, a new H1S flag
(H1S_F_RX_CONGESTED) is used to track this state inside h1_process_demux().
This patch is part of a series related to the issue #1362. It should be
backported as far as 2.0, probably with some adaptations. So be careful
during backports.
During the packet loss detection we must treat the paquet number
in this order Initial -> Handshake -> O1RTT. This was not the case
due to the chosen order to implement the array of packet number space
which was there before the packet loss detection implementation.
The STREAM data to send coming from the upper layer must be stored until
having being acked by the peer. To do so, we store them in buffer structs,
one by stream (see qcs.tx.buf). Each time a STREAM is built by quic_push_frame(),
its offset must match the offset of the first byte added to the buffer (modulo
the size of the buffer) by the frame. As they are not always acknowledged in
order, they may be stored in eb_trees ordered by their offset to be sure
to sequentially delete the STREAM data from their buffer, in the order they
have been added to it.
This function does exactly the same thing as b_xfer() which transfers
data from a struct buffer to another one but without zero copy when
the destination buffer is empty. This is at least useful to transfer
h3 data to the QUIC mux from buffer with garbage medata which have
been used to build h3 frames without too much memcopy()/memmove().
The peer transport parameter values were not initialized with
the default ones (when absent), especially the
"active_connection_id_limit" parameter with 2 as default value
when absent from received remote transport parameters. This
had as side effect to send too much NEW_CONNECTION_ID frames.
This was the case for curl which does not announce any
"active_connection_id_limit" parameter.
Also rename ->idle_timeout to ->max_idle_timeout to reflect the RFC9000.
These salts are used to derive initial secrets to decrypt the first Initial packet.
We support draft-29 and v1 QUIC version initial salts.
Add parameters to our QUIC-TLS API functions used to derive these secret for
these salts.
Make our xprt_quic use the correct initial salt upon QUIC version field found in
the first paquet. Useful to support connections with curl which use draft-29
QUIC version.
Move the "ACK required" bit from the packet number space to the connection level.
Force the "ACK required" option when acknowlegding Handshake or Initial packet.
A client may send three packets with a different encryption level for each. So,
this patch modifies qc_treat_rx_pkts() to consider two encryption level passed
as parameters, in place of only one.
Make qc_conn_io_cb() restart its process after the handshake has succeeded
so that to process any Application level packets which have already been received
in the same datagram as the last CRYPTO frames in Handshake packets.
Make qc_prep_hdshk_pkts() and qui_conn_io_cb() handle the case
where we enter them with QUIC_HS_ST_COMPLETE or QUIC_HS_ST_CONFIRMED
as connection state with QUIC_TLS_ENC_LEVEL_APP and QUIC_TLS_ENC_LEVEL_NONE
to consider to prepare packets.
quic_get_tls_enc_levels() is modified to return QUIC_TLS_ENC_LEVEL_APP
and QUIC_TLS_ENC_LEVEL_NONE as levels to consider when coalescing
packets in the same datagram.
qc_build_pkt() has recently been modified to support any type of
supported frame at any encryption level (assuming that an encryption level does
not support any type of frame) but quic_tls_level_pkt_type()
prevented it from building application level packet type because it was written
only for the handshake.
This patch simply adds the remaining encryption level QUIC_TLS_ENC_LEVEL_APP
which must be supported by quic_tls_level_pkt_type().
This should be used by the function which build packets to prevent
it from failing. This is important when the packet numbers are consumed
by several threads. The packet number is used to build and encrypt packets
and must be incremented only and only if the packet it refers to has been
successfully built.
These structures are similar. quic_tx_frm was there to try to reduce the
size of such objects which embed a union for all the QUIC frames.
Furtheremore this patch fixes the issue where quic_tx_frm objects were freed
from the pool for quic_frame.
Make quic_rx_packet_ref(inc|dec)() functions be thread safe.
Make use of ->rx.crypto.frms_rwlock RW lock when manipulating RX frames
from qc_treat_rx_crypto_frms().
Modify atomically several variables attached to RX part of quic_enc_level struct.
->rx.crypto member of quic_enc_level struct was not initialized as
this was done for all other members of this structure. This patch
fixes this.
Also adds a RW lock for the frame of this member.
Add two functions to encrement or decrement a referenc counter
attached to TX packet structure (struct quic_tx_packet). The packet are freed
when their counters reach the null value.
We use only ring buffers (struct qring) to prepare and send QUIC datagrams.
We can safely remove the old buffering implementation which was not thread safe.
We modify the functions responsible of building packets to put these latters
in ring buffers (qc_build_hdshk_pkt() during the handshake step, and
qc_build_phdshk_apkt() during the post-handshake step). These functions
remove a ring buffer from its list to build as much as possible datagrams.
Eache datagram is prepended of two field: the datagram length and the
first packet in the datagram. We chain the packets belonging to the same datagram
in a singly linked list to reach them from the first one: indeed we must
modify some members of each packet when we really send them from send_ppkts().
This function is also modified to retrieved the datagram from ring buffers.
We initialize the pointer to the listener TX ring buffer list.
Note that this is not done for QUIC clients as we do not fully support them:
we only have to allocate the list and attach it to server struct I guess.
We allocate an array of QUIC ring buffer, one by thread, and arranges them in a
MT_LIST. Everything is allocated or nothing: we do not want to usse an incomplete
array of ring buffers to ensure that each thread may safely acquire one of these
buffers.
This implementation is inspired from Linux kernel circular buffer implementation
(see include/linux/circ-buf.h). Such buffers may be used at the same time both
by writer and reader (lock-free).
Modify the I/O dgram handler principal function used to parse QUIC packets
be thread safe. Its role is at least to create new incoming connections
add to two trees protected by the same RW lock. The packets are for now on
fully parsed before possibly creating new connections.
Move the connection state from quic_conn_ctx struct to quic_conn struct which
is the structure which is used to store the QUIC connection part information.
This structure is initialized by the I/O dgram handler for each new connection
to QUIC listeners. This is needed for the multithread support so that to not
to have to depend on the connection context potentially initialized by another
thread.
We must protect from concurrent the tree which stores the QUIC packets received
by the dgram I/O handler, these packets being also parsed by the xprt task.
Make depends qc_new_isecs() only on quic_conn struct initialization only (no more
dependency on connection struct initialization) to be able to run it as soon as
the quic_conn struct is initialized (from the I/O handler) before running ->accept()
quic proto callback.
Move the QUIC conn (struct quic_conn) initialization from quic_sock_accept_conn()
to qc_lstnr_pkt_rcv() as this is done for the server part.
Move the timer initialization to ->start xprt callback to ensure the connection
context is done : it is initialized by the ->accept callback which may be run
by another thread than the one for the I/O handler which also run ->start.
The name the maximum packet size transport parameter was ambiguous and replaced
by maximum UDP payload size. Our code would be also ambiguous if it does not
reflect this change.
This function calls quic_mux_transport_params_update() to update the related
streams transport parameter of the mux. It is there only so that not to have
to include mux_quic.h to update these parameters.
Add a new structure to store enough information about STREAM frames which
must be stored before being delivered to the application layer, for any
reason.
The flow control at stream level is organized by types (client bidi, server bidi,
client uni, server uni). Adds at least callback to retrieve the number
of available streams by direction.
This file has been derived from mux_h2.c removing all h2 parts. At
QUIC mux layer, there must not be any reference to http. This will be the
responsability of the application layer (h3) to open streams handled by the mux.
We move ->params transport parameters to ->rx.params. They are the
transport parameters which will be sent to the peer, and used for
the endpoint flow control. So, they will be used to received packets
from the peer (RX part).
Also move ->rx_tps transport parameters to ->tx.params. They are the
transport parameter which are sent by the peer, and used to respect
its flow control limits. So, they will be used when sending packets
to the peer (TX part).
appctx_new() is exclusively called with tid_bit and it only uses the
mask to pass it to the accompanying task. There is no point requiring
the caller to know about a mask there, nor is there any point in
creating an applet outside of the context of its own thread anyway.
Let's drop this and pass tid_bit to task_new() directly.
A new warning is reported by gcc11 when using a pointer to uninitialized
memory block for a function with a const pointer argument. The warning
is triggered for istalloc, used by http_client.c / proxy.c / tcpcheck.c.
This warning is reported because the uninitialized memory block
allocated by malloc should not be passed to a const argument as in ist2.
See https://gcc.gnu.org/onlinedocs/gcc-11.1.0/gcc/Warning-Options.html#index-Wmaybe-uninitialized
This should be backported up to 2.2.
Ilya reported in issue #1391 a build warning on Fedora about mallinfo()
being deprecated in favor of mallinfo2() since glibc-2.33. Let's add
support for it. This should be backported where the following commit is
also backported: 157e39303 ("MINOR: pools: automatically disable
malloc_trim() with external allocators").
"f(void)" is the correct and preferred form for a function taking no
argument, while some places use the older "f()". These were reported
by clang's -Wmissing-prototypes, for example:
src/cpuset.c:111:5: warning: no previous prototype for function 'ha_cpuset_size' [-Wmissing-prototypes]
int ha_cpuset_size()
include/haproxy/cpuset.h:42:5: note: this declaration is not a prototype; add 'void' to make it a prototype for a zero-parameter function
int ha_cpuset_size();
^
void
This aggregate patch fixes this for the following functions:
ha_backtrace_to_stderr(), ha_cpuset_size(), ha_panic(), ha_random64(),
ha_thread_dump_all_to_trash(), get_exec_path(), check_config_validity(),
mworker_child_nb(), mworker_cli_proxy_(create|stop)(),
mworker_cleantasks(), mworker_cleanlisteners(), mworker_ext_launch_all(),
mworker_reload(), mworker_(env|proc_list)_to_(proc_list|env)(),
mworker_(un|)block_signals(), proxy_adjust_all_maxconn(),
proxy_destroy_all_defaults(), get_tainted(),
pool_total_(allocated|used)(), thread_isolate(_full|)(),
thread(_sync|)_release(), thread_harmless_till_end(),
thread_cpu_mask_forced(), dequeue_all_listeners(), next_timer_expiry(),
wake_expired_tasks(), process_runnable_tasks(), init_acl(),
init_buffer(), (de|)init_log_buffers(), (de|)init_pollers(),
fork_poller(), pool_destroy_all(), pool_evict_from_local_caches(),
pool_total_failures(), dump_pools_to_trash(), cfg_run_diagnostics(),
tv_init_(process|thread)_date(), __signal_process_queue(),
deinit_signals(), haproxy_unblock_signals()
-Wundef triggered on a MIPS-based musl build on __WORDSIZE that's used
in ultoa_o() and some Lua initialization. The former will fail to convert
integers larger to 1 billion to proper string in this case. Let's make
sure this macro is defined and fall back to values determined from
__SIZEOF_LONG__ otherwise. A cleaner long-term approach would consist
in removing all remaining occurrences of this macro.
This can be backported to all versions.
Building with an old musl-based toolchain reported this warning:
include/haproxy/thread.h: In function 'ha_thread_relax':
include/haproxy/thread.h:256:5: warning: "_POSIX_PRIORITY_SCHEDULING" is not defined [-Wundef]
#if _POSIX_PRIORITY_SCHEDULING
^
There were indeed two "#if" insteadd of #ifdef" for this macro, let's
fix them.
When the header list is added, after the message parsing, headers with no
value are now ignored. It is not the same than headers with empty value
fields. Only headers with a NULL pointer as value are skipped. This only
happens if the header value is removed during the message
parsing. Concretly, such headers are now ignored when htx_add_all_headers()
is called. However, htx_add_header() is not affected by this change.
Symetrically, the same is true for trailers. It may be backported to 2.4
because of the previous fix ("BUG/MEDIUM: mux-h1: Remove "Upgrade:" header
for requests with payload").
There's no point taking the variables locks for sess/txn/req/res
contexts since these ones always run inside the same thread anyway.
This patch conditions the lock on the variable's scope to avoid
flushing cache lines when not needed.
This showed an improvement of ~5% on a 16-thread machine with 12
variables.
The global table of known variables names can only grow and was designed
for static names that are registered at boot. Nowadays it's possible to
set dynamic variable names from Lua or from the CLI, which causes a real
problem that was partially addressed in 2.2 with commit 4e172c93f
("MEDIUM: lua: Add `ifexist` parameter to `set_var`"). Please see github
issue #624 for more context.
This patch simplifies all this by removing the need for a central
registry of known names, and storing 64-bit hashes instead. This is
highly sufficient given the low number of variables in each context.
The hash is calculated using XXH64() which is bijective over the 64-bit
space thus is guaranteed collision-free for 1..8 chars. Above that the
risk remains around 1/2^64 per extra 8 chars so in practice this is
highly sufficient for our usage. A random seed is used at boot to seed
the hash so that it's not attackable from Lua for example.
There's one particular nit though. The "ifexist" hack mentioned above
is now limited to variables of scope "proc" only, and will only match
variables that were already created or declared, but will now verify
the scope as well. This may affect some bogus Lua scripts and SPOE
agents which used to accidentally work because a similarly named
variable used to exist in a different scope. These ones may need to be
fixed to comply with the doc.
Now we can sum up the situation as this one:
- ephemeral variables (scopes sess, txn, req, res) will always be
usable, regardless of any prior declaration. This effectively
addresses the most problematic change from the commit above that
in order to work well could have required some script auditing ;
- process-wide variables (scope proc) that are mentioned in the
configuration, referenced in a "register-var-names" SPOE directive,
or created via "set-var" in the global section or the CLI, are
permanent and will always accept to be set, with or without the
"ifexist" restriction (SPOE uses this internally as well).
- process-wide variables (scope proc) that are only created via a
set-var() tcp/http action, via Lua's set_var() calls, or via an
SPOE with the "force-set-var" directive), will not be permanent
but will always accept to be replaced once they are created, even
if "ifexist" is present
- process-wide variables (scope proc) that do not exist will only
support being created via the set-var() tcp/http action, Lua's
set_var() calls without "ifexist", or an SPOE declared with
"force-set-var".
This means that non-proc variables do not care about "ifexist" nor
prior declaration, and that using "ifexist" should most often be
reliable in Lua and that SPOE should most often work without any
prior declaration. It may be doable to turn "ifexist" to 1 by default
in Lua to further ease the transition. Note: regtests were adjusted.
Cc: Tim Düsterhus <tim@bastelstu.be>
We certainly do not want that a permanent variable (one that is listed
in the configuration) be erased by accident by an "unset-var" action.
Let's make sure these ones are only reset to an empty sample, like at
the moment of their initial registration. One trick is that the same
function is used to purge the memory at the end and to delete, so we
need to add an extra "force" argument to make the choice.
In order to continue to honor the ifexist Lua option and prevent rogue
SPOA agents from creating too many variables, we'll need to keep the
ability to mark certain proc.* variables as permanent when they're
known from the config file.
Let's add a flag there for this. It's added to the variable when the
variable is created with this flag set by the caller.
Another approach could have been to use a distinct list or distinct
scope but that sounds complicated and bug-prone.
Passing this flag to var_set() will result in the variable to only be
created if it did not exist, otherwise nothing is done (it's not even
updated). This will be used for pre-registering names.
When setting variables, there are currently two variants, one which will
always create the variable, and another one, "ifexist", which will only
create or update a variable if a similarly named variable in any scope
already existed before.
The goal was to limit the risk of injecting random names in the proc
scope, but it was achieved by making use of the somewhat limited name
indexing model, which explains the scope-agnostic restriction.
With this change, we're moving the check downwards in the chain, at the
variable level, and only variables under the scope "proc" will be subject
to the restriction. A new set of VF_* flags was added to adjust how
variables are set, and VF_UPDATEONLY is used to mention this restriction.
In this exact state of affairs, this is not completely exact, as if a
similar name was not known in any scope, the variable will continue to
be rejected like before, but this will change soon.
The vars_init() name is particularly confusing as it does not initialize
the variables code but the head of a list of variables passed in
arguments. And we'll soon need to have proper initialization code, so
let's rename it now.