This patch adds new statement "server" into ring section, and the
related "timeout connect" and "timeout server".
server <name> <address> [param*]
Used to configure a syslog tcp server to forward messages from ring buffer.
This supports for all "server" parameters found in 5.2 paragraph.
Some of these parameters are irrelevant for "ring" sections.
timeout connect <timeout>
Set the maximum time to wait for a connection attempt to a server to succeed.
Arguments :
<timeout> is the timeout value specified in milliseconds by default, but
can be in any other unit if the number is suffixed by the unit,
as explained at the top of this document.
timeout server <timeout>
Set the maximum time for pending data staying into output buffer.
Arguments :
<timeout> is the timeout value specified in milliseconds by default, but
can be in any other unit if the number is suffixed by the unit,
as explained at the top of this document.
Example:
global
log ring@myring local7
ring myring
description "My local buffer"
format rfc3164
maxlen 1200
size 32764
timeout connect 5s
timeout server 10s
server mysyslogsrv 127.0.0.1:6514
Commit 04f5fe87d3 introduced an rwlock in the pools to deal with the risk
that pool_flush() dereferences an area being freed, and commit 899fb8abdc
turned it into a spinlock. The pools already contain a spinlock in case of
locked pools, so let's use the same and simplify the code by removing ifdefs.
At this point I'm really suspecting that if pool_flush() would instead
rely on __pool_get_first() to pick entries from the pool, the concurrency
problem could never happen since only one user would get a given entry at
once, thus it could not be freed by another user. It's not certain this
would be faster however because of the number of atomic ops to retrieve
one entry compared to a locked batch.
HTTP_1XX, HTTP_3XX and HTTP_4XX message templates are no longer used. Only
HTTP_302 and HTTP_303 are used during configuration parsing by "errorloc" family
directives. So these templates are removed from the generic http code. And
HTTP_302 and HTTP_303 templates are moved as static strings in the function
parsing "errorloc" directives.
There is no reason to not use proxy's error replies to emit 401/407
responses. The function http_reply_40x_unauthorized(), responsible to emit those
responses, is not really complex. It only adds a
WWW-Authenticate/Proxy-Authenticate header to a generic message.
So now, error replies can be defined for 401 and 407 status codes, using
errorfile or http-error directives. When an http-request auth rule is evaluated,
the corresponding error reply is used. For 401 responses, all occurrences of the
WWW-Authenticate header are removed and replaced by a new one with a basic
authentication challenge for the configured realm. For 407 responses, the same
is done on the Proxy-Authenticate header. If the error reply must not be
altered, "http-request return" rule must be used instead.
During pool_free(), when the ->allocated value is 125% of needed_avg or
more, instead of putting the object back into the pool, it's immediately
freed using free(). By doing this we manage to significantly reduce the
amount of memory pinned in pools after transient traffic spikes.
During a test involving a constant load of 100 concurrent connections
each delivering 100 requests per second, the memory usage was a steady
21 MB RSS. Adding a 1 minute parallel load of 40k connections all looping
on 100kB objects made the memory usage climb to 938 MB before this patch.
With the patch it was only 660 MB. But when this parasit load stopped,
before the patch the RSS would remain at 938 MB while with the patch,
it went down to 480 then 180 MB after a few seconds, to stabilize around
69 MB after about 20 seconds.
This can be particularly important to improve reloads where the memory
has to be shared between the old and new process.
Another improvement would be welcome, we ought to have a periodic task
to check pools usage and continue to free up unused objects regardless
of any call to pool_free(), because the needed_avg value depends on the
past and will not cover recently refilled objects.
This adds a sliding estimate of the pools' usage. The goal is to be able
to use this to start to more aggressively free memory instead of keeping
lots of unused objects in pools. The average is calculated as a sliding
average over the last 1024 consecutive measures of ->used during calls to
pool_free(), and is bumped up for 1/4 of its history from ->allocated when
allocation from the pool fails and results in a call to malloc().
The result is a floating value between ->used and ->allocated, that tries
to react fast to under-estimates that result in expensive malloc() but
still maintains itself well in case of stable usage, and progressively
goes down if usage shrinks over time.
This new metric is reported as "needed_avg" in "show pools".
Sadly due to yet another include dependency hell, we couldn't reuse the
functions from freq_ctr.h so they were temporarily duplicated into memory.h.
Recent commit 2bdcc70fa7 ("MEDIUM: hpack: use a pool for the hpack table")
made the hpack code finally use a pool with very unintrusive code that was
assumed to be trivial enough to adjust if the code needed to be reused
outside of haproxy. Unfortunately the code in contrib/hpack already uses
it and broke the oss-fuzz tests as it doesn't build anymore.
This patch adds an HPACK_STANDALONE macro to decide if we should use the
pools or malloc+free. The resulting macros are called hpack_alloc() and
hpack_free() respectively, and the size must be passed into the pool
itself.
The htx_copy_msg() function can now be used to copy the HTX message stored in a
buffer in an existing HTX message. It takes care to not overwrite existing
data. If the destination message is empty, a raw copy is performed. All the
message is copied or nothing.
This function is used instead of channel_htx_copy_msg().
This reverts commit 957ec59571.
As discussed with Emeric, the current syntax is not extensible enough,
this will be turned to a section instead in a forthcoming patch.
Instead of using malloc/free to allocate an HPACK table, let's declare
a pool. However the HPACK size is configured by the H2 mux, so it's
also this one which allocates it after post_check.
This patch adds the new global statement:
ring <name> [desc <desc>] [format <format>] [size <size>] [maxlen <length>]
Creates a named ring buffer which could be used on log line for instance.
<desc> is an optionnal description string of the ring. It will appear on
CLI. By default, <name> is reused to fill this field.
<format> is the log format used when generating syslog messages. It may be
one of the following :
iso A message containing only the ISO date, followed by the text.
The PID, process name and system name are omitted. This is
designed to be used with a local log server.
raw A message containing only the text. The level, PID, date, time,
process name and system name are omitted. This is designed to be
used in containers or during development, where the severity only
depends on the file descriptor used (stdout/stderr). This is
the default.
rfc3164 The RFC3164 syslog message format. This is the default.
(https://tools.ietf.org/html/rfc3164)
rfc5424 The RFC5424 syslog message format.
(https://tools.ietf.org/html/rfc5424)
short A message containing only a level between angle brackets such as
'<3>', followed by the text. The PID, date, time, process name
and system name are omitted. This is designed to be used with a
local log server. This format is compatible with what the systemd
logger consumes.
timed A message containing only a level between angle brackets such as
'<3>', followed by ISO date and by the text. The PID, process
name and system name are omitted. This is designed to be
used with a local log server.
<length> is the maximum length of event message stored into the ring,
including formatted header. If the event message is longer
than <length>, it would be truncated to this length.
<name> is the ring identifier, which follows the same naming convention as
proxies and servers.
<size> is the optionnal size in bytes. Default value is set to BUFSIZE.
Note: Historically sink's name and desc were refs on const strings. But with new
configurable rings a dynamic allocation is needed.
The EVP_MD_CTX_create() and EVP_MD_CTX_destroy() functions were renamed to
EVP_MD_CTX_new() and EVP_MD_CTX_free() in OpenSSL 1.1.0, respectively. These
functions are used by the digest converter, introduced by the commit 8e36651ed
("MINOR: sample: Add digest and hmac converters"). So for prior versions of
openssl, macros are used to fallback on old functions.
This patch must only be backported if the commit 8e36651ed is backported too.
This one really ought to be defined in hathreads.h like all other thread
definitions, which is what this patch does. As expected, all files but
one (regex.h) were already including hathreads.h when using THREAD_LOCAL;
regex.h was fixed for this.
This was the last entry in config.h which is now useless.
The setting of CONFIG_HAP_LOCKLESS_POOLS depending on threads and
compat was done in config.h for use only in memory.h and memory.c
where other settings are dealt with. Further, the default pool cache
size was set there from a fixed value instead of being set from
defaults.h
Let's move the decision to enable lockless pools via
CONFIG_HAP_LOCKLESS_POOLS to memory.h, and set the default pool
cache size in defaults.h like other default settings.
This was the next-to-last setting in config.h.
CONFIG_HAP_MEM_OPTIM was introduced with memory pools in 1.3 and dropped
in 1.6 when pools became the only way to allocate memory. Still the
option remained present in config.h. Let's kill it.
Just like in previous patch, it happens that HA_ATOMIC_UPDATE_MIN() and
HA_ATOMIC_UPDATE_MAX() would evaluate the (val) argument up to 3 times.
However this time it affects both thread and non-thread versions. It's
strange because the copy was properly performed for the (new) argument
in order to avoid this. Anyway it was done for the "val" one as well.
A quick code inspection showed that this currently has no effect as
these macros are fairly limited in usage.
It would be best to backport this for long-term stability (till 1.8)
but it will not fix an existing bug.
When threads are disabled, HA_ATOMIC_CAS() becomes a simple compound
expression. However this expression presents a problem, which is that
its arguments are evaluated multiple times, once for the comparison
and once again for the assignement. This presents a risk of performing
some side-effect operations twice in the non-threaded case (e.g. in
case of auto-increment or function return).
The macro was rewritten using local copies for arguments like the
other macros do.
Fortunately a complete inspection of the code indicates that this case
currently never happens. It was however responsible for the strict-aliasing
warning emitted when building fd.c without threads but with 64-bit CAS.
This may be backported as far as 1.8 though it will not fix any existing
bug and is more of a long-term safety measure in case a future fix would
depend on this behavior.
I changed my mind twice on this one and pushed after the last test with
threads disabled, without re-enabling long long, causing this rightful
build warning.
This needs to be backported if the previous commit ff64d3b027 ("MINOR:
threads: export the POSIX thread ID in panic dumps") is backported as
well.
It is very difficult to map a panic dump against a gdb thread dump
because the thread numbers do not match. However gdb provides the
pthread ID but this one is supposed to be opaque and not to be cast
to a scalar.
This patch provides a fnuction, ha_get_pthread_id() which retrieves
the pthread ID of the indicated thread and casts it to an unsigned
long long so as to lose the least possible amount of information from
it. This is done cleanly using a union to maintain alignment so as
long as these IDs are stored on 1..8 bytes they will be properly
reported. This ID is now presented in the panic dumps so it now
becomes possible to map these threads. When threads are disabled,
zero is returned. For example, this is a panic dump:
Thread 1 is about to kill the process.
*>Thread 1 : id=0x7fe92b825180 act=0 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
stuck=1 prof=0 harmless=0 wantrdv=0
cpu_ns: poll=5119122 now=2009446995 diff=2004327873
curr_task=0xc99bf0 (task) calls=4 last=0
fct=0x592440(task_run_applet) ctx=0xca9c50(<CLI>)
strm=0xc996a0 src=unix fe=GLOBAL be=GLOBAL dst=<CLI>
rqf=848202 rqa=0 rpf=80048202 rpa=0 sif=EST,200008 sib=EST,204018
af=(nil),0 csf=0xc9ba40,8200
ab=0xca9c50,4 csb=(nil),0
cof=0xbf0e50,1300:PASS(0xc9cee0)/RAW((nil))/unix_stream(20)
cob=(nil),0:NONE((nil))/NONE((nil))/NONE(0)
call trace(20):
| 0x59e4cf [48 83 c4 10 5b 5d 41 5c]: wdt_handler+0xff/0x10c
| 0x7fe92c170690 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x13690
| 0x7ffce29519d9 [48 c1 e2 20 48 09 d0 48]: linux-vdso:+0x9d9
| 0x7ffce2951d54 [eb d9 f3 90 e9 1c ff ff]: linux-vdso:__vdso_gettimeofday+0x104/0x133
| 0x57b484 [48 89 e6 48 8d 7c 24 10]: main+0x157114
| 0x50ee6a [85 c0 75 76 48 8b 55 38]: main+0xeaafa
| 0x50f69c [48 63 54 24 20 85 c0 0f]: main+0xeb32c
| 0x59252c [48 c7 c6 d8 ff ff ff 44]: task_run_applet+0xec/0x88c
Thread 2 : id=0x7fe92b6e6700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
stuck=0 prof=0 harmless=1 wantrdv=0
cpu_ns: poll=786738 now=1086955 diff=300217
curr_task=0
Thread 3 : id=0x7fe92aee5700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
stuck=0 prof=0 harmless=1 wantrdv=0
cpu_ns: poll=828056 now=1129738 diff=301682
curr_task=0
Thread 4 : id=0x7fe92a6e4700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
stuck=0 prof=0 harmless=1 wantrdv=0
cpu_ns: poll=818900 now=1153551 diff=334651
curr_task=0
And this is the gdb output:
(gdb) info thr
Id Target Id Frame
* 1 Thread 0x7fe92b825180 (LWP 15234) 0x00007fe92ba81d6b in raise () from /lib64/libc.so.6
2 Thread 0x7fe92b6e6700 (LWP 15235) 0x00007fe92bb56a56 in epoll_wait () from /lib64/libc.so.6
3 Thread 0x7fe92a6e4700 (LWP 15237) 0x00007fe92bb56a56 in epoll_wait () from /lib64/libc.so.6
4 Thread 0x7fe92aee5700 (LWP 15236) 0x00007fe92bb56a56 in epoll_wait () from /lib64/libc.so.6
We can clearly see that while threads 1 and 2 are the same, gdb's
threads 3 and 4 respectively are haproxy's threads 4 and 3.
This may be backported to 2.0 as it removes some confusion in github issues.
A shared tcp-check ruleset is now created to support LDAP check. This way no
extra memory is used if several backends use a LDAP check.
The following sequance is used :
tcp-check send-binary "300C020101600702010304008000"
tcp-check expect rbinary "^30" min-recv 14 \
on-error "Not LDAPv3 protocol"
tcp-check expect custom
The last expect rule relies on a custom function to check the LDAP server reply.
A share tcp-check ruleset is now created to support smtp checks. This way no
extra memory is used if several backends use a smtp check.
The following sequence is used :
tcp-check connect default linger
tcp-check expect rstring "^[0-9]{3}[ \r]" min-recv 4 \
error-status "L7RSP" on-error "%[check.payload(),cut_crlf]"
tcp-check expect rstring "^2[0-9]{2}[ \r]" min-recv 4 \
error-status "L7STS" \
on-error %[check.payload(4,0),ltrim(' '),cut_crlf] \
status-code "check.payload(0,3)"
tcp-echeck send "%[var(check.smtp_cmd)]\r\n" log-format
tcp-check expect rstring "^2[0-9]{2}[- \r]" min-recv 4 \
error-status "L7STS" \
on-error %[check.payload(4,0),ltrim(' '),cut_crlf] \
on-success "%[check.payload(4,0),ltrim(' '),cut_crlf]" \
status-code "check.payload(0,3)"
The variable check.smtp_cmd is by default the string "HELO localhost" by may be
customized setting <helo> and <domain> parameters on the option smtpchk
line. Note there is a difference with the old smtp check. The server gretting
message is checked before send the HELO/EHLO comand.
A share tcp-check ruleset is now created to support redis checks. This way no
extra memory is used if several backends use a redis check.
The following sequence is used :
tcp-check send "*1\r\n$4\r\nPING\r\n"
tcp-check expect string "+PONG\r\n" error-status "L7STS" \
on-error "%[check.payload(),cut_crlf]" on-success "Redis server is ok"
list_for_each_entry_rev() and list_for_each_entry_from_rev() and corresponding
safe versions have been added to iterate on a list in the reverse order. All
these functions work the same way than the forward versions, except they use the
.p field to move for an element to another.
The url_decode() function used by the url_dec converter and a few other
call points is ambiguous on its processing of the '+' character which
itself isn't stable in the spec. This one belongs to the reserved
characters for the query string but not for the path nor the scheme,
in which it must be left as-is. It's only in argument strings that
follow the application/x-www-form-urlencoded encoding that it must be
turned into a space, that is, in query strings and POST arguments.
The problem is that the function is used to process full URLs and
paths in various configs, and to process query strings from the stats
page for example.
This patch updates the function to differentiate the situation where
it's parsing a path and a query string. A new argument indicates if a
query string should be assumed, otherwise it's only assumed after seeing
a question mark.
The various locations in the code making use of this function were
updated to take care of this (most call places were using it to decode
POST arguments).
The url_dec converter is usually called on path or url samples, so it
needs to remain compatible with this and will default to parsing a path
and turning the '+' to a space only after a question mark. However in
situations where it would explicitly be extracted from a POST or a
query string, it now becomes possible to enforce the decoding by passing
a non-null value in argument.
It seems to be what was reported in issue #585. This fix may be
backported to older stable releases.
As reported in issue #596, the edx register isn't marked as clobbered
in div64_32(), which could technically allow gcc to try to reuse it
if it needed a copy of the 32 highest bits of the o1 register after
the operation.
Two attempts were tried, one using a dummy 32-bit local variable to
store the intermediary edx and another one switching to "=A" and making
result a long long. It turns out the former makes the resulting object
code significantly dirtier while the latter makes it better and was
kept. This is due to gcc's difficulties at working with register pairs
mixing 32- and 64- bit values on i386. It was verified that no code
change happened at all on x86_64, armv7, aarch64 nor mips32.
In practice it's only used by the frequency counters so this bug
cannot even be triggered but better fix it.
This may be backported to stable branches though it will not fix any
issue.
If haproxy fails to start and emits an alert, then it can be useful
to have it also emit the version and the path used to load it. Some
users may be mistakenly launching the wrong binary due to a misconfigured
PATH variable and this will save them some troubleshooting time when it
reports that some keywords are not understood.
What we do here is that we *try* to extract the binary name from the
AUX vector on glibc, and we report this as a NOTICE tag before the
very first alert is emitted.
Use the refcount of the SSL_CTX' to free them instead of freeing them on
certains conditions. That way we can free the SSL_CTX everywhere its
pointer is used.
The flush_lock was introduced, mostly to be sure that pool_gc() will never
dereference a pointer that has been free'd. __pool_get_first() was acquiring
the lock to, the fear was that otherwise that pointer could get free'd later,
and then pool_gc() would attempt to dereference it. However, that can not
happen, because the only functions that can free a pointer, when using
lockless pools, are pool_gc() and pool_flush(), and as long as those two
are mutually exclusive, nobody will be able to free the pointer while
pool_gc() attempts to access it.
So change the flush_lock to a spinlock, and don't bother acquire/release
it in __pool_get_first(), that way callers of __pool_get_first() won't have
to wait while the pool is flushed. The worst that can happen is we call
__pool_refill_alloc() while the pool is getting flushed, and memory can
get allocated just to be free'd.
This may help with github issue #552
This may be backported to 2.1, 2.0 and 1.9.
In the struct fdtab, introduce a new mask, running_mask. Each thread should
add its bit before using the fd.
Use the running_mask instead of a lock, in fd_insert/fd_delete, we'll just
spin as long as the mask is non-zero, to be sure we access the data
exclusively.
fd_set_running_excl() spins until the mask is 0, fd_set_running() just
adds the thread bit, and fd_clr_running() removes it.
With these debug options we still get these warnings:
include/common/memory.h:501:23: warning: null pointer dereference [-Wnull-dereference]
*(volatile int *)0 = 0;
~~~~~~~~~~~~~~~~~~~^~~
include/common/memory.h:460:22: warning: null pointer dereference [-Wnull-dereference]
*(volatile int *)0 = 0;
~~~~~~~~~~~~~~~~~~~^~~
These are purposely there to crash the process at specific locations.
But the annoying warnings do not help with debugging and they are not
even reliable as the compiler may decide to optimize them away. Let's
pass the pointer through DISGUISE() to avoid this.
It's more generic and versatile than the previous shut_your_big_mouth_gcc()
that was used to silence annoying warnings as it's not limited to ignoring
syscalls returns only. This allows us to get rid of the aforementioned
function and the shut_your_big_mouth_gcc_int variable, that started to
look ugly in multi-threaded environments.
Tim reported that BUG_ON() issues warnings on his distro, as the libc marks
some syscalls with __attribute__((warn_unused_result)). Let's pass the
write() result through DISGUISE() to hide it.
This does exactly the same as ALREADY_CHECKED() but does it inline,
returning an identical copy of the scalar variable without letting
the compiler know how it might have been transformed. This can
forcefully disable certain null-pointer checks or result checks when
known undesirable. Typically forcing a crash with *(DISGUISE(NULL))=0
will not cause a null-deref warning.
These are mostly comments in the code. A few error messages were fixed
and are of low enough importance not to deserve a backport. Some regtests
were also fixed.
Implement mt_list_to_list() and list_to_mt_list(), to be able to convert
from a struct list to a struct mt_list, and vice versa.
This is normally of no use, except for struct connection's list field, that
can go in either a struct list or a struct mt_list.
Ryan O'Hara reported that haproxy breaks on fedora-32 using gcc-10
(pre-release). It turns out that constructs such as:
while (item != head) {
item = LIST_ELEM(item.n);
}
loop forever, never matching <item> to <head> despite a printf there
showing them equal. In practice the problem is that the LIST_ELEM()
macro is wrong, it assigns the subtract of two pointers (an integer)
to another pointer through a cast to its pointer type. And GCC 10 now
considers that this cannot match a pointer and silently optimizes the
comparison away. A tested workaround for this is to build with
-fno-tree-pta. Note that older gcc versions even with -ftree-pta do
not exhibit this rather surprizing behavior.
This patch changes the test to instead cast the null-based address to
an int to get the offset and subtract it from the pointer, and this
time it works. There were just a few places to adjust. Ideally
offsetof() should be used but the LIST_ELEM() API doesn't make this
trivial as it's commonly called with a typeof(ptr) and not typeof(ptr*)
thus it would require to completely change the whole API, which is not
something workable in the short term, especially for a backport.
With this change, the emitted code is subtly different even on older
versions. A code size reduction of ~600 bytes and a total executable
size reduction of ~1kB are expected to be observed and should not be
taken as an anomaly. Typically this loop in dequeue_proxy_listeners() :
while ((listener = MT_LIST_POP(...)))
used to produce this code where the comparison is performed on RAX
while the new offset is assigned to RDI even though both are always
identical:
53ded8: 48 8d 78 c0 lea -0x40(%rax),%rdi
53dedc: 48 83 f8 40 cmp $0x40,%rax
53dee0: 74 39 je 53df1b <dequeue_proxy_listeners+0xab>
and now produces this one which is slightly more efficient as the
same register is used for both purposes:
53dd08: 48 83 ef 40 sub $0x40,%rdi
53dd0c: 74 2d je 53dd3b <dequeue_proxy_listeners+0x9b>
Similarly, retrieving the channel from a stream_interface using si_ic()
and si_oc() used to cause this (stream-int in rdi):
1cb7: c7 47 1c 00 02 00 00 movl $0x200,0x1c(%rdi)
1cbe: f6 47 04 10 testb $0x10,0x4(%rdi)
1cc2: 74 1c je 1ce0 <si_report_error+0x30>
1cc4: 48 81 ef 00 03 00 00 sub $0x300,%rdi
1ccb: 81 4f 10 00 08 00 00 orl $0x800,0x10(%rdi)
and now causes this:
1cb7: c7 47 1c 00 02 00 00 movl $0x200,0x1c(%rdi)
1cbe: f6 47 04 10 testb $0x10,0x4(%rdi)
1cc2: 74 1c je 1ce0 <si_report_error+0x30>
1cc4: 81 8f 10 fd ff ff 00 orl $0x800,-0x2f0(%rdi)
There is extremely little chance that this fix wakes up a dormant bug as
the emitted code effectively does what the source code intends.
This must be backported to all supported branches (dropping MT_LIST_ELEM
and the spoa_example parts as needed), since the bug is subtle and may
not always be visible even when compiling with gcc-10.
In MT_LIST_DEL_SAFE(), when the code was changed to use a temporary variable
instead of using the provided pointer directly, we shouldn't have changed
the code that set the pointer to NULL, as we really want the pointer
provided to be nullified, otherwise other parts of the code won't know
we just deleted an element, and bad things will happen.
This should be backported to 2.1.
The splice() syscall has been supported in glibc since version 2.5 issued
in 2006 and is present on supported systems so there's no need for having
our own arch-specific syscall definitions anymore.
This was made to support epoll on patched 2.4 kernels, and on early 2.6
using alternative libcs thanks to the arch-specific syscall definitions.
All the features we support have been around since 2.6.2 and present in
glibc since 2.3.2, neither of which are found in field anymore. Let's
simply drop this and use epoll normally.
The accept4() syscall has been present for a while now, there is no more
reason for maintaining our own arch-specific syscall implementation for
systems lacking it in libc but having it in the kernel.