haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-10 09:07:02 +02:00

Author	SHA1	Message	Date
Willy Tarreau	2f877304ef	OPTIM/MEDIUM: epoll: fuse active events into polled ones during polling changes When trying to speculatively send data to a server being connected to, we see the following pattern : connect() = EINPROGRESS send() = EAGAIN epoll_ctl(add, W) epoll_wait() = EPOLLOUT send() = success > epoll_ctl(del, W) > recv() = EAGAIN > epoll_ctl(add, R) recv() = success epoll_ctl(del, R) The reason for the failed recv() call is that the reading was marked as speculative while we already have a polled I/O there. So we already know when removing send write poll that the read is pending. Thus, let's improve this by merging speculative I/O into polled I/O when polled state changes. The result is now the following as expected : connect() = EINPROGRESS send() = EAGAIN epoll_ctl(add, W) epoll_wait() = EPOLLOUT send() = success epoll_ctl(mod, R) recv() = success epoll_ctl(del, R) This is specific to epoll(), it doesn't make much sense at the moment to do so for other pollers, because the cost of updating them is very small. The average performance gain on small requests is of 1.6% in TCP mode, which is easily explained with the syscall stats below for 10000 forwarded connections : Before : % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 91.02 0.024608 0 60000 1 epoll_wait 2.19 0.000593 0 20000 shutdown 1.52 0.000412 0 10000 10000 connect 1.36 0.000367 0 29998 9998 sendto 1.09 0.000294 0 49993 epoll_ctl 0.93 0.000252 0 50004 20002 recvfrom 0.79 0.000214 0 20005 close 0.62 0.000167 0 20001 10001 accept4 0.25 0.000067 0 20002 setsockopt 0.13 0.000035 0 10001 socket 0.10 0.000028 0 10001 fcntl After: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 87.59 0.024269 0 50012 1 epoll_wait 3.19 0.000884 0 20000 shutdown 2.33 0.000646 0 29996 9996 sendto 2.02 0.000560 0 10005 10003 connect 1.40 0.000387 0 40013 10013 recvfrom 1.35 0.000374 0 40000 epoll_ctl 0.64 0.000178 0 20001 10001 accept4 0.55 0.000152 0 20005 close 0.45 0.000124 0 20002 setsockopt 0.31 0.000086 0 10001 fcntl 0.17 0.000047 0 10001 socket Overall : -16.6% epoll_wait -20% recvfrom -20% epoll_ctl On HTTP, the gain is even better : % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 80.43 0.015386 0 60006 1 epoll_wait 4.61 0.000882 0 30000 10000 sendto 3.74 0.000715 0 20001 10001 accept4 3.35 0.000640 0 10000 10000 connect 2.66 0.000508 0 20005 close 1.34 0.000257 0 30002 10002 recvfrom 1.27 0.000242 0 30005 epoll_ctl 1.20 0.000230 0 10000 shutdown 0.62 0.000119 0 20003 setsockopt 0.40 0.000077 0 10001 socket 0.39 0.000074 0 10001 fcntl willy@wtap:haproxy$ head -15 apres.txt % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 83.47 0.020301 0 50008 1 epoll_wait 4.26 0.001036 0 20005 close 3.30 0.000803 0 30000 10000 sendto 2.55 0.000621 0 20001 10001 accept4 1.76 0.000428 0 10000 10000 connect 1.20 0.000292 0 10000 shutdown 1.14 0.000278 0 20001 1 recvfrom 0.86 0.000210 0 20003 epoll_ctl 0.71 0.000173 0 20003 setsockopt 0.49 0.000120 0 10001 socket 0.25 0.000060 0 10001 fcntl Overall : -16.6% epoll_wait -33% recvfrom -33% epoll_ctl	2013-11-15 23:15:10 +01:00
Willy Tarreau	cf181c9d40	BUG/MINOR: epoll: use a fix maxevents argument in epoll_wait() epoll_wait() takes a number of returned events, not the number of fds to consider. We must not pass it the number of the smallest fd, as it leads to value zero being used, which is invalid in epoll_wait(). The effect may sometimes be observed with peers sections trying to connect and causing 2-seconds CPU loops upon a soft reload because epoll_wait() immediately returns -1 EINVAL instead of waiting for the timeout to happen. This fix should be backported to 1.4 too (into ev_epoll and ev_sepoll).	2013-01-18 15:31:03 +01:00
Willy Tarreau	1c07b0755d	OPTIM: epoll: make use of EPOLLRDHUP epoll may report pending shutdowns using EPOLLRDHUP. Since this flag is missing from a number of libcs despite being available since kernel 2.6.17, let's define it ourselves. Doing so saves one syscall by allow us to avoid the read()==0 when the server closes with the respose.	2013-01-07 16:39:47 +01:00
Willy Tarreau	39ebef82aa	BUG/MINOR: poll: the I/O handler was called twice for polled I/Os When a polled I/O event is detected, the event is added to the updates list and the I/O handler is called. Upon return, if the event handler did not experience an EAGAIN, the event remains in the updates list so that it will be processed later. But if the event was already in the spec list, its state is updated and it will be called again immediately upon exit, by fd_process_spec_events(), so this creates unfairness between speculative events and polled events. So don't call the I/O handler upon I/O detection when the FD already is in the spec list. The fd events are still updated so that the spec list is up to date with the possible I/O change.	2012-12-14 00:17:03 +01:00
Willy Tarreau	fb5470d144	OPTIM: epoll: current fd does not count as a new one The epoll loop checks for newly appeared FDs in order to process them early if they're accepted sockets. Since the introduction of the fd_ev_set() calls before the iocb(), the current FD is always in the update list, and we don't want to check it again, so we must assign the old_updt index just before calling the I/O handler.	2012-12-14 00:13:23 +01:00
Willy Tarreau	6320c3cb46	OPTIM: epoll: use a temp variable for intermediary flag computations Playing with fdtab[fd].ev makes gcc constantly reload the pointers because it does not know they don't alias. Use a temporary variable instead. This saves a few operations in the fast path.	2012-12-13 23:52:58 +01:00
Willy Tarreau	db9cb0b9b7	CLEANUP: poll: remove a useless double-check on fdtab[fd].owner This check is already performed a few lines above in the same loop, remove it from the condition.	2012-12-13 23:41:12 +01:00
Willy Tarreau	462c7206bc	CLEANUP: polling: gcc doesn't always optimize constants away In ev_poll and ev_epoll, we have a bit-to-bit mapping between the POLL_ constants and the FD_POLL_ constants. A comment said that gcc was able to detect this and to automatically apply a mask. Things have possibly changed since the output assembly doesn't always reflect this. So let's perform an explicit assignment when bits are equal.	2012-12-13 22:30:17 +01:00
Willy Tarreau	26d7cfce32	BUG/MAJOR: polling: do not set speculative events on ERR nor HUP Errors and Hangups are sticky events, which means that once they're detected, we never clear them, allowing them to be handled later if needed. Till now when an error was reported, it used to register a speculative I/O event for both recv and send. Since the connection had not requested such events, it was not able to detect a change and did not clear them, so the events were called in loops until a timeout caused their owner task to die. So this patch does two things : - stop registering spec events when no I/O activity was requested, so that we don't end up with non-disablable polling state ; - keep the sticky polling flags (ERR and HUP) when leaving the connection handler so that an error notification doesn't magically become a normal recv() or send() report once the event is converted to a spec event. It is normally not needed to make the connection handler emit an error when it detects POLL_ERR because either a registered data handler will have done it, or the event will be disabled by the wake() callback.	2012-12-07 00:09:43 +01:00
Willy Tarreau	70c6fd82c3	MAJOR: polling: remove unused callbacks from the poller struct Since no poller uses poller->{set,clr,wai,is_set,rem} anymore, let's remove them and remove the associated pointer tests in proto/fd.h.	2012-11-11 21:02:34 +01:00
Willy Tarreau	e9f49e78fe	MAJOR: polling: replace epoll with sepoll and remove sepoll Now that all pollers make use of speculative I/O, there is no point having two epoll implementations, so replace epoll with the sepoll code and remove sepoll which has just become the standard epoll method.	2012-11-11 20:53:30 +01:00
Willy Tarreau	f8cfa447c6	BUG/MINOR: epoll: correctly disable FD polling in fd_rem() When calling fd_rem(), the polling was not correctly disabled because the ->prev state was set to zero instead of the previous value. fd_rem() is very rarely used, only just before closing a socket. The effect is that upon an error reported at the connection level, if the task assigned to the connection was too slow to be woken up because of too many other tasks in the run queue, the FD was still not disabled and caused the connection handler to be called again with the same event until the task was finally executed to close the fd. This issue only affects the epoll poller, not the sepoll variant nor any of the other ones. It was already present in 1.4 and even 1.3 with the same almost unnoticeable effects. The bug can in fact only be discovered during development where it emphasizes other bugs. It should be backported anyway.	2012-10-04 22:26:09 +02:00
Willy Tarreau	babd05a6c6	MEDIUM: fd: add fd_poll_{recv,send} for use when explicit polling is required The old EV_FD_SET() macro was confusing, as it would enable receipt but there was no way to indicate that EAGAIN was received, hence the recently added FD_WAIT_* flags. They're not enough as we're still facing a conflict between EV_FD_* and FD_WAIT_*. So let's offer I/O functions what they need to explicitly request polling.	2012-09-02 21:53:11 +02:00
Willy Tarreau	3788e4c874	MEDIUM: fd: remove the EV_FD_COND_* primitives These primitives were initially introduced so that callers were able to conditionally set/disable polling on a file descriptor and check in return what the state was. It's been long since we last had an "if" on this, and all pollers' functions were the same for cond_* and their systematic counter parts, except that this required a check and a specific return value that are not always necessary. So let's simplify the FD API by removing this now unused distinction and by making all specific functions return void.	2012-09-02 21:53:10 +02:00
Willy Tarreau	076be25ab8	CLEANUP: remove the now unused fdtab direct I/O callbacks They were all left to NULL since last commit so we can safely remove them all now and remove the temporary dual polling logic in pollers.	2012-09-02 21:51:29 +02:00
Willy Tarreau	9845e75d23	MEDIUM: polling: prepare to call the iocb() function when defined. We will need this to centralize I/O callbacks. Nobody sets it right now so the code should have no impact.	2012-09-02 21:51:27 +02:00
Willy Tarreau	db3b32610f	REORG/MEDIUM: fd: remove FD_STCLOSE from struct fdtab In an attempt to get rid of fdtab[].state, and to move the relevant parts to the connection struct, we remove the FD_STCLOSE state which can easily be deduced from the <owner> pointer as there is a 1:1 match.	2012-09-02 21:51:25 +02:00
Willy Tarreau	491c498d97	BUG/MINOR: polling: some events were not set in various pollers fdtab[].ev was only set in ev_sepoll. Unfortunately, some I/O handling functions now rely on this, so depending on the polling mechanism, some useless operations might have been performed, such as performing a useless recv() when a HUP was reported. This is a very old issue, the flags were only added to the fdtab and not propagated into any poller. Then they were used in ev_sepoll which needed them for the cache. It is unsure whether a backport to 1.4 is appropriate or not.	2012-07-31 07:55:31 +02:00
Willy Tarreau	45a1251515	[MEDIUM] poll: add a measurement of idle vs work time We now measure the work and idle times in order to report the idle time in the stats. It's expected that we'll be able to use it at other places later.	2011-09-10 18:01:41 +02:00
Willy Tarreau	43d8fb2d3a	[REORG] build: move syscall redefinition to specific places Some older libc don't define splice() and and don't define _syscall() either, which causes build errors if splicing is enabled. To solve this, we now split the syscall redefinition into two layers : - one file per syscall (epoll, splice) - one common file to declare the _syscall() macros The code is cleaner because files using the syscalls just have to include their respective file. It's not adviced to merge multiple syscall families into a same file if all are not intended to be used simultaneously, because defining unused static functions causes warnings to be emitted during build. As a result, the new USE_MY_SPLICE parameter was added in order to be able to define the splice() syscall separately.	2011-08-23 00:11:25 +02:00
Willy Tarreau	d79e79b436	[BUG] O(1) pollers should check their FD before closing it epoll, sepoll and kqueue pollers should check that their fd is not closed before attempting to close it, otherwise we can end up with multiple closes of fd #0 upon exit, which is harmless but dirty.	2009-05-10 10:18:54 +02:00
Willy Tarreau	332740dab2	[MEDIUM] pollers: don't wait if a signal is pending If an asynchronous signal is received outside of the poller, we don't want the poller to wait for a timeout to occur before processing it, so we set its timeout to zero, just like we do with pending tasks in the run queue.	2009-05-10 09:57:21 +02:00
Willy Tarreau	a534fea478	[CLEANUP] remove 65 useless NULL checks before free C specification clearly states that free(NULL) is a no-op. So remove useless checks before calling free.	2008-08-03 20:48:50 +02:00
Willy Tarreau	ec6c5df018	[CLEANUP] remove many #include <types/xxx> from C files It should be stated as a rule that a C file should never include types/xxx.h when proto/xxx.h exists, as it gives less exposure to declaration conflicts (one of which was caught and fixed here) and it complicates the file headers for nothing. Only types/global.h, types/capture.h and types/polling.h have been found to be valid includes from C files.	2008-07-16 10:30:42 +02:00
Willy Tarreau	0c303eec87	[MAJOR] convert all expiration timers from timeval to ticks This is the first attempt at moving all internal parts from using struct timeval to integer ticks. Those provides simpler and faster code due to simplified operations, and this change also saved about 64 bytes per session. A new header file has been added : include/common/ticks.h. It is possible that some functions should finally not be inlined because they're used quite a lot (eg: tick_first, tick_add_ifset and tick_is_expired). More measurements are required in order to decide whether this is interesting or not. Some function and variable names are still subject to change for a better overall logics.	2008-07-07 00:09:58 +02:00
Willy Tarreau	b0b37bcd65	[MEDIUM] further improve monotonic clock by check forward jumps The first implementation of the monotonic clock did not verify forward jumps. The consequence is that a fast changing time may expire a lot of tasks. While it does seem minor, in fact it is problematic because most machines which boot with a wrong date are in the past and suddenly see their time jump by several years in the future. The solution is to check if we spent more apparent time in a poller than allowed (with a margin applied). The margin is currently set to 1000 ms. It should be large enough for any poll() to complete. Tests with randomly jumping clock show that the result is quite accurate (error less than 1 second at every change of more than one second).	2008-06-23 14:00:57 +02:00
Willy Tarreau	b7f694f20e	[MEDIUM] implement a monotonic internal clock If the system date is set backwards while haproxy is running, some scheduled events are delayed by the amount of time the clock went backwards. This is particularly problematic on systems where the date is set at boot, because it seldom happens that health-checks do not get sent for a few hours. Before switching to use clock_gettime() on systems which provide it, we can at least ensure that the clock is not going backwards and maintain two clocks : the "date" which represents what the user wants to see (mostly for logs), and an internal date stored in "now", used for scheduled events.	2008-06-22 17:18:02 +02:00
Willy Tarreau	3a6281199a	[BUG] event pollers must not wait if a task exists in the run queue Under some circumstances, a task may already lie in the run queue (eg: inter-task wakeup). It is disastrous to wait for an event in this case because some processing gets delayed.	2008-06-20 15:05:56 +02:00
Willy Tarreau	70bcfb77a7	[OPTIM] GCC4's builtin_expect() is suboptimal GCC4 is stupid (unbelievable news!). When some code uses __builtin_expect(x != 0, 1), it really performs the check of x != 0 then tests that the result is not zero! This is a double check when only one was expected. Some performance drops of 10% in the HTTP parser code have been observed due to this bug. GCC 3.4 is fine though. A solution consists in expecting that the tested value is 1. In this case, it emits the correct code, but it's still not optimal it seems. Finally the best solution is to ignore likely() and to pray for the compiler to emit correct code. However, we still have to fix unlikely() to remove the test there too, and to fix all code which passed pointers overthere to pass integers instead.	2008-02-14 23:14:33 +01:00
Willy Tarreau	1db37710dc	[MEDIUM] limit the number of events returned by poll By default, epoll/kqueue used to return as many events as possible. This could sometimes cause huge latencies (latencies of up to 400 ms have been observed with many thousands of fds at once). Limiting the number of events returned also reduces the latency by avoiding too many blind processing. The value is set to 200 by default and can be changed in the global section using the tune.maxpollevents parameter.	2007-06-03 17:16:49 +02:00
Willy Tarreau	fb8983f21b	[BUG] the epoll FD must not be shared between processes Recreate the epoll file descriptor after a fork(). It will ensure that all processes will not share their epoll_fd. Some side effects were encountered because of this, such as epoll_wait() returning an FD which was previously deleted, in multi-process mode.	2007-06-03 16:40:44 +02:00
Willy Tarreau	bdefc513a0	[BUG] fix null timeouts in poll-based pollers Introduction of timeval timers broke poll-based pollers, because the call to tv_ms_remain may return 0 while the event is not elapsed yet. Now we carefully check for those cases and round the result up by 1 ms.	2007-05-14 02:02:04 +02:00
Willy Tarreau	d825eef9c5	[MAJOR] replaced all timeouts with struct timeval The timeout functions were difficult to manipulate because they were rounding results to the millisecond. Thus, it was difficult to compare and to check what expired and what did not. Also, the comparison functions were heavy with multiplies and divides by 1000. Now, all timeouts are stored in timevals, reducing the number of operations for updates and leading to cleaner and more efficient code.	2007-05-12 22:35:00 +02:00
Willy Tarreau	ef1d1f859b	[MAJOR] auto-registering of pollers at load time Gcc provides __attribute__((constructor)) which is very convenient to execute functions at startup right before main(). All the pollers have been converted to have their register() function declared like this, so that it is not necessary anymore to call them from a centralized file.	2007-04-16 00:25:25 +02:00
Willy Tarreau	b40d42006c	[BUILD] declare epoll_* as static when using our own functions We will have to share this code among several implementations.	2007-04-15 23:57:41 +02:00
Willy Tarreau	58094f2fd9	[MAJOR] ev_epoll: do not rely on fd_sets anymore The new epoll-based poller uses a list of changes in order to process only the fds which have changed.	2007-04-10 01:43:43 +02:00
Willy Tarreau	2ff7622c0c	[MAJOR] delay registering of listener sockets at startup Some pollers such as kqueue lose their FD across fork(), meaning that the registered file descriptors are lost too. Now when the proxies are started by start_proxies(), the file descriptors are not registered yet, leaving enough time for the fork() to take place and to get a new pollfd. It will be the first call to maintain_proxies that will register them.	2007-04-09 19:29:56 +02:00
Willy Tarreau	63455a9be5	[MINOR] use 'is_set' instead of 'isset' in struct poller 'isset' was defined as a macro in /usr/include/sys/param.h, and it breaks build on at least OpenBSD.	2007-04-09 15:34:49 +02:00
Willy Tarreau	69801b8e77	[MINOR] removed proto/polling.h which was not used anymore	2007-04-09 15:28:51 +02:00
Willy Tarreau	e54e9176a3	[MINOR] ev_* : moved the poll function closer to fd_*	2007-04-09 09:23:31 +02:00
Willy Tarreau	97129b5408	[MINOR] changed fd_set/fd_clr functions to return ints The fd_* functions now return ints so that they can be factored when appropriate.	2007-04-09 00:54:46 +02:00
Willy Tarreau	28d86862bc	[MEDIUM] pollers: store the events in arrays Instead of managing StaticReadEvent/StaticWriteEvent, use evts[dir]	2007-04-08 17:42:27 +02:00
Willy Tarreau	4f60f16dd3	[MAJOR] modularize the polling mechanisms select, poll and epoll now have their dedicated functions and have been split into distinct files. Several FD manipulation primitives have been provided with each poller. The rest of the code needs to be cleaned to remove traces of StaticReadEvent/StaticWriteEvent. A trick involving a macro has temporarily been used right now. Some work needs to be done to factorize tests and sets everywhere.	2007-04-08 16:39:58 +02:00

43 Commits