mirror of
https://git.haproxy.org/git/haproxy.git/
synced 2025-10-08 22:21:29 +02:00
A suboptimal behaviour was appearing quite often with sepoll. When a speculative write failed after a connect(), the socket was added to the poll list using epoll_ctl(ADD). Then when epoll_wait() returned a write event, the send() was performed and write event disabled, causing it to get back to the spec list in order to be disabled later. But if some new accept() did succeed in the same run, then fd_created was not null, causing a new run of the spec list to happen. This run would then detect the old event in STOP state and would remove it from the poll list using epoll_ctl(DEL). After this, process_session() enables reading on the FD, attempting an speculative recv() which fails then adds it again using epoll_ctl(ADD) to do it again. So the total sequence of syscalls looked like this : connect(fd) = EAGAIN send(fd) = EAGAIN epoll_ctl(ADD(fd:OUT)) epoll_wait() = fd:OUT send(fd) > 0 epoll_ctl(DEL(fd)) recv(fd) = EAGAIN epoll_ctl(ADD(fd:IN)) recv(fd) > 0 In order to fix this stupid situation, we must compute the epoll_ctl() parameters at the last moment, just before doing epoll_wait(). This is what was done except that the spec events were processed just before doing that without leaving time for the tasks to adjust the FDs if needed. This is also the reason we have the re_poll_once label to try to catch new events in case of a successful accept(). The new solution consists in doing the opposite : - compute epoll_ctl() - call epoll_wait() - call spec events This significantly reduces the number of iterations on the spec events and avoids a huge number of epoll_ctl() ping/pongs. The new sequence above simply becomes : connect(fd) = EAGAIN send(fd) = EAGAIN epoll_ctl(ADD(fd:OUT)) epoll_wait() = fd:OUT send(fd) > 0 epoll_ctl(MOD(fd:IN)) recv(fd) > 0 Also, there is no need to re-run the spec events after an accept() as it will automatically be detected in the spec list after a return from polled events. The gains are important, with up to 4.5% global performance increase in connection rate on HTTP with small objects. The code is less tricky and does not need anymore to skip epoll_wait() every other call, nor to track the number of FDs newly created.