When a server terminates a connection, the next session in its
own queue was immediately processed. Because of this, if all
server queues are always filled, then no new anonymous request
will be processed. Consider oldest request between global and
server queues to choose from which to pick the request.
An improvement over this will consist in adding a configurable
offset when comparing expiration dates, so that cookie-less
requests can get either less or more priority.
Under some circumstances, a task may already lie in the run queue
(eg: inter-task wakeup). It is disastrous to wait for an event in
this case because some processing gets delayed.
Reported by Cherife Li : just doing a "make install" fails because it
depends on "all" which is equivalent to "help" if no TARGET was specified.
Make it depend on "haproxy" instead.
Released version 1.3.14.5 with the following main changes :
- [BUILD] fix build with gcc 4.3
- [TESTS] add a debug patch to help trigger the stats bug
- [BUG] Flush buffers also where there are exactly 0 bytes left
- [DOC] fix unescaped space in httpchk example.
- [DOC] update the README file with new build options
- [MEDIUM] reduce risk of event starvation in ev_sepoll
If too many events are set for spec I/O, those ones can starve the
polled events. Experiments show that when polled events starve, they
quickly turn into spec I/O, making the situation even worse. While
we can reduce the number of polled events processed at once, we
cannot do this on speculative events because most of them are new
ones (avg 2/3 new - 1/3 old from experiments).
The solution against this problem relies on those two factors :
1) one FD registered as a spec event cannot be polled at the same time
2) even during very high loads, we will almost never be interested in
simultaneous read and write streaming on the same FD.
The first point implies that during starvation, we will not have more than
half of our FDs in the poll list, otherwise it means there is less than that
in the spec list, implying there is no starvation.
The second point implies that we're statically only interested in half of
the maximum number of file descriptors at once, because we will unlikely
have simultaneous read and writes for a same buffer during long periods.
So, if we make it possible to drain maxsock/2/2 during peak loads, then we
can ensure that there will be no starvation effect. This means that we must
always allocate maxsock/4 events for the poller.
Last, sepoll uses an optimization consisting in reducing the number of calls
to epoll_wait() to once every too polls. However, when dealing with many
spec events, we can wait very long and skipping epoll_wait() every second
time increases latency. For this reason, we try to detect if we are beyond
a reasonable limit and stop doing so at this stage.
For Fedora 9 gcc 4.3 will be shipping as a feature, and right now haproxy does
not compile with gcc 4.3.
It appears that there is a reordering of headers or something along those lines,
This is the patch that gets haproxy to compile with gcc 4.3. I'm not sure if
this is the correct approach you would want to use, so please correct me.
If this works for you, I'll go ahead and put this patch in the src rpm until a
release of haproxy which compiles with gcc 4.3 is released.
About: [BUG] Flush buffers also where there are exactly 0 bytes left
I'm also attaching a debug patch that helps to trigger this bug.
Without the fix:
# echo -ne "GET /haproxy?stats;csv;norefresh HTTP/1.0\r\n\r\n"|nc 127.0.0.1
801|wc -c
16384
With the fix:
# echo -ne "GET /haproxy?stats;csv;norefresh HTTP/1.0\r\n\r\n"|nc 127.0.0.1
801|wc -c
33089
Best regards,
Krzysztof Oledzki
I noticed it was possible to get truncated http/csv stats. Sometimes.
Usually the problem disappeared as fast as it appeared, but once it
happend that my http-stats page was truncated for about one hour.
It was quite weird as it happened independently for csv and http
output and it took me some time to track & fix this bug.
Both buffer_write & buffer_write_chunk used to return 0 in two
situations: is case of success or where there was exactly 0 bytes
left. The first one is intentional but I believe the second one
is not as it was not possible to distinguish between successful
write and unsuccessful one, which means that if the buffer was 100%
filled, it was never flushed and it was not possible to write
more data.
This patch fixes this problem.
Released version 1.3.14.4 with the following main changes :
- [BUILD] Replace hardcoded 'LD = gcc' with 'LD = $(CC)'
- [BUILD] Added support for 'make install'
- [BUILD] Added 'install-man' make target for installing the man page
- [BUILD] Added 'install-bin' make target
- [BUILD] Added 'install-doc' make target
- [BUILD] Removed "/" after '$(DESTDIR)' in install targets
- [BUILD] Changed 'install' target to install the binaries first
- [MEDIUM] fix stats socket limitation to 16 kB
To be flexible while installing haproxy following variables have been
added to the Makefile:
- DESTDIR useful i.e. while installing in a sandbox (not set by default)
- PREFIX defines the default install prefix (default: /usr/local)
- SBINDIR defines the dir the haproxy binary gets installed
(default: $PREFIX/sbin)
haproxy relies on linking the binary using gcc, so there is no real need to
hardcode both (CC and LD). Setting 'LD = $(CC)' will make the build system
a bit more cross-compile friendly because only the right cross-compiler has
to be passed via make.
Due to the way the stats socket work, it was not possible to
maintain the information related to the command entered, so
after filling a whole buffer, the request was lost and it was
considered that there was nothing to write anymore.
The major reason was that some flags were passed directly
during the first call to stats_dump_raw() instead of being
stored persistently in the session.
To definitely fix this problem, flags were added to the stats
member of the session structure.
A second problem appeared. When the stats were produced, a first
call to client_retnclose() was performed, then one or multiple
subsequent calls to buffer_write_chunks() were done. But once the
stats buffer was full and a reschedule operated, the buffer was
flushed, the write flag cleared from the buffer and nothing was
done to re-arm it.
For this reason, a check was added in the proto_uxst_stats()
function in order to re-call the client FSM when data were added
by stats_dump_raw(). Finally, the whole unix stats dump FSM was
rewritten to avoid all the magics it depended on. It is now
simpler and looks more like the HTTP one.
Released version 1.3.14.3 with the following main changes :
- [BUG]: Restore clearing t->logs.bytes
- [DOC] Update a "contrib" file with a hint about a scheme used for formathing subjects
- [BUG] Don't increment server connections too much + fix retries
- [BUG] appsession lookup in URL does not work
- [MINOR] report correct section type for unknown keywords.
- [BUILD] update MacOS Makefile to build on newer versions
- [DOC] fix erroneous "useallbackups" option in the doc
- [DOC] applied small fixes from early readers
- [BUG] failed conns were sometimes incremented in the frontend!
- [TESTS] add test-pollers.cfg to easily report pollers in use
- [BUILD] ensure that makefile understands USE_DLMALLOC=1
- [CLEANUP] update .gitignore to ignore more temporary files
- [CLEANUP] report dlmalloc's source path only if explictly specified
- [BUG] str2sun could leak a small buffer in case of error during parsing
- [BUG] option allbackups was not working anymore in roundrobin mode
Commit 3168223a7b33a1d5aad1e11b8f2ad917645d7f27 broke option
"allbackups" in roundrobin mode due to an erroneous structure
member replacement in backend.c. The PR_O_USE_ALL_BK flag was
not tested in the right member anymore.
This bug uncoverred another one, by which all backup servers would
be used whatever the option's value, if all of them had been seen
as simultaneously failed at one moment.
This patch fixes the two stupid errors. Correctness has been tested
using the test-fwrr.cfg config example.
(cherry picked from commit f4cca45b5e6c6ed88a0062cf92ae57e01405ab12)
Matt Farnsworth reported a memory leak in str2sun() in case a too large
socket path is passed. The bug is very minor because it only happens
once during config parsing, but has to be fixed nevertheless. The patch
Matt provided could even be improved by completely removing the useless
strdup() in this function.
(cherry picked from commit caf720d3ff7758273278aecab26bb7624ec2f555)
Commit 98937b875798e10fac671d109355cde29d2a411a while fixing
one bug introduced another one. With "retries 4" and
"option redispatch" haproxy tries to connect 4 times to
one server server and 1 time to a second one. However
logs showed 5 connections to the first server (the
last one was counted twice) and 2 to the second.
This patch also fixes srv->retries and be->retries increments.
Now I get: 3 retries and 1 error in a first server (4 cum_sess)
and 1 error in a second server (1 cum_sess) with:
retries 4
option redispatch
and: 4 retries and 1 error (5 cum_sess) with:
retries 4
So, the number of connections effectively served by a server is:
srv->cum_sess - srv->failed_conns - srv->retries
(cherry picked from commit 626a19b66f769a87e7c995267ccedf14149e03b3)
USE_DLMALLOC=1 was ignored since last makefile update. It's better
to keep it running for existing setups.
(cherry picked from commit f14358bd1ad4f7c9fd32c3900ac3a2848bed1b9a)
We've been trying to use the latest release (1.3.14.2) of haproxy to do
sticky sessions. Cookie insertion is not an option for us, although we
would much rather use it, as we are trying to work around a problem where
cookies are unreliable. The appsession functionality only partially worked
(it wouldn't read the session id out of a query string) until we made the
following code change to the get_srv_from_appsession function in
proto_http.c.
(cherry picked from commit 6d0b1fac23517f16b3972b529ea41718b3643c9f)
With each new patch I had to search for the e-mail from Willy
describing the schem used for formathing subjects. No more. ;)
(cherry picked from commit 4ad3b40a2d7c78bcdbf16a853647833cad78b050)
An unknown keyword was always reported in section "listen" for any
section type (defaults, listen, frontend, backend, ...).
(cherry picked from commit 6daf34352f325699efa8f731e5525275523786b9)
Commit 8b3977ffe36190e45fb974bf813bfbd935ac99a1 removed "t->logs.bytes_in = 0;"
but instead it should change it into "t->logs.bytes_out = 0;" as since
583bc966064e248771e75c253dddec7351afd8a2 counters are incremented not set.
It should be incremented in session_process_counters while sending data to a
client:
bytes = s->rep->total - s->logs.bytes_out;
s->logs.bytes_out = s->rep->total;
However, if we increment (set) s->logs.bytes_out while processing
"logasap", statistics get wrong values added for headers: 0 or even
negative if haproxy adds some headers itself.
To test it, please enable logasap and download one empty file and look at
stats. Without my fix information available on that page are invalid, for
example:
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,
www,b,0,0,0,1,,1,24,-92,,0,,0,0,0,,UP,1,1,0,0,0,3121,0,,1,2,1,,1,
www,BACKEND,0,0,0,1,0,1,24,-92,0,0,,0,0,0,0,UP,1,1,0,,0,3121,0,,1,2,0,,1,
Released version 1.3.14.2 with the following main changes :
- bug: increment server connections for each connect()
- bug: fix typo in redispatched connection
- bug: connect_server: server might not exist when sending error report
- bug: use backend's source and not server's source with tproxy
- bug: fix overlapping server flags
- bug: log response byte count, not request
- bug: fix truncated responses with sepoll
- large update to the configuration manual
- major rework of the GNU Makefile
- provide inversion for some options
- add support for "show info" on the unix socket
- add support for the "backlog" parameter
- introduce global parameter "tune.maxaccept"
- introduce "timeout http-request" in frontends
- tarpit timeout is also allowed in backends
- code did not build in full debug mode
- fix configuration hint about timeouts
Due to the way Linux delivers EPOLLIN and EPOLLHUP, a closed connection
received after some server data sometimes results in truncated responses
if the client disconnects before server starts to respond. The reason
is that the EPOLLHUP flag is processed as an indication of end of
transfer while some data may remain in the system's socket buffers.
This problem could only be triggered with sepoll, although nothing should
prevent it from happening with normal epoll. In fact, the work factoring
performed by sepoll increases the risk that this bug appears.
The fix consists in making FD_POLL_HUP and FD_POLL_ERR sticky and that
they are only checked if FD_POLL_IN is not set, meaning that we have
read all pending data.
That way, the problem is definitely fixed and sepoll still remains about
17% faster than epoll since it can take into account all information
returned by the kernel.
The documentation now lists all keywords except the req* and rsp*. The
"server" keyword has been documented for mandatory parameters. Specific
settings are still waiting to be written in a dedicated section.
- options tcplog, tcpsplice and transparent have been documented.
- keywords "srvtimeout", "timeout queue", "timeout server" and
"timeout tarpit" have been documented
- keywords "transparent" and "use_backend" have been documented
Only "server", "source" and "stats *" remain undocumented
Options nolinger, persist, smtpchk and ssl-hello-chk have been
documented. All keywords and options up to and including option
tcpka are now documented.
a copy-paste typo was present in the reconnection code responsible
for respatching. The client's FSM would not be re-evaluated if an
error occurred. It looks harmless but better fix it.