haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-11-12 22:41:03 +01:00

Author	SHA1	Message	Date
Christopher Faulet	13b007d583	BUG/MINOR: kqueue/threads: Don't forget to close kqueue_fd[tid] on each thread in deinit_kqueue_per_thread, kqueue_fd[tid] must be closed, except for the main thread (the first one, tid==0). This patch must be backported in 1.8 with commit 7a2364d4.	2018-01-25 17:52:57 +01:00
Christopher Faulet	23d86d157e	BUG/MEDIUM: checks: Don't try to release undefined conn_stream when a check is freed When a healt-check is released, the attached conn_stream may be undefined. For instance, this happens when 'no-check' option is used on a server line. So we must check it is defined before trying to release it. This patch must be backported in 1.8.	2018-01-25 13:51:23 +01:00
Christopher Faulet	8d01fd6b3c	BUG/MEDIUM: threads/server: Fix deadlock in srv_set_stopping/srv_set_admin_flag Because of a typo (HA_SPIN_LOCK instead of HA_SPIN_UNLOCK), there is a deadlock in srv_set_stopping and srv_set_admin_flag when there is at least one trackers. This patch must be backported in 1.8.	2018-01-25 13:51:23 +01:00
Willy Tarreau	c20d737338	BUG/MINOR: threads: always set an owner to the thread_sync pipe The owner of the fd used by the synchronization pipe was set to NULL, making it ignored by maxfd computation. The risk would be that some synchronization events get delayed between threads when using poll() or select(). However this is only theorical since the pipe is created before listeners are bound so normally its FD should be lower and this should normally not happen. The only possible situation would be if all listeners are bound to inherited FDs which are lower than the pipe's. This patch must be backported to 1.8.	2018-01-25 07:31:08 +01:00
Olivier Houchard	0048dd04c9	MINOR: threads: Fix build when we're not compiling with threads. Only declare the start_lock if threads are compiled in, otherwise HA_SPINLOCK_T won't be defined. This should be backported to 1.8 when/if 1605c7ae6154d8c2cfcf3b325872b1a7266c5bc2 is backported.	2018-01-24 21:41:29 +01:00
Willy Tarreau	46ec48bc1a	BUG/MINOR: mworker: only write to pidfile if it exists A missing test causes a write(-1, $PID) to appear in strace output when in master-worker mode. This is totally harmless though. This fix must be backported to 1.8.	2018-01-23 19:20:19 +01:00
Willy Tarreau	1605c7ae61	BUG/MEDIUM: threads/mworker: fix a race on startup Marc Fournier reported an interesting case when using threads with the master-worker mode : sometimes, a listener would have its FD closed during startup. Sometimes it could even be health checks seeing this. What happens is that after the threads are created, and the pollers enabled on each threads, the master-worker pipe is registered, and at the same time a close() is performed on the write side of this pipe since the children must not use it. But since this is replicated in every thread, what happens is that the first thread closes the pipe, thus releases the FD, and the next thread starting a listener in parallel gets this FD reassigned. Then another thread closes the FD again, which this time corresponds to the listener. It can also happen with the health check sockets if they're started early enough. This patch splits the mworker_pipe_register() function in two, so that the close() of the write side of the FD is performed very early after the fork() and long before threads are created (we don't need to delay it anyway). Only the pipe registration is done in the threaded code since it is important that the pollers are properly allocated for this. The mworker_pipe_register() function now takes care of registering the pipe only once, and this is guaranteed by a new surrounding lock. The call to protocol_enable_all() looks fragile in theory since it scans the list of proxies and their listeners, though in practice all threads scan the same list and take the same locks for each listener so it's not possible that any of them escapes the process and finishes before all listeners are started. And the operation is idempotent. This fix must be backported to 1.8. Thanks to Marc for providing very detailed traces clearly showing the problem.	2018-01-23 19:18:57 +01:00
Willy Tarreau	7a2364d474	BUG/MEDIUM: kqueue/threads: use one kqueue_fd per thread This is the same principle as the previous patch (BUG/MEDIUM: epoll/threads: use one epoll_fd per thread) except that this time it's for kqueue. We don't want all threads to wake up because of activity on a single other thread that the other ones are not interested in. Just like with previous patch, this one shows that the polling state doesn't need to be changed here and that some simplifications are now possible. This patch only implements the minimum required for a stable backport. This should be backported to 1.8.	2018-01-23 15:50:03 +01:00
Willy Tarreau	d9e7e36c6e	BUG/MEDIUM: epoll/threads: use one epoll_fd per thread There currently is a problem regarding epoll(). While select() and poll() compute their polling state on the fly upon each call, epoll() keeps a shared state between all threads via the epoll_fd. The problem is that once an fd is registered on any thread, all other threads receive events for that FD as well. It is clearly visible when binding a listener to a single thread like in the configuration below where all 4 threads will work, 3 of them simply spinning to skip the event : global nbthread 4 frontend foo bind :1234 process 1/1 The worst case happens when some slow operations are in progress on a busy thread, preventing it from processing its task and causing the other ones to wake up not being able to do anything with this event. Typically computing a large TLS key will delay processing of next events on the same thread while others will still wake up. All this simply shows that the poller must remain thread-specific, with its own events and its own ability to sleep when it doesn't have anyhing to do. This patch does exactly this. For this, it proceeds like this : - have one epoll_fd per thread instead of one per process - initialize these epoll_fd when threads are created. - mark all known FDs as updated so that the next invocation of _do_poll() recomputes their polling status (including a possible removal of undesired polling from the original FD) ; - use each fd's polled_mask to maintain an accurate status of the current polling activity for this FD. - when scanning updates, only focus on events whose new polling status differs from the existing one - during updates, always verify the thread_mask to resist migration - on __fd_clo(), for cloned FDs (typically listeners inherited from the parent during a graceful shutdown), run epoll_ctl(DEL) on all epoll_fd. This is the reason why epoll_fd is stored in a shared array and not in a thread_local storage. Note: maybe this can be moved to an update instead. Interestingly, this shows that we don't need the FD's old state anymore and that we only use it to convert it to the new state based on stable information. It appears clearly that the FD code can be further improved by computing the final state directly when manipulating it. With this change, the config above goes from 22000 cps at 380% CPU to 43000 cps at 100% CPU : not only the 3 unused threads are not activated, but they do not disturb the activity anymore. The output of "show activity" before and after the patch on a 4-thread config where a first listener on thread 2 forwards over SSL to threads 3 & 4 shows this a much smaller amount of undesired events (thread 1 doesn't wake up anymore, poll_skip remains zero, fd_skip stays low) : // before: 400% CPU, 7700 cps, 13 seconds loops: 11380717 65879 5733468 5728129 wake_cache: 0 63986 317547 314174 wake_tasks: 0 0 0 0 wake_applets: 0 0 0 0 wake_signal: 0 0 0 0 poll_exp: 0 63986 317547 314174 poll_drop: 1 0 49981 48893 poll_dead: 65514 0 31334 31934 poll_skip: 46293690 34071 22867786 22858208 fd_skip: 66068135 174157 33732685 33825727 fd_lock: 0 2 2809 2905 fd_del: 0 494361 80890 79464 conn_dead: 0 0 0 0 stream: 0 407747 50526 49474 empty_rq: 11380718 1914 5683023 5678715 long_rq: 0 0 0 0 // after: 200% cpu, 9450 cps, 11 seconds loops: 17 66147 1001631 450968 wake_cache: 0 66119 865139 321227 wake_tasks: 0 0 0 0 wake_applets: 0 0 0 0 wake_signal: 0 0 0 0 poll_exp: 0 66119 865139 321227 poll_drop: 6 5 38279 60768 poll_dead: 0 0 0 0 poll_skip: 0 0 0 0 fd_skip: 54 172661 4411407 2008198 fd_lock: 0 0 10890 5394 fd_del: 0 492829 58965 105091 conn_dead: 0 0 0 0 stream: 0 406223 38663 61338 empty_rq: 18 40 962999 390549 long_rq: 0 0 0 0 This patch presents a few risks but fixes a real problem with threads, and as such it needs be backported to 1.8. It depends on previous patch ("MINOR: fd: add a bitmask to indicate that an FD is known by the poller"). Special thanks go to Samuel Reed for providing a large amount of useful debugging information and for testing fixes.	2018-01-23 15:48:08 +01:00
Willy Tarreau	c9c8378c2b	MINOR: fd: add a bitmask to indicate that an FD is known by the poller Some pollers like epoll() need to know if the fd is already known or not in order to compute the operation to perform (add, mod, del). For now this is performed based on the difference between the previous FD state and the new state but this will not be usable anymore once threads become responsible for their own polling. Here we come with a different approach : a bitmask is stored with the fd to indicate which pollers already know it, and the pollers will be able to simply perform the add/mod/del operations based on this bit combined with the new state. This patch only adds the bitmask declaration and initialization, it is it not yet used. It will be needed by the next two fixes and will need to be backported to 1.8.	2018-01-23 15:42:57 +01:00
Willy Tarreau	ebc78d78a2	BUG/MEDIUM: fd: maintain a per-thread update mask Since the fd update tables are per-thread, we need to have a bit per thread to indicate whether an update exists, otherwise this can lead to lost update events every time multiple threads want to update the same FD. In practice for now, it only happens at start time when listeners are enabled and ask for polling after facing their first EAGAIN. But since the pollers are still shared, a lost event is still recovered by a neighbor thread. This will not reliably work anymore with per-thread pollers, where it has been observed a few times on startup that a single-threaded listener would not always accept incoming connections upon startup. It's worth noting that during this code review it appeared that the "new" flag in the fdtab isn't used anymore. This fix should be backported to 1.8.	2018-01-23 15:41:19 +01:00
Christopher Faulet	32467fef98	BUG/MEDIUM: threads/polling: Use fd_cache_mask instead of fd_cache_num fd_cache_num is the number of FDs in the FD cache. It is a global variable. So it is underoptimized because we may be lead to consider there are waiting FDs for the current thread in the FD cache while in fact all FDs are assigned to the other threads. So, in such cases, the polling loop will be evaluated many more times than necessary. Instead, we now check if the thread id is set in the bitfield fd_cache_mask. [wt: it's not exactly a bug, rather a design limitation of the thread which was not addressed in time for the 1.8 release. It can appear more often than we initially predicted, when more threads are running than the number of assigned CPU cores, or when certain threads spend milliseconds computing crypto keys while other threads spin on epoll_wait(0)=0] This patch should be backported to 1.8.	2018-01-23 15:39:51 +01:00
Christopher Faulet	69553fe62c	MINOR: threads/fd: Use a bitfield to know if there are FDs for a thread in the FD cache A bitfield has been added to know if there are some FDs processable by a specific thread in the FD cache. When a FD is inserted in the FD cache, the bits corresponding to its thread_mask are set. On each thread, the bitfield is updated when the FD cache is processed. If there is no FD processed, the thread is removed from the bitfield by unsetting its tid_bit. Note that this bitfield is updated but not checked in fd_process_cached_events. So, when this function is called, the FDs cache is always processed. [wt: should be backported to 1.8 as it will help fix a design limitation]	2018-01-23 15:39:10 +01:00
Willy Tarreau	d80cb4ee13	MINOR: global: add some global activity counters to help debugging A number of counters have been added at special places helping better understanding certain bug reports. These counters are maintained per thread and are shown using "show activity" on the CLI. The "clear counters" commands also reset these counters. The output is sent as a single write(), which currently produces up to about 7 kB of data for 64 threads. If more counters are added, it may be necessary to write into multiple buffers, or to reset the counters. To backport to 1.8 to help collect more detailed bug reports.	2018-01-23 15:38:33 +01:00
Willy Tarreau	421f02e738	MINOR: threads: add a MAX_THREADS define instead of LONGBITS This one allows not to inflate some structures when threads are disabled. Now struct global is 1.4 kB instead of 33 kB. Should be backported to 1.8 for ease of backporting of upcoming patches.	2018-01-23 15:28:20 +01:00
Willy Tarreau	f4571a027f	MINOR: global/threads: move cpu_map at the end of the global struct The "thread" part is 32kB long, better move it at the end of the structure since it's only used during initialization, to keep the rest grouped together. Should be backported to 1.8 to ease backporting of upcoming patches, no functional impact.	2018-01-23 15:27:52 +01:00
Olivier Houchard	e9bad0a936	MINOR: servers: Don't report duplicate dyncookies for disabled servers. Especially with server-templates, it can happen servers starts with a placeholder IP, in the disabled state. In this case, we don't want to report that the same cookie was generated for multiple servers. So defer the test until the server is enabled. This should be backported to 1.8.	2018-01-23 14:05:17 +01:00
Emeric Brun	5548291395	BUG/MEDIUM: peers: fix expire date wasn't updated if entry is modified remotely. The stktable_touch_remote considers the expire field stored in the stksess struct. The expire field was updated on the a newly created stksess to store. But if the stksess with a same key is still present the expire was not updated. This patch postpones the update of the expire field of the stksess just before processing the "touch". These bug was introduced in commit: MEDIUM: threads/stick-tables: handle multithreads on stick tables. And the fix should be backported on 1.8.	2018-01-22 16:03:25 +01:00
Etienne Carriere	a792a0aa93	MINOR: sample: add date_us sample Add date_us sample that returns the microsecond part of the timeval structure representing the date of the structure. The "second" part of the timeval can already be fetched by the "date" sample	2018-01-21 07:56:42 +01:00
Willy Tarreau	cc35923c32	BUG/MINOR: poll: too large size allocation for FD events Commit 80da05a ("MEDIUM: poll: do not use FD_* macros anymore") which appeared in 1.5-dev18 and which was backported to 1.4.23 made explicit use of arrays of FDs mapped to unsigned ints. The problem lies in the allocated size for poll(), as the resulting size is in bits and not bytes, resulting in poll() arrays being 8 times larger than necessary! In practice poll() is not used on highly loaded systems, explaining why nobody noticed. But it definetely has to be addressed. This fix needs to be backported to all stable versions.	2018-01-17 15:52:11 +01:00
Willy Tarreau	260bf5c106	CONTRIB: debug: fix a few flags definitions Commit f4cfcf9 ("MINOR: debug/flags: Add missing flags") added a number of missing flags but a few of them were incorrect, hiding real values. This can be backported to 1.8.	2018-01-15 18:59:16 +01:00
J�r�me Magnin	4a326cba5b	DOC: clarify the scope of ssl_fc_is_resumed Clarify that it's for incoming connections.	2018-01-15 14:18:25 +01:00
Christopher Faulet	333694d771	MINOR: spoe: Don't queue a SPOE context if nothing is sent When some messages must be sent to an agent, the SPOE context of the stream is queued to be handled by an SPOE applet. If there is no available applet, a new one is created, thus opening a connection with the agent. Since the support of ACLs on messages, some processing can now be discarded. So, to avoid opening a connection for nothing, the SPOE context is now queued after the messages encoding.	2018-01-15 13:48:03 +01:00
Christopher Faulet	336d3ef0e7	MINOR: spoe: add register-var-names directive in spoe-agent configuration In addition to "option force-set-var", recently added, this directive can be used to selectivelly register unknown variable names, without totally relaxing their registration during the runtime, like "option force-set-var" does. So there is no way for a malicious agent to exhaust memory by defining a too high number of variable names. In other hand, you need to enumerate all variable names. This could be painfull in some circumstances. Remember, this directive is only usefull when the variable names are not referenced anywhere in the HAProxy configuration or the SPOE one. Thanks to Etienne Carri�re for his help on this part.	2018-01-15 13:47:27 +01:00
Willy Tarreau	d651ba14d4	BUG/MEDIUM: stream: properly handle client aborts during redispatch James Mc Bride reported an interesting case affecting all versions since at least 1.5 : if a client aborts a connection on an empty buffer at the exact moment a server redispatch happens, the CF_SHUTW_NOW flag on the channel is immediately turned into CF_SHUTW, which is not caught by check_req_may_abort(), leading the redispatch to be performed anyway with the channel marked as shut in both directions while the stream interface correctly establishes. This situation makes no sense. Ultimately the transfer times out and the server-side stream interface remains in EST state while the client is in CLO state, and this case doesn't correspond to anything we can handle in process_stream, leading to poll() being woken up all the time without any progress being made. And the session cannot even be killed from the CLI. So we must ensure that check_req_may_abort() also considers the case where the channel is already closed, which is what this patch does. Thanks to James for providing detailed captures allowing to diagnose the problem. This fix must be backported to all maintained versions.	2018-01-12 10:47:48 +01:00
David Carlier	ec5e84552a	BUILD/MINOR: ancient gcc versions atomic fix Commit 1a69af6d3892fe1946bb8babb3044d2d26afd46e introduced code for atomic prior to 4.7. Unfortunately clang uses as well those constants which is misleading.	2018-01-11 15:31:07 +01:00
Willy Tarreau	1a69af6d38	MINOR: hathreads: add support for gcc < 4.7 Till now the use of __atomic_* gcc builtins required gcc >= 4.7. Since some supported and quite common operating systems like CentOS 6 still come with older versions (4.4) and the mapping to the older builtins is reasonably simple, let's implement it. This code is only used for gcc < 4.7. It has been quickly tested on a machine using gcc 4.4.4 and provided expected results. This patch should be backported to 1.8.	2018-01-10 07:51:56 +01:00
William Lallemand	29f690c945	BUG/MEDIUM: mworker: execvp failure depending on argv[0] The copy_argv() function lacks a check on '-' to remove the -x, -sf and -st parameters. When reloading a master process with a path starting by /st, /sf, or /x.. the copy_argv() function skipped argv[0] leading to an execvp() without the binary.	2018-01-09 23:44:18 +01:00
Olivier Houchard	2ec2db9725	MINOR: dns: Handle SRV record weight correctly. A SRV record weight can range from 0 to 65535, while haproxy weight goes from 0 to 256, so we have to divide it by 256 before handing it to haproxy. Also, a SRV record with a weight of 0 doesn't mean the server shouldn't be used, so use a minimum weight of 1. This should probably be backported to 1.8.	2018-01-09 15:43:11 +01:00
Tim Duesterhus	119a5f10e4	BUG/MINOR: lua: Fix return value of Socket.settimeout The `socket.tcp.settimeout` method of Lua returns `1` in all cases, while the `Socket.settimeout` method of haproxy returns `0` in all cases. This breaks the `socket.http` module, because it validates the return value of `settimeout`. This bug was introduced in commit 7e7ac32dad1e15c19152d37aaf9ea6b3f00a7226 (which is the very first commit adding the Socket class to Lua). This bugfix should be backported to every branch containing that commit: - 1.6 - 1.7 - 1.8 A test case for this bug is as follows: The 'Test' response header will contain an HTTP status code with the patch applied and will be zero (nil) without the patch applied. http.lua: http = require("socket.http") core.register_action("bug", { "http-req" }, function(txn) local b, c, h = http.request { url = "http://93.184.216.34", headers = { Host = "example.com" }, create = core.tcp, redirect = false } txn:set_var("txn.foo", c) end) haproxy.cfg: global lua-load /scratch/haproxy/http.lua frontend fe bind 127.0.0.1:8080 http-request lua.bug http-response set-header Test %[var(txn.foo)] default_backend be backend be server s example.com:80	2018-01-09 15:22:55 +01:00
Tim Duesterhus	6edab865f6	BUG/MEDIUM: lua: Fix IPv6 with separate port support for Socket.connect The `socket.tcp.connect` method of Lua requires at least two parameters: The host and the port. The `Socket.connect` method of haproxy requires only one when a host with a combined port is provided. This stems from the fact that `str2sa_range` is used internally in `hlua_socket_connect`. This very fact unfortunately causes a diversion in the behaviour of Lua's socket class and haproxy's for IPv6 addresses: sock:connect("::1", "80") works fine with Lua, but fails with: connect: cannot parse destination address '::1' in haproxy, because `str2sa_range` parses the trailing `:1` as the port. This patch forcefully adds a `:` to the end of the address iff a port number greater than `0` is given as the second parameter. Technically this breaks backwards compatibility, because the docs state: > The syntax "127.0.0.1:1234" is valid. in this case, the > parameter port is ignored. But: The connect() call can only succeed if the second parameter is left out (which causes no breakage) or if the second parameter is an integer or a numeric string. It seems unlikely that someone would provide an address with a port number and would also provide a second parameter containing a number other than zero. Thus I feel this breakage is warranted to fix the mismatch between haproxy's socket class and Lua's one. This commit should be backported to haproxy 1.8 only, because of the possible breakage of existing Lua scripts.	2018-01-09 15:22:55 +01:00
Tim Duesterhus	b33754ce86	DOC: lua: Fix typos in comments of hlua_socket_receive	2018-01-09 15:22:49 +01:00
Tim Duesterhus	c6e377e6bb	BUG/MINOR: lua: Fix default value for pattern in Socket.receive The default value of the pattern in `Socket.receive` is `l` according to the documentation and in the `socket.tcp.receive` method of Lua. The default value of `wanted` in `int hlua_socket_receive(struct lua_State )` reflects this requirement, but the function fails to ensure this nonetheless: If no parameter is given the top of the Lua stack will have the index 1. `lua_pushinteger(L, wanted);` then pushes the default value onto the stack (with index 2). The following `lua_replace(L, 2);` then pops the top index (2) and tries to replace the index 2 with it. I am not sure why exactly that happens (possibly, because one cannot replace non-existent stack indicies), but this causes the stack index to be lost. `hlua_socket_receive_yield` then tries to read the stack index 2, to determine what to read and get the value `0`, instead of the correct HLSR_READ_LINE, thus taking the wrong branch. Fix this by ensuring that the top of the stack is not replaced by itself. This bug was introduced in commit 7e7ac32dad1e15c19152d37aaf9ea6b3f00a7226 (which is the very first commit adding the Socket class to Lua). This bugfix should be backported to every branch containing that commit: - 1.6 - 1.7 - 1.8 A test case for this bug is as follows: The 'Test' response header will contain an HTTP status line with the patch applied and will be empty without the patch applied. Replacing the `sock:receive()` with `sock:receive("*l")` will cause the status line to appear with and without the patch http.lua: core.register_action("bug", { "http-req" }, function(txn) local sock = core.tcp() sock:settimeout(60) sock:connect("127.0.0.1:80") sock:send("GET / HTTP/1.0\r\n\r\n") response = sock:receive() sock:close() txn:set_var("txn.foo", response) end) haproxy.cfg (bits omitted for brevity): global lua-load /scratch/haproxy/http.lua frontend fe bind 127.0.0.1:8080 http-request lua.bug http-response set-header Test %[var(txn.foo)] default_backend be backend be server s 127.0.0.1:80	2018-01-09 15:22:46 +01:00
William Lallemand	99b90af621	BUG/MEDIUM: ssl: cache doesn't release shctx blocks Since the rework of the shctx with the hot list system, the ssl cache was putting session inside the hot list, without removing them. Once all block were used, they were all locked in the hot list, which was forbiding to reuse them for new sessions. Bug introduced by 4f45bb9 ("MEDIUM: shctx: separate ssl and shctx") Thanks to Jeffrey J. Persch for reporting this bug. Must be backported to 1.8.	2018-01-05 11:46:54 +01:00
Olivier Houchard	e2a34967a9	CLEANUP: rbtree: remove Remove the rbtree implementation. It's not used, it's not even connected to the build, and we probably have no use for it .	2018-01-05 10:56:32 +01:00
Willy Tarreau	5d4cafb610	BUILD: ssl: silence a warning when building without NPN nor ALPN support When building with a library not offering any of these, ssl_conf_cur is not used. Can be backported to 1.8.	2018-01-04 19:04:08 +01:00
Willy Tarreau	4a28da1e9d	BUG/MEDIUM: h2: properly handle the END_STREAM flag on empty DATA frames Peter Lindegaard Hansen reported a problem affecting some POST requests sent by MSIE on 1.8.3. Lukas found that we incorrectly dealt with the END_STREAM flag on empty DATA frames. What happens in fact is that while we correctly report that we've read a zero-byte frame, since commit 8fc016d ("BUG/MEDIUM: h2: support uploading partial DATA frames") backported into 1.8.2, we've been able to return without updating the parser's state nor checking the frame flags in this case. The fix is trival, we just need not to return too early. This fix must be backported to 1.8.	2018-01-04 14:41:00 +01:00
Willy Tarreau	8ec140604a	MEDIUM: h2: prepare a graceful shutdown when the frontend is stopped During a reload operation, instead of keeping the H2 connections opened forever causing confusion during configuration changes, let's send a graceful shutdown so that the client knows that it would better open a new connection for future requests. We can't really catch the signal from H2, but we can advertise this graceful shutdown upon the next I/O event (eg: a WINDOW_UPDATE from the client or a new request). One of the visible effect is that the old process quits much faster. This patch should be backported to 1.8 since it is affected by this problem.	2017-12-30 18:08:13 +01:00
Willy Tarreau	4576424174	CONTRIB: hpack: add an hpack decoder This decoder takes a series of hex codes on stdin using one line per HEADERS frame and shows the decoded headers.	2017-12-30 17:43:28 +01:00
Willy Tarreau	c775f8372b	DEBUG: hpack: add more traces to the hpack decoder These ones are only enabled when DEBUG_HPACK is defined so they have no effect on the production code.	2017-12-30 17:37:08 +01:00
Willy Tarreau	4f03436c48	DEBUG: hpack: make hpack_dht_dump() expose the output file It's more convenient to be able to choose between stdout and stderr.	2017-12-30 17:17:07 +01:00
Willy Tarreau	3083276187	MINOR: h2: add a function to report pseudo-header names For debugging we need to be able to dump pseudo headers when we know their name, let's put this there as we already have the other way around.	2017-12-30 17:17:07 +01:00
Willy Tarreau	bb39b4945b	BUG/MAJOR: hpack: don't return direct references to the dynamic headers table Maximilian B�hm and Lucas Rolff both reported some random failed requests with HTTP/2. Upon deep investigation on detailed traces provided by Lucas, it turned out that some header names were occasionally corrupted and used to point to random strings within the dynamic headers table. The HPACK decoder must always return copies of header names that point to the dynamic headers table. Otherwise, the insertion of a header after the current one leading to a reorganization of the table will change the data the pointer designates. Unfortunately, one such copy was missing for indexed names, leading to random request failures due to invalid header names. Many thanks to Lucas who ran a large number of tests with full traces helping to capture a reproduceable sequence exhibiting this issue. This patch must be backported to 1.8.	2017-12-30 17:17:06 +01:00
Willy Tarreau	ff47b3f41d	BUG/MEDIUM: http: don't automatically forward request close Maximilian B�hm, and Lucas Rolff reported some frequent HTTP/2 POST failures affecting version 1.8.2 that were not affecting 1.8.1. Lukas Tribus determined that these ones appeared consecutive to commit a48c141 ("BUG/MAJOR: connection: refine the situations where we don't send shutw()"). It turns out that the HTTP request forwarding engine lets a shutr from the client be automatically forwarded to the server unless chunked encoding is in use. It's a bit tricky to meet this condition as it only happens if the shutr is not reported in the initial request. So if a request is large enough or the body is delayed after the headers (eg: Expect: 100-continue), the the function quits with channel_auto_close() left enabled. The patch above was not really related in fact. It's just that a previous bug was causing this shutw to be skipped at the lower layers, and the two bugs used to cancel themselves. In the HTTP request we should only pass the close in tunnel mode, as other cases either need to keep the connection alive (eg: for reuse) or will force-close it. Also the forced close will properly take care of avoiding the painful time-wait, which is not possible with the early close. This patch must be backported to 1.8 as it directly impacts HTTP/2, and may be backported to older version to save them from being abused by clients causing TIME_WAITs between haproxy and the server. Thanks to Lukas and Lucas for running many tests with captures allowing the bug to be narrowed down.	2017-12-29 17:23:40 +01:00
William Lallemand	e134041910	MINOR: don't close stdio anymore Closing the standard IO FDs (0,1,2) can be troublesome, especially in the case of the master-worker. Instead of closing those FDs, they are now pointing to /dev/null which prevents sending debugging messages to the wrong FDs. This patch could be backported in 1.8.	2017-12-29 16:33:41 +01:00
PiBa-NL	149a81a443	BUG/MEDIUM: mworker: don't close stdio several time This patch makes sure that a frontend socket that gets created after initialization won't be closed when the master gets re-executed. When used in daemon mode, the master-worker is closing the FDs 0, 1, 2 after the fork of the children. When the master was reloading, those FDs were assigned again during the parsing of the configuration (probably for some listeners), and the workers were closing them thinking it was the stdio. This patch must be backported to 1.8.	2017-12-29 16:31:10 +01:00
Willy Tarreau	d790143d99	BUG/MEDIUM: h2: ensure we always know the stream before sending a reset The recent patch introducing the H2_CS_FRAME_E state to emit stream resets was not totally correct in that in the rare case where there is no room left to emit the reset, the next call to process it later could use an uninitialized stream. This only affects responses to frames that are sent on closed streams though. This fix must be backported to 1.8.	2017-12-29 11:34:40 +01:00
Davor Ocelic	e9ed281e9f	DOC/MINOR: configuration: typo, formatting fixes - Add simple typo and formatting fixes - Eliminate a couple > 80 column lines Changes do not affect technical content and can be backported.	2017-12-27 19:03:32 +01:00
Willy Tarreau	ab83750a29	BUG/MEDIUM: h2: improve handling of frames received on closed streams The h2spec utility found certain situations where we're returning an RST_STREAM while a GOAWAY is expected. While we can't always reliably decide which one to use (eg: after a stream has been closed for a long time), in practice we often still have the stream available until it's destroyed at the application level. This provides the flags we need to verify the conditions that led to its closure, namely if RST was sent or received, or if it was regularly closed using a double ES. The first step consists in marking all closed streams as having already sent an RST_STREAM frame. This will ensure that we can send an RST_STREAM for a late transmission on a stream we have forgotten about instead of risking to break the connection. The next steps consist in re-arranging the H2_SS_CLOSED checks so that we can deliver a GOAWAY frame for the few cases where an unexpected frame was received after a double ES. By carefully taking care of these specificities, we can reduce by 4 the number of remaining compliance issues. Note: some tests start to become a bit long and to be repeated at various places. Probably that adding a bitmask of allowed/forbidden frame types per state and/or per situation could significantly help. It's likely that some deeper tests in the frame handlers could also be removed now as they can't be triggered anymore. This fix should be backported to 1.8.	2017-12-27 18:44:22 +01:00
Willy Tarreau	a20a519b8f	BUG/MEDIUM: h2: properly handle and report some stream errors Some stream errors applied to half-closed and closed streams are not properly reported, especially after the stream transistions to the closed state. The reason is that the code checks for this "error" stream state in order to send an RST frame. But if the stream was just closed or was already closed, there's no way to validate this condition, and the error is never reported to the peer. In order to address this situation, we'll add a new FRAME_E demux state which indicates that the previously parsed frame triggered a stream error of type STREAM CLOSED that needs to be reported. Proceeding like this will ensure that we don't lose that information even if we can't immediately send the message. It also removes the confusion where FRAME_A could be used either for ACKs or for RST. The state transition has been added after every h2s_error() on the demux path. It seems that we might need to have two distinct h2s_error() functions, one for the mux and another one for the demux, though it would provide little benefit. It also becomes more apparent that the H2_SS_ERROR state is only used to detect the need to report an error on the mux direction. Maybe this will have to be revisited later. This simple change managed to eliminate 5 bugs reported by h2spec. This fix must be backported to 1.8.	2017-12-27 18:34:50 +01:00

1 2 3 4 5 ...

7307 Commits