haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-10-17 10:31:33 +02:00

Author	SHA1	Message	Date
Christopher Faulet	1703478e2d	BUG/MINOR: h1: Report the right error position when a header value is invalid During H1 messages parsing, when the parser has finished to parse a full header line, some tests are performed on its value, depending on its name, to be sure it is valid. The content-length is checked and converted in integer and the host header is also checked. If an error occurred during this step, the error position must point on the header value. But from the parser point of view, we are already on the start of the next header. Thus the effective reported position in the error capture is the beginning of the unparsed header line. It is a bit confusing when we try to figure out why a message is rejected. Now, the parser state is updated to point on the invalid value. This way, the error position really points on the right position. This patch must be backported as far as 1.9.	2020-01-06 13:58:21 +01:00
Christopher Faulet	bc7c03eba3	BUG/MINOR: h1: Don't test the host header during response parsing During the H1 message parsing, the host header is tested to be sure it matches the request's authority, if defined. When there are multiple host headers, we also take care they are all the same. Of course, these tests must only be performed on the requests. A host header in a response has no special meaning. This patch must be backported to 2.1.	2019-11-27 14:01:17 +01:00
Christopher Faulet	531b83e039	MINOR: h1: Reject requests if the authority does not match the header host As stated in the RCF7230#5.4, a client must send a field-value for the header host that is identical to the authority if the target URI includes one. So, now, by default, if the authority, when provided, does not match the value of the header host, an error is triggered. To mitigate this behavior, it is possible to set the option "accept-invalid-http-request". In that case, an http error is captured without interrupting the request parsing.	2019-10-14 22:28:50 +02:00
Christopher Faulet	497ab4f519	MINOR: h1: Reject requests with different occurrences of the header host There is no reason for a client to send several headers host. It even may be considered as a bug. However, it is totally invalid to have different values for those. So now, in such case, an error is triggered during the request parsing. In addition, when several headers host are found with the same value, only the first instance is kept and others are skipped.	2019-10-14 22:28:50 +02:00
Christopher Faulet	84f06533e1	BUG/MINOR: h1: Properly reset h1m when parsing is restarted Otherwise some processing may be performed twice. For instance, if the header "Content-Length" is parsed on the first pass, when the parsing is restarted, we skip it because we think another header with the same value was already seen. In fact, it is currently the only existing bug that can be encountered. But it is safer to reset all the h1m on restart to avoid any future bugs. This patch must be backported to 2.0 and 1.9	2019-09-04 10:30:11 +02:00
Christopher Faulet	711ed6ae4a	MAJOR: http: Remove the HTTP legacy code First of all, all legacy HTTP analyzers and all functions exclusively used by them were removed. So the most of the functions in proto_http.{c,h} were removed. Only functions to deal with the HTTP transaction have been kept. Then, http_msg and hdr_idx modules were entirely removed. And finally the structure http_msg was lightened of all its useless information about the legacy HTTP. The structure hdr_ctx was also removed because unused now, just like unused states in the enum h1_state. Note that the memory pool "hdr_idx" was removed and "http_txn" is now smaller.	2019-07-19 09:24:12 +02:00
Christopher Faulet	a51ebb7f56	MEDIUM: h1: Add an option to sanitize connection headers during parsing The flag H1_MF_CLEAN_CONN_HDR has been added to let the H1 parser sanitize connection headers. It means it will remove all "close" and "keep-alive" values during the parsing. One noticeable effect is that connection headers may be unfolded. In practice, this is not a problem because it is not frequent to have multiple values for the connection headers. If this flag is set, during the parsing The function h1_parse_next_connection_header() is called in a loop instead of h1_parse_conection_header(). No need to backport this patch	2019-04-12 22:06:53 +02:00
Christopher Faulet	68b1bbd767	BUG/MEDIUM: h1: Get the h1m state when restarting the headers parsing Since the commit 0f8fb6b7f ("MINOR: h1: make the H1 headers block parser able to parse headers only"), when headers are not received in one time, a parsing error is returned because the local state in the function h1_headers_to_hdr_list() was not initialized with the previous one (in fact, it was not initialized at all). So now, we start the parsing of headers with the state H1_MSG_HDR_FIRST when the flag H1_MF_HDRS_ONLY is set. Otherwise, we always get it from the h1m. This patch must be backported to 1.9.	2019-01-04 16:23:03 +01:00
Willy Tarreau	0f8fb6b7f9	MINOR: h1: make the H1 headers block parser able to parse headers only Currently the H1 headers parser works for either a request or a response because it starts from the start line. It is also able to resume its processing when it was interrupted, but in this case it doesn't update the list. Make it support a new flag, H1_MF_HDRS_ONLY so that the caller can indicate it's only interested in the headers list and not the start line. This will be convenient to parse H1 trailers.	2019-01-04 10:48:03 +01:00
Willy Tarreau	afba57ae80	REORG: h1: merge types+proto into common/h1.h These two files are self-contained and do not depend on other layers, so let's remerge them together for easier manipulation.	2018-12-11 17:15:13 +01:00
Willy Tarreau	538746ad38	REORG: h1: move legacy http functions to http_msg.c Now that h1 and legacy HTTP are two distinct things, there's no need to keep the legacy HTTP parsers in h1.c since they're only used by the legacy code in proto_http.c, and h1.h doesn't need to include hdr_idx anymore. This concerns the following functions : - http_parse_reqline(); - http_parse_stsline(); - http_msg_analyzer(); - http_forward_trailers(); All of these were moved to http_msg.c.	2018-12-11 17:15:13 +01:00
Christopher Faulet	25da9e34f1	MINOR: h1: Add the flag H1_MF_NO_PHDR to not add pseudo-headers during parsing Some pseudo-headers are added during the headers parsing, mainly for the mux H2. With this flag, it is possible to not add them. This avoid some boring filtering in the mux H1.	2018-10-12 16:15:18 +02:00
Christopher Faulet	1dc2b49556	MINOR: h1: Change the union h1_sl to use indirect strings to store infos Instead of using offsets relating to the parsed buffer to store start line infos, we now use indirect strings. So now, these infos remain valid only if the origin buffer remains untouched. But it's not a real problem because this union is used during the parsing and never stored to a later use.	2018-10-12 16:14:57 +02:00
Christopher Faulet	ff08a92797	MINOR: h1: Add EOH marker during headers parsing When headers parsing ends, a pseudo header with an empty name and an empty value is added to the array of parsed headers to mark its end. It is convenient to loop on this array, but not really useful if we want remove the last header or add a new one, because we don't really know where is the last CRLF (the empty line ending the headers block). So now, instead the name of this pseudo header points on this last CRLF. Its length is still 0 and its value is still empty, so loops on the array remains unchanged.	2018-10-12 16:08:27 +02:00
Christopher Faulet	2912f87443	BUG/MEDIUM: h1: Really skip all updates when incomplete messages are parsed In h1_headers_to_hdr_list, when an incomplete message is parsed, all updates must be skipped until the end of the message is found. Then the parsing is restarted from the beginning. But not all updates were skipped, leading to invalid rewritting or segfault. No backport is needed.	2018-09-19 15:08:05 +02:00
Willy Tarreau	73373ab43a	MEDIUM: h1: deduplicate the content-length header Just like we used to do in proto_http, we now check that each and every occurrence of the content-length header field and each of its values are exactly identical, and we normalize the header to return the last value of the first header with spaces trimmed.	2018-09-14 19:04:28 +02:00
Willy Tarreau	2557f6a3e2	MEDIUM: h1: better handle transfer-encoding vs content-length The transfer-encoding header processing was a bit lenient in this part because it was made to read messages already validated by haproxy. We absolutely need to reinstate the strict processing defined in RFC7230 as is currently being done in proto_http.c. That is, transfer-encoding presence alone is enough to cancel content-length, and must be terminated by the "chunked" token, except in the response where we can fall back to the close mode if it's not last. For this we now use a specific parsing function which updates the flags and we introduce a new flag H1_MF_XFER_ENC indicating that the transfer-encoding header is present. Last, if such a header is found, we delete all content-length header fields found in the message.	2018-09-14 17:40:35 +02:00
Willy Tarreau	2ea6bb5c31	MINOR: h1: add headers to the list after controls, not before This will ease removal/skipping of duplicates such as content-length.	2018-09-14 17:40:35 +02:00
Willy Tarreau	98f5cf7a59	MINOR: h1: parse the Connection header field The new function h1_parse_connection_header() is called when facing a connection header in the generic parser, and it will set up to 3 bits in h1m->flags indicating if at least one "close", "keep-alive" or "upgrade" tokens was seen.	2018-09-13 14:52:31 +02:00
Willy Tarreau	ba5fbca33f	MINOR: h1: report in the h1m struct if the HTTP version is 1.1 or above This will be needed for the mux to know how to process the Connection header, and will save it from having to re-parse the request line since it's captured on the fly.	2018-09-13 14:34:09 +02:00
Willy Tarreau	db72da0432	BUG/MINOR: h1: don't consider the status for each header While it was possible to consider the status before parsing response headers, it's wrong to do it for request headers and could lead to random behaviours due to this status matching other fields instead. Additionnally there is little to no value in doing this for each and every new header field. It's much better to reset the content-length at once in the callerwhen seeing such statuses (which currently is only the H2 mux). No backport is needed, this is purely 1.9.	2018-09-13 14:30:23 +02:00
Willy Tarreau	eb528db60b	MINOR: h1: add H1_MF_TOLOWER to decide when to turn header names to lower case The h1 parser used to systematically turn header field names to lower case because it was designed for H2. Let's add a flag which is off by default to condition this behaviour so that when using it from an H1 parser it will not affect the message.	2018-09-12 17:38:26 +02:00
Willy Tarreau	c2ab9f5163	MEDIUM: h1: implement the request parser as well The original H1 request parsing code was reintroduced into the generic H1 parser so that it can be used regardless of the direction. If the parser is interrupted and restarts, it makes use of the H1_MF_RESP flag to decide whether to re-parse a request or a response. While parsing the request, the method is decoded and set into the start line structure.	2018-09-12 17:38:25 +02:00
Willy Tarreau	11da5674c3	MINOR: h1: remove the HTTP status from the H1M struct It has nothing to do there and is not used from there anymore, let's get rid of it.	2018-09-12 17:38:25 +02:00
Willy Tarreau	001823c304	MEDIUM: h1: remove the useless H1_MSG_BODY state This state was only a delimiter between headers and body but it now causes more harm than good because it requires someone to change it. Since the H1 parser knows if we're in DATA or CHUNK_SIZE, simply let it set the right next state so that h1m->state constantly matches what is expected afterwards.	2018-09-12 17:38:25 +02:00
Willy Tarreau	4c34c0e74a	MEDIUM: h1: support partial message parsing While it was not needed in the H2 mux which was reading full H1 messages from the channel, it is mandatory for the H1 mux reading contents from outside to be able to restart on a message. The problem is that the headers are indexed on the fly, and it's not fun to have to store everything between calls. The solution here is to complete the first pass doing a partial restart, and only once the end of message was found, to start over it again at once, filling entries. This way there is a bounded number of passes on the contents and no need to store an intermediary result anymore. Later this principle could even be used to decide to completely drop an output buffer to save memory.	2018-09-12 17:38:25 +02:00
Willy Tarreau	5384aac0cb	MINOR: h1: make the message parser support a null <hdr> argument This will allow some iterative calls to be made on incomplete messages without having to store all the headers.	2018-09-12 17:38:25 +02:00
Willy Tarreau	4433c083ec	MEDIUM: h1: let the caller pass the initial parser's state This way the caller controls if it's the request or response which has to be used, and it will allow to restart after an incomplete parsing.	2018-09-12 17:38:25 +02:00
Willy Tarreau	a41393fc61	MEDIUM: h1: make the parser support a pointer to a start line This will allow the parser to fill some extra fields like the method or status without having to store them permanently in the HTTP message. At this point however the parser cannot restart from an interrupted read.	2018-09-12 17:38:25 +02:00
Willy Tarreau	9aec30557b	MEDIUM: h1: consider err_pos before deciding to accept a header name or not Till now the H1 parser made for H2 used to be lenient on invalid header field names because they were supposed to be produced by haproxy. Now instead we'll rely on err_pos to know how to act (ie: -2 == must block).	2018-09-12 17:38:25 +02:00
Willy Tarreau	801250e07d	REORG: h1: create a new h1m_state This is the parsing state of an HTTP/1 message. Currently the h1_state is composite as it's made both of parsing and control (100SENT, BODY, DONE, TUNNEL, ENDING etc). The purpose here is to have a purely H1 state that can be used by H1 parsers. For now it's equivalent to h1_state.	2018-09-12 17:38:25 +02:00
Willy Tarreau	35b51c6e5b	REORG: http: move the HTTP semantics definitions to http.h/http.c It's a bit painful to have to deal with HTTP semantics for each protocol version (H1 and H2), and working on the version-agnostic code further emphasizes the problem. This patch creates http.h and http.c which are agnostic to the version in use, and which borrow a few parts from proto_http and from h1. For example the once thought h1-specific h1_char_classes array is in fact dictated by RFC7231 and is used to parse HTTP headers. A few changes were made to a few files which were including proto_http.h while they only needed http.h. Certain string definitions pre-dated the introduction of indirect strings (ist) so some were used to simplify the definition of the known HTTP methods. The current lookup code saves 2 kB of a heavily used table and is faster than the previous table based lookup (typ. 14 ns vs 16 before).	2018-09-11 10:30:25 +02:00
Willy Tarreau	950a8a6fde	BUG/MINOR: h1: fix buffer shift after realignment Commit 5e74b0b ("MEDIUM: h1: port to new buffer API.") introduced a minor bug by which a buffer's head could stay shifted by the amount of removed CRLF if it started with empty lines. This would cause the second request (or response) not to work until it would receive a few extra characters. This most only impacts requests sent by hand though. This is purely 1.9, no backport is needed.	2018-09-06 10:48:15 +02:00
Willy Tarreau	c9fa0480af	MAJOR: buffer: finalize buffer detachment Now the buffers only contain the header and a pointer to the storage area which can be anywhere. This will significantly simplify buffer swapping and will make it possible to map chunks on buffers as well. The buf_empty variable was removed, as now it's enough to have size==0 and area==NULL to designate the empty buffer (thus a non-allocated head is the empty buffer by default). buf_wanted for now is indicated by size==0 and area==(void *)1. The channels and the checks now embed the buffer's head, and the only pointer is to the storage area. This slightly increases the unallocated buffer size (3 extra ints for the empty buffer) but considerably simplifies dynamic buffer management. It will also later permit to detach unused checks. The way the struct buffer is arranged has proven quite efficient on a number of tests, which makes sense given that size is always accessed and often first, followed by the othe ones.	2018-07-19 16:23:43 +02:00
Willy Tarreau	72a100b386	MINOR: buffer: replace bi_fast_delete() with b_del() There's no distinction between in and out data now. The latter covers the needs of the former and supports wrapping. The extra cost is negligible given the locations where it's used.	2018-07-19 16:23:43 +02:00
Willy Tarreau	5e74b0ba3b	MEDIUM: h1: port to new buffer API. The parser now uses the channel exclusively to access the data. In order to avoid the cost of indirection, a local variable "input" was added to the function that replaces buf->p. Given that this part is on the critical path, it will have to be tested again for any visible performance loss.	2018-07-19 16:23:42 +02:00
Willy Tarreau	f40e68227b	MINOR: h1: make h1_measure_trailers() use an offset and a count This will be needed by the H2 encoder to restart after wrapping.	2018-07-19 16:23:41 +02:00
Willy Tarreau	7314be8e2c	MINOR: h1: make h1_measure_trailers() take the byte count in argument The principle is that it should not have to take this value from the buffer itself anymore.	2018-07-19 16:23:40 +02:00
Willy Tarreau	188e230704	MINOR: buffer: convert most b_ptr() calls to c_ptr() The latter uses the channel wherever a channel is known.	2018-07-19 16:23:40 +02:00
Willy Tarreau	8f9c72d301	MINOR: buffer: remove bi_end() It was replaced by ci_tail() when the channel is known, or b_tail() in other cases.	2018-07-19 16:23:40 +02:00
Willy Tarreau	41e38ac0ee	MINOR: buffer: remove bo_end() It was replaced by either b_tail() when the buffer has no input data, or b_peek(b, b->o).	2018-07-19 16:23:40 +02:00
Willy Tarreau	1b4cf9b754	BUG/MINOR: h1: the HTTP/1 make status code parser check for digits The H1 parser used by the H2 gateway was a bit lax and could validate non-numbers in the status code. Since it computes the code on the fly it's problematic, as "30:" is read as status code 310. Let's properly check that it's a number now. No backport needed.	2017-11-09 11:15:45 +01:00
Willy Tarreau	2510f702f9	MINOR: h1: add a function to measure the trailers length This is needed in the H2->H1 gateway so that we know how long the trailers block is in chunked encoding. It returns the number of bytes, or 0 if some are missing, or -1 in case of parse error.	2017-10-31 17:18:10 +01:00
Willy Tarreau	d22e83abd9	MINOR: h1: store the status code in the H1 message It was painful not to have the status code available, especially when it was computed. Let's store it and ensure we don't claim content-length anymore on 1xx, only 0 body bytes.	2017-10-31 08:43:29 +01:00
Willy Tarreau	8ea0f38c75	MEDIUM: h1: ensure that 1xx, 204 and 304 don't have a payload body It's important for the H2 to H1 gateway that the response parser properly clears the H1 message's body_len when seeing these status codes so that we don't hang waiting to transfer data that will not come.	2017-10-30 19:33:22 +01:00
Willy Tarreau	794f9af894	MEDIUM: h1: reimplement the http/1 response parser for the gateway The HTTP/2->HTTP/1 gateway will need to process HTTP/1 responses. We cannot sanely rely on the HTTP/1 txn to parse a response because : 1) responses generated by haproxy such as error messages, redirects, stats or Lua are neither parsed nor indexed ; this could be addressed over the long term but will take time. 2) the http txn is useless to parse the body : the states present there are only meaningful to received bytes (ie next bytes to parse) and not at all to sent bytes. Thus chunks cannot be followed at all. Even when implementing this later, it's unsure whether it will be possible when dealing with compression. So using the HTTP txn is now out of the equation and the only remaining solution is to call an HTTP/1 message parser. We already have one, it was slightly modified to avoid keeping states by benefitting from the fact that the response was produced by haproxy and this is entirely available. It assumes the following rules are true, or that incuring an extra cost to work around them is acceptable : - the response buffer is read-write and supports modifications in place - headers sent through / by haproxy are not folded. Folding is still implemented by replacing CR/LF/tabs/spaces with spaces if encountered - HTTP/0.9 responses are never sent by haproxy and have never been supported at all - haproxy will not send partial responses, the whole headers block will be sent at once ; this means that we don't need to keep expensive states and can afford to restart the parsing from the beginning when facing a partial response ; - response is contiguous (does not wrap). This was already the case with the original parser and ensures we can safely dereference all fields with (ptr,len) The parser replaces all of the http_msg fields that were necessary with local variables. The parser is not called on an http_msg but on a string with a start and an end. The HTTP/1 states were reused for ease of use, though the request-specific ones have not been implemented for now. The error position and error state are supported and optional ; these ones may be used later for bug hunting. The parser issues the list of all the headers into a caller-allocated array of struct ist. The content-length/transfer-encoding header are checked and the relevant info fed the h1 message state (flags + body_len).	2017-10-22 09:54:15 +02:00
Willy Tarreau	8740c8b1b2	REORG: http: move the HTTP/1 header block parser to h1.c Since it still depends on http_msg, it was not renamed yet.	2017-10-22 09:54:13 +02:00
Willy Tarreau	db4893d6a4	REORG: http: move the HTTP/1 chunk parser to h1.{c,h} Functions http_parse_chunk_size(), http_skip_chunk_crlf() and http_forward_trailers() were moved to h1.h and h1.c respectively so that they can be called from outside. The parts that were inline remained inline as it's critical for performance (+41% perf difference reported in an earlier test). For now the "http_" prefix remains in their name since they still depend on the http_msg type.	2017-10-22 09:54:13 +02:00
Willy Tarreau	0da5b3bddc	REORG: http: move some very http1-specific parts to h1.{c,h} Certain types and enums are very specific to the HTTP/1 parser, and we'll need to share them with the HTTP/2 to HTTP/1 translation code. Let's move them to h1.c/h1.h. Those with very few occurrences or only used locally were renamed to explicitly mention the relevant HTTP version : enum ht_state -> h1_state. http_msg_state_str -> h1_msg_state_str HTTP_FLG_* -> H1_FLG_* http_char_classes -> h1_char_classes Others like HTTP_IS_, HTTP_MSG_ are left to be done later.	2017-10-22 09:54:13 +02:00

49 Commits