38 Commits

Author SHA1 Message Date
Christopher Faulet
25da9e34f1 MINOR: h1: Add the flag H1_MF_NO_PHDR to not add pseudo-headers during parsing
Some pseudo-headers are added during the headers parsing, mainly for the mux
H2. With this flag, it is possible to not add them. This avoid some boring
filtering in the mux H1.
2018-10-12 16:15:18 +02:00
Christopher Faulet
1dc2b49556 MINOR: h1: Change the union h1_sl to use indirect strings to store infos
Instead of using offsets relating to the parsed buffer to store start line
infos, we now use indirect strings. So now, these infos remain valid only if the
origin buffer remains untouched. But it's not a real problem because this union
is used during the parsing and never stored to a later use.
2018-10-12 16:14:57 +02:00
Christopher Faulet
ff08a92797 MINOR: h1: Add EOH marker during headers parsing
When headers parsing ends, a pseudo header with an empty name and an empty value
is added to the array of parsed headers to mark its end. It is convenient to
loop on this array, but not really useful if we want remove the last header or
add a new one, because we don't really know where is the last CRLF (the empty
line ending the headers block). So now, instead the name of this pseudo header
points on this last CRLF. Its length is still 0 and its value is still empty, so
loops on the array remains unchanged.
2018-10-12 16:08:27 +02:00
Christopher Faulet
2912f87443 BUG/MEDIUM: h1: Really skip all updates when incomplete messages are parsed
In h1_headers_to_hdr_list, when an incomplete message is parsed, all updates
must be skipped until the end of the message is found. Then the parsing is
restarted from the beginning. But not all updates were skipped, leading to
invalid rewritting or segfault.

No backport is needed.
2018-09-19 15:08:05 +02:00
Willy Tarreau
73373ab43a MEDIUM: h1: deduplicate the content-length header
Just like we used to do in proto_http, we now check that each and every
occurrence of the content-length header field and each of its values are
exactly identical, and we normalize the header to return the last value
of the first header with spaces trimmed.
2018-09-14 19:04:28 +02:00
Willy Tarreau
2557f6a3e2 MEDIUM: h1: better handle transfer-encoding vs content-length
The transfer-encoding header processing was a bit lenient in this part
because it was made to read messages already validated by haproxy. We
absolutely need to reinstate the strict processing defined in RFC7230
as is currently being done in proto_http.c. That is, transfer-encoding
presence alone is enough to cancel content-length, and must be
terminated by the "chunked" token, except in the response where we
can fall back to the close mode if it's not last.

For this we now use a specific parsing function which updates the
flags and we introduce a new flag H1_MF_XFER_ENC indicating that the
transfer-encoding header is present.

Last, if such a header is found, we delete all content-length header
fields found in the message.
2018-09-14 17:40:35 +02:00
Willy Tarreau
2ea6bb5c31 MINOR: h1: add headers to the list after controls, not before
This will ease removal/skipping of duplicates such as content-length.
2018-09-14 17:40:35 +02:00
Willy Tarreau
98f5cf7a59 MINOR: h1: parse the Connection header field
The new function h1_parse_connection_header() is called when facing a
connection header in the generic parser, and it will set up to 3 bits
in h1m->flags indicating if at least one "close", "keep-alive" or "upgrade"
tokens was seen.
2018-09-13 14:52:31 +02:00
Willy Tarreau
ba5fbca33f MINOR: h1: report in the h1m struct if the HTTP version is 1.1 or above
This will be needed for the mux to know how to process the Connection
header, and will save it from having to re-parse the request line since
it's captured on the fly.
2018-09-13 14:34:09 +02:00
Willy Tarreau
db72da0432 BUG/MINOR: h1: don't consider the status for each header
While it was possible to consider the status before parsing response
headers, it's wrong to do it for request headers and could lead to
random behaviours due to this status matching other fields instead.
Additionnally there is little to no value in doing this for each and
every new header field. It's much better to reset the content-length
at once in the callerwhen seeing such statuses (which currently is only
the H2 mux).

No backport is needed, this is purely 1.9.
2018-09-13 14:30:23 +02:00
Willy Tarreau
eb528db60b MINOR: h1: add H1_MF_TOLOWER to decide when to turn header names to lower case
The h1 parser used to systematically turn header field names to lower
case because it was designed for H2. Let's add a flag which is off by
default to condition this behaviour so that when using it from an H1
parser it will not affect the message.
2018-09-12 17:38:26 +02:00
Willy Tarreau
c2ab9f5163 MEDIUM: h1: implement the request parser as well
The original H1 request parsing code was reintroduced into the generic
H1 parser so that it can be used regardless of the direction. If the
parser is interrupted and restarts, it makes use of the H1_MF_RESP
flag to decide whether to re-parse a request or a response. While
parsing the request, the method is decoded and set into the start line
structure.
2018-09-12 17:38:25 +02:00
Willy Tarreau
11da5674c3 MINOR: h1: remove the HTTP status from the H1M struct
It has nothing to do there and is not used from there anymore, let's
get rid of it.
2018-09-12 17:38:25 +02:00
Willy Tarreau
001823c304 MEDIUM: h1: remove the useless H1_MSG_BODY state
This state was only a delimiter between headers and body but it now
causes more harm than good because it requires someone to change it.
Since the H1 parser knows if we're in DATA or CHUNK_SIZE, simply let
it set the right next state so that h1m->state constantly matches
what is expected afterwards.
2018-09-12 17:38:25 +02:00
Willy Tarreau
4c34c0e74a MEDIUM: h1: support partial message parsing
While it was not needed in the H2 mux which was reading full H1 messages
from the channel, it is mandatory for the H1 mux reading contents from
outside to be able to restart on a message. The problem is that the
headers are indexed on the fly, and it's not fun to have to store
everything between calls.

The solution here is to complete the first pass doing a partial restart,
and only once the end of message was found, to start over it again at
once, filling entries. This way there is a bounded number of passes on
the contents and no need to store an intermediary result anymore. Later
this principle could even be used to decide to completely drop an output
buffer to save memory.
2018-09-12 17:38:25 +02:00
Willy Tarreau
5384aac0cb MINOR: h1: make the message parser support a null <hdr> argument
This will allow some iterative calls to be made on incomplete messages
without having to store all the headers.
2018-09-12 17:38:25 +02:00
Willy Tarreau
4433c083ec MEDIUM: h1: let the caller pass the initial parser's state
This way the caller controls if it's the request or response which has
to be used, and it will allow to restart after an incomplete parsing.
2018-09-12 17:38:25 +02:00
Willy Tarreau
a41393fc61 MEDIUM: h1: make the parser support a pointer to a start line
This will allow the parser to fill some extra fields like the method or
status without having to store them permanently in the HTTP message. At
this point however the parser cannot restart from an interrupted read.
2018-09-12 17:38:25 +02:00
Willy Tarreau
9aec30557b MEDIUM: h1: consider err_pos before deciding to accept a header name or not
Till now the H1 parser made for H2 used to be lenient on invalid header
field names because they were supposed to be produced by haproxy. Now
instead we'll rely on err_pos to know how to act (ie: -2 == must block).
2018-09-12 17:38:25 +02:00
Willy Tarreau
801250e07d REORG: h1: create a new h1m_state
This is the *parsing* state of an HTTP/1 message. Currently the h1_state
is composite as it's made both of parsing and control (100SENT, BODY,
DONE, TUNNEL, ENDING etc). The purpose here is to have a purely H1 state
that can be used by H1 parsers. For now it's equivalent to h1_state.
2018-09-12 17:38:25 +02:00
Willy Tarreau
35b51c6e5b REORG: http: move the HTTP semantics definitions to http.h/http.c
It's a bit painful to have to deal with HTTP semantics for each protocol
version (H1 and H2), and working on the version-agnostic code further
emphasizes the problem.

This patch creates http.h and http.c which are agnostic to the version
in use, and which borrow a few parts from proto_http and from h1. For
example the once thought h1-specific h1_char_classes array is in fact
dictated by RFC7231 and is used to parse HTTP headers. A few changes
were made to a few files which were including proto_http.h while they
only needed http.h.

Certain string definitions pre-dated the introduction of indirect
strings (ist) so some were used to simplify the definition of the known
HTTP methods. The current lookup code saves 2 kB of a heavily used table
and is faster than the previous table based lookup (typ. 14 ns vs 16
before).
2018-09-11 10:30:25 +02:00
Willy Tarreau
950a8a6fde BUG/MINOR: h1: fix buffer shift after realignment
Commit 5e74b0b ("MEDIUM: h1: port to new buffer API.") introduced a
minor bug by which a buffer's head could stay shifted by the amount
of removed CRLF if it started with empty lines. This would cause the
second request (or response) not to work until it would receive a few
extra characters. This most only impacts requests sent by hand though.

This is purely 1.9, no backport is needed.
2018-09-06 10:48:15 +02:00
Willy Tarreau
c9fa0480af MAJOR: buffer: finalize buffer detachment
Now the buffers only contain the header and a pointer to the storage
area which can be anywhere. This will significantly simplify buffer
swapping and will make it possible to map chunks on buffers as well.

The buf_empty variable was removed, as now it's enough to have size==0
and area==NULL to designate the empty buffer (thus a non-allocated head
is the empty buffer by default). buf_wanted for now is indicated by
size==0 and area==(void *)1.

The channels and the checks now embed the buffer's head, and the only
pointer is to the storage area. This slightly increases the unallocated
buffer size (3 extra ints for the empty buffer) but considerably
simplifies dynamic buffer management. It will also later permit to
detach unused checks.

The way the struct buffer is arranged has proven quite efficient on a
number of tests, which makes sense given that size is always accessed
and often first, followed by the othe ones.
2018-07-19 16:23:43 +02:00
Willy Tarreau
72a100b386 MINOR: buffer: replace bi_fast_delete() with b_del()
There's no distinction between in and out data now. The latter covers
the needs of the former and supports wrapping. The extra cost is
negligible given the locations where it's used.
2018-07-19 16:23:43 +02:00
Willy Tarreau
5e74b0ba3b MEDIUM: h1: port to new buffer API.
The parser now uses the channel exclusively to access the data. In order
to avoid the cost of indirection, a local variable "input" was added to
the function that replaces buf->p. Given that this part is on the critical
path, it will have to be tested again for any visible performance loss.
2018-07-19 16:23:42 +02:00
Willy Tarreau
f40e68227b MINOR: h1: make h1_measure_trailers() use an offset and a count
This will be needed by the H2 encoder to restart after wrapping.
2018-07-19 16:23:41 +02:00
Willy Tarreau
7314be8e2c MINOR: h1: make h1_measure_trailers() take the byte count in argument
The principle is that it should not have to take this value from the
buffer itself anymore.
2018-07-19 16:23:40 +02:00
Willy Tarreau
188e230704 MINOR: buffer: convert most b_ptr() calls to c_ptr()
The latter uses the channel wherever a channel is known.
2018-07-19 16:23:40 +02:00
Willy Tarreau
8f9c72d301 MINOR: buffer: remove bi_end()
It was replaced by ci_tail() when the channel is known, or b_tail() in
other cases.
2018-07-19 16:23:40 +02:00
Willy Tarreau
41e38ac0ee MINOR: buffer: remove bo_end()
It was replaced by either b_tail() when the buffer has no input data, or
b_peek(b, b->o).
2018-07-19 16:23:40 +02:00
Willy Tarreau
1b4cf9b754 BUG/MINOR: h1: the HTTP/1 make status code parser check for digits
The H1 parser used by the H2 gateway was a bit lax and could validate
non-numbers in the status code. Since it computes the code on the fly
it's problematic, as "30:" is read as status code 310. Let's properly
check that it's a number now. No backport needed.
2017-11-09 11:15:45 +01:00
Willy Tarreau
2510f702f9 MINOR: h1: add a function to measure the trailers length
This is needed in the H2->H1 gateway so that we know how long the trailers
block is in chunked encoding. It returns the number of bytes, or 0 if some
are missing, or -1 in case of parse error.
2017-10-31 17:18:10 +01:00
Willy Tarreau
d22e83abd9 MINOR: h1: store the status code in the H1 message
It was painful not to have the status code available, especially when
it was computed. Let's store it and ensure we don't claim content-length
anymore on 1xx, only 0 body bytes.
2017-10-31 08:43:29 +01:00
Willy Tarreau
8ea0f38c75 MEDIUM: h1: ensure that 1xx, 204 and 304 don't have a payload body
It's important for the H2 to H1 gateway that the response parser properly
clears the H1 message's body_len when seeing these status codes so that we
don't hang waiting to transfer data that will not come.
2017-10-30 19:33:22 +01:00
Willy Tarreau
794f9af894 MEDIUM: h1: reimplement the http/1 response parser for the gateway
The HTTP/2->HTTP/1 gateway will need to process HTTP/1 responses. We
cannot sanely rely on the HTTP/1 txn to parse a response because :

  1) responses generated by haproxy such as error messages, redirects,
     stats or Lua are neither parsed nor indexed ; this could be
     addressed over the long term but will take time.

  2) the http txn is useless to parse the body : the states present there
     are only meaningful to received bytes (ie next bytes to parse) and
     not at all to sent bytes. Thus chunks cannot be followed at all.
     Even when implementing this later, it's unsure whether it will be
     possible when dealing with compression.

So using the HTTP txn is now out of the equation and the only remaining
solution is to call an HTTP/1 message parser. We already have one, it was
slightly modified to avoid keeping states by benefitting from the fact
that the response was produced by haproxy and this is entirely available.
It assumes the following rules are true, or that incuring an extra cost
to work around them is acceptable :
  - the response buffer is read-write and supports modifications in place

  - headers sent through / by haproxy are not folded. Folding is still
    implemented by replacing CR/LF/tabs/spaces with spaces if encountered

  - HTTP/0.9 responses are never sent by haproxy and have never been
    supported at all

  - haproxy will not send partial responses, the whole headers block will
    be sent at once ; this means that we don't need to keep expensive
    states and can afford to restart the parsing from the beginning when
    facing a partial response ;

  - response is contiguous (does not wrap). This was already the case
    with the original parser and ensures we can safely dereference all
    fields with (ptr,len)

The parser replaces all of the http_msg fields that were necessary with
local variables. The parser is not called on an http_msg but on a string
with a start and an end. The HTTP/1 states were reused for ease of use,
though the request-specific ones have not been implemented for now. The
error position and error state are supported and optional ; these ones
may be used later for bug hunting.

The parser issues the list of all the headers into a caller-allocated
array of struct ist.

The content-length/transfer-encoding header are checked and the relevant
info fed the h1 message state (flags + body_len).
2017-10-22 09:54:15 +02:00
Willy Tarreau
8740c8b1b2 REORG: http: move the HTTP/1 header block parser to h1.c
Since it still depends on http_msg, it was not renamed yet.
2017-10-22 09:54:13 +02:00
Willy Tarreau
db4893d6a4 REORG: http: move the HTTP/1 chunk parser to h1.{c,h}
Functions http_parse_chunk_size(), http_skip_chunk_crlf() and
http_forward_trailers() were moved to h1.h and h1.c respectively so
that they can be called from outside. The parts that were inline
remained inline as it's critical for performance (+41% perf
difference reported in an earlier test). For now the "http_" prefix
remains in their name since they still depend on the http_msg type.
2017-10-22 09:54:13 +02:00
Willy Tarreau
0da5b3bddc REORG: http: move some very http1-specific parts to h1.{c,h}
Certain types and enums are very specific to the HTTP/1 parser, and we'll
need to share them with the HTTP/2 to HTTP/1 translation code. Let's move
them to h1.c/h1.h. Those with very few occurrences or only used locally
were renamed to explicitly mention the relevant HTTP version :

  enum ht_state      -> h1_state.
  http_msg_state_str -> h1_msg_state_str
  HTTP_FLG_*         -> H1_FLG_*
  http_char_classes  -> h1_char_classes

Others like HTTP_IS_*, HTTP_MSG_* are left to be done later.
2017-10-22 09:54:13 +02:00