Now that h1 and legacy HTTP are two distinct things, there's no need
to keep the legacy HTTP parsers in h1.c since they're only used by
the legacy code in proto_http.c, and h1.h doesn't need to include
hdr_idx anymore. This concerns the following functions :
- http_parse_reqline();
- http_parse_stsline();
- http_msg_analyzer();
- http_forward_trailers();
All of these were moved to http_msg.c.
Lots of HTTP code still uses struct http_msg. Not only this code is
still huge, but it's part of the legacy interface. Let's move most
of these functions to a separate file http_msg.c to make it more
visible which file relies on what. It's mostly symmetrical with
what is present in http_htx.c.
The function http_transform_header_str() which used to rely on two
function pointers to look up a header was simplified to rely on
two variants http_legacy_replace_{,full_}header(), making both
sides of the function much simpler.
No code was changed beyond these moves.
All the HTX definition is self-contained and doesn't really depend on
anything external since it's a mostly protocol. In addition, some
external similar files (like h2) also placed in common used to rely
on it, making it a bit awkward.
This patch moves the two htx.h files into a single self-contained one.
The historical dependency on sample.h could be also removed since it
used to be there only for http_meth_t which is now in http.h.
The cache is now able to store and resend HTX messages. When an HTX message is
stored in the cache, the headers are prefixed with their block's info (an
uint32_t), containing its type and its length. Data, on their side, are stored
without any prefix. Only the value is copied in the cache. 2 fields have been
added in the structure cache_entry, hdrs_len and data_len, to known the size, in
the cache, of the headers part and the data part. If the message is chunked, the
trailers are also copied, the same way as data. When the HTX message is
recreated in the cache applet, the trailers size is known removing the headers
length and the data lenght from the total object length.
The CLI proxy was not handling payload. To do that, we needed to keep a
connection active on a server and to transfer each new line over that
connection until we receive a empty line.
The CLI proxy handles the payload in the same way that the CLI do it.
Examples:
$ echo -e "@1;add map #-1 <<\n$(cat data)\n" | socat /tmp/master-socket -
$ socat /tmp/master-socket readline
prompt
master> @1
25130> add map #-1 <<
+ test test
+ test2 test2
+ test3 test3
+
25130>
There were a number of ugly setsockopt() calls spread all over
proto_http.c, proto_htx.c and hlua.c just to manipulate the front
connection's TOS, mark or TCP quick-ack. These ones entirely relied
on the connection, its existence, its control layer's presence, and
its addresses. Worse, inet_set_tos() was placed in proto_http.c,
exported and used from the two other ones, surrounded in #ifdefs.
This patch moves this code to connection.h and makes the other ones
rely on it without ifdefs.
The new function hpack_encode_path() supports encoding a path into
the ":path" header. It knows about "/" and "/index.html" which use
a single byte, and falls back to literal encoding for other ones,
with a fast path for short paths < 127 bytes.
The new function hpack_encode_scheme() supports encoding a scheme
into the ":scheme" header. It knows about "https" and "http" which use
a single byte, and falls back to literal encoding for other ones.
The new function hpack_encode_method() supports encoding a method.
It knows about GET and POST which use a single byte, and falls back
to literal encoding for other ones.
This header exists with 7 different values, it's worth taking them
into account for the encoding, hence these functions. One of them
makes use of an integer only and computes the 3 output bytes in case
of literal. The other one benefits from the knowledge of an existing
string, which for example exists in the case of H1 to H2 encoding.
For long header values whose index is known, hpack_encodde_long_idx()
may now be used. This function emits the short index and follows with
the header's value.
Most direct calls to HPACK functions are made to encode short header
fields like methods, schemes or statuses, whose lengths and indexes
are known. Let's have a small function to do this.
We'll need these functions from other inline functions, let's make them
accessible. len_to_bytes() was renamed to hpack_len_to_bytes() since it's
now exposed.
This macro may be used to block constant propagation that lets the compiler
detect a possible NULL dereference on a variable resulting from an explicit
assignment in an impossible check. Sometimes a function is called which does
safety checks and returns NULL if safe conditions are not met. The place
where it's called cannot hit this condition and dereferencing the pointer
without first checking it will make the compiler emit a warning about a
"potential null pointer dereference" which is hard to work around. This
macro "washes" the pointer and prevents the compiler from emitting tests
branching to undefined instructions. It may only be used when the developer
is absolutely certain that the conditions are guaranteed and that the
pointer passed in argument cannot be NULL by design.
A typical use case is a top-level function doing this :
if (frame->type == HEADERS)
parse_frame(frame);
Then parse_frame() does this :
void parse_frame(struct frame *frame)
{
const char *frame_hdr;
frame_hdr = frame_hdr_start(frame);
if (*frame_hdr == FRAME_HDR_BEGIN)
process_frame(frame);
}
and :
const char *frame_hdr_start(const struct frame *frame)
{
if (frame->type == HEADERS)
return frame->data;
else
return NULL;
}
Above parse_frame() is only called for frame->type == HEADERS so it will
never get a NULL in return from frame_hdr_start(). Thus it's always safe
to dereference *frame_hdr since the check was already performed above.
It's then safe to address it this way instead of inventing dummy error
code paths that may create real bugs :
void parse_frame(struct frame *frame)
{
const char *frame_hdr;
frame_hdr = frame_hdr_start(frame);
ALREADY_CHECKED(frame_hdr);
if (*frame_hdr == FRAME_HDR_BEGIN)
process_frame(frame);
}
Calling tolower/toupper for each character is slow, a lookup into a
256-byte table is cheaper, especially for common characters used in
header field names which all fit into a cache line. Let's create these
two variables marked weak so that they're included only once.
The ist functions were missing functions to copy an IST into a target
buffer, making some code have to resort to memcpy(), which tends to be
overkill for small strings, that the compiler cannot guess. In addition
sometimes there is a need to turn a string to lower or upper case so it
had to be overwritten after the operation.
This patch adds 6 functions to copy an ist to a buffer, as binary or as a
string (i.e. a zero is or is not appended), and optionally to apply a
lower case or upper case transformation on the fly.
A number of tests were performed to optimize the processing for small
strings. The loops are marked unlikely to dissuade the compilers from
over-optimizing them and switching to SIMD instructions. The lower case
or upper case transformations used to rely on external functions for
each character and to crappify the code due to clobbered registers,
which is not acceptable when we know that only a certain class of chars
has to be transformed, so the test was open-coded.
CS_FL_RCV_MORE is used in two cases, to let the conn_stream
know there may be more data available, and to let it know that
it needs more room. We can't easily differentiate between the
two, and that may leads to hangs, so split it into two flags,
CS_FL_RCV_MORE, that means there may be more data, and
CS_FL_WANT_ROOM, that means we need more room.
This should not be backported.
If we try to receive before the connection is established, we lose the
send event and are not woken up anymore once the connection is established.
This was diagnosed by Olivier.
No backport is needed.
There are some situations where we need to wait for the other side to
be connected. None of the current blocking flags support this. It used
to work more or less by accident using the old flags. Let's add a new
flag to mention we're blocking on this, it's removed by si_chk_rcv()
when a connection is established. It should be enough for now.
The master is not supposed to run (at the moment) any task before the
polling loop, the created tasks should be run only in the workers but in
the master they should be disabled or removed.
No backport needed.
To ease the fast forwarding and the infinte forwarding on HTX proxies, 2
functions have been added to let the channel be almost aware of the way data are
stored in its buffer. By calling these functions instead of legacy ones, we are
sure to forward the right amount of data.
Now, the function htx_from_buf() will set the buffer's length to its size
automatically. In return, the caller should call htx_to_buf() at the end to be
sure to leave the buffer hosting the HTX message in the right state. When the
caller can use the function htxbuf() to get the HTX message without any update
on the underlying buffer.
The small HTX overhead is enough to make the system perform multiple
reads and unaligned memory copies. Here we provide a function whose
purpose is to reduce the apparent room in a buffer by the size of the
overhead for DATA blocks, which is the struct htx plus 2 blocks (one
for DATA, one for the end of message so that small blocks can fit at
once). The muxes using HTX will be encouraged to use this one instead
of b_room() to compute the available buffer room and avoid filling
their demux buf with more data than can fit at once into the HTX
buffer.
This one is used a lot during transfers, let's avoid resetting its
size when there are already data in the buffer since it implies the
size is correct.
Add a new keyword for servers, "idle-timeout". If set, unused connections are
kept alive until the timeout happens, and will be picked for reuse if no
other connection is available.
Add a new method to muxes, "max_streams", that returns the max number of
streams the mux can handle. This will be used to know if a mux is in use
or not.
We currently have conn_get_best_mux() to return the best mux for a
given protocol name, side and proxy mode. But we need the mux entry
as well in order to fix the bind_conf and servers at the end of the
config parsing. Let's split the function in two parts. It's worth
noting that the <conn> argument is never used anymore so this part
is eligible to some cleanup.
Till now we could only produce an HTTP/1 request from a list of H2
request headers. Now the new function h2_make_htx_request() does the
same but using the HTX encoding instead, while respecting the H2
semantics. The code is not much different from the first version,
only the encoding differs.
For now it's not used.
First, to be called on HTX streams, a filter must explicitly be declared as
compatible by setting the flag STRM_FLT_FL_HAS_FILTERS on the filter's config at
HAProxy startup. This flag is checked when a filter implementation is attached
to a stream.
Then, some changes have been made on HTTP callbacks. The callback http_payload
has been added to filter HTX data. It will be called on HTX streams only. It
replaces the callbacks http_data, http_chunk_trailers and http_forward_data,
called on legacy HTTP streams only and marked as deprecated. The documention
(once updated)) will give all information to implement this new callback. Other
HTTP callbacks will be called for HTX and HTTP legacy streams. So it is the
filter's responsibility to known which kind of data it handles. The macro
IS_HTX_STRM should be used in such cases.
There is at least a noticeable changes in the way data are forwarded. In HTX,
after the call to the callback http_headers, all the headers are considered as
forwarded. So, in http_payload, only the body and eventually the trailers will
be filtered.
First of all, an dedicated error snapshot, h1_snapshot, has been added. It
contains more or less the some info than http_snapshot but adapted for H1
messages. Then, the function h1_capture_bad_message() has been added to capture
bad H1 messages. And finally, the function h1_show_error_snapshot() is used to
dump these errors. Only Headers or data parsing are captured.