Commit Graph

54 Commits

Author SHA1 Message Date
Christopher Faulet
62dc8750a9 MINOR: http: Add support for HTTP 414/431 status codes
414-Uri-Too-Long and 431-Request-Header-Fields-Too-Large are now part of
supported status codes that can be define as error files. The hash table
defined in http_get_status_idx() was updated accordingly.
2024-11-19 15:29:40 +01:00
Willy Tarreau
4cc25f26f9 MEDIUM: http: add the ability to redefine http-err-codes and http-fail-codes
The new global keywords "http-err-codes" and "http-fail-codes" allow to
redefine which HTTP status codes indicate a client-induced error or a
server error, as tracked by stick-table counters. This is only done
globally, though everything was done so that it could easily be extended
to a per-proxy mechanism if there was a real need for this (but it would
eat quite more RAM then).

A simple reg-test was added (http-err-fail.vtc).
2024-01-11 15:10:08 +01:00
Willy Tarreau
3c135569c5 MINOR: http: add infrastructure to choose status codes for err / fail
At the moment, http_err_cnt and http_fail_cnt are incremented on a
well-defined set of status codes, which are checked at various places.
Over time, there have been some complains about 404, 401 or 407
triggering errors, or 500 triggering failures in SOAP environments
for example. With a small bit field that fits in a cache line we
can match the presence of a status code from 100 to 599, so that
remains cheap.

This patch adds two such bit fields, one per code class, and the
accompanying functions to set/clear/test the codes. The arrays are
preset at boot time. For now they are not used and it's not possible
to adjust them.
2024-01-11 15:10:08 +01:00
Willy Tarreau
59c01f1091 CLEANUP: http: avoid duplicating literals in find_http_meth()
The function does the inverse of http_known_methods[], better rely on
that array with its indices, that makes the code clearer. Note that
we purposely don't use a loop because the compiler is able to build
an evaluation tree of the size checks and content checks that's very
efficient for the most common methods. Moving a few unimportant
entries even simplified the output code a little bit (they're now
groupped by size without changing anything for the first ones).
2024-01-11 15:10:08 +01:00
Willy Tarreau
19def65228 OPTIM: http: simplify http_get_status_idx() using a hash
This function uses a large switch/case, but the problem is that due to the
numerous holes in the range, the compiler implemented a large jump table.
With a bit of experimentations, some trivial perfect-hash code works, and
since the number of entries is 19, it was enlarged to match the nearest
next power of two to avoid a large modulo operation, and fills the holes
with the default return value (HTTP_ERR_500). Jumping to 32 also results
in a lot of valid keys and allows us to pick small values, resulting in
very fast and compact code. The new function, despite keeping a 32-bytes
table, saves slightly more than 800 bytes of code+data compared to the
previous code, and avoids table jumps that affect the CPU's branch history.

Note that another simple hash worked fine and produced exactly 19 codes
(hence no need to pad holes): ((status * 8675725) >> 13) % 19

But it's still about 24 bytes larger in code to save 13 bytes of data
that are aligned anyway, and it was a bit more expensive so that was
definitely not worth it.

The validity of the table was verified with this test code added just after
it:

  __attribute__((constructor)) void http_hash_test(void)
  {
	int i;
	for (i = 0; i <= 600; i++)
		printf("code %d => %d\n", i, http_get_status_idx(i));
	exit(0);
  }

And starting haproxy |grep -vw 14 correctly shows all ordered values
(except 500 of course which is 14).

In case new codes would be added, just play again with dev/phash to
updated the table. As long as there are less than 32 effective entries
it will remain easy to update without having to modify phash.
2024-01-11 15:10:08 +01:00
Ruei-Bang Chen
7a1ec235cd MINOR: sample: Add fetcher for getting all cookie names
This new fetcher can be used to extract the list of cookie names from
Cookie request header or from Set-Cookie response header depending on
the stream direction. There is an optional argument that can be used
as the delimiter (which is assumed to be the first character of the
argument) between cookie names. The default delimiter is comma (,).

Note that we will treat the Cookie request header as a semi-colon
separated list of cookies and each Set-Cookie response header as
a single cookie and extract the cookie names accordingly.
2023-11-03 09:57:06 +01:00
Willy Tarreau
22731762d9 BUG/MINOR: http: skip leading zeroes in content-length values
Ben Kallus also noticed that we preserve leading zeroes on content-length
values. While this is totally valid, it would be safer to at least trim
them before passing the value, because a bogus server written to parse
using "strtol(value, NULL, 0)" could inadvertently take a leading zero
as a prefix for an octal value. While there is not much that can be done
to protect such servers in general (e.g. lack of check for overflows etc),
at least it's quite cheap to make sure the transmitted value is normalized
and not taken for an octal one.

This is not really a bug, rather a missed opportunity to sanitize the
input, but is marked as a bug so that we don't forget to backport it to
stable branches.

A combined regtest was added to h1or2_to_h1c which already validates
end-to-end syntax consistency on aggregate headers.
2023-08-09 11:28:48 +02:00
Willy Tarreau
6492f1f29d BUG/MAJOR: http: reject any empty content-length header value
The content-length header parser has its dedicated function, in order
to take extreme care about invalid, unparsable, or conflicting values.
But there's a corner case in it, by which it stops comparing values
when reaching the end of the header. This has for a side effect that
an empty value or a value that ends with a comma does not deserve
further analysis, and it acts as if the header was absent.

While this is not necessarily a problem for the value ending with a
comma as it will be cause a header folding and will disappear, it is a
problem for the first isolated empty header because this one will not
be recontructed when next ones are seen, and will be passed as-is to the
backend server. A vulnerable HTTP/1 server hosted behind haproxy that
would just use this first value as "0" and ignore the valid one would
then not be protected by haproxy and could be attacked this way, taking
the payload for an extra request.

In field the risk depends on the server. Most commonly used servers
already have safe content-length parsers, but users relying on haproxy
to protect a known-vulnerable server might be at risk (and the risk of
a bug even in a reputable server should never be dismissed).

A configuration-based work-around consists in adding the following rule
in the frontend, to explicitly reject requests featuring an empty
content-length header that would have not be folded into an existing
one:

    http-request deny if { hdr_len(content-length) 0 }

The real fix consists in adjusting the parser so that it always expects a
value at the beginning of the header or after a comma. It will now reject
requests and responses having empty values anywhere in the C-L header.

This needs to be backported to all supported versions. Note that the
modification was made to functions h1_parse_cont_len_header() and
http_parse_cont_len_header(). Prior to 2.8 the latter was in
h2_parse_cont_len_header(). One day the two should be refused but the
former is also used by Lua.

The HTTP messaging reg-tests were completed to test these cases.

Thanks to Ben Kallus of Dartmouth College and Narf Industries for
reporting this! (this is in GH #2237).
2023-08-09 09:27:38 +02:00
Christopher Faulet
e3e4e00063 BUG/MINOR: http: Return the right reason for 302
Because of a cut/paste error, the wrong reason was returned for 302
code. The 301 reason was returned instead. Thus now, "Found" is returned for
302, instead of "Moved Permanently".

This pathc should fix the issue 2208. It must be backported to all stable
versions.
2023-07-17 11:14:10 +02:00
Martin DOLEZ
110e4a8733 MINOR: http_fetch: add case insensitive support for smp_fetch_url_param
This commit adds a new argument to smp_fetch_url_param
that makes the parameter key comparison case-insensitive.
Several levels of callers were modified to pass this info.
2023-03-30 14:11:10 +02:00
Amaury Denoyelle
15f3cc4b38 MINOR: http: extract content-length parsing from H2
Extract function h2_parse_cont_len_header() in the generic HTTP module.
This allows to reuse it for all HTTP/x parsers. The function is now
available as http_parse_cont_len_header().

Most notably, this will be reused in the next bugfix for the H3 parser.
This is necessary to check that content-length header match the length
of DATA frames.

Thus, it must be backported to 2.6.
2022-12-14 11:34:18 +01:00
Ilya Shipitsin
6f86eaae4f CLEANUP: assorted typo fixes in the code and comments
This is 33rd iteration of typo fixes
2022-11-30 14:02:36 +01:00
Christopher Faulet
99ade9e0da MINOR: http: Considere empty ports as valid default ports
In RFC3986#6.2.3, following URIs are considered as equivalent:

      http://example.com
      http://example.com/
      http://example.com:/
      http://example.com:80/

The third one is interristing because the port is empty and it is still
considered as a default port. Thus, http_get_host_port() does no longer
return IST_NULL when the port is empty. Now, a ist is returned, it points on
the first character after the colon (':') with a length of 0. In addition,
http_is_default_port() now considers an empty port as a default port,
regardless the scheme.

This patch must not be backported, if so, without the previous one ("MINOR:
h1: Consider empty port as invalid in authority for CONNECT").
2022-11-22 16:56:49 +01:00
Christopher Faulet
ca7218aaf0 MINOR: http: Add function to detect default port
http_is_default_port() can be used to test if a port is a default HTTP/HTTPS
port. A scheme may be specified. In this case, it is used to detect defaults
ports, 80 for "http://" and 443 for "https://". Otherwise, with no scheme, both
are considered as default ports.
2022-07-06 17:54:03 +02:00
Christopher Faulet
658f971621 MINOR: http: Add function to get port part of a host
http_get_host_port() function can be used to get the port part of a host. It
will be used to get the port of an uri authority or a host header
value. This function only look for a port starting from the end of the
host. It is the caller responsibility to call it with a valid host value. An
indirect string is returned.
2022-07-06 17:54:03 +02:00
Willy Tarreau
1ba30167a0 MEDIUM: h1: enlarge the scope of accepted version chars with accept-invalid-http-request
We used to support both RTSP and HTTP protocol version names with and
without accept-invalid-http-request, but since this is based on the
characters themselves, any protocol made of chars {0-9/.HPRST} was
possible and not others. Now that such non-standard protocols are
restricted to accept-invalid-http-request, there's no reason for not
allowing other letters. With this patch, characters {0-9./A-Z} are
permitted when the option is set.
2022-05-24 15:38:54 +02:00
Christopher Faulet
92cafb39e7 MINOR: http: Add 422-Unprocessable-Content error message
The last HTTP/1.1 draft adds the 422 status code in the list of client
errors. It normalizes the WebDav specific one (422-Unprocessable-Entity).
2021-09-28 16:21:25 +02:00
Willy Tarreau
d3d8d03d98 MINOR: http: add a new function http_validate_scheme() to validate a scheme
While http_parse_scheme() extracts a scheme from a URI by extracting
exactly the valid characters and stopping on delimiters, this new
function performs the same on a fixed-size string.
2021-08-17 10:16:22 +02:00
Amaury Denoyelle
c453f9547e MINOR: http: use http uri parser for path
Replace http_get_path by the http_uri_parser API. The new functions is
renamed http_parse_path. Replace duplicated code for scheme and
authority parsing by invocations to http_parse_scheme/authority.

If no scheme is found for an URI detected as an absolute-uri/authority,
consider it to be an authority format : no path will be found. For an
absolute-uri or absolute-path, use the remaining of the string as the
path. A new http_uri_parser state is declared to mark the path parsing
as done.
2021-07-08 17:11:17 +02:00
Amaury Denoyelle
69294b20ac MINOR: http: use http uri parser for authority
Replace http_get_authority by the http_uri_parser API.

The new function is renamed http_parse_authority. Replace duplicated
scheme parsing code by http_parse_scheme invocation. A new
http_uri_parser state is declared to mark the authority parsing as done.
2021-07-08 17:11:17 +02:00
Amaury Denoyelle
8ac8cbfd72 MINOR: http: use http uri parser for scheme
Replace http_get_scheme by the http_uri_parser API. The new function is
renamed http_parse_scheme. A new http_uri_parser state is declared to
mark the scheme parsing as completed.
2021-07-08 17:11:17 +02:00
Amaury Denoyelle
ef08811240 MINOR: http: implement http_get_scheme
This method can be used to retrieve the scheme part of an uri, with the
suffix '://'. It will be useful to implement scheme-based normalization.
2021-07-07 15:34:01 +02:00
Christopher Faulet
e095f31d36 MINOR: http: Add HTTP 501-not-implemented error message
Add the support for the 501-not-implemented status code with the
corresponding default message. The documentation is updated accordingly
because it is now part of status codes HAProxy may emit via an errorfile or
a deny/return HTTP action.
2021-01-21 15:21:12 +01:00
Thayne McCombs
8f0cc5c4ba CLEANUP: Fix spelling errors in comments
This is from the output of codespell. It's done at once over a bunch
of files and only affects comments, so there is nothing user-visible.
No backport needed.
2021-01-08 14:56:32 +01:00
Remi Tricot-Le Breton
56e46cb393 MINOR: http: Add helper functions to trim spaces and tabs
Add two helper functions that trim leading or trailing spaces and
horizontal tabs from an ist string.
2020-12-24 17:18:00 +01:00
Maciej Zdeb
dea7c209f8 BUG/MINOR: http-fetch: Extract cookie value even when no cookie name
HTTP sample fetches dealing with the cookies (req/res.cook,
req/res.cook_val and req/res.cook_cnt) must be prepared to be called
without cookie name. For the first two, the first cookie value is
returned, regardless its name. For the last one, all cookies are counted.

To do so, http_extract_cookie_value() may now be called with no cookie
name (cookie_name_l set to 0). In this case, the matching on the cookie
name is ignored and the first value found is returned.

Note this patch also fixes matching on cookie values in ACLs.

This should be backported in all stable versions.
2020-11-13 16:26:10 +01:00
Remi Tricot-Le Breton
bcced09b91 MINOR: http: Add etag comparison function
Add a function that compares two etags that might be of different types.
If any of them is weak, the 'W/' prefix is discarded and a strict string
comparison is performed.

Co-authored-by: Tim Duesterhus <tim@bastelstu.be>
2020-10-22 16:06:20 +02:00
Christopher Faulet
5563392554 BUG/MINOR: http: Fix content-length of the default 500 error
96 bytes is announce in the C-L header for a message of body of 97 bytes. This
bug was introduced by the patch 46a030cdd ("CLEANUP: assorted typo fixes in the
code and comments").

This patch must be backported in all versions where the patch above is (the 2.2
for now).
2020-10-09 10:02:09 +02:00
Ilya Shipitsin
46a030cdda CLEANUP: assorted typo fixes in the code and comments
This is 11th iteration of typo fixes
2020-07-06 14:34:32 +02:00
Anthonin Bonnefoy
85048f80c9 MINOR: http: Add support for http 413 status
Add 413 http "payload too large" status code. This will allow 413 to be
used in deny_status and errorfile.
2020-06-26 11:30:02 +02:00
Willy Tarreau
2c4dfaeff6 MINOR: http: do not close connections anymore after internal responses
Since we dropped support for legacy mode, it's not the stream which
deals with the connection but the mux, and there's no point in closing
the client connection after most internal status codes. For example if
the client gets a 401 or a 503 because a server doesn't respond, it
makes no sense forcing the connection to close after reporting this
status, because it's already done by the mux if the client asks for it
or is not compatible with keep-alive. This current state was inherited
from the early days but is still limiting the amount of client-side
connection reuse in a number of circumstances (typically server-side
errors). This change was planned for 2.1 but forgotten.

The status codes for which the connection is not closed anymore are those
that do not depend on the client side connection itself, which are all
except 400 and 408. This could be backported to 2.1 but not further, in
order to make sure legacy and HTX behave strictly similarly.
2020-06-16 17:41:32 +02:00
Willy Tarreau
48fbcae07c REORG: tools: split common/standard.h into haproxy/tools{,-t}.h
And also rename standard.c to tools.c. The original split between
tools.h and standard.h dates from version 1.3-dev and was mostly an
accident. This patch moves the files back to what they were expected
to be, and takes care of not changing anything else. However this
time tools.h was split between functions and types, because it contains
a small number of commonly used macros and structures (e.g. name_desc)
which in turn cause the massive list of includes of tools.h to conflict
with the callers.

They remain the ugliest files of the whole project and definitely need
to be cleaned and split apart. A few types are defined there only for
functions provided there, and some parts are even OS-specific and should
move somewhere else, such as the symbol resolution code.
2020-06-11 10:18:57 +02:00
Willy Tarreau
cd72d8c981 REORG: include: split common/http.h into haproxy/http{,-t}.h
So the enums and structs were placed into http-t.h and the functions
into http.h. This revealed that several files were dependeng on http.h
but not including it, as it was silently inherited via other files.
2020-06-11 10:18:57 +02:00
Willy Tarreau
4c7e4b7738 REORG: include: update all files to use haproxy/api.h or api-t.h if needed
All files that were including one of the following include files have
been updated to only include haproxy/api.h or haproxy/api-t.h once instead:

  - common/config.h
  - common/compat.h
  - common/compiler.h
  - common/defaults.h
  - common/initcall.h
  - common/tools.h

The choice is simple: if the file only requires type definitions, it includes
api-t.h, otherwise it includes the full api.h.

In addition, in these files, explicit includes for inttypes.h and limits.h
were dropped since these are now covered by api.h and api-t.h.

No other change was performed, given that this patch is large and
affects 201 files. At least one (tools.h) was already freestanding and
didn't get the new one added.
2020-06-11 10:18:42 +02:00
Christopher Faulet
0bac4cdf1a CLEANUP: http: Remove unused HTTP message templates
HTTP_1XX, HTTP_3XX and HTTP_4XX message templates are no longer used. Only
HTTP_302 and HTTP_303 are used during configuration parsing by "errorloc" family
directives. So these templates are removed from the generic http code. And
HTTP_302 and HTTP_303 templates are moved as static strings in the function
parsing "errorloc" directives.
2020-05-28 15:07:20 +02:00
Christopher Faulet
612f2eafe9 MINOR: http-ana: Use proxy's error replies to emit 401/407 responses
There is no reason to not use proxy's error replies to emit 401/407
responses. The function http_reply_40x_unauthorized(), responsible to emit those
responses, is not really complex. It only adds a
WWW-Authenticate/Proxy-Authenticate header to a generic message.

So now, error replies can be defined for 401 and 407 status codes, using
errorfile or http-error directives. When an http-request auth rule is evaluated,
the corresponding error reply is used. For 401 responses, all occurrences of the
WWW-Authenticate header are removed and replaced by a new one with a basic
authentication challenge for the configured realm. For 407 responses, the same
is done on the Proxy-Authenticate header. If the error reply must not be
altered, "http-request return" rule must be used instead.
2020-05-28 15:07:20 +02:00
Tim Duesterhus
241e29ef9c MINOR: ist: Add IST_NULL macro
`IST_NULL` is equivalent to an `struct ist` with `.ptr = NULL` and
`.len = 0`.
2020-03-05 19:52:07 +01:00
Willy Tarreau
02ac950a11 CLEANUP: http/h1: rely on HA_UNALIGNED_LE instead of checking for CPU families
Now that we have flags indicating the CPU's capabilities, better use
them instead of missing some updates for new CPU families (ARMv8 was
missing there).
2020-02-21 16:32:57 +01:00
Florian Tham
9205fea13a MINOR: http: Add 404 to http-request deny
This patch adds http status code 404 Not Found to http-request deny. See
issue #80.
2020-01-08 16:15:23 +01:00
Florian Tham
272e29b5cc MINOR: http: Add 410 to http-request deny
This patch adds http status code 410 Gone to http-request deny. See
issue #80.
2020-01-08 16:15:23 +01:00
Christopher Faulet
16fdc55f79 MINOR: http: Add a function to get the authority into a URI
The function http_get_authority() may be used to parse a URI and looks for the
authority, between the scheme and the path. An option may be used to skip the
user info (part before the '@'). Most of time, the user info will be ignored.
2019-10-09 11:05:31 +02:00
Christopher Faulet
341fac1eb2 MINOR: http: Add function to parse value of the header Status
It will be used by the mux FCGI to get the status a response.
2019-09-17 10:18:54 +02:00
Christopher Faulet
f734638976 MINOR: http: Don't store raw HTTP errors in chunks anymore
Default HTTP error messages are stored in an array of chunks. And since the HTX
was added, these messages are also converted in HTX and stored in another
array. But now, the first array is not used anymore because the legacy HTTP mode
was removed.

So now, only the array with the HTX messages are kept. The other one was
removed.
2019-07-19 09:46:23 +02:00
Willy Tarreau
b5ba2b0177 MINOR: http: turn default error files to HTTP/1.1
For quite a long time we've been saying that the default error files
should produce HTTP/1.1 responses and since it's of low importance, it
always gets forgotten.

So here it finally comes. Each status code now properly contains a
content-length header so that the output is clean and doesn't force
upstream proxies to switch to chunked encoding or to close the connection
immediately after the response, which is particularly annoying for 401
or 407 for example. It's worth noting that the 3xx codes had already
been turned to HTTP/1.1.

This patch will obviously not change anything for user-provided error files.
2019-06-11 16:37:13 +02:00
Willy Tarreau
8de1df92a3 BUILD: do not specify "const" on functions returning structs or scalars
Older compilers (like gcc-3.4) warn about the use of "const" on functions
returning a struct, which makes sense since the return may only be copied :

  include/common/htx.h:233: warning: type qualifiers ignored on function return type

Let's simply drop "const" here.
2019-04-15 21:55:48 +02:00
Christopher Faulet
a7b677cd0d MEDIUM: proto_htx: Convert all HTTP error messages into HTX
During startup, after the configuration parsing, all HTTP error messages
(errorloc, errorfile or default messages) are converted into HTX messages and
stored in dedicated buffers. We use it to return errors in the HTX analyzers
instead of using ugly OOB blocks.
2018-12-01 17:37:27 +01:00
Tim Duesterhus
3f024f3be5 CLEANUP: http: Fix typo in init_http's comment
It read "non-zero" where it should read zero.
2018-11-28 04:20:51 +01:00
Joseph Herlant
942eea3f5c CLEANUP: Fix typos in the http subsystem
Fix typos in code comment of the http subsystem.
2018-11-18 22:26:42 +01:00
Christopher Faulet
8277ca72b1 MINOR: http: Add standalone functions to parse a start-line or a header
These 2 functions are pretty naive. They only split a start-line into its 3
substrings or a header line into its name and value. Spaces before and after
each part are skipped. No CRLF at the end are expected.
2018-11-18 21:45:49 +01:00
Frédéric Lécaille
9ca51aa288 MINOR: http: Implement "early-hint" http request rules.
This patch implements http_apply_early_hint_rule() function is responsible of
building HTTP 103 Early Hint responses each time a "early-hint" rule is matched.
2018-11-12 21:08:55 +01:00