Commit Graph

99 Commits

Author SHA1 Message Date
Willy Tarreau
6204cd9f27 BUG/MAJOR: vars: always retrieve the stream and session from the sample
This is the continuation of previous patch called "BUG/MAJOR: samples:
check smp->strm before using it".

It happens that variables may have a session-wide scope, and that their
session is retrieved by dereferencing the stream. But nothing prevents them
from being used from a streamless context such as tcp-request connection,
thus crashing the process. Example :

    tcp-request connection accept if { src,set-var(sess.foo) -m found }

In order to fix this, we have to always ensure that variable manipulation
only happens via the sample, which contains the correct owner and context,
and that we never use one from a different source. This results in quite a
large change since a lot of functions are inderctly involved in the call
chain, but the change is easy to follow.

This fix must be backported to 1.6, and requires the last two patches.
2016-03-10 17:28:04 +01:00
Willy Tarreau
7560dd4b6a MINOR: sample: always set a new sample's owner before evaluating it
Some functions like sample_conv_var2smp(), var_get_byname(), and
var_set_byname() directly or indirectly need to access the current
stream and/or session and must find it in the sample itself and not
as a distinct argument. Thus we first need to call smp_set_owner()
prior to each such calls.
2016-03-10 16:42:58 +01:00
Willy Tarreau
1777ea63e0 MINOR: sample: add a new helper to initialize the owner of a sample
Since commit 6879ad3 ("MEDIUM: sample: fill the struct sample with the
session, proxy and stream pointers") merged in 1.6-dev2, the sample
contains the pointer to the stream and sample fetch functions as well
as converters use it heavily. This requires from a lot of call places
to initialize 4 fields, and it was even forgotten at a few places.

This patch provides a convenient helper to initialize all these fields
at once, making it easy to prepare a new sample from a previous one for
example.

A few call places were cleaned up to make use of it. It will be needed
by further fixes.

At one place in the Lua code, it was moved earlier because we used to
call sample casts with a non completely initialized sample, which is
not clean eventhough at the moment there are no consequences.
2016-03-10 16:42:58 +01:00
Dragan Dosen
0b85ecee53 MEDIUM: logs: add a new RFC5424 log-format for the structured-data
This patch adds a new RFC5424-specific log-format for the structured-data
that is automatically send by __send_log() when the sender is in RFC5424
mode.

A new statement "log-format-sd" should be used in order to set log-format
for the structured-data part in RFC5424 formatted syslog messages.
Example:

    log-format-sd [exampleSDID@1234\ bytes=\"%B\"\ status=\"%ST\"]
2015-09-28 14:01:27 +02:00
Thierry FOURNIER
136f9d34a9 MINOR: samples: rename union from "data" to "u"
The union name "data" is a little bit heavy while we read the source
code because we can read "data.data.sint". The rename from "data" to "u"
makes the read easiest like "data.u.sint".
2015-08-20 17:13:46 +02:00
Thierry FOURNIER
8c542cac07 MEDIUM: samples: Use the "struct sample_data" in the "struct sample"
This patch remove the struct information stored both in the struct
sample_data and in the striuct sample. Now, only thestruct sample_data
contains data, and the struct sample use the struct sample_data for storing
his own data.
2015-08-20 17:13:46 +02:00
Thierry FOURNIER
cc4d1716a2 MINOR: sample: Add ipv6 to ipv4 and sint to ipv6 casts
The RFC4291 says that when the IPv6 adress have the followin form:
0000::ffff:a.b.c.d, if can be converted to an IPv4 adress. This patch
enable this conversion in casts.

As the sint can be casted as ipv4, and ipv4 can be casted as ipv6, we
can directly cast sint as ipv6 using the RFC4291.
2015-08-11 14:14:10 +02:00
Thierry FOURNIER
5d86fae234 MEDIUM: vars/sample: operators can use variables as parameter
This patch allow the existing operators to take a variable as parameter.
This is useful to add the content of two variables. This patch modify
the behavior of operators.
2015-07-22 00:48:24 +02:00
Thierry FOURNIER
00c005c726 MEDIUM: sample: switch to saturated arithmetic
This patch check calculus for overflow and returns capped values.
This permits to protect against integer overflow in certain operations
involving ratios, percentages, limits or anything. That can sometimes
be critically important with some operations (eg: content-length < X).
2015-07-22 00:48:24 +02:00
Thierry FOURNIER
bf65cd4d77 MAJOR: arg: converts uint and sint in sint
This patch removes the 32 bits unsigned integer and the 32 bit signed
integer. It replaces these types by a unique type 64 bit signed.
2015-07-22 00:48:23 +02:00
Thierry FOURNIER
07ee64ef4d MAJOR: sample: converts uint and sint in 64 bits signed integer
This patch removes the 32 bits unsigned integer and the 32 bit signed
integer. It replaces these types by a unique type 64 bit signed.

This makes easy the usage of integer and clarify signed and unsigned use.
With the previous version, signed and unsigned are used ones in place of
others, and sometimes the converter loose the sign. For example, divisions
are processed with "unsigned", if one entry is negative, the result is
wrong.

Note that the integer pattern matching and dotted version pattern matching
are already working with signed 64 bits integer values.

There is one user-visible change : the "uint()" and "sint()" sample fetch
functions which used to return a constant integer have been replaced with
a new more natural, unified "int()" function. These functions were only
introduced in the latest 1.6-dev2 so there's no impact on regular
deployments.
2015-07-22 00:48:23 +02:00
Thierry FOURNIER
fac9ccfb70 BUG/MINOR: http/sample: gmtime/localtime can fail
The man said that gmtime() and localtime() can return a NULL value.
This is not tested. It appears that all the values of a 32 bit integer
are valid, but it is better to check the return of these functions.

However, if the integer move from 32 bits to 64 bits, some 64 values
can be unsupported.
2015-07-20 12:21:35 +02:00
Willy Tarreau
28d976d5ee MINOR: args: add new context for servers
We'll have to support fetch expressions and args on server lines for
"usesrc", "usedst", "sni", etc...
2015-07-09 11:39:33 +02:00
Adis Nezirovic
79beb248b9 CLEANUP: sample: generalize sample_fetch_string() as sample_fetch_as_type()
This modification makes possible to use sample_fetch_string() in more places,
where we might need to fetch sample values which are not plain strings. This
way we don't need to fetch string, and convert it into another type afterwards.

When using aliased types, the caller should explicitly check which exact type
was returned (e.g. SMP_T_IPV4 or SMP_T_IPV6 for SMP_T_ADDR).

All usages of sample_fetch_string() are converted to use new function.
2015-07-06 16:17:25 +02:00
Dragan Dosen
93b38d9191 MEDIUM: 51Degrees code refactoring and cleanup
Moved 51Degrees code from src/haproxy.c, src/sample.c and src/cfgparse.c
into a separate files src/51d.c and include/import/51d.h.

Added two new functions init_51degrees() and deinit_51degrees(), updated
Makefile and other code reorganizations related to 51Degrees.
2015-06-30 10:43:03 +02:00
Thierry FOURNIER
cc103299c7 MINOR: samples: add samples which returns constants
This patch adds sample which returns constants values. This is useful
for intialising variables.
2015-06-13 23:01:37 +02:00
Thierry FOURNIER
9687c77c91 MINOR: debug: add a special converter which display its input sample content.
This converter displays its input sample type and content. It is useful
for debugging some complex configurations.
2015-06-13 23:01:36 +02:00
Thierry FOURNIER
9c627e84b2 MEDIUM: sample: Add type any
This type is used to accept any type of sample as input, and prevent
any automatic "cast". It runs like the type "ADDR" which accept the
type "IPV4" and "IPV6".
2015-06-13 22:59:14 +02:00
Thierry FOURNIER
0f811440d5 BUG/MINOR: sample: wrong conversion of signed values
The signed values are casted as unsigned before conversion. This patch
use the good converters according with the sample type.

Note: it depends on previous patch to parse signed ints.
2015-06-13 22:59:14 +02:00
Thierry FOURNIER
4c2479e1c4 BUG/MINOR: debug: display (null) in place of "meth"
The array which contains names of types, miss the METH entry.

[wt: should be backported to 1.5 as well]
2015-06-09 10:58:14 +02:00
Thomas Holmes
4d441a759c MEDIUM: sample: add trie support to 51Degrees
Trie or pattern algorithm is used depending on what 51Degrees source
files are provided to MAKE.
2015-06-02 19:30:53 +02:00
Thomas Holmes
951d44d24d MEDIUM: sample: add fiftyone_degrees converter.
It takes up to 5 string arguments that are to be 51Degrees property names.
It will then create a chunk with values detected based on the request header
supplied (this should be the User-Agent).
2015-06-02 14:00:25 +02:00
Willy Tarreau
f63386ad27 CLEANUP: da: move the converter registration to da.c
There's no reason to put it into sample.c, it's better to register it
locally in da.c, it removes a number of ifdefs and exports.
2015-06-02 13:42:12 +02:00
David Carlier
4542b10ae1 MEDIUM: sample: add the da-csv converter
This diff declares the deviceatlas module and can accept up to 5
property names for the API lookup.

[wt: this should probably be moved to its own file using the keyword
      registration mechanism]
2015-06-02 13:24:50 +02:00
Willy Tarreau
e2dc1fa8ca MEDIUM: stick-table: remove the now duplicate find_stktable() function
Since proxy_tbl_by_name() already does the same job, let's not keep
duplicate functions and use this one only.
2015-05-26 12:08:07 +02:00
Willy Tarreau
9e0bb1013e CLEANUP: proxy: make the proxy lookup functions more user-friendly
First, findproxy() was renamed proxy_find_by_name() so that its explicit
that a name is required for the lookup. Second, we give this function
the ability to search for tables if needed. Third we now provide inline
wrappers to pass the appropriate PR_CAP_* flags and to explicitly look
up a frontend, backend or table.
2015-05-26 11:24:42 +02:00
Thierry FOURNIER
0786d05a04 MEDIUM: sample: change the prototype of sample-fetches functions
This patch removes the "opt" entry from the prototype of the
sample-fetches fucntions. This permits to remove some weight
in the prototype call.
2015-05-11 20:03:08 +02:00
Thierry FOURNIER
1d33b882d2 MINOR: sample: fill the struct sample with the options.
Options are relative to the sample. Each sample fetched is associated with
fetch options or fetch flags.

This patch adds the 'opt' vaue in the sample struct. This permits to reduce
the sample-fetch function prototype. In other way, the converters will have
more detail about the origin of the sample.
2015-05-11 20:02:11 +02:00
Thierry FOURNIER
0a9a2b8cec MEDIUM: sample change the prototype of sample-fetches and converters functions
This patch removes the structs "session", "stream" and "proxy" from
the sample-fetches and converters function prototypes.

This permits to remove some weight in the prototype call.
2015-05-11 20:01:42 +02:00
Thierry FOURNIER
6879ad31a5 MEDIUM: sample: fill the struct sample with the session, proxy and stream pointers
Some sample analyzer (sample-fetch or converters) needs to known the proxy,
session and stream attached to the sampel. The sample-fetches and the converters
function pointers cannot be called without these 3 pointers filled.

This patch permits to reduce the sample-fetch and the converters called
prototypes, and provides a new mean to add information for this type of
functions.
2015-05-11 20:00:03 +02:00
Willy Tarreau
192252e2d8 MAJOR: sample: pass a pointer to the session to each sample fetch function
Many such function need a session, and till now they used to dereference
the stream. Once we remove the stream from the embryonic session, this
will not be possible anymore.

So as of now, sample fetch functions will be called with this :

   - sess = NULL,  strm = NULL                     : never
   - sess = valid, strm = NULL                     : tcp-req connection
   - sess = valid, strm = valid, strm->txn = NULL  : tcp-req content
   - sess = valid, strm = valid, strm->txn = valid : http-req / http-res
2015-04-06 11:37:25 +02:00
Willy Tarreau
15e91e1b36 MAJOR: sample: don't pass l7 anymore to sample fetch functions
All of them can now retrieve the HTTP transaction *if it exists* from
the stream and be sure to get NULL there when called with an embryonic
session.

The patch is a bit large because many locations were touched (all fetch
functions had to have their prototype adjusted). The opportunity was
taken to also uniformize the call names (the stream is now always "strm"
instead of "l4") and to fix indent where it was broken. This way when
we later introduce the session here there will be less confusion.
2015-04-06 11:35:53 +02:00
Willy Tarreau
87b09668be REORG/MAJOR: session: rename the "session" entity to "stream"
With HTTP/2, we'll have to support multiplexed streams. A stream is in
fact the largest part of what we currently call a session, it has buffers,
logs, etc.

In order to catch any error, this commit removes any reference to the
struct session and tries to rename most "session" occurrences in function
names to "stream" and "sess" to "strm" when that's related to a session.

The files stream.{c,h} were added and session.{c,h} removed.

The session will be reintroduced later and a few parts of the stream
will progressively be moved overthere. It will more or less contain
only what we need in an embryonic session.

Sample fetch functions and converters will have to change a bit so
that they'll use an L5 (session) instead of what's currently called
"L4" which is in fact L6 for now.

Once all changes are completed, we should see approximately this :

   L7 - http_txn
   L6 - stream
   L5 - session
   L4 - connection | applet

There will be at most one http_txn per stream, and a same session will
possibly be referenced by multiple streams. A connection will point to
a session and to a stream. The session will hold all the information
we need to keep even when we don't yet have a stream.

Some more cleanup is needed because some code was already far from
being clean. The server queue management still refers to sessions at
many places while comments talk about connections. This will have to
be cleaned up once we have a server-side connection pool manager.
Stream flags "SN_*" still need to be renamed, it doesn't seem like
any of them will need to move to the session.
2015-04-06 11:23:56 +02:00
Thierry FOURNIER
8fd1376014 MINOR: converters: add function to browse converters
This patch adds a fucntion to browse each converter. This
is used with Lua for using the converters with a wrapper.
2015-03-11 19:55:10 +01:00
Thierry FOURNIER
4d9a1d1a5c MINOR: sample: add function for browsing samples.
This function is useful with the incoming lua functions.
2015-02-28 23:12:32 +01:00
Thierry FOURNIER
f41a809dc9 MINOR: sample: add private argument to the struct sample_fetch
The add of this private argument is to prepare the integration
of the lua fetchs.
2015-02-28 23:12:31 +01:00
Thierry FOURNIER
68a556e282 MINOR: converters: give the session pointer as converter argument
Some usages of the converters need to know the attached session. The Lua
needs the session for retrieving his running context. This patch adds
the "session" as an argument of the converters prototype.
2015-02-28 23:12:31 +01:00
Thierry FOURNIER
1edc971919 MINOR: converters: add a "void *private" argument to converters
This permits to store specific configuration pointer. It is useful
with future Lua integration.
2015-02-28 23:12:31 +01:00
Willy Tarreau
9770787e70 MEDIUM: samples: provide basic arithmetic and bitwise operators
This commit introduces a new category of converters. They are bitwise and
arithmetic operators which support performing basic operations on integers.
Some bitwise operations are supported (and, or, xor, cpl) and some arithmetic
operations are supported (add, sub, mul, div, mod, neg). Some comparators
are provided (odd, even, not, bool) which make it possible to report a match
without having to write an ACL.

The detailed list of new operators as they appear in the doc is :

add(<value>)
  Adds <value> to the input value of type unsigned integer, and returns the
  result as an unsigned integer.

and(<value>)
  Performs a bitwise "AND" between <value> and the input value of type unsigned
  integer, and returns the result as an unsigned integer.

bool
  Returns a boolean TRUE if the input value of type unsigned integer is
  non-null, otherwise returns FALSE. Used in conjunction with and(), it can be
  used to report true/false for bit testing on input values (eg: verify the
  presence of a flag).

cpl
  Takes the input value of type unsigned integer, applies a twos-complement
  (flips all bits) and returns the result as an unsigned integer.

div(<value>)
  Divides the input value of type unsigned integer by <value>, and returns the
  result as an unsigned integer. If <value> is null, the largest unsigned
  integer is returned (typically 2^32-1).

even
  Returns a boolean TRUE if the input value of type unsigned integer is even
  otherwise returns FALSE. It is functionally equivalent to "not,and(1),bool".

mod(<value>)
  Divides the input value of type unsigned integer by <value>, and returns the
  remainder as an unsigned integer. If <value> is null, then zero is returned.

mul(<value>)
  Multiplies the input value of type unsigned integer by <value>, and returns
  the product as an unsigned integer. In case of overflow, the higher bits are
  lost, leading to seemingly strange values.

neg
  Takes the input value of type unsigned integer, computes the opposite value,
  and returns the remainder as an unsigned integer. 0 is identity. This
  operator is provided for reversed subtracts : in order to subtract the input
  from a constant, simply perform a "neg,add(value)".

not
  Returns a boolean FALSE if the input value of type unsigned integer is
  non-null, otherwise returns TRUE. Used in conjunction with and(), it can be
  used to report true/false for bit testing on input values (eg: verify the
  absence of a flag).

odd
  Returns a boolean TRUE if the input value of type unsigned integer is odd
  otherwise returns FALSE. It is functionally equivalent to "and(1),bool".

or(<value>)
  Performs a bitwise "OR" between <value> and the input value of type unsigned
  integer, and returns the result as an unsigned integer.

sub(<value>)
  Subtracts <value> from the input value of type unsigned integer, and returns
  the result as an unsigned integer. Note: in order to subtract the input from
  a constant, simply perform a "neg,add(value)".

xor(<value>)
  Performs a bitwise "XOR" (exclusive OR) between <value> and the input value
  of type unsigned integer, and returns the result as an unsigned integer.
2015-01-27 15:41:13 +01:00
Willy Tarreau
d817e468bf BUG/MINOR: sample: fix case sensitivity for the regsub converter
Two commits ago in 7eda849 ("MEDIUM: samples: add a regsub converter to
perform regex-based transformations"), I got caught for the second time
with the inverted case sensitivity usage of regex_comp(). So by default
it is case insensitive and passing the "i" flag makes it case sensitive.
I forgot to recheck that case before committing the cleanup. No harm
anyway, nobody had the time to use it.
2015-01-23 20:27:41 +01:00
Willy Tarreau
7eda849dce MEDIUM: samples: add a regsub converter to perform regex-based transformations
We can now replace matching regex parts with a string, a la sed. Note
that there are at least 3 different behaviours for existing sed
implementations when matching 0-length strings. Here is the result
of the following operation on each implementationt tested :

  echo 'xzxyz' | sed -e 's/x*y*/A/g'

  GNU sed 4.2.1       => AzAzA
  Perl's sed 5.16.1   => AAzAAzA
  Busybox v1.11.2 sed => AzAz

The psed behaviour was adopted because it causes the least exceptions
in the code and seems logical from a certain perspective :

  - "x"  matches x*y*  => add "A" and skip "x"
  - "z"  matches x*y*  => add "A" and keep "z", not part of the match
  - "xy" matches x*y*  => add "A" and skip "xy"
  - "z"  matches x*y*  => add "A" and keep "z", not part of the match
  - ""   matches x*y*  => add "A" and stop here

Anyway, given the incompatibilities between implementations, it's unlikely
that some processing will rely on this behaviour.

There currently is one big limitation : the configuration parser makes it
impossible to pass commas or closing parenthesis (or even closing brackets
in log formats). But that's still quite usable to replace certain characters
or character sequences. It will become more complete once the config parser
is reworked.
2015-01-22 14:24:53 +01:00
Willy Tarreau
469477879c MINOR: args: implement a new arg type for regex : ARGT_REG
This one will be used when a regex is expected. It is automatically
resolved after the parsing and compiled into a regex. Some optional
flags are supported in the type-specific flags that should be set by
the optional arg checker. One is used during the regex compilation :
ARGF_REG_ICASE to ignore case.
2015-01-22 14:24:53 +01:00
Willy Tarreau
8059977d3e MINOR: samples: provide a "crc32" converter
This converter hashes a binary input sample into an unsigned 32-bit quantity
using the CRC32 hash function. Optionally, it is possible to apply a full
avalanche hash function to the output if the optional <avalanche> argument
equals 1. This converter uses the same functions as used by the various hash-
based load balancing algorithms, so it will provide exactly the same results.
It is provided for compatibility with other software which want a CRC32 to be
computed on some input keys, so it follows the most common implementation as
found in Ethernet, Gzip, PNG, etc... It is slower than the other algorithms
but may provide a better or at least less predictable distribution.
2015-01-20 19:48:08 +01:00
Vincent Bernat
1228dc0e7a BUG/MEDIUM: sample: fix random number upper-bound
random() will generate a number between 0 and RAND_MAX. POSIX mandates
RAND_MAX to be at least 32767. GNU libc uses (1<<31 - 1) as
RAND_MAX.

In smp_fetch_rand(), a reduction is done with a multiply and shift to
avoid skewing the results. However, the shift was always 32 and hence
the numbers were not distributed uniformly in the specified range. We
fix that by dividing by RAND_MAX+1. gcc is smart enough to turn that
into a shift:

    0x000000000046ecc8 <+40>:    shr    $0x1f,%rax
2014-12-10 22:45:34 +01:00
Emeric Brun
c9a0f6d023 MINOR: samples: add the word converter.
word(<index>,<delimiters>)
  Extracts the nth word considering given delimiters from an input string.
  Indexes start at 1 and delimiters are a string formatted list of chars.
2014-11-25 14:48:39 +01:00
Emeric Brun
f399b0debf MINOR: samples: adds the field converter.
field(<index>,<delimiters>)
  Extracts the substring at the given index considering given delimiters from
  an input string. Indexes start at 1 and delimiters are a string formatted
  list of chars.
2014-11-24 17:44:02 +01:00
Emeric Brun
54c4ac8417 MINOR: samples: adds the bytes converter.
bytes(<offset>[,<length>])
  Extracts a some bytes from an input binary sample. The result is a
  binary sample starting at an offset (in bytes) of the original sample
  and optionnaly truncated at the given length.
2014-11-24 17:44:02 +01:00
Willy Tarreau
0f30d26dbf MINOR: sample: add a few basic internal fetches (nbproc, proc, stopping)
Sometimes, either for debugging or for logging we'd like to have a bit
of information about the running process. Here are 3 new fetches for this :

nbproc : integer
  Returns an integer value corresponding to the number of processes that were
  started (it equals the global "nbproc" setting). This is useful for logging
  and debugging purposes.

proc : integer
  Returns an integer value corresponding to the position of the process calling
  the function, between 1 and global.nbproc. This is useful for logging and
  debugging purposes.

stopping : boolean
  Returns TRUE if the process calling the function is currently stopping. This
  can be useful for logging, or for relaxing certain checks or helping close
  certain connections upon graceful shutdown.
2014-11-24 17:44:02 +01:00
Emeric Brun
4b9e80268e BUG/MINOR: samples: fix unnecessary memcopy converting binary to string. 2014-11-24 17:44:02 +01:00
Thierry FOURNIER
317e1c4f1e MINOR: sample: add "json" converter
This converter escapes string to use it as json/ascii escaped string.
It can read UTF-8 with differents behavior on errors and encode it in
json/ascii.

json([<input-code>])
  Escapes the input string and produces an ASCII ouput string ready to use as a
  JSON string. The converter tries to decode the input string according to the
  <input-code> parameter. It can be "ascii", "utf8", "utf8s", "utf8"" or
  "utf8ps". The "ascii" decoder never fails. The "utf8" decoder detects 3 types
  of errors:
   - bad UTF-8 sequence (lone continuation byte, bad number of continuation
     bytes, ...)
   - invalid range (the decoded value is within a UTF-8 prohibited range),
   - code overlong (the value is encoded with more bytes than necessary).

  The UTF-8 JSON encoding can produce a "too long value" error when the UTF-8
  character is greater than 0xffff because the JSON string escape specification
  only authorizes 4 hex digits for the value encoding. The UTF-8 decoder exists
  in 4 variants designated by a combination of two suffix letters : "p" for
  "permissive" and "s" for "silently ignore". The behaviors of the decoders
  are :
   - "ascii"  : never fails ;
   - "utf8"   : fails on any detected errors ;
   - "utf8s"  : never fails, but removes characters corresponding to errors ;
   - "utf8p"  : accepts and fixes the overlong errors, but fails on any other
                error ;
   - "utf8ps" : never fails, accepts and fixes the overlong errors, but removes
                characters corresponding to the other errors.

  This converter is particularly useful for building properly escaped JSON for
  logging to servers which consume JSON-formated traffic logs.

  Example:
     capture request header user-agent len 150
     capture request header Host len 15
     log-format {"ip":"%[src]","user-agent":"%[capture.req.hdr(1),json]"}

  Input request from client 127.0.0.1:
     GET / HTTP/1.0
     User-Agent: Very "Ugly" UA 1/2

  Output log:
     {"ip":"127.0.0.1","user-agent":"Very \"Ugly\" UA 1\/2"}
2014-10-26 06:41:12 +01:00