haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-09 16:47:18 +02:00

Author	SHA1	Message	Date
Willy Tarreau	9824f8c890	MINOR: buffer: add br_single() to check if a buffer ring has more than one buf It's cheaper and cleaner than using br_count()==1 given that it just compares two indexes, and that a ring having a single buffer is in a special case where it is between empty and used up-to-1. In other words it's not congested.	2023-03-16 18:45:46 +01:00
Willy Tarreau	e5a26eb2de	MINOR: buffer: add br_count() to return the number of allocated bufs We have no way to know how many buffers are currently allocated in a buffer ring. Let's add br_count() for this.	2023-03-16 18:45:46 +01:00
Christopher Faulet	3a7b539b12	BUG/MEDIUM: connection: Preserve flags when a conn is removed from an idle list The commit `5e1b0e7bf` ("BUG/MEDIUM: connection: Clear flags when a conn is removed from an idle list") introduced a regression. CO_FL_SAFE_LIST and CO_FL_IDLE_LIST flags are used when the connection is released to properly decrement used/idle connection counters. if a connection is idle, these flags must be preserved till the connection is really released. It may be removed from the list but not immediately released. If these flags are lost when it is finally released, the current number of used connections is erroneously decremented. If means this counter may become negative and the counters tracking the number of idle connecitons is not decremented, suggesting a leak. So, the above commit is reverted and instead we improve a bit the way to detect an idle connection. The function conn_get_idle_flag() must now be used to know if a connection is in an idle list. It returns the connection flag corresponding to the idle list if the connection is idle (CO_FL_SAFE_LIST or CO_FL_IDLE_LIST) or 0 otherwise. But if the connection is scheduled to be removed, 0 is also returned, regardless the connection flags. This new function is used when the connection is temporarily removed from the list to be used, mainly in muxes. This patch should fix #2078 and #2057. It must be backported as far as 2.2.	2023-03-16 15:34:20 +01:00
Remi Tricot-Le Breton	a6c0a59e9a	MINOR: ssl: Use ocsp update task for "update ssl ocsp-response" command Instead of having a dedicated httpclient instance and its own code decorrelated from the actual auto update one, the "update ssl ocsp-response" will now use the update task in order to perform updates. Since the cli command allows to update responses that were never included in the auto update tree, a new flag was added to the certificate_ocsp structure so that the said entry can be inserted into the tree "by hand" and it won't be reinserted back into the tree after the update process is performed. The 'update_once' flag "stole" a bit from the 'fail_count' counter since it is the one less likely to reach UINT_MAX among the ocsp counters of the certificate_ocsp structure. This new logic required that every certificate_ocsp entry contained all the ocsp-related information at all time since entries that are not supposed to be configured automatically can still be updated through the cli. The logic of the ssl_sock_load_ocsp was changed accordingly.	2023-03-14 11:07:32 +01:00
Willy Tarreau	8f6da64641	MINOR: quic_sock: un-statify quic_conn_sock_fd_iocb() This one is printed as the iocb in the "show fd" output, and arguably this wasn't very convenient as-is: 293 : st=0x000123(cl heopI W:sRa R:sRA) ref=0 gid=1 tmask=0x8 umask=0x0 prmsk=0x8 pwmsk=0x0 owner=0x7f488487afe0 iocb=0x50a2c0(main+0x60f90) Let's unstatify it and export it so that the symbol can now be resolved from the various points that need it.	2023-03-10 14:30:01 +01:00
William Lallemand	2078d4b1f7	BUG/MINOR: mworker: use MASTER_MAXCONN as default maxconn value In environments where SYSTEM_MAXCONN is defined when compiling, the master will use this value instead of the original minimal value which was set to 100. When this happens, the master process could allocate RAM excessively since it does not need to have an high maxconn. (For example if SYSTEM_MAXCONN was set to 100000 or more) This patch fixes the issue by using the new define MASTER_MAXCONN which define a default maxconn of 100 for the master process. Must be backported as far as 2.5.	2023-03-09 14:28:44 +01:00
Willy Tarreau	cd8914bc52	BUG/MAJOR: fd/threads: close a race on closing connections after takeover As mentioned in commit `237e6a0d6` ("BUG/MAJOR: fd/thread: fix race between updates and closing FD"), a race was found during stress tests involving heavy backend connection reuse with many competing closes. Here the problem is complex. The analysis in commit `f69fea64e` ("MAJOR: fd: get rid of the DWCAS when setting the running_mask") that removed the DWCAS in 2.5 overlooked a few races. First, a takeover from thread1 could happen just after fd_update_events() in thread2 validates it holds the tmask bit in the CAS loop. Since thread1 releases running_mask after the operation, thread2 will succeed the CAS and both will believe the FD is theirs. This does explain the occasional crashes seen with h1_io_cb() being called on a bad context, or sock_conn_iocb() seeing conn->subs vanish after checking it. This issue can be addressed using a DWCAS in both fd_takeover() and fd_update_events() as it was before the patch above but this is not portable to all archs and is not easy to adapt for those lacking it, due to some operations still happening only on individual masks after the thread groups were added. Second, the checks after fd_clr_running() for the current thread being the last one is not sufficient: at the exact moment the operation completes, another thread may also set and drop the running bit and see itself as alone, and both can call _fd_close_orphan() in parallel. In order to prevent this from happening, we cannot rely on the absence of others, we need an explicit flag indicating that the FD must be closed. One approach that was attempted consisted in playing with the thread_mask but that was not reliable since it could still match between the late deletion and the early insertion that follows. Instead, a new FD flag was added, FD_MUST_CLOSE, that exactly indicates that the call to _fd_delete_orphan() must be done. It is set by fd_delete(), and atomically cleared by the first one which checks it, and which is the only one to call _fd_delete_orphan(). With both points addressed, there's no more visible race left: - takeover() only happens under the connection list's lock and cannot compete with fd_delete() since fd_delete() must first remove the connection from the list before deleting the FD. That's also why it doesn't need to call _fd_delete_orphan() when dropping its running bit. - takeover() sets its running bit then atomically replaces the thread mask, so that until that's done, it doesn't validate the condition to end the synchonization loop in fd_update_events(). Once it's OK, the previous thread's bit is lost, and this is checked for in fd_update_events() - fd_update_events() can compete with fd_delete() at various places which are explained above. Since fd_delete() clears the thread mask as after setting its running bit and after setting the FD_MUST_CLOSE bit, the synchronization loop guarantees that the thread mask is seen before going further, and that once it's seen, the FD_MUST_CLOSE flag is already present. - fd_delete() may start while fd_update_events() has already started, but fd_delete() must hold a bit in thread_mask before starting, and that is checked by the first test in fd_update_events() before setting the running_mask. - the poller's _update_fd() will not compete against _fd_delete_orphan() nor fd_insert() thanks to the fd_grab_tgid() that's always done before updating the polled_mask, and guarantees that we never pretend that a polled_mask has a bit before the FD is added. The issue is very hard to reproduce and is extremely time-sensitive. Some tests were required with a 1-ms timeout with request rates closely matching 1 kHz per server, though certain tests sometimes benefitted from saturation. It was found that adding the following slowdown at a few key places helped a lot and managed to trigger the bug in 0.5 to 5 seconds instead of tens of minutes on a 20-thread setup: { volatile int i = 10000; while (i--); } Particularly, placing it at key places where only one of running_mask or thread_mask is set and not the other one yet (e.g. after the synchronization loop in fd_update_events or after dropping the running bit) did yield great results. Many thanks to Olivier Houchard for this expert help analysing these races and reviewing candidate fixes. The patch must be backported to 2.5. Note that 2.6 does not have tgid in FDs, and that it requires a change of output on fd_clr_running() as we need the previous bit. This is provided by carefully backporting commit `d6e1987612` ("MINOR: fd: make fd_clr_running() return the previous value instead"). Tests have shown that the lack of tgid is a showstopper for 2.6 and that unless a better workaround is found, it could still be preferable to backport the minimum pieces required for fd_grab_tgid() to 2.6 so that it stays stable long.	2023-03-09 14:01:48 +01:00
Frédéric Lécaille	cc101cd2aa	BUG/MINOR: quic: Wrong RETIRE_CONNECTION_ID sequence number check This bug arrived with this commit: b5a8020e9 MINOR: quic: RETIRE_CONNECTION_ID frame handling (RX) and was revealed by h3 interop tests with clients like s2n-quic and quic-go as noticed by Amaury. Indeed, one must check that the CID matching the sequence number provided by a received RETIRE_CONNECTION_ID frame does not match the DCID of the packet. Remove useless ->curr_cid_seq_num member from quic_conn struct. The sequence number lookup must be done in qc_handle_retire_connection_id_frm() to check the validity of the RETIRE_CONNECTION_ID frame, it returns the CID to be retired into <cid_to_retire> variable passed as parameter to this function if the frame is valid and if the CID was not already retired Must be backported to 2.7.	2023-03-08 14:53:12 +01:00
Amaury Denoyelle	5907fede87	MEDIUM: quic: release closing connections on stopping Since the following commit : commit `fb375574f9` MINOR: quic: mark quic-conn as jobs on socket allocation quic-conn instances are marked as jobs. This prevent haproxy process to stop while there is transfer in progress. To not delay process termination, idle connections are woken up through their MUX instances to be able to release them immediately. However, there is no mechanism to wake up quic connections left on closing or draining state. This means that haproxy process termination is delayed until every closing quic connections timer has expired. To improve this, a new function quic_handle_stopping() is called when haproxy process is stopping. It simply wakes up the idle timer task of all connections in the global closing list. These connections will thus be released immediately to not interrupt haproxy process stopping. This should be backported up to 2.7.	2023-03-08 14:41:28 +01:00
Amaury Denoyelle	efed86c973	MINOR: quic: create a global list dedicated for closing QUIC conns When a CONNECTION_CLOSE is emitted or received, a QUIC connection enters respectively in draining or closing state. These states are a loose equivalent of TCP TIME_WAIT. No data can be exchanged anymore but the connection is maintained during a certain timer to handle packet reordering or loss. A new global list has been defined for QUIC connections in closing/draining state inside thread_ctx structure. Each time a connection enters in one of this state, it will be moved from the default global list to the new closing list. The objective of this patch is to quickly filter connections on closing/draining. Most notably, this will be used to wake up these connections and avoid that haproxy process stopping is delayed by them. A dedicated function qc_detach_th_ctx_list() has been implemented to transfer a quic-conn from one list instance to the other. This takes care of back-references attach to a quic-conn instance in case of a running "show quic". This should be backported up to 2.7.	2023-03-08 14:39:48 +01:00
Frédéric Lécaille	5e3201ea77	MINOR: quic: Add transport parameters to "show quic" Modify quic_transport_params_dump() and others function relative to the transport parameters value dump from TRACE() to make their output more compact. Add call to quic_transport_params_dump() to dump the transport parameters from "show quic" CLI command. Must be backported to 2.7.	2023-03-08 08:50:54 +01:00
Frédéric Lécaille	ece86e64c4	MINOR: quic: Add spin bit support Add QUIC_FL_RX_PACKET_SPIN_BIT new RX packet flag to mark an RX packet as having the spin bit set. Idem for the connection with QUIC_FL_CONN_SPIN_BIT flag. Implement qc_handle_spin_bit() to set/unset QUIC_FL_CONN_SPIN_BIT for the connection as soon as a packet number could be deciphered. Modify quic_build_packet_short_header() to set the spin bit when building a short packet header. Validated by quic-tracker spin bit test. Must be backported to 2.7.	2023-03-08 08:50:54 +01:00
Frédéric Lécaille	8ac8a8778d	MINOR: quic: RETIRE_CONNECTION_ID frame handling (RX) Add ->curr_cid_seq_num new quic_conn struct frame to store the connection ID sequence number currently used by the connection. Implement qc_handle_retire_connection_id_frm() to handle this RX frame. Implement qc_retire_connection_seq_num() to remove a connection ID from its sequence number. Implement qc_build_new_connection_id_frm to allocate a new NEW_CONNECTION_ID frame from a CID. Modify qc_parse_pkt_frms() which parses the frames of an RX packet to handle the case of the RETIRE_CONNECTION_ID frame. Must be backported to 2.7.	2023-03-08 08:50:54 +01:00
Frédéric Lécaille	b4c5471425	MINOR: quic: Store the next connection IDs sequence number in the connection Add ->next_cid_seq_num new member to quic_conn struct to store the next connection ID to be used to alloacated a connection ID. It is initialized to 0 from qc_new_conn() which initializes a connection. Modify new_quic_cid() to use this variable each time it is called without giving the possibility to the caller to pass the sequence number for the connection to be allocated. Modify quic_build_post_handshake_frames() to use ->next_cid_seq_num when building NEW_CONNECTION_ID frames after the hanshake has been completed. Limit the number of connection IDs provided to the peer to the minimum between 4 and the value it sent with active_connection_id_limit transport parameter. This includes the connection ID used by the connection to send this new connection IDs. Must be backported to 2.7.	2023-03-08 08:50:54 +01:00
Frédéric Lécaille	51a7caf921	MINOR: quic: Add traces about QUIC TLS key update Dump the secret used to derive the next one during a key update initiated by the client and dump the resulted new secret and the new key and iv to be used to decryption Application level packets. Also add a trace when the key update is supposed to be initiated on haproxy side. This has already helped in diagnosing an issue evealed by the key update interop test with xquic as client. Must be backported to 2.7.	2023-03-03 19:12:26 +01:00
Amaury Denoyelle	c8a0efbda8	BUG/MEDIUM: quic: properly handle duplicated STREAM frames When a STREAM frame is re-emitted, it will point to the same stream buffer as the original one. If an ACK is received for either one of these frame, the underlying buffer may be freed. Thus, if the second frame is declared as lost and schedule for retransmission, we must ensure that the underlying buffer is still allocated or interrupt the retransmission. Stream buffer is stored as an eb_tree indexed by the stream ID. To avoid to lookup over a tree each time a STREAM frame is re-emitted, a lost STREAM frame is flagged as QUIC_FL_TX_FRAME_LOST. In most cases, this code is functional. However, there is several potential issues which may cause a segfault : - when explicitely probing with a STREAM frame, the frame won't be flagged as lost - when splitting a STREAM frame during retransmission, the flag is not copied To fix both these cases, QUIC_FL_TX_FRAME_LOST flag has been converted to a <dup> field in quic_stream structure. This field is now properly copied when splitting a STREAM frame. Also, as this is now an inner quic_frame field, it will be copied automatically on qc_frm_dup() invocation thus ensuring that it will be set on probing. This issue was encounted randomly with the following backtrace : #0 __memmove_avx512_unaligned_erms () #1 0x000055f4d5a48c01 in memcpy (__len=18446698486215405173, __src=<optimized out>, #2 quic_build_stream_frame (buf=0x7f6ac3fcb400, end=<optimized out>, frm=0x7f6a00556620, #3 0x000055f4d5a4a147 in qc_build_frm (buf=buf@entry=0x7f6ac3fcb5d8, #4 0x000055f4d5a23300 in qc_do_build_pkt (pos=<optimized out>, end=<optimized out>, #5 0x000055f4d5a25976 in qc_build_pkt (pos=0x7f6ac3fcba10, #6 0x000055f4d5a30c7e in qc_prep_app_pkts (frms=0x7f6a0032bc50, buf=0x7f6a0032bf30, #7 qc_send_app_pkts (qc=0x7f6a0032b310, frms=0x7f6a0032bc50) at src/quic_conn.c:4184 #8 0x000055f4d5a35f42 in quic_conn_app_io_cb (t=0x7f6a0009c660, context=0x7f6a0032b310, This should fix github issue #2051. This should be backported up to 2.6.	2023-03-03 15:08:02 +01:00
Remi Tricot-Le Breton	86d1e0b163	BUG/MINOR: ssl: Fix ocsp-update when using "add ssl crt-list" When adding a new certificate through the CLI and appending it to a crt-list with the 'ocsp-update' option set, the new certificate would not be added to the OCSP response update list. The only thing that was missing was the copy of the ocsp_update mode from the ssl_bind_conf into the ckch_store's object. An extra wakeup of the update task also needed to happen in case the newly inserted entry needs to be updated before the next wakeup of the task. This patch does not need to be backported.	2023-03-02 15:57:56 +01:00
Remi Tricot-Le Breton	5843237993	MINOR: ssl: Add global options to modify ocsp update min/max delay The minimum and maximum delays between two automatic updates of a given OCSP response can now be set via global options. It allows to limit the update rate of OCSP responses for configurations that use many frontend certificates with the ocsp-update option set if the updates are deemed too costly.	2023-03-02 15:37:23 +01:00
Remi Tricot-Le Breton	07b7c15bce	MINOR: ssl: Reorder struct certificate_ocsp members Just swapping those two 'refcount' and 'response' members enables to fill two 4 bytes holes in the structure.	2023-03-02 15:37:20 +01:00
Remi Tricot-Le Breton	0c96ee48b4	MINOR: ssl: Add certificate's path to certificate_ocsp structure In order to have some information about the frontend certificate when dumping the contents of the ocsp update tree from the cli, we could either keep a reference to a ckch_store in the certificate_ocsp structure, which might cause some dangling reference problems, or simply copy the path to the certificate in the ocsp response structure. This latter solution was chosen because of its simplicity.	2023-03-02 15:37:15 +01:00
Remi Tricot-Le Breton	ad6cba83a4	MINOR: ssl: Store specific ocsp update errors in response and update ctx Those new specific error codes will enable to know a bit better what went wrong during and OCSP update process. They will come to use in future sample fetches as well as in debugging means (via the cli or future traces).	2023-03-02 15:37:12 +01:00
Remi Tricot-Le Breton	9e94df3e55	MINOR: ssl: Add ocsp update success/failure counters Those counters will be used for debugging purposes and will be dumped via a cli command.	2023-03-02 15:37:11 +01:00
Amaury Denoyelle	e0fe118dad	MINOR: quic: implement qc_notify_send() Implement qc_notify_send(). This function is responsible to notify the upper layer subscribed on SUB_RETRY_SEND if sending condition are back to normal. For the moment, this patch has no functional change as only congestion window room is checked before notifying the upper layer. However, this will be extended when poller subscribe of socket on sendto() error will be implemented. qc_notify_send() will thus be responsible to ensure that all condition are met before wake up the upper layer. This should be backported up to 2.7.	2023-03-01 14:29:16 +01:00
Amaury Denoyelle	1febc2d316	MEDIUM: quic: improve fatal error handling on send Send is conducted through qc_send_ppkts() for a QUIC connection. There is two types of error which can be encountered on sendto() or affiliated syscalls : * transient error. In this case, sending is simulated with the remaining data and retransmission process is used to have the opportunity to retry emission * fatal error. If this happens, the connection should be closed as soon as possible. This is done via qc_kill_conn() function. Until this patch, only ECONNREFUSED errno was considered as fatal. Modify the QUIC send API to be able to differentiate transient and fatal errors more easily. This is done by fixing the return value of the sendto() wrapper qc_snd_buf() : * on fatal error, a negative error code is returned. This is now the case for every errno except EAGAIN, EWOULDBLOCK, ENOTCONN, EINPROGRESS and EBADF. * on a transient error, 0 is returned. This is the case for the listed errno values above and also if a partial send has been conducted by the kernel. * on success, the return value of sendto() syscall is returned. This commit will be useful to be able to handle transient error with a quic-conn owned socket. In this case, the socket should be subscribed to the poller and no simulated send will be conducted. This commit allows errno management to be confined in the quic-sock module which is a nice cleanup. On a final note, EBADF should be considered as fatal. This will be the subject of a next commit. This should be backported up to 2.7.	2023-02-28 10:51:25 +01:00
Willy Tarreau	7b8aac4439	MINOR: tinfo: make thread_set functions return nth group/mask instead of first thread_set_first_group() and thread_set_first_tmask() were modified and renamed to instead return the number and mask of the nth group. Passing zero continues to return the first one, but it will be more convenient to use this way when building shards.	2023-02-28 10:28:47 +01:00
Willy Tarreau	fea8c19119	CLEANUP: listener: only store conn counts for local threads The listeners have a thr_conn[] array indexed on the thread number that is used during connection redispatching to know what threads are the least loaded. Since we introduced thread groups, and based on the fact that a listener may only belong to one group, there's no point storing counters for all threads, we just need to store them for all threads in the group. Doing so reduces the struct listener from 1500 to 632 bytes. This may be backported to 2.7 to save a bit of resources.	2023-02-28 10:28:47 +01:00
Christopher Faulet	85eabfbf67	MEDIUM: mux-quic: Don't expect data from server as long as request is unfinished As for the H1 and H2 stream, the QUIC stream now states it does not expect data from the server as long as the request is unfinished. The aim is the same. We must be sure to not trigger a read timeout on server side if the client is still uploading data. From the moment the end of the request is received and forwarded to upper layer, the QUIC stream reports it expects to receive data from the opposite endpoint. This re-enables read timeout on the server side.	2023-02-27 17:45:45 +01:00
Christopher Faulet	8aabc8ebfd	MINOR: stconn: Report a send activity when endpoint is willing to consume data When the endpoint (applet or mux) is now willing to consume data while it said it wouldn't, a send activity is reported. Indeed, the writes was blocked because of the endpoint. It is now ready to consume outgoing data. So an send activity must be reported to reset corresponding timers. Concretly, when the flag SE_FL_WONT_CONSULE is removed, a send activity is reported.	2023-02-27 17:45:45 +01:00
Willy Tarreau	a2a3d5dd25	CLEANUP: ring: remove the now unused ring's offset Since the previous patch, the ring's offset is not used anymore. The haring utility remains backward-compatible since it can trust the buffer element that's at the beginning of the map and which still contains all the valid data.	2023-02-24 09:26:30 +01:00
Aurelien DARRAGON	d3ffba4512	MINOR: listener: pause_listener() becomes suspend_listener() We are simply renaming pause_listener() to suspend_listener() to prevent confusion around listener pausing. A suspended listener can be in two differents valid states: - LI_PAUSED: the listener is effectively paused, it will unpause on resume_listener() - LI_ASSIGNED (not bound): the listener does not support the LI_PAUSED state, so it was unbound to satisfy the suspend request, it will correcly re-bind on resume_listener() Besides that, we add the LI_F_SUSPENDED flag to mark suspended listeners in suspend_listener() and unmark them in resume_listener(). We're also adding li_suspend proxy variable to track the number of currently suspended listeners: That is, the number of listeners that were suspended through suspend_listener() and that are either in LI_PAUSED or LI_ASSIGNED state. Counter is increased on successful suspend in suspend_listener() and it is decreased on successful resume in resume_listener() -- Backport notes: -> 2.4 only, as "MINOR: proxy/listener: support for additional PAUSED state" was not backported: Replace this: \| /* PROXY_LOCK is require \| proxy_cond_resume(px); By this: \| ha_warning("Resumed %s %s.\n", proxy_cap_str(px->cap), px->id); \| send_log(px, LOG_WARNING, "Resumed %s %s.\n", proxy_cap_str(px->cap), px->id); -> 2.6 and 2.7 only, as "MINOR: listener: make sure we don't pause/resume" was custom patched: Replace this: \|@@ -253,6 +253,7 @@ struct listener { \| \| /* listener flags (16 bits) / \| #define LI_F_FINALIZED 0x0001 / listener made it to the READY\|\|LIMITED\|\|FULL state at least once, may be suspended/resumed safely / \|+#define LI_F_SUSPENDED 0x0002 / listener has been suspended using suspend_listener(), it is either is LI_PAUSED or LI_ASSIGNED state / \| \| / Descriptor for a "bind" keyword. The ->parse() function returns 0 in case of \| * success, or a combination of ERR_* flags if an error is encountered. The By this: \|@@ -222,6 +222,7 @@ struct li_per_thread { \| \| #define LI_F_QUIC_LISTENER 0x00000001 /* listener uses proto quic / \| #define LI_F_FINALIZED 0x00000002 / listener made it to the READY\|\|LIMITED\|\|FULL state at least once, may be suspended/resumed safely / \|+#define LI_F_SUSPENDED 0x00000004 / listener has been suspended using suspend_listener(), it is either is LI_PAUSED or LI_ASSIGNED state / \| \| / The listener will be directly referenced by the fdtab[] which holds its \| * socket. The listener provides the protocol-specific accept() function to	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	2370599f96	MINOR: listener: make sure we don't pause/resume bypassed listeners Some listeners are kept in LI_ASSIGNED state but are not supposed to be started since they were bypassed on initial startup (eg: in protocol_bind_all() or in enable_listener()...) Introduce the LI_F_FINALIZED flag: when the variable is non zero it means that the listener made it past the LI_LISTEN state (finalized) at least once so we can safely pause / resume. This way we won't risk starting a previously bypassed listener which never made it that far and thus was not expected to be lazy-started by accident. As listener_pause() and listener_resume() are currently partially broken, such unexpected lazy-start won't happen. But we're trying to restore pause() and resume() behavior so this patch will be required before going any further. We had to re-introduce listeners 'flags' struct member since it was recently moved into bind_conf struct. But here we do have a legitimate need for these listener-only flags. This should only be backported if explicitly required by another commit. -- Backport notes: -> 2.4 and 2.5: The 2-bytes hole we're using in the current patch does not apply, let's use the 4-byte hole located under the 'option' field. Replace this: \|@@ -226,7 +226,8 @@ struct li_per_thread { \| struct listener { \| enum obj_type obj_type; /* object type = OBJ_TYPE_LISTENER / \| enum li_state state; / state: NEW, INIT, ASSIGNED, LISTEN, READY, FULL / \|- / 2-byte hole here / \|+ uint16_t flags; / listener flags: LI_F_* / \| int luid; / listener universally unique ID, used for SNMP / \| int nbconn; / current number of connections on this listener / \| unsigned int thr_idx; / thread indexes for queue distribution : (t2<<16)+t1 / By this: \|@@ -209,6 +209,8 @@ struct listener { \| short int nice; / nice value to assign to the instantiated tasks / \| int luid; / listener universally unique ID, used for SNMP / \| int options; / socket options : LI_O_* / \|+ uint16_t flags; / listener flags: LI_F_* / \|+ / 2-bytes hole here / \| __decl_thread(HA_RWLOCK_T lock); \| \| struct fe_counters counters; /* statistics counters / -> 2.4 only: We need to adjust some contextual lines. Replace this: \|@@ -477,7 +478,7 @@ int pause_listener(struct listener l, int lpx, int lli) \| if (!lli) \| HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock); \| \|- if (l->state <= LI_PAUSED) \|+ if (!(l->flags & LI_F_FINALIZED) \|\| l->state <= LI_PAUSED) \| goto end; \| \| if (l->rx.proto->suspend) By this: \|@@ -477,7 +478,7 @@ int pause_listener(struct listener l, int lpx, int lli) \| !(proc_mask(l->rx.settings->bind_proc) & pid_bit)) \| goto end; \| \|- if (l->state <= LI_PAUSED) \|+ if (!(l->flags & LI_F_FINALIZED) \|\| l->state <= LI_PAUSED) \| goto end; \| \| if (l->rx.proto->suspend) And this: \|@@ -535,7 +536,7 @@ int resume_listener(struct listener l, int lpx, int lli) \| if (MT_LIST_INLIST(&l->wait_queue)) \| goto end; \| \|- if (l->state == LI_READY) \|+ if (!(l->flags & LI_F_FINALIZED) \|\| l->state == LI_READY) \| goto end; \| \| if (l->rx.proto->resume) By this: \|@@ -535,7 +536,7 @@ int resume_listener(struct listener l, int lpx, int lli) \| !(proc_mask(l->rx.settings->bind_proc) & pid_bit)) \| goto end; \| \|- if (l->state == LI_READY) \|+ if (!(l->flags & LI_F_FINALIZED) \|\| l->state == LI_READY) \| goto end; \| \| if (l->rx.proto->resume) -> 2.6 and 2.7 only: struct listener 'flags' member still exists, let's use it. Remove this from the current patch: \|@@ -226,7 +226,8 @@ struct li_per_thread { \| struct listener { \| enum obj_type obj_type; / object type = OBJ_TYPE_LISTENER / \| enum li_state state; / state: NEW, INIT, ASSIGNED, LISTEN, READY, FULL / \|- / 2-byte hole here / \|+ uint16_t flags; / listener flags: LI_F_* / \| int luid; / listener universally unique ID, used for SNMP / \| int nbconn; / current number of connections on this listener / \| unsigned int thr_idx; / thread indexes for queue distribution : (t2<<16)+t1 / Then, replace this: \|@@ -251,6 +250,9 @@ struct listener { \| EXTRA_COUNTERS(extra_counters); \| }; \| \|+/ listener flags (16 bits) / \|+#define LI_F_FINALIZED 0x0001 / listener made it to the READY\|\|LIMITED\|\|FULL state at least once, may be suspended/resumed safely / \|+ \| / Descriptor for a "bind" keyword. The ->parse() function returns 0 in case of \| * success, or a combination of ERR_* flags if an error is encountered. The \| * function pointer can be NULL if not implemented. The function also has an By this: \|@@ -221,6 +221,7 @@ struct li_per_thread { \| }; \| \| #define LI_F_QUIC_LISTENER 0x00000001 /* listener uses proto quic / \|+#define LI_F_FINALIZED 0x00000002 / listener made it to the READY\|\|LIMITED\|\|FULL state at least once, may be suspended/resumed safely / \| \| / The listener will be directly referenced by the fdtab[] which holds its \| * socket. The listener provides the protocol-specific accept() function to	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	bcad7e6319	MINOR: listener: add relax_listener() function There is a need for a small difference between resuming and relaxing a listener. When resuming, we expect that the listener may completely resume, this includes unpausing or rebinding if required. Resuming a listener is a best-effort operation: no matter the current state, try our best to bring the listener up to the LI_READY state. There are some cases where we only want to "relax" listeners that were previously restricted using limit_listener() or listener_full() functions. Here we don't want to ressucitate listeners, we're simply interested in cancelling out the previous restriction. To this day, listener_resume() on a unbound listener is broken, that's why the need for this wasn't felt yet. But we're trying to restore historical listener_resume() behavior, so we better prepare for this by introducing an explicit relax_listener() function that only does what is expected in such cases. This commit depends on: - "MINOR: listener/api: add lli hint to listener functions"	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	4059e094db	MINOR: listener/api: add lli hint to listener functions Add listener lock hint (AKA lli) to (stop/resume/pause)_listener() functions. All these functions implicitely take the listener lock when they are called: It could be useful to be able to call them while already holding the lock, so we're adding lli hint to make them take the lock only when it is missing. This should only be backported if explicitly required by another commit -- -> 2.4 and 2.5 common backport notes: These 2 commits need to be backported first: - `187396e34` "CLEANUP: listener: function comment typo in stop_listener()" - `a57786e87` "BUG/MINOR: listener: null pointer dereference suspected by coverity" -> 2.4 special backport notes: In addition to the previously mentionned dependencies, the patch needs to be slightly adapted to match the corresponding contextual lines: Replace this: \|@@ -471,7 +474,8 @@ int pause_listener(struct listener l, int lpx) \| if (!lpx && px) \| HA_RWLOCK_WRLOCK(PROXY_LOCK, &px->lock); \| \|- HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock); \|+ if (!lli) \|+ HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock); \| \| if (l->state <= LI_PAUSED) \| goto end; By this: \|@@ -471,7 +474,8 @@ int pause_listener(struct listener l, int lpx) \| if (!lpx && px) \| HA_RWLOCK_WRLOCK(PROXY_LOCK, &px->lock); \| \|- HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock); \|+ if (!lli) \|+ HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock); \| \| if ((global.mode & (MODE_DAEMON \| MODE_MWORKER)) && \| !(proc_mask(l->rx.settings->bind_proc) & pid_bit)) Replace this: \|@@ -169,7 +169,7 @@ void protocol_stop_now(void) \| HA_SPIN_LOCK(PROTO_LOCK, &proto_lock); \| list_for_each_entry(proto, &protocols, list) { \| list_for_each_entry_safe(listener, lback, &proto->receivers, rx.proto_list) \|- stop_listener(listener, 0, 1); \|+ stop_listener(listener, 0, 1, 0); \| } \| HA_SPIN_UNLOCK(PROTO_LOCK, &proto_lock); \| } By this: \|@@ -169,7 +169,7 @@ void protocol_stop_now(void) \| HA_SPIN_LOCK(PROTO_LOCK, &proto_lock); \| list_for_each_entry(proto, &protocols, list) { \| list_for_each_entry_safe(listener, lback, &proto->receivers, rx.proto_list) \| if (!listener->bind_conf->frontend->grace) \|- stop_listener(listener, 0, 1); \|+ stop_listener(listener, 0, 1, 0); \| } \| HA_SPIN_UNLOCK(PROTO_LOCK, &proto_lock); Replace this: \|@@ -2315,7 +2315,7 @@ void stop_proxy(struct proxy p) \| HA_RWLOCK_WRLOCK(PROXY_LOCK, &p->lock); \| \| list_for_each_entry(l, &p->conf.listeners, by_fe) \|- stop_listener(l, 1, 0); \|+ stop_listener(l, 1, 0, 0); \| \| if (!(p->flags & (PR_FL_DISABLED\|PR_FL_STOPPED)) && !p->li_ready) { \| / might be just a backend / By this: \|@@ -2315,7 +2315,7 @@ void stop_proxy(struct proxy p) \| HA_RWLOCK_WRLOCK(PROXY_LOCK, &p->lock); \| \| list_for_each_entry(l, &p->conf.listeners, by_fe) \|- stop_listener(l, 1, 0); \|+ stop_listener(l, 1, 0, 0); \| \| if (!p->disabled && !p->li_ready) { \| /* might be just a backend */	2023-02-23 15:05:05 +01:00
Christopher Faulet	2bf99123ef	MINOR: stconn: Add functions to set/clear SE_FL_EXP_NO_DATA flag from endpoint se_expect_data() and se_expect_no_data() should be used from the endpoint to inform upper layer it expects data or not from the opposite endpoint.	2023-02-23 13:44:32 +01:00
Christopher Faulet	be5cc766b0	MINOR: stconn: Remove half-closed timeout The half-closed timeout is now directly retrieved from the proxy settings. There is no longer usage for the .hcto field in the stconn structure. So let's remove it.	2023-02-22 15:59:16 +01:00
Christopher Faulet	bcdcfad3ff	MINOR: stconn: Set half-close timeout using proxy settings We now directly use the proxy settings to set the half-close timeout of a stream-connector. The function sc_set_hcto() must be used to do so. This timeout is only set when a shutw is performed. So it is not really a big deal to use a dedicated function to do so.	2023-02-22 15:59:16 +01:00
Christopher Faulet	15315d6c0a	CLEANUP: stconn: Remove old read and write expiration dates Old read and write expiration dates are no longer used. Thus we can safely remove them.	2023-02-22 15:59:16 +01:00
Christopher Faulet	b374ba563a	MAJOR: stream: Use SE descriptor date to detect read/write timeouts We stop to use the channel's expiration dates to detect read and write timeouts on the channels. We now rely on the stream-endpoint descriptor to do so. All the stuff is handled in process_stream(). The stream relies on 2 helper functions to know if the receives or sends may expire: sc_rcv_may_expire() and sc_snd_may_expire().	2023-02-22 15:57:16 +01:00
Christopher Faulet	2ca4cc1936	MINOR: applet/stconn: Add a SE flag to specify an endpoint does not expect data An endpoint should now set SE_FL_EXP_NO_DATA flag if it does not expect any data from the opposite endpoint. This way, the stream will be able to disable any read timeout on the opposite endpoint. Applets should use applet_expect_no_data() and applet_expect_data() functions to set or clear the flag. For now, only dns and sink forwarder applets are concerned.	2023-02-22 15:56:28 +01:00
Christopher Faulet	4c13568b49	MEDIUM: stconn: Add two date to track successful reads and blocked sends The stream endpoint descriptor now owns two date, lra (last read activity) and fsb (first send blocked). The first one is updated every time a read activity is reported, including data received from the endpoint, successful connect, end of input and shutdown for reads. A read activity is also reported when receives are unblocked. It will be used to detect read timeouts. The other one is updated when no data can be sent to the endpoint and reset when some data are sent. It is the date of the first send blocked by the endpoint. It will be used to detect write timeouts. Helper functions are added to report read/send activity and to retrieve lra/fsb date.	2023-02-22 14:52:15 +01:00
Christopher Faulet	5aaacfbccd	MEDIUM: stconn: Replace read and write timeouts by a unique I/O timeout Read and write timeouts (.rto and .wto) are now replaced by an unique timeout, call .ioto. Since the recent refactoring on channel's timeouts, both use the same value, the client timeout on client side and the server timeout on the server side. Thus, this part may be simplified. Now it represents the I/O timeout.	2023-02-22 14:52:15 +01:00
Christopher Faulet	f8413cba2a	MEDIUM: channel/stconn: Move rex/wex timer from the channel to the sedesc These timers are related to the I/O. Thus it is logical to move them into the SE descriptor. The patch is a bit huge but it is just a replacement. However it is error-prone. From the stconn or the stream, helper functions are used to get, set or reset these timers. This simplify the timers manipulations.	2023-02-22 14:52:15 +01:00
Christopher Faulet	ed7e66fe1a	MINOR: channel/stconn: Move rto/wto from the channel to the stconn Read and write timeouts concerns the I/O. Thus, it is logical to move it into the stconn. At the end, the stream is responsible to detect the timeouts. So it is logcial to have these values in the stconn and not in the SE descriptor. But it may change depending on the recfactoring. So, now: * scf->rto is used instead of req->rto * scf->wto is used instead of res->wto * scb->rto is used instead of res->rto * scb->wto is used instead of req->wto	2023-02-22 14:52:15 +01:00
Christopher Faulet	2e56a73459	MAJOR: channel: Remove flags to report READ or WRITE errors This patch removes CF_READ_ERROR and CF_WRITE_ERROR flags. We now rely on SE_FL_ERR_PENDING and SE_FL_ERROR flags. SE_FL_ERR_PENDING is used for write errors and SE_FL_ERROR for read or unrecoverable errors. When a connection error is reported, SE_FL_ERROR and SE_FL_EOS are now set and a read event and a write event are reported to be sure the stream will properly process the error. At the stream-connector level, it is similar. When an error is reported during a send, a write event is triggered. On the read side, nothing more is performed because an error at this stage is enough to wake the stream up. A major change is brought with this patch. We stop to check flags of the ooposite channel to report abort or timeout. It also means when an read or write error is reported on a side, we no longer update the other side. Thus a read error on the server side does no long lead to a write error on the client side. This should ease errors report.	2023-02-22 14:52:15 +01:00
Christopher Faulet	81fdeb8ce2	MEDIUM: channel: Remove CF_READ_NOEXP flag This flag was introduced in 1.3 to fix a design issue. It was untouch since then but there is no reason to still have this trick. Note it could be good to review what happens in HTTP with the server is waiting for the end of the request. It could be good to be sure a client timeout is always reported.	2023-02-22 14:52:14 +01:00
Aurelien DARRAGON	3ffbf3896d	BUG/MEDIUM: httpclient/lua: fix a race between lua GC and hlua_ctx_destroy In `bb581423b` ("BUG/MEDIUM: httpclient/lua: crash when the lua task timeout before the httpclient"), a new logic was implemented to make sure that when a lua ctx destroyed, related httpclients are correctly destroyed too to prevent a such httpclients from being resuscitated on a destroyed lua ctx. This was implemented by adding a list of httpclients within the lua ctx, and a new function, hlua_httpclient_destroy_all(), that is called under hlua_ctx_destroy() and runs through the httpclients list in the lua context to properly terminate them. This was done with the assumption that no concurrent Lua garbage collection cycles could occur on the same ressources, which seems OK since the "lua" context is about to be freed and is not explicitly being used by other threads. But when 'lua-load' is used, the main lua stack is shared between multiple OS threads, which means that all lua ctx in the process are linked to the same parent stack. Yet it seems that lua GC, which can be triggered automatically under lua_resume() or manually through lua_gc(), does not limit itself to the "coroutine" stack (the stack referenced in lua->T) when performing the cleanup, but is able to perform some cleanup on the main stack plus coroutines stacks that were created under the same main stack (via lua_newthread()) as well. This can be explained by the fact that lua_newthread() coroutines are not meant to be thread-safe by design. Source: http://lua-users.org/lists/lua-l/2011-07/msg00072.html (lua co-author) It did not cause other issues so far because most of the time when using 'lua-load', the global lua lock is taken when performing critical operations that are known to interfere with the main stack. But here in hlua_httpclient_destroy_all(), we don't run under the global lock. Now that we properly understand the issue, the fix is pretty trivial: We could simply guard the hlua_httpclient_destroy_all() under the global lua lock, this would work but it could increase the contention over the global lock. Instead, we switched 'lua->hc_list' which was introduced with `bb581423b` from simple list to mt_list so that concurrent accesses between hlua_httpclient_destroy_all and hlua_httpclient_gc() are properly handled. The issue was reported by @Mark11122 on Github #2037. This must be backported with `bb581423b` ("BUG/MEDIUM: httpclient/lua: crash when the lua task timeout before the httpclient") as far as 2.5.	2023-02-22 11:44:22 +01:00
Willy Tarreau	27629a7d65	MINOR: compiler: add a TOSTR() macro to turn a value into a string Pretty often we have to emit a value (setting, limit etc) in an error message, and this value is known at compile-time, and just doing this forces to use a printf format such as "%d". Let's have a simple macro to turn any other macro or value into a string that can be concatenated with the rest of the string around. This simplifies error messages production on the CLI for example.	2023-02-22 09:10:53 +01:00
Remi Tricot-Le Breton	879debeecb	BUG/MINOR: cache: Cache response even if request has "no-cache" directive Since commit `cc9bf2e5f` "MEDIUM: cache: Change caching conditions" responses that do not have an explicit expiration time are not cached anymore. But this mechanism wrongly used the TX_CACHE_IGNORE flag instead of the TX_CACHEABLE one. The effect this had is that a cacheable response that corresponded to a request having a "Cache-Control: no-cache" for instance would not be cached. Contrary to what was said in the other commit message, the "checkcache" option should not be impacted by the use of the TX_CACHEABLE flag instead of the TX_CACHE_IGNORE one. The response is indeed considered as not cacheable if it has no expiration time, regardless of the presence of a cookie in the response. This should fix GitHub issue #2048. This patch can be backported up to branch 2.4.	2023-02-21 18:35:41 +01:00
Christopher Faulet	c13f3028e8	MINOR: cfgcond: Implement enabled condition expression Implement a way to test if some options are enabled at run-time. For now, following options may be detected: POLL, EPOLL, KQUEUE, EVPORTS, SPLICE, GETADDRINFO, REUSEPORT, FAST-FORWARD, SERVER-SSL-VERIFY-NONE These options are those that can be disabled on the command line. This way it is possible, from a reg-test for instance, to know if a feature is supported or not : feature cmd "$HAPROXY_PROGRAM -cc '!(globa.tune & GTUNE_NO_FAST_FWD)'"	2023-02-21 11:44:55 +01:00
Christopher Faulet	a1fdad784b	MINOR: cfgcond: Implement strstr condition expression Implement a way to match a substring in a string. The strstr expresionn can now be used to do so.	2023-02-21 11:44:55 +01:00

1 2 3 4 5 ...

6766 Commits