Congestion window is limit by a minimal and maximum values which can
never be exceeded. Min value is hardcoded to 2 datagrams as recommended
by the specification. Max value is specified via haproxy configuration.
These values must be respected each time the congestion window size is
adjusted. However, in some rare occasions, limit were not always
enforced. Fix this by implementing wrappers to set or increment the
congestion window. These functions ensure limits are always applied
after the operation.
Additionnally, wrappers also ensure that if window reached a new maximum
value, it is saved in <cwnd_last_max> field.
This should be backported up to 2.6, after a brief period of
observation.
Write minor adjustments to QUIC BBR functions. The objective is to
centralize every modification of path cwnd field.
No functional change. This patch will be useful to simplify
implementation of global QUIC Tx memory usage limitation.
There was some possible confusion between fields related to congestion
window size min and max limit which cannot be exceeded, and the maximum
value previously reached by the window.
Fix this by adopting a new naming scheme. Enforced limit are now renamed
<limit_max>/<limit_min>, while the previously reached max value is
renamed <cwnd_last_max>.
This should be backported up to 3.1.
Rename one of the congestion algorithms pacing callback from pacing_rate
to pacing_inter. This better reflects that this function returns a delay
(in nanoseconds) which should be applied between each packet emission to
fill the congestion window with a perfectly smoothed emission.
This should be backported up to 3.1.
Rename bbr_is_probing_bw() to bbr_is_in_a_probe_state() and
bbr_is_accelerating_probing_bw() to bbr_is_probing_bw() to match
the function names of the BBR v3 internet draft.
Must be backported to 3.1 to ease any further backport to come.
Startup state is also a probing with acceleration bandwidth state.
This modification should have come with this previous one:
BUG/MINOR: quic: reduce packet losses at least during ProbeBW_CRUISE (BBR)
Must be backported to 3.1.
This bug fixes the 3rd condition used by bbr_check_startup_high_loss() to decide
it has detected some high loss as mentioned by the BBR v3 RFC draft:
4.3.1.3. Exiting Startup Based on Packet Loss
...
There are at least BBRStartupFullLossCnt=6 discontiguous sequence ranges lost in that round trip.
where a <= operator was used in place of <.
Must be backported to 3.1.
bbr_congestion_event() role is to track the start time of recovery periods.
This was done using <ts> passed as parameter. But this parameter is the
time the newest lost packet has been sent.
The timestamp value to store in ->recovery_start_ts is <now_ms>.
Must be backported to 3.1.
Upon congestion events (for a instance packet loss),
bbr_adapt_lower_bounds_from_congestion() role is to adapt some BBR internal
variables in relation with the estimated bandwidth (BBR.bw).
According to the BBR v3 draft, this function should do nothing
if BBRIsProbingBW() pseudo-code returns true. That said, this function
is not defined by the BBR v3 draft. But according to this part mentioned before
defining the pseudo-code for BBRAdaptLowerBoundsFromCongestion():
4.5.10.3. When not Probing for Bandwidth
When not explicitly accelerating to probe for bandwidth (Drain, ProbeRTT,
ProbeBW_DOWN, ProbeBW_CRUISE), BBR responds to loss by slowing down to some extent.
This is because loss suggests that the available bandwidth and safe volume of
in-flight data may have decreased recently, and the flow needs to adapt, slowing
down toward the latest delivery process. BBR flows implement this response by
reducing the short-term model parameters, BBR.bw_lo and BBR.inflight_lo.
BBRIsProbingBW() should concern the accelerating probe for bandwidth states
which are BBR_ST_PROBE_BW_REFILL and BBR_ST_PROBE_BW_UP.
Adapt the code to match this latter assumption. At least this reduce
drastically the packet loss volumes at least during ProbeBW_CRUISE.
As an example, on a 100MBits/s internet link with ~94ms as RTT, before
this patch, 4329640 sent packets were needed with 1617119 lost packets (!!!) to
download a 3GB object. After this patch, 2843952 sent packets vs 144134 lost packets
are needed. There may be some packet loss issue. I suspect the maximum bandwidth
which may be overestimated. More this is the case, more the packet loss is big.
That said, at this time, it remains below 5% depending on the size of the objects,
5% being for more than 2GB objects.
Must be backported to 3.1.
Add a test to ensure that values of a local variable used by
bbr_inflight_hi_from_lost_packet() is not be impacted by underflow issues
when subtracting too big numbers and make this function return a correct value.
Must be backported to 3.1.
This bug arrived with this commit:
6404b7a18a BUG/MINOR: quic: fix bbr_inflight() calls with wrong gain value
This patch partially reverts after having checked the BBR v3 draft.
This bug was invisible when testing long BBR flows.
Must be backported to 3.1.
Remove the code in relation with BBR.ack_phase as per this commit:
ee98c12ad6
I do now kwow at this time why such a request was pushed on GH for the BBR v3 draft
pseudo-code. That said, the use of such an ack phase seemed confusing, adding much
more information about a BBR flow state than needed. Indeed, the ack phase
state is modified several times in the BBR draft pseudo-code but only used to
decide if the max bandwidth filter virtual clock had to be incremented by
BBRAdvanceMaxBwFilter().
In addition to this, when discussing about haproxy BBR implementation with
Neal Cardwell on the BBR development google group about an oscillation issue
of the max bandwidth (BBR.max_bw), I concluded that this was due to the fact
that its filter virutal clock was too often update, due to the ack phase wich
was stalled in BBR_ACK_PHASE_ACKS_PROBE_STOPPING state for too long. This is
where Neal asked me to test the aforementioned commit. This definitively
makes the max bandwidth (BBR.max_bw) oscillation issue disappear.
Another solution would have been to add a new ack phase enum afer
BBR_ACK_PHASE_ACKS_PROBE_STOPPING. BBR_ACK_PHASE_ACKS_PROBE_STOPPED
would have been a good candidate.
Remove the code in relation with BBR.ack_phase.
Must be backported to 3.1.
This patch fixes two wrong calls to bbr_inflight().
bbr_target_inflight() aim is to compute the number of bytes BBR has to put on
the network as bytes in flight (sent but not acked bytes). It must call
bbr_inflight() with the current window gain value (in place of a wrong fixed 100
gain value here, in percents).
bbr_is_time_to_cruise() also called bbr_inflight() with a wrong gain value
as parameter due to a confusion between the value mentioned by the RFC (1
meaning 100% of the current window) and our implementation which needs value in
percents (so 100 in place of 1 here). Note that bbr_is_time_to_cruise() aim is to
make BBR the decision to leave the probing_bw down state. The bug had as side
effect to make BBR stay in this state during too long periods of time during
which the bottleneck bandwidth is decreasing, leading to big oscillations
between the mininum and maximum bottleneck bandwidth estimations.
This patch must be backported to 3.1 where BBR was first implemented.
Limit the BBR congestion control window size as this is done for all the others
congestion control algorithms with tune.quic.frontend.default-max-window-size
or as first argument passed to "bbr" option for "quic-cc-algo".
The ->app_limited member of the delivery rate struct (quic_cc_drs) aim is to
store the index of the last transmitted byte marked as application-limited
so that to track the application-limited phases. During these phases,
BBR must ignore delivery rate samples to properly estimate the delivery rate.
Without such a patch, the Startup phase could be exited very quickly with
a very low estimated bottleneck bandwidth. This had a very bad impact
on little objects with download times smaller than the expected Startup phase
duration. For such objects, with enough bandwith, BBR should stay in the Startup
state.
No need to be backported, as BBR is implemented in the current developement version.
This poor/inefficient code has been revealed by coverity GH issue in #2788 where
some quic_cc_rs struct member initializations were mentionned as overwritten
(after initialization) before being used as follows:
CID 1565821: Code maintainability issues (UNUSED_VALUE)
/src/quic_cc_bbr.c: 1373 in bbr_handle_lost_packet()
1367 }
1368
1369 static void bbr_handle_lost_packet(struct bbr *bbr, struct quic_cc_path *p,
1370 struct quic_tx_packet *pkt,
1371 uint32_t lost)
1372 {
>>> CID 1565821: Code maintainability issues (UNUSED_VALUE)
>>> Assigning value "0UL" to "rs.tx_in_flight" here, but that stored value is overwritten before it can be used.
1373 struct quic_cc_rs rs = {0};
1374
1375 /* C.delivered = bbr->drs.delivered */
1376 bbr_note_loss(bbr, bbr->drs.delivered);
1377 if (!bbr->bw_probe_samples)
1378 return; /* not a packet sent while probing bandwidth */
Remove the {0} initializer for <rs> variable. This is safe because the members
initializations of <rs> local variable passed to functions from
bbr_handle_lost_packet() are done. Add a comment to mention this.
Implement the version 3 of BBR for QUIC specified by the IETF in this draft:
https://datatracker.ietf.org/doc/draft-ietf-ccwg-bbr/
Here is an extract from the Abstract part to sum up the the capabilities of BBR:
BBR ("Bottleneck Bandwidth and Round-trip propagation time") uses recent
measurements of a transport connection's delivery rate, round-trip time, and
packet loss rate to build an explicit model of the network path. BBR then uses
this model to control both how fast it sends data and the maximum volume of data
it allows in flight in the network at any time. Relative to loss-based congestion
control algorithms such as Reno [RFC5681] or CUBIC [RFC9438], BBR offers
substantially higher throughput for bottlenecks with shallow buffers or random
losses, and substantially lower queueing delays for bottlenecks with deep buffers
(avoiding "bufferbloat"). BBR can be implemented in any transport protocol that
supports packet-delivery acknowledgment. Thus far, open source implementations
are available for TCP [RFC9293] and QUIC [RFC9000].
In haproxy, this implementation is considered as still experimental. It depends
on the newly implemented pacing feature.
BBR was asked in GH #2516 by @KazuyaKanemura, @osevan and @kennyZ96.