2722 Commits

Author SHA1 Message Date
Alex Auvolat
7b119c0b4f bump version number to v2.3.0 v2.3.0 2026-04-16 18:34:27 +02:00
Alex Auvolat
02d5e67698 db: avoid iterating bounded from empty slice (fix #1401) (#1408)
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1408
Co-authored-by: Alex Auvolat <lx@deuxfleurs.fr>
Co-committed-by: Alex Auvolat <lx@deuxfleurs.fr>
2026-04-16 16:33:28 +00:00
maximilien
854280e957 Merge pull request 'helm: Conditionally skip CRD management RBAC rule' (#1248) from boris.m/garage:feat/drop-crd-management-rbac-rule into main-v2
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1248
Reviewed-by: maximilien <git@mricher.fr>
2026-04-16 16:22:17 +00:00
B Marinov
9ea2b1d628 helm: Conditionally skip CRD management RBAC rule
Remove rule permitting changes to CRDs when garage.kubernetesSkipCrd is  set to true.
2026-04-16 16:22:17 +00:00
maximilien
7b7548a4f7 Merge pull request 'Fix helm existing configmap volume ref in workload' (#1388) from PhilleZi/garage:fix-helm-existing-configmap into main-v2
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1388
Reviewed-by: maximilien <git@mricher.fr>
2026-04-16 16:20:27 +00:00
Philip Zingmark
a2e410f8b6 Fix helm existing configmap volume ref in workload 2026-04-16 16:20:01 +00:00
Alex
690729ccdb Merge pull request 'fix: bound known_addrs growth and add TCP connect timeout' (#1345) from rajsinghtech/garage:fix/peering-stale-addr-reconnection into main-v2
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1345
2026-04-15 11:42:38 +00:00
Alex Auvolat
ff743453b6 garage_net: make pruning logic simpler and add test 2026-04-15 11:42:38 +00:00
Raj Singh
f34a7db48a fix: bound known_addrs growth
known_addrs in PeerInfoInternal is append-only — addresses accumulate
via add_addr() and PeerList gossip but are never removed. In dynamic
environments (k8s pod restarts, DHCP, NAT traversal), this list grows
unboundedly with stale addresses.

Combined with sequential iteration in try_connect() and no TCP connect
timeout in netapp.rs, each unreachable address blocks reconnection for
the kernel's TCP SYN timeout (75-130s on Linux). With 10+ stale
addresses, worst-case reconnection exceeds 750s — a full outage for
replication_factor=3 clusters.

This commit contains the two following changes:

1. Address failure tracking and pruning (peering.rs): Track consecutive
   connection failures per address in PeerInfoInternal. After 3 failures,
   prune from known_addrs. Reset count when address is re-advertised via
   gossip or incoming connection. Prevents unbounded list growth.

2. Shuffle before connecting (peering.rs): Randomize address order in
   try_connect() so the valid address (often appended last) gets a fair
   chance instead of always trying stale addresses first.
2026-04-15 11:42:38 +00:00
Raj Singh
3a355b1617 fix: add TCP connect timeout
known_addrs in PeerInfoInternal is append-only — addresses accumulate
via add_addr() and PeerList gossip but are never removed. In dynamic
environments (k8s pod restarts, DHCP, NAT traversal), this list grows
unboundedly with stale addresses.

Combined with sequential iteration in try_connect() and no TCP connect
timeout in netapp.rs, each unreachable address blocks reconnection for
the kernel's TCP SYN timeout (75-130s on Linux). With 10+ stale
addresses, worst-case reconnection exceeds 750s — a full outage for
replication_factor=3 clusters.

This patches includes a first change to fix this issue:

1. TCP connect timeout (netapp.rs): Wrap TcpStream::connect() in
   tokio::time::timeout(10s). Caps per-address attempt from 75-130s
   to 10s, reducing worst-case 10-addr reconnection from ~750s to ~100s.
2026-04-15 11:42:38 +00:00
Alex
0b5e82a18b Merge pull request 'Cherry-pick #1396 for main-v2' (#1404) from fix-starvation into main-v2
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1404
2026-04-15 10:35:22 +00:00
Gauthier Zirnhelt
2798667345 Fix the LifecycleWorker being uncooperative (#1396)
## Summary

This PR ensures that the `LifecycleWorker` yields at least once to the Tokio scheduler in between each batch of 100 objects.

## Problem being solved

I'm administrating a Garage cluster which has been experiencing timeouts on all endpoints while the lifecycle worker is running at midnight UTC : `Ping timeout` error messages and even requests eventually failing due to `Could not reach quorum ...`.

I have found that this happens while the lifecycle worker is working on a big bucket (containing millions of objects) with a lifecycle rule that applies to very few objects.
The `process_object()` function does not hit any `await`:
- `last_bucket` is always the same, so the `bucket_table` is not read asynchronously
- no transaction is made on the `object_table` because my lifecycle rule (almost) never applies to any object

The first commit in this PR adds an executable which reproduces the problem that I've been experiencing in a self-contained way : the lifecycle worker starves the Tokio scheduler so much that no other task is able to run (or very rarely).
To run it : `cargo run -p garage_model --bin lifecycle-starvation-test`.
This commit can be dropped post-review, as it's only useful to demonstrate the starvation.

The error messages completely stopped after adding the extra yield to the nodes of my cluster.
The duration of the lifecycle worker task does not appear to have changed at all from what I can see (looking at the timestamps produced either by the self-contained binary or by each of my nodes with the `Lifecycle worker finished` message).

## Note

An other potential fix would have been to force the `WorkerProcessor` to yield before re-enqueuing a busy task, but this would have affected all Garage workers even though it's only the `LifecycleWorker` being uncooperative.

Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1396
Reviewed-by: Alex <lx@deuxfleurs.fr>
Co-authored-by: Gauthier Zirnhelt <gauthier.zirnhelt@insimo.fr>
Co-committed-by: Gauthier Zirnhelt <gauthier.zirnhelt@insimo.fr>
2026-04-15 12:13:18 +02:00
Alex
b1660f0cba Merge pull request 'document known issues' (#1379) from doc-known-issues into main-v2
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1379
2026-04-15 10:11:39 +00:00
Alex Auvolat
dfb20ba87f doc: write details of known issues 2026-04-15 10:11:39 +00:00
maximilien
7279cb9113 Add comment on tags 2026-04-15 10:11:39 +00:00
Alex Auvolat
56cb89d153 wip: list known issues in documentation 2026-04-15 10:11:39 +00:00
Armael
6fd9bba0cb WebsiteConfiguration: do not emit empty XML attributes for absent values (#1391)
This fixes a regression wrt garage-v1, likely caused by the version upgrade of quick_xml.

Currently, garage-v2 will emit empty ErrorDocument/IndexDocument/RedirectAllRequestsTo attributes in the response of GetBucketWebsite if there are no corresponding values.
This is somewhat wrong; at least, the S3 documentation for RedirectAllRequestsTo (https://docs.aws.amazon.com/AmazonS3/latest/API/API_RedirectAllRequestsTo.html) writes that it has a required HostName field. So emitting an empty RedirectAllRequestsTo is invalid.

This PR skips emitting XML attributes for these parameters if they contain no value.

Co-authored-by: Armaël Guéneau <armael.gueneau@ens-lyon.org>
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1391
Co-authored-by: Armael <armael@noreply.localhost>
Co-committed-by: Armael <armael@noreply.localhost>
2026-04-13 13:59:32 +00:00
Jul Lang
f9605fae78 fix typo (#1402)
found by [typos](https://github.com/crate-ci/typos)

Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1402
Co-authored-by: Jul Lang <jullanggit@proton.me>
Co-committed-by: Jul Lang <jullanggit@proton.me>
2026-04-13 12:12:57 +00:00
Armael
9969c3e599 Fix: correctly parse CORS website configuration with no rules (#1392)
This is a port of #1320 on top of the main-v2 branch.

Co-authored-by: Armaël Guéneau <armael.gueneau@ens-lyon.org>
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1392
Co-authored-by: Armael <armael@noreply.localhost>
Co-committed-by: Armael <armael@noreply.localhost>
2026-03-22 17:09:16 +00:00
Alex
a69a8d3b21 Merge pull request 'force uri encoding before check signature' (#1382) from gwenlg/garage:signature_doesnt_match_1155 into main-v2
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1382
Reviewed-by: Alex <lx@deuxfleurs.fr>
2026-03-22 10:59:43 +00:00
Gwen Lg
3a97b13e2f wip: add percent_decode before uri_encode for check signature
this avoid error when request uri is not encoded for signature
2026-03-22 10:59:43 +00:00
Gwen Lg
4efaea60bb tests: check request signatures with 'badly-encoded' uri
test related to issue #1155 and #1255
2026-03-22 10:59:43 +00:00
Gwen Lg
06e9756729 test: some error rework 2026-03-22 10:59:43 +00:00
trinity-1686a
8341b7f914 log api error in one self-sufficient line (fix #1381) (#1390)
this makes it more easy to correlate an error with the request that caused it. This can be helpful during debugging, or when setting up some sort of automation based on log content

Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1390
Reviewed-by: Alex <lx@deuxfleurs.fr>
Reviewed-by: maximilien <git@mricher.fr>
Co-authored-by: trinity-1686a <trinity@deuxfleurs.fr>
Co-committed-by: trinity-1686a <trinity@deuxfleurs.fr>
2026-03-20 20:22:34 +00:00
MrSnowy
96b986a0a0 Add completions sub-command for generating shell completions (#1386)
Made a quick pr to add a sub-command called completions for generating shell completions, was going pretty crazy that this wasn't a thing :P.

Tried my best to do everything properly, let me know if I need to change something, I tested it and it works perfectly.

Co-authored-by: MrSnowy <snow@mrsnowy.dev>
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1386
Reviewed-by: Alex <lx@deuxfleurs.fr>
Co-authored-by: MrSnowy <mrsnowy@noreply.localhost>
Co-committed-by: MrSnowy <mrsnowy@noreply.localhost>
2026-03-17 18:17:51 +00:00
trinity-1686a
60244b60dd don't panic on missing checksum (fix #1387) (#1389)
fix https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/1387

Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1389
Reviewed-by: Alex <lx@deuxfleurs.fr>
Co-authored-by: trinity-1686a <trinity-1686a@noreply.localhost>
Co-committed-by: trinity-1686a <trinity-1686a@noreply.localhost>
2026-03-17 18:16:37 +00:00
Alex
9848ec7f4e Merge pull request 'add missing admin API endpoints for admin UI' (#1376) from admin-json-statistics into main-v2
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1376
2026-03-17 17:44:29 +00:00
Alex Auvolat
b81eae3f65 admin api: don't fail in getclusterstatistics when counting total objects/bytes 2026-03-17 17:44:29 +00:00
Alex Auvolat
6131318c80 admin api: don't gather all bucket statistics if too many buckets 2026-03-17 17:44:29 +00:00
Alex Auvolat
4566020360 admin api: convert new fields to Option<T> 2026-03-17 17:44:29 +00:00
Alex Auvolat
de10dc43d5 admin api: return total buckets, objects and bytes in GetClusterStatistics 2026-03-17 17:44:29 +00:00
Alex Auvolat
8abd0fee86 admin api: add fixme comments for cleanup for v3 release 2026-03-17 17:44:29 +00:00
Alex Auvolat
af5f68a34d admin api: allow updating website routing rules 2026-03-17 17:44:29 +00:00
Alex Auvolat
19e5f83164 admin api: update cors and lifecycle rules in UpdateBucket 2026-03-17 17:44:29 +00:00
Alex Auvolat
64087172ff admin api: expose routing rules, cors rules and lifecycle rules 2026-03-17 17:44:29 +00:00
Alex Auvolat
6c0bb1c9b6 refactoring: move xml definitions for bucket cors/lifecycle/website config
move these defnitions to garage_api_common so that they can also be used
in admin api
2026-03-17 17:44:29 +00:00
Alex Auvolat
124a9eb521 admin api: export node statistics as structured json 2026-03-17 17:44:29 +00:00
Alex Auvolat
03e6020c6b admin api: report avilable space numerically in GetClusterStatistics 2026-03-17 17:44:29 +00:00
milouz1985
836657565e s3: fix DeleteObjects XML parsing with pretty-printed bodies (#1374)
## Summary

This PR fixes S3 `DeleteObjects` XML parsing when the request body is pretty-printed (contains indentation/newlines as whitespace text nodes).

Although PR #1324 already tried to address this, parsing could still fail with:

`InvalidRequest: Bad request: Invalid delete XML query`

because non-element nodes were validated but not actually skipped in the parsing loop.

## What changed

- In `src/api/s3/delete.rs`:
  - Properly skip non-element whitespace text nodes while iterating over `<Delete>` children.
  - Keep rejecting non-whitespace stray text content.
  - Parse the root `<Delete>` element more robustly by selecting the first element child.

## Tests added

New unit tests in `src/api/s3/delete.rs`:

- `parse_delete_objects_xml_with_formatting`
  - pretty-printed valid XML is accepted.
- `parse_delete_objects_xml_accepts_compact_valid_xml`
  - compact valid XML is accepted.
- `parse_delete_objects_xml_rejects_non_whitespace_text_node`
  - compact XML with stray text is rejected.
- `parse_delete_objects_xml_rejects_pretty_print_with_stray_text`
  - pretty-printed XML with stray text is rejected.

## Validation

Executed:

```bash
cargo test -p garage_api_s3 parse_delete_objects_xml -- --nocapture
```

Result: all parser tests pass.
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1374
Co-authored-by: milouz1985 <francois.hoyez@gmail.com>
Co-committed-by: milouz1985 <francois.hoyez@gmail.com>
2026-03-15 10:40:50 +00:00
trinity-1686a
76592723de don't send empty 404 on GetBucketCORS/GetBucketLifecycle (#1378)
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1378
Reviewed-by: Alex <lx@deuxfleurs.fr>
Co-authored-by: trinity-1686a <trinity@deuxfleurs.fr>
Co-committed-by: trinity-1686a <trinity@deuxfleurs.fr>
2026-03-10 09:41:08 +00:00
Ira Iva
d2f033641e Suppress log noise from /metrics and /health endpoints [#1292]. Change log level for 'netapp: incomming connection ...' message [#1310] (#1361)
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1361
Co-authored-by: Ira Iva <xatikopro@gmail.com>
Co-committed-by: Ira Iva <xatikopro@gmail.com>
2026-03-03 15:52:53 +00:00
Roman Ivanov
2cfd92e0c3 Use error NoSuchAccessKey in get info request processing (#1293) (#1356)
Fix for https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/1293

Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1356
Reviewed-by: Alex <lx@deuxfleurs.fr>
Co-authored-by: Roman Ivanov <xatikopro@gmail.com>
Co-committed-by: Roman Ivanov <xatikopro@gmail.com>
2026-02-27 18:11:57 +00:00
Quentin Dufour
f796df8c34 Support streaming of gzip content involving multiple Content-Encoding headers (#1369)
## Problem

`hugo deploy` is broken with Garage on recent hugo versions when using gzip matchers

## Why?

We don't support multi-value headers correctly, in this case this specific headers combination:

```
Content-Encoding: gzip
Content-Encoding: aws-chunked
```

is interpreted as:

```
Content-Encoding: gzip
```

instead of:

```
Content-Encoding: gzip,aws-chunked
```

It fails both 1. the signature check and 2. the streaming check.

## Proposed fix

 - Taking into account multi-value headers when building Canonical Request (validated with hugo deploy + AWS SDK v2)
 - Taking into account multi-value headers (both comma separated and HeaderEntry separated) when removing `aws-chunked` (validated with hugo deploy + AWS SDK v2)

## Full explanation

Currently, `hugo deploy` on version `hugo v0.152.2` or more recent uses AWS SDK v2 only and supports for sending gzipped content.
That's configured with a matcher like that:

```yaml
deployment:
  matchers:
    - pattern: "^.+\\.(woff2|woff|svg|ttf|otf|eot|js|css)$"
      cacheControl: "max-age=31536000, no-transform, public"
      gzip: true  # <-------- here
```

Also, with SDK v2, hugo is streaming all of its files.
Thus, it sends that kind of requests:

```python
Request {
  method: PUT,
  uri: /sebou/pagefind/pagefind.js?x-id=PutObject,
  version: HTTP/1.1,
  headers: {
    "host": "localhost",
    "user-agent": "aws-sdk-go-v2/1.39.2 ua/2.1 os/linux lang/go#1.25.6 md/GOOS#linux md/GOARCH#amd64 api/s3#1.84.0 ft/s3-transfer m/E,G,Z,g",
    "content-length": "10026",
    "accept-encoding": "identity",
    "amz-sdk-invocation-id": "aed6df34-a67c-4bab-b63b-2b3777b751a0",
    "amz-sdk-request": "attempt=1; max=3",
    "authorization": "AWS4-HMAC-SHA256 Credential=GKxxxxx/20260227/garage/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;amz-sdk-request;cache-control;content-encoding;content-length;content-type;host;x-amz-content-sha256;x-amz-date;x-amz-decoded-content-length;x-amz-meta-md5chksum;x-amz-trailer, Signature=76cd9b77f693ca89c2e6dd2a4dc55f83d4a82eca0f563d9d095ff96076f7b057",
    "cache-control": "max-age=31536000, no-transform, public",
    "content-encoding": "gzip",                                           # <---- see here 1st instance of Content-Encoding
    "content-encoding": "aws-chunked",                                    # <---- 2nd instance of Content-Encoding
    "content-type": "text/javascript",
    "via": "2.0 Caddy",
    "x-amz-content-sha256": "STREAMING-UNSIGNED-PAYLOAD-TRAILER",
    "x-amz-date": "20260227T132212Z",
    "x-amz-decoded-content-length": "9982",
    "x-amz-meta-md5chksum": "aad88ac0bf704e91584b8d9ad9796670",
    "x-amz-trailer": "x-amz-checksum-crc32",
    "x-forwarded-for": "::1",
    "x-forwarded-host": "localhost",
    "x-forwarded-proto": "https"
  },
  body: Body(Streaming)
}
```

But our canonical request function only calls `HeaderMap.get()` that returns only the 1st value and not `HeaderMap.get_all()` that returns all the values for a header.
Leading to the following invalid `CanonicalRequest` value:

```python
PUT
/sebou/pagefind/pagefind.js
x-id=PutObject
accept-encoding:identity
amz-sdk-invocation-id:aed6df34-a67c-4bab-b63b-2b3777b751a0
amz-sdk-request:attempt=1; max=3
cache-control:max-age=31536000, no-transform, public
content-encoding:gzip                                                             # <----- see here, we kept only gzip and dropped aws-chunked
content-length:10026
content-type:text/javascript
host:localhost
x-amz-content-sha256:STREAMING-UNSIGNED-PAYLOAD-TRAILER
x-amz-date:20260227T132212Z
x-amz-decoded-content-length:9982
x-amz-meta-md5chksum:aad88ac0bf704e91584b8d9ad9796670
x-amz-trailer:x-amz-checksum-crc32

accept-encoding;amz-sdk-invocation-id;amz-sdk-request;cache-control;content-encoding;content-length;content-type;host;x-amz-content-sha256;x-amz-date;x-amz-decoded-content-length;x-amz-meta-md5chksum;x-amz-trailer
```

Amazon is crystal clear that, instead of dropping the other values, we should concatenate them with a comma:

![20260227_17h26m20s_grim](/attachments/e3edf7bf-7dff-43d7-80d9-cf276ae94ed5)

https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_sigv-create-signed-request.html#create-canonical-request
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1369
Reviewed-by: Alex <lx@deuxfleurs.fr>
Co-authored-by: Quentin Dufour <quentin@deuxfleurs.fr>
Co-committed-by: Quentin Dufour <quentin@deuxfleurs.fr>
2026-02-27 18:02:31 +00:00
trinity-1686a
668dfea4e2 fix silent write errors (#1360)
same as #1358 for garage-v2

Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1360
Co-authored-by: trinity-1686a <trinity@deuxfleurs.fr>
Co-committed-by: trinity-1686a <trinity@deuxfleurs.fr>
2026-02-24 14:40:11 +00:00
maximilien
7f61bbbebb Merge pull request 'helm: add priorityClassName support' (#1357) from blue.lion4023/garage:helm-add-priority-class-name into main-v2
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1357
Reviewed-by: maximilien <git@mricher.fr>
2026-02-21 08:23:14 +00:00
blue.lion4023
8105ca888d helm: add priorityClassName support to pod spec 2026-02-20 21:36:08 +00:00
Alex
d0166fe938 Merge pull request 'Upgrade quick-xml crate to 0.39' (#1319) from gwenlg/garage:quick_xml_upgrade into main-v2
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1319
Reviewed-by: Alex <lx@deuxfleurs.fr>
2026-02-20 21:29:26 +00:00
Gwen Lg
290a7f5ab6 fix: VersioningConfiguration xml reference
empty element handling is set as expanded and be consistant.
2026-02-20 21:29:26 +00:00
Gwen Lg
2576626240 fix: configure xmk serializer to expand empty elements 2026-02-20 21:29:26 +00:00
Gwen Lg
6591044c2e fix: set quote level to full for xml serialization
also remove use of intermediate String
2026-02-20 21:29:26 +00:00