garage

mirror of https://git.deuxfleurs.fr/Deuxfleurs/garage.git synced 2026-05-06 01:56:10 +02:00

Author	SHA1	Message	Date
Alex Auvolat	7b119c0b4f	bump version number to v2.3.0 v2.3.0	2026-04-16 18:34:27 +02:00
Alex Auvolat	02d5e67698	db: avoid iterating bounded from empty slice (fix #1401 ) (#1408 ) Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1408 Co-authored-by: Alex Auvolat <lx@deuxfleurs.fr> Co-committed-by: Alex Auvolat <lx@deuxfleurs.fr>	2026-04-16 16:33:28 +00:00
maximilien	854280e957	Merge pull request 'helm: Conditionally skip CRD management RBAC rule' (#1248 ) from boris.m/garage:feat/drop-crd-management-rbac-rule into main-v2 Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1248 Reviewed-by: maximilien <git@mricher.fr>	2026-04-16 16:22:17 +00:00
B Marinov	9ea2b1d628	helm: Conditionally skip CRD management RBAC rule Remove rule permitting changes to CRDs when garage.kubernetesSkipCrd is set to true.	2026-04-16 16:22:17 +00:00
maximilien	7b7548a4f7	Merge pull request 'Fix helm existing configmap volume ref in workload' (#1388 ) from PhilleZi/garage:fix-helm-existing-configmap into main-v2 Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1388 Reviewed-by: maximilien <git@mricher.fr>	2026-04-16 16:20:27 +00:00
Philip Zingmark	a2e410f8b6	Fix helm existing configmap volume ref in workload	2026-04-16 16:20:01 +00:00
Alex	690729ccdb	Merge pull request 'fix: bound known_addrs growth and add TCP connect timeout' (#1345 ) from rajsinghtech/garage:fix/peering-stale-addr-reconnection into main-v2 Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1345	2026-04-15 11:42:38 +00:00
Alex Auvolat	ff743453b6	garage_net: make pruning logic simpler and add test	2026-04-15 11:42:38 +00:00
Raj Singh	f34a7db48a	fix: bound known_addrs growth known_addrs in PeerInfoInternal is append-only — addresses accumulate via add_addr() and PeerList gossip but are never removed. In dynamic environments (k8s pod restarts, DHCP, NAT traversal), this list grows unboundedly with stale addresses. Combined with sequential iteration in try_connect() and no TCP connect timeout in netapp.rs, each unreachable address blocks reconnection for the kernel's TCP SYN timeout (75-130s on Linux). With 10+ stale addresses, worst-case reconnection exceeds 750s — a full outage for replication_factor=3 clusters. This commit contains the two following changes: 1. Address failure tracking and pruning (peering.rs): Track consecutive connection failures per address in PeerInfoInternal. After 3 failures, prune from known_addrs. Reset count when address is re-advertised via gossip or incoming connection. Prevents unbounded list growth. 2. Shuffle before connecting (peering.rs): Randomize address order in try_connect() so the valid address (often appended last) gets a fair chance instead of always trying stale addresses first.	2026-04-15 11:42:38 +00:00
Raj Singh	3a355b1617	fix: add TCP connect timeout known_addrs in PeerInfoInternal is append-only — addresses accumulate via add_addr() and PeerList gossip but are never removed. In dynamic environments (k8s pod restarts, DHCP, NAT traversal), this list grows unboundedly with stale addresses. Combined with sequential iteration in try_connect() and no TCP connect timeout in netapp.rs, each unreachable address blocks reconnection for the kernel's TCP SYN timeout (75-130s on Linux). With 10+ stale addresses, worst-case reconnection exceeds 750s — a full outage for replication_factor=3 clusters. This patches includes a first change to fix this issue: 1. TCP connect timeout (netapp.rs): Wrap TcpStream::connect() in tokio::time::timeout(10s). Caps per-address attempt from 75-130s to 10s, reducing worst-case 10-addr reconnection from ~750s to ~100s.	2026-04-15 11:42:38 +00:00
Alex	0b5e82a18b	Merge pull request 'Cherry-pick #1396 for main-v2' (#1404 ) from fix-starvation into main-v2 Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1404	2026-04-15 10:35:22 +00:00
Gauthier Zirnhelt	2798667345	Fix the LifecycleWorker being uncooperative (#1396 ) ## Summary This PR ensures that the `LifecycleWorker` yields at least once to the Tokio scheduler in between each batch of 100 objects. ## Problem being solved I'm administrating a Garage cluster which has been experiencing timeouts on all endpoints while the lifecycle worker is running at midnight UTC : `Ping timeout` error messages and even requests eventually failing due to `Could not reach quorum ...`. I have found that this happens while the lifecycle worker is working on a big bucket (containing millions of objects) with a lifecycle rule that applies to very few objects. The `process_object()` function does not hit any `await`: - `last_bucket` is always the same, so the `bucket_table` is not read asynchronously - no transaction is made on the `object_table` because my lifecycle rule (almost) never applies to any object The first commit in this PR adds an executable which reproduces the problem that I've been experiencing in a self-contained way : the lifecycle worker starves the Tokio scheduler so much that no other task is able to run (or very rarely). To run it : `cargo run -p garage_model --bin lifecycle-starvation-test`. This commit can be dropped post-review, as it's only useful to demonstrate the starvation. The error messages completely stopped after adding the extra yield to the nodes of my cluster. The duration of the lifecycle worker task does not appear to have changed at all from what I can see (looking at the timestamps produced either by the self-contained binary or by each of my nodes with the `Lifecycle worker finished` message). ## Note An other potential fix would have been to force the `WorkerProcessor` to yield before re-enqueuing a busy task, but this would have affected all Garage workers even though it's only the `LifecycleWorker` being uncooperative. Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1396 Reviewed-by: Alex <lx@deuxfleurs.fr> Co-authored-by: Gauthier Zirnhelt <gauthier.zirnhelt@insimo.fr> Co-committed-by: Gauthier Zirnhelt <gauthier.zirnhelt@insimo.fr>	2026-04-15 12:13:18 +02:00
Alex	b1660f0cba	Merge pull request 'document known issues' (#1379 ) from doc-known-issues into main-v2 Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1379	2026-04-15 10:11:39 +00:00
Alex Auvolat	dfb20ba87f	doc: write details of known issues	2026-04-15 10:11:39 +00:00
maximilien	7279cb9113	Add comment on tags	2026-04-15 10:11:39 +00:00
Alex Auvolat	56cb89d153	wip: list known issues in documentation	2026-04-15 10:11:39 +00:00
Armael	6fd9bba0cb	WebsiteConfiguration: do not emit empty XML attributes for absent values (#1391 ) This fixes a regression wrt garage-v1, likely caused by the version upgrade of quick_xml. Currently, garage-v2 will emit empty ErrorDocument/IndexDocument/RedirectAllRequestsTo attributes in the response of GetBucketWebsite if there are no corresponding values. This is somewhat wrong; at least, the S3 documentation for RedirectAllRequestsTo (https://docs.aws.amazon.com/AmazonS3/latest/API/API_RedirectAllRequestsTo.html) writes that it has a required HostName field. So emitting an empty RedirectAllRequestsTo is invalid. This PR skips emitting XML attributes for these parameters if they contain no value. Co-authored-by: Armaël Guéneau <armael.gueneau@ens-lyon.org> Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1391 Co-authored-by: Armael <armael@noreply.localhost> Co-committed-by: Armael <armael@noreply.localhost>	2026-04-13 13:59:32 +00:00
Jul Lang	f9605fae78	fix typo (#1402 ) found by [typos](https://github.com/crate-ci/typos) Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1402 Co-authored-by: Jul Lang <jullanggit@proton.me> Co-committed-by: Jul Lang <jullanggit@proton.me>	2026-04-13 12:12:57 +00:00
Armael	9969c3e599	Fix: correctly parse CORS website configuration with no rules (#1392 ) This is a port of #1320 on top of the main-v2 branch. Co-authored-by: Armaël Guéneau <armael.gueneau@ens-lyon.org> Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1392 Co-authored-by: Armael <armael@noreply.localhost> Co-committed-by: Armael <armael@noreply.localhost>	2026-03-22 17:09:16 +00:00
Alex	a69a8d3b21	Merge pull request 'force uri encoding before check signature' (#1382 ) from gwenlg/garage:signature_doesnt_match_1155 into main-v2 Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1382 Reviewed-by: Alex <lx@deuxfleurs.fr>	2026-03-22 10:59:43 +00:00
Gwen Lg	3a97b13e2f	wip: add percent_decode before uri_encode for check signature this avoid error when request uri is not encoded for signature	2026-03-22 10:59:43 +00:00
Gwen Lg	4efaea60bb	tests: check request signatures with 'badly-encoded' uri test related to issue #1155 and #1255	2026-03-22 10:59:43 +00:00
Gwen Lg	06e9756729	test: some error rework	2026-03-22 10:59:43 +00:00
trinity-1686a	8341b7f914	log api error in one self-sufficient line (fix #1381 ) (#1390 ) this makes it more easy to correlate an error with the request that caused it. This can be helpful during debugging, or when setting up some sort of automation based on log content Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1390 Reviewed-by: Alex <lx@deuxfleurs.fr> Reviewed-by: maximilien <git@mricher.fr> Co-authored-by: trinity-1686a <trinity@deuxfleurs.fr> Co-committed-by: trinity-1686a <trinity@deuxfleurs.fr>	2026-03-20 20:22:34 +00:00
MrSnowy	96b986a0a0	Add completions sub-command for generating shell completions (#1386 ) Made a quick pr to add a sub-command called completions for generating shell completions, was going pretty crazy that this wasn't a thing :P. Tried my best to do everything properly, let me know if I need to change something, I tested it and it works perfectly. Co-authored-by: MrSnowy <snow@mrsnowy.dev> Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1386 Reviewed-by: Alex <lx@deuxfleurs.fr> Co-authored-by: MrSnowy <mrsnowy@noreply.localhost> Co-committed-by: MrSnowy <mrsnowy@noreply.localhost>	2026-03-17 18:17:51 +00:00
trinity-1686a	60244b60dd	don't panic on missing checksum (fix #1387 ) (#1389 ) fix https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/1387 Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1389 Reviewed-by: Alex <lx@deuxfleurs.fr> Co-authored-by: trinity-1686a <trinity-1686a@noreply.localhost> Co-committed-by: trinity-1686a <trinity-1686a@noreply.localhost>	2026-03-17 18:16:37 +00:00
Alex	9848ec7f4e	Merge pull request 'add missing admin API endpoints for admin UI' (#1376 ) from admin-json-statistics into main-v2 Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1376	2026-03-17 17:44:29 +00:00
Alex Auvolat	b81eae3f65	admin api: don't fail in getclusterstatistics when counting total objects/bytes	2026-03-17 17:44:29 +00:00
Alex Auvolat	6131318c80	admin api: don't gather all bucket statistics if too many buckets	2026-03-17 17:44:29 +00:00
Alex Auvolat	4566020360	admin api: convert new fields to Option<T>	2026-03-17 17:44:29 +00:00
Alex Auvolat	de10dc43d5	admin api: return total buckets, objects and bytes in GetClusterStatistics	2026-03-17 17:44:29 +00:00
Alex Auvolat	8abd0fee86	admin api: add fixme comments for cleanup for v3 release	2026-03-17 17:44:29 +00:00
Alex Auvolat	af5f68a34d	admin api: allow updating website routing rules	2026-03-17 17:44:29 +00:00
Alex Auvolat	19e5f83164	admin api: update cors and lifecycle rules in UpdateBucket	2026-03-17 17:44:29 +00:00
Alex Auvolat	64087172ff	admin api: expose routing rules, cors rules and lifecycle rules	2026-03-17 17:44:29 +00:00
Alex Auvolat	6c0bb1c9b6	refactoring: move xml definitions for bucket cors/lifecycle/website config move these defnitions to garage_api_common so that they can also be used in admin api	2026-03-17 17:44:29 +00:00
Alex Auvolat	124a9eb521	admin api: export node statistics as structured json	2026-03-17 17:44:29 +00:00
Alex Auvolat	03e6020c6b	admin api: report avilable space numerically in GetClusterStatistics	2026-03-17 17:44:29 +00:00
milouz1985	836657565e	s3: fix DeleteObjects XML parsing with pretty-printed bodies (#1374 ) ## Summary This PR fixes S3 `DeleteObjects` XML parsing when the request body is pretty-printed (contains indentation/newlines as whitespace text nodes). Although PR #1324 already tried to address this, parsing could still fail with: `InvalidRequest: Bad request: Invalid delete XML query` because non-element nodes were validated but not actually skipped in the parsing loop. ## What changed - In `src/api/s3/delete.rs`: - Properly skip non-element whitespace text nodes while iterating over `<Delete>` children. - Keep rejecting non-whitespace stray text content. - Parse the root `<Delete>` element more robustly by selecting the first element child. ## Tests added New unit tests in `src/api/s3/delete.rs`: - `parse_delete_objects_xml_with_formatting` - pretty-printed valid XML is accepted. - `parse_delete_objects_xml_accepts_compact_valid_xml` - compact valid XML is accepted. - `parse_delete_objects_xml_rejects_non_whitespace_text_node` - compact XML with stray text is rejected. - `parse_delete_objects_xml_rejects_pretty_print_with_stray_text` - pretty-printed XML with stray text is rejected. ## Validation Executed: ```bash cargo test -p garage_api_s3 parse_delete_objects_xml -- --nocapture ``` Result: all parser tests pass. Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1374 Co-authored-by: milouz1985 <francois.hoyez@gmail.com> Co-committed-by: milouz1985 <francois.hoyez@gmail.com>	2026-03-15 10:40:50 +00:00
trinity-1686a	76592723de	don't send empty 404 on GetBucketCORS/GetBucketLifecycle (#1378 ) Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1378 Reviewed-by: Alex <lx@deuxfleurs.fr> Co-authored-by: trinity-1686a <trinity@deuxfleurs.fr> Co-committed-by: trinity-1686a <trinity@deuxfleurs.fr>	2026-03-10 09:41:08 +00:00
Ira Iva	d2f033641e	Suppress log noise from /metrics and /health endpoints [#1292 ]. Change log level for 'netapp: incomming connection ...' message [#1310 ] (#1361 ) Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1361 Co-authored-by: Ira Iva <xatikopro@gmail.com> Co-committed-by: Ira Iva <xatikopro@gmail.com>	2026-03-03 15:52:53 +00:00
Roman Ivanov	2cfd92e0c3	Use error NoSuchAccessKey in get info request processing (#1293 ) (#1356 ) Fix for https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/1293 Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1356 Reviewed-by: Alex <lx@deuxfleurs.fr> Co-authored-by: Roman Ivanov <xatikopro@gmail.com> Co-committed-by: Roman Ivanov <xatikopro@gmail.com>	2026-02-27 18:11:57 +00:00
Quentin Dufour	f796df8c34	Support streaming of gzip content involving multiple `Content-Encoding` headers (#1369 ) ## Problem `hugo deploy` is broken with Garage on recent hugo versions when using gzip matchers ## Why? We don't support multi-value headers correctly, in this case this specific headers combination: ``` Content-Encoding: gzip Content-Encoding: aws-chunked ``` is interpreted as: ``` Content-Encoding: gzip ``` instead of: ``` Content-Encoding: gzip,aws-chunked ``` It fails both 1. the signature check and 2. the streaming check. ## Proposed fix - Taking into account multi-value headers when building Canonical Request (validated with hugo deploy + AWS SDK v2) - Taking into account multi-value headers (both comma separated and HeaderEntry separated) when removing `aws-chunked` (validated with hugo deploy + AWS SDK v2) ## Full explanation Currently, `hugo deploy` on version `hugo v0.152.2` or more recent uses AWS SDK v2 only and supports for sending gzipped content. That's configured with a matcher like that: ```yaml deployment: matchers: - pattern: "^.+\\.(woff2\|woff\|svg\|ttf\|otf\|eot\|js\|css)$" cacheControl: "max-age=31536000, no-transform, public" gzip: true # <-------- here ``` Also, with SDK v2, hugo is streaming all of its files. Thus, it sends that kind of requests: ```python Request { method: PUT, uri: /sebou/pagefind/pagefind.js?x-id=PutObject, version: HTTP/1.1, headers: { "host": "localhost", "user-agent": "aws-sdk-go-v2/1.39.2 ua/2.1 os/linux lang/go#1.25.6 md/GOOS#linux md/GOARCH#amd64 api/s3#1.84.0 ft/s3-transfer m/E,G,Z,g", "content-length": "10026", "accept-encoding": "identity", "amz-sdk-invocation-id": "aed6df34-a67c-4bab-b63b-2b3777b751a0", "amz-sdk-request": "attempt=1; max=3", "authorization": "AWS4-HMAC-SHA256 Credential=GKxxxxx/20260227/garage/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;amz-sdk-request;cache-control;content-encoding;content-length;content-type;host;x-amz-content-sha256;x-amz-date;x-amz-decoded-content-length;x-amz-meta-md5chksum;x-amz-trailer, Signature=76cd9b77f693ca89c2e6dd2a4dc55f83d4a82eca0f563d9d095ff96076f7b057", "cache-control": "max-age=31536000, no-transform, public", "content-encoding": "gzip", # <---- see here 1st instance of Content-Encoding "content-encoding": "aws-chunked", # <---- 2nd instance of Content-Encoding "content-type": "text/javascript", "via": "2.0 Caddy", "x-amz-content-sha256": "STREAMING-UNSIGNED-PAYLOAD-TRAILER", "x-amz-date": "20260227T132212Z", "x-amz-decoded-content-length": "9982", "x-amz-meta-md5chksum": "aad88ac0bf704e91584b8d9ad9796670", "x-amz-trailer": "x-amz-checksum-crc32", "x-forwarded-for": "::1", "x-forwarded-host": "localhost", "x-forwarded-proto": "https" }, body: Body(Streaming) } ``` But our canonical request function only calls `HeaderMap.get()` that returns only the 1st value and not `HeaderMap.get_all()` that returns all the values for a header. Leading to the following invalid `CanonicalRequest` value: ```python PUT /sebou/pagefind/pagefind.js x-id=PutObject accept-encoding:identity amz-sdk-invocation-id:aed6df34-a67c-4bab-b63b-2b3777b751a0 amz-sdk-request:attempt=1; max=3 cache-control:max-age=31536000, no-transform, public content-encoding:gzip # <----- see here, we kept only gzip and dropped aws-chunked content-length:10026 content-type:text/javascript host:localhost x-amz-content-sha256:STREAMING-UNSIGNED-PAYLOAD-TRAILER x-amz-date:20260227T132212Z x-amz-decoded-content-length:9982 x-amz-meta-md5chksum:aad88ac0bf704e91584b8d9ad9796670 x-amz-trailer:x-amz-checksum-crc32 accept-encoding;amz-sdk-invocation-id;amz-sdk-request;cache-control;content-encoding;content-length;content-type;host;x-amz-content-sha256;x-amz-date;x-amz-decoded-content-length;x-amz-meta-md5chksum;x-amz-trailer ``` Amazon is crystal clear that, instead of dropping the other values, we should concatenate them with a comma: ![20260227_17h26m20s_grim](/attachments/e3edf7bf-7dff-43d7-80d9-cf276ae94ed5) https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_sigv-create-signed-request.html#create-canonical-request Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1369 Reviewed-by: Alex <lx@deuxfleurs.fr> Co-authored-by: Quentin Dufour <quentin@deuxfleurs.fr> Co-committed-by: Quentin Dufour <quentin@deuxfleurs.fr>	2026-02-27 18:02:31 +00:00
trinity-1686a	668dfea4e2	fix silent write errors (#1360 ) same as #1358 for garage-v2 Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1360 Co-authored-by: trinity-1686a <trinity@deuxfleurs.fr> Co-committed-by: trinity-1686a <trinity@deuxfleurs.fr>	2026-02-24 14:40:11 +00:00
maximilien	7f61bbbebb	Merge pull request 'helm: add priorityClassName support' (#1357 ) from blue.lion4023/garage:helm-add-priority-class-name into main-v2 Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1357 Reviewed-by: maximilien <git@mricher.fr>	2026-02-21 08:23:14 +00:00
blue.lion4023	8105ca888d	helm: add priorityClassName support to pod spec	2026-02-20 21:36:08 +00:00
Alex	d0166fe938	Merge pull request 'Upgrade quick-xml crate to 0.39' (#1319 ) from gwenlg/garage:quick_xml_upgrade into main-v2 Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1319 Reviewed-by: Alex <lx@deuxfleurs.fr>	2026-02-20 21:29:26 +00:00
Gwen Lg	290a7f5ab6	fix: VersioningConfiguration xml reference empty element handling is set as expanded and be consistant.	2026-02-20 21:29:26 +00:00
Gwen Lg	2576626240	fix: configure xmk serializer to expand empty elements	2026-02-20 21:29:26 +00:00
Gwen Lg	6591044c2e	fix: set quote level to full for xml serialization also remove use of intermediate String	2026-02-20 21:29:26 +00:00

1 2 3 4 5 ...

2722 Commits