## Title
fix(cors): return single matching origin instead of multiple values in `Access-Control-Allow-Origin`
## Summary
This PR fixes bucket CORS responses when a single CORS rule contains multiple `AllowedOrigins`.
Previously, Garage returned the configured origins as a comma-separated list in `Access-Control-Allow-Origin`, for example:
```http
Access-Control-Allow-Origin: https://app.example.test, https://admin.example.test
```
This is not the expected browser-facing behavior.
When a request origin matches a configured rule, the response should reflect **only the matching request origin**, unless the rule contains `*`.
## What changed
- `Access-Control-Allow-Origin` now behaves as follows:
- returns `*` when the matched rule contains a wildcard origin
- otherwise returns the request `Origin` as a **single value**
- added `Vary: Origin` when ACAO reflects the request origin
- added preflight-specific `Vary` handling in the preflight path for:
- `Origin`
- `Access-Control-Request-Method`
- `Access-Control-Request-Headers`
## Scope
This change applies to shared bucket CORS handling paths, including:
- S3 API responses
- K2V API responses
- S3 POST object responses
- web bucket responses
- preflight (`OPTIONS`) bucket CORS responses
This does **not** change admin API fixed CORS behavior.
## Reproduction
A direct repro script is included:
```bash
./script/test-cors-multi-origin.sh
```
It exercises two cases against a direct single-node Garage instance:
1. **single-origin control**
2. **multi-origin repro**
Before this fix, the multi-origin case returned a comma-separated ACAO value.
After this fix, both cases reflect only the request origin.
## Example behavior
### Before
```http
Access-Control-Allow-Origin: https://app.example.test, https://admin.example.test
```
### After
```http
Access-Control-Allow-Origin: https://app.example.test
```
## Tests
Added/updated tests in `src/api/common/cors.rs` for:
- single-origin control
- multiple allowed origins reflecting the request origin
- wildcard origin preserving `*`
- preserving existing `Vary` values while appending `Origin`
## Validation
Used for validation:
```bash
cargo test -p garage_api_common cors::tests -- --nocapture
cargo build -p garage --bin garage
./script/test-cors-multi-origin.sh
```
## Reproducibility
For reviewers who want to validate behavior by commit:
- Before fix: `aa368e4b`
- includes the direct repro script and the regression test setup
- multi-origin ACAO is reproduced as a comma-separated value
- After fix: `f630eb92`
- reflects only the matching request origin
- preserves wildcard behavior
- adds `Vary: Origin` and preflight-specific `Vary` handling
Branch:
- `fix/cors-multiple-allow-origin`
Base used during validation:
- `74ad3bf8` (`main-v2`)
ClosesDeuxfleurs/garage#1149
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1419
## Summary
Garage's SigV4 canonical-request builder trims leading/trailing whitespace from signed header values but does not collapse sequential internal whitespace, which the SigV4 spec requires:
> Convert sequential spaces to a single space.
— https://docs.aws.amazon.com/IAM/latest/UserGuide/create-signed-request.html
AWS SDKs apply this normalization before computing the signature, but transmit the raw value on the wire. The receiver must therefore apply the same normalization when reconstructing the canonical request, otherwise the recomputed hash differs and the request is rejected as `Invalid signature`.
Same class of canonicalization-drift bug as #1155 / !1382, but on the canonical-headers axis rather than the canonical-URI axis.
## Reproduction
Surfaces in practice with `gitlab-runner`'s S3 cache uploader. I was in the midst of migrating my runner cache from AWS S3 to garage, but I noticed some shared runner caches were no longer uploading.
I was using `sha256sum | sha256sum` to compute my cache keys, which leaves a trailing ` -` on the value. Once GitLab appends `-protected` for protected branches the resulting `x-amz-meta-cachekey` header value contains internal sequential whitespace and triggers the mismatch:
```
x-amz-meta-cachekey:php- --protected
^^
two spaces, preserved by Garage
```
Without the fix the included regression test (`test_presigned_put_with_user_metadata`) fails with HTTP 403; with the fix it returns 200.
`aws-cli` is unaffected because it signs `Content-Type` rather than user metadata, so the specific code path with whitespace-bearing signed header values isn't exercised.
## Fix
In `canonical_request` (`src/api/common/signature/payload.rs`), replace the `.trim()` call on the joined header value with the full SigV4 normalization — `split_whitespace().collect::<Vec<_>>().join(" ")` — which both trims edges and collapses internal runs.
## Tests
* New regression test `test_presigned_put_with_user_metadata` covering a presigned PUT whose `x-amz-meta-*` value contains internal sequential whitespace.
* Full integration suite passes: `40 passed; 0 failed; 2 ignored`.
* `garage_api_common` unit tests pass: `18 passed; 0 failed`.
## Notes
* Backwards-compatible: any signature that validated before still validates, because clients are spec-required to collapse on their side; Garage was only rejecting requests where the client had collapsed correctly but Garage hadn't.
* No config or migration changes.
* Fix applies to both presigned-URL and Authorization-header code paths since they share the canonical-request builder.
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1424
Reviewed-by: Alex <lx@deuxfleurs.fr>
Merging this first version as a baseline. For future work: write a security.md document to explain how to report security vulnerabilities, and split off the release process in a separate releasing.md document
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1406
Otherwise the rustls dependency might be built with both aws-lc and ring backends,
leading to the following error in the k2v_client tests when
consul-discovery feature is enabled (including the reqwest dependency):
```
Could not automatically determine the process-level CryptoProvider from Rustls crate features.
Call CryptoProvider::install_default() before this point to select a provider manually, or make sure exactly one of the 'aws-lc-rs' and 'ring' features is enabled.
See the documentation of the CryptoProvider type for more information.
```
Co-authored-by: Yureka <yuka@yuka.dev>
Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/1412
Reviewed-by: Alex <lx@deuxfleurs.fr>
known_addrs in PeerInfoInternal is append-only — addresses accumulate
via add_addr() and PeerList gossip but are never removed. In dynamic
environments (k8s pod restarts, DHCP, NAT traversal), this list grows
unboundedly with stale addresses.
Combined with sequential iteration in try_connect() and no TCP connect
timeout in netapp.rs, each unreachable address blocks reconnection for
the kernel's TCP SYN timeout (75-130s on Linux). With 10+ stale
addresses, worst-case reconnection exceeds 750s — a full outage for
replication_factor=3 clusters.
This commit contains the two following changes:
1. Address failure tracking and pruning (peering.rs): Track consecutive
connection failures per address in PeerInfoInternal. After 3 failures,
prune from known_addrs. Reset count when address is re-advertised via
gossip or incoming connection. Prevents unbounded list growth.
2. Shuffle before connecting (peering.rs): Randomize address order in
try_connect() so the valid address (often appended last) gets a fair
chance instead of always trying stale addresses first.
known_addrs in PeerInfoInternal is append-only — addresses accumulate
via add_addr() and PeerList gossip but are never removed. In dynamic
environments (k8s pod restarts, DHCP, NAT traversal), this list grows
unboundedly with stale addresses.
Combined with sequential iteration in try_connect() and no TCP connect
timeout in netapp.rs, each unreachable address blocks reconnection for
the kernel's TCP SYN timeout (75-130s on Linux). With 10+ stale
addresses, worst-case reconnection exceeds 750s — a full outage for
replication_factor=3 clusters.
This patches includes a first change to fix this issue:
1. TCP connect timeout (netapp.rs): Wrap TcpStream::connect() in
tokio::time::timeout(10s). Caps per-address attempt from 75-130s
to 10s, reducing worst-case 10-addr reconnection from ~750s to ~100s.