minio

mirror of https://github.com/minio/minio.git synced 2025-08-15 02:26:58 +02:00

Author	SHA1	Message	Date
Harshavardhana	53ce92b9ca	fix: use the right channel to feed the data in (#18605 ) this PR fixes a regression in batch replication where we weren't sending any data from the Walk() results due to incorrect channels being used.	2023-12-06 18:17:03 -08:00
Krishnan Parthasarathi	c397fb6c7a	Minor fixes to bucket replication (#18578 )	2023-12-01 16:13:08 -08:00
Harshavardhana	bd0819330d	avoid Walk() API listing objects without quorum (#18535 ) This allows batch replication to basically do not attempt to copy objects that do not have read quorum. This PR also allows walk() to provide custom values for quorum under batch replication, and key rotation.	2023-11-27 17:20:04 -08:00
Harshavardhana	a4cfb5e1ed	return errors if dataDir is missing during HeadObject() (#18477 ) Bonus: allow replication to attempt Deletes/Puts when the remote returns quorum errors of some kind, this is to ensure that MinIO can rewrite the namespace with the latest version that exists on the source.	2023-11-20 21:33:47 -08:00
Anis Eleuch	02331a612c	batch-repl: Replicate missing metadata and standard headers (#18484 ) - Replicate Expires when the source is local or remote - Replicate metadata when the source is remote	2023-11-18 19:12:44 -08:00
Krishnan Parthasarathi	9569a85cee	Avoid allocs for MRF on-disk header (#18425 )	2023-11-10 19:54:46 -08:00
Poorna	03dc65e12d	Reload replication targets lazily if missing (#18333 ) There can be rare situations where errors seen in bucket metadata load on startup or subsequent metadata updates can result in missing replication remotes. Attempt a refresh of remote targets backed by a good replication config lazily in 5 minute intervals if there ever occurs a situation where remote targets go AWOL.	2023-10-27 21:08:53 -07:00
Harshavardhana	e1e33077e8	fix: tests and resync replication status (#18244 )	2023-10-13 17:03:34 -07:00
Harshavardhana	74e0c9ab9b	reduce unnecessary logging, simplify certain error handling (#18196 ) remove a bunch of unnecessary logs	2023-10-10 00:33:42 -07:00
Poorna	72871dbb9a	delete replication: avoid overwriting replication decision (#18174 ) from ObjectInfo unless version purge status is present. Otherwise there is potential to make incorrect replication decision if Stat returned an error	2023-10-05 21:09:45 -06:00
Poorna	b73699fad8	replication: pass user tags while queueing (#18052 ) Continues from #18032 - otherwise replication will fail on tag based rules.	2023-09-19 03:18:28 -07:00
jiuker	9947c01c8e	feat: SSE-KMS use uuid instead of read all data to md5. (#17958 )	2023-09-18 10:00:54 -07:00
Harshavardhana	fa6d082bfd	reduce all major allocations in replication path (#18032 ) - remove targetClient for passing around via replicationObjectInfo{} - remove cloing to object info unnecessarily - remove objectInfo from replicationObjectInfo{} (only require necessary fields)	2023-09-16 02:28:06 -07:00
Harshavardhana	a2aabfabd9	add backups for usage-caches to rely on upon error (#18029 ) This allows scanner to avoid lengthy scans, skip things appropriately and also not lose metrics in any manner. reduce longer deadlines for usage-cache loads/saves to match the disk timeout which is 2minutes now per IOP.	2023-09-14 11:53:52 -07:00
Poorna	96fbf18201	replication: queue existing objects to same workers as incoming (#18020 ) Previously existing objects were queued to single worker and MRF re-queues are also handled by same worker - this does not fully use the available bandwidth in case there is no incoming workload.	2023-09-12 21:59:15 -07:00
Harshavardhana	1df5e31706	optimize MRF replication queue to avoid memory leaks (#18007 )	2023-09-11 20:59:11 -07:00
Poorna	703ed46d79	fix: replication of tags while removing (#17989 ) A tag removal was not being replicated prior to this change	2023-09-06 19:05:02 -07:00
Poorna	13a2dc8485	replication resync: avoid blocking on results channel. (#17981 ) continues fix in #17775	2023-09-05 20:22:39 -07:00
Harshavardhana	5b114b43f7	refactor bandwidth throttling for replication target (#17980 ) This refactor is to allow using the bandwidth throttling for other purposes.	2023-09-05 20:21:59 -07:00
Poorna	d665e855de	replication: remove check for empty version id (#17964 )	2023-09-01 13:46:10 -07:00
Poorna	b48bbe08b2	Add additional info for replication metrics API (#17293 ) to track the replication transfer rate across different nodes, number of active workers in use and in-queue stats to get an idea of the current workload. This PR also adds replication metrics to the site replication status API. For site replication, prometheus metrics are no longer at the bucket level - but at the cluster level. Add prometheus metric to track credential errors since uptime	2023-08-30 01:00:59 -07:00
Poorna	4a6af93c83	mark replication target offline if network timeouts seen (#17907 ) regular target liveness check every 5 secs will toggle state back as target returns online.	2023-08-24 09:24:26 -07:00
Harshavardhana	1c5af7c31a	serialize queueMRFHeal(), add timeouts and avoid normal build-ups (#17886 ) we expect a certain level of IOPs and latency so this is okay. fixes other miscellaneous bugs - such as hanging on mrfCh <- when the context is canceled - queuing MRF heal when the context is canceled - remove unused saveStateCh channel	2023-08-21 16:44:50 -07:00
Poorna	dfaf735073	replication: fix queuing of large uploads (#17831 ) Fixes regression from #17687	2023-08-10 15:48:42 -07:00
Harshavardhana	b732a673dc	reduce logging in bucket replication in retry scenarios (#17820 )	2023-08-08 13:27:40 -07:00
Poorna	26c23b30f4	replication: set context timeout for NewMultipartUpload calls (#17807 )	2023-08-05 12:27:07 -07:00
Poorna	311380f8cb	replication resync: fix queueing (#17775 ) Assign resync of all versions of object to the same worker to avoid locking contention. Fixes parallel resync implementation in #16707	2023-08-01 11:51:15 -07:00
Poorna	1a42693d68	replication: limit larger uploads to a subset of workers (#17687 ) Limit large uploads (> 128MiB) to a max of 10 workers, intent is to avoid larger uploads from using all replication bandwidth, giving room for smaller uploads to sync faster.	2023-07-25 20:02:02 -07:00
Harshavardhana	005a4a275a	add more bootstrap messages to provide latency (#17650 ) - simplify refreshing bucket metadata, wait() to depend on how fast the bucket metadata can load. - simplify resync to start resync in single pass.	2023-07-14 04:00:29 -07:00
Poorna	5e2f8d7a42	replication: Simplify mrf requeueing and add backlog handler (#17171 ) Simplify MRF queueing and add backlog handler - Limit re-tries to 3 to avoid repeated re-queueing. Fall offs to be re-tried when the scanner revisits this object or upon access. - Change MRF to have each node process only its MRF entries. - Collect MRF backlog by the node to allow for current backlog visibility	2023-07-12 23:51:33 -07:00
Kaan Kabalak	f64d62b01d	Fix style of logOnceIf calls w/unique identifiers (#17631 )	2023-07-11 13:17:45 -07:00
Poorna	e8c98c3246	Avoid extra GetObjectInfo call in DeleteObject API (#17599 ) Optimize DeleteObject API to avoid extra GetObjectInfo call on the replicating side. For receiving side, it is just a regular DeleteObject call. Bonus: Fix a corner case where version purged is absent on target (either due to replication not yet complete or target version already deleted in a one-way replication or when replication was disabled). In such cases, mark version purge complete.	2023-07-10 07:57:56 -07:00
Klaus Post	ff5988f4e0	Reduce allocations (#17584 ) * Reduce allocations * Add stringsHasPrefixFold which can compare string prefixes, while ignoring case and not allocating. * Reuse all msgp.Readers * Reuse metadata buffers when not reading data. * Make type safe. Make buffer 4K instead of 8. * Unslice	2023-07-06 16:02:08 -07:00
Kaan Kabalak	21fbe88e1f	Print certain log messages once per error (#17484 )	2023-06-24 20:29:13 -07:00
jiuker	b6b68be052	fix: replication check for duplicate endpoints detection with wrong route (#17474 )	2023-06-20 09:27:54 -07:00
Aditya Manthramurthy	5a1612fe32	Bump up madmin-go and pkg deps (#17469 )	2023-06-19 17:53:08 -07:00
Harshavardhana	1443b5927a	allow quorum fileInfo to pick same parityBlocks (#17454 ) Bonus: allow replication to proceed for 503 errors such as with error code SlowDownRead	2023-06-18 18:20:15 -07:00
Poorna	c4d0c49a5f	ensure metadata updates go to same pool where version exists (#17451 ) This PR also returns the replication status in proxy calls and defers replication attempt if HEAD on object version returned a error different from NoSuchKey	2023-06-17 07:30:53 -07:00
Poorna Krishnamoorthy	f986b0c493	replication: perform bucket resync in parallel (#16707 ) Default number of parallel resync operations for a bucket to 10 to speed up resync.	2023-06-11 16:09:55 -07:00
Harshavardhana	b210ea79bc	do not save MTime in newMultipartUpload() to avoid side-affects (#17340 )	2023-06-02 14:38:09 -07:00
Poorna	e95825a42e	replication: use latest object info for metrics update (#17333 )	2023-06-01 18:52:55 -07:00
Poorna	2131046427	replication: fix audit log reporting (#17222 )	2023-05-16 15:35:08 -07:00
Klaus Post	aaf1abc993	simplify HardLimitReader by using LimitReader for internal usage (#17218 )	2023-05-16 13:14:37 -07:00
jiuker	413549bcf5	fix: loadStatsFromDisk() should return nil for configNotFound (#17217 )	2023-05-16 12:23:38 -07:00
Poorna	e07c2ab868	Use hash.NewLimitReader for internal multipart calls (#17191 )	2023-05-12 11:19:08 -07:00
Poorna	c5c1426262	Validate if replication config being added is self referential (#17142 )	2023-05-06 13:35:43 -07:00
Harshavardhana	6825bd7e75	fix: inlined objects don't need to honor long locks (#17039 )	2023-04-17 12:16:37 -07:00
Harshavardhana	c06e0bfef9	set correct `Host:` value for replication event notification (#16984 )	2023-04-06 10:20:53 -07:00
Anis Eleuch	d90d0c8931	Use one http response recorder per external http call (#16938 )	2023-03-31 09:37:29 -07:00
Allan Roger Reid	483b226cc1	fix: avoid logging when object/version not found in replication (#16919 )	2023-03-29 15:02:45 -07:00
Harshavardhana	8e02660a0d	update all our deps (#16899 )	2023-03-28 03:45:24 -07:00
Poorna	fb6ab1cca2	fix: allow replication of 'null' delete markers (#16773 )	2023-03-08 07:03:29 -08:00
Poorna	ee54643004	Avoid unnecessary replication heal attempts (#16769 )	2023-03-07 07:43:38 -08:00
Poorna	c33a237067	fix: under site replication disallow remote target modification (#16628 )	2023-02-15 20:22:13 -08:00
jiuker	a15b6f21b8	remove incorrect use of WaitGroup (#16596 )	2023-02-12 20:59:45 -08:00
Poorna	876e1a91b2	replication: Fix typo checking PreconditionFailed status code (#16517 )	2023-02-02 19:22:02 +05:30
Poorna	820d94447c	replication: fix target bucket passed on GET proxy (#16495 )	2023-01-27 10:24:51 -08:00
Poorna	ed20134a7b	replication: detect proxy header presence correctly (#16489 )	2023-01-27 01:29:32 -08:00
Harshavardhana	e64b9f6751	fix: disallow SSE-C encrypted objects on replicated buckets (#16467 )	2023-01-24 15:46:33 -08:00
Poorna	ddad231921	replication: Avoid logging PreConditionFailed error (#16450 )	2023-01-21 07:33:04 +05:30
Poorna	1b02e046c2	Fix bandwidth monitoring to be per remote target (#16360 )	2023-01-19 18:52:16 +05:30
Harshavardhana	2937711390	fix: DeleteObject() API with versionId under replication (#16325 )	2022-12-28 22:48:33 -08:00
Anis Elleuch	acc9c033ed	debug: Add X-Amz-Request-ID to lock/unlock calls (#16309 )	2022-12-23 19:49:07 -08:00
Poorna	de0b43de32	persist replication stats with leader lock (#16282 )	2022-12-22 14:25:13 -08:00
Poorna	6423e4c767	Remove site replication config if it succeeded locally (#16279 )	2022-12-22 01:31:20 -08:00
Harshavardhana	2fc182d8e6	fix: iso8601TimeFormat padding issue for certain nanoseconds (#16207 )	2022-12-12 10:28:30 -08:00
Aditya Manthramurthy	a30cfdd88f	Bump up madmin-go to v2 (#16162 )	2022-12-06 13:46:50 -08:00
Klaus Post	a713aee3d5	Run staticcheck on CI (#16170 )	2022-12-05 11:18:50 -08:00
Klaus Post	1cd875de1e	Persist updated metadata (#16160 )	2022-12-02 08:35:04 -08:00
Anis Elleuch	641ab24aec	repl: resync orchestrator to use global shared lock (#16154 )	2022-12-01 12:10:09 -08:00
Klaus Post	a22b4adf4c	distribute replication ops based on names (#16083 )	2022-11-17 15:20:09 -08:00
Klaus Post	b7bb122be8	fix: replication auto-scaling deadlock (#16084 )	2022-11-17 07:35:02 -08:00
Klaus Post	8a07000e58	fix: refactor getReplicationDiff for safe use (#16051 )	2022-11-15 07:59:21 -08:00
Poorna	d6bc141bd1	feat: Add support for site level resync (#15753 )	2022-11-14 07:16:40 -08:00
Poorna	34d28dd79f	replication: Avoid blocking on mrf save (#16045 )	2022-11-10 10:20:02 -08:00
Klaus Post	2894dd4d1a	fix: hold lock while serializing replication stats (#16007 )	2022-11-04 09:59:14 -07:00
Klaus Post	0f0e154315	fix: inconsistent replication delete marker timestamps (#15956 )	2022-10-27 09:46:52 -07:00
Harshavardhana	23b329b9df	remove gateway completely (#15929 )	2022-10-24 17:44:15 -07:00
Anis Elleuch	fc6c794972	Audit dangling object removal (#15933 )	2022-10-24 11:35:07 -07:00
Poorna	e4e90b53c1	fix: delete-marker replication check properly (#15923 )	2022-10-21 14:45:06 -07:00
Harshavardhana	59e33b3b21	validate setBucketTarget properly as per BucketExists() call (#15860 )	2022-10-13 17:46:49 -07:00
Poorna	0e3c92c027	attempt delete marker replication after object is replicated (#15857 ) Ensure delete marker replication success, especially since the recent optimizations to heal on HEAD, LIST and GET can force replication attempts on delete marker before underlying object version could have synced.	2022-10-13 17:45:23 -07:00
Harshavardhana	97112c69be	fix: replication stats() to not crash under any situation (#15851 ) Co-authored-by: Daniel Valdivia <18384552+dvaldivia@users.noreply.github.com>	2022-10-12 15:47:41 -07:00
Anis Elleuch	e856e10ac2	ignore VersionNotFound in addition to ObjectNotFound while replicating (#15814 )	2022-10-07 16:11:41 -07:00
Poorna	8ea6fb368d	Add auto configuration of replication workers (#15636 )	2022-09-24 16:20:28 -07:00
Harshavardhana	50a8ba6a6f	fix: parse and save retainUntilDate in correct time format (#15741 )	2022-09-23 08:49:27 -07:00
Harshavardhana	124544d834	add pre-conditions support for PUT calls during replication (#15674 ) PUT shall only proceed if pre-conditions are met, the new code uses - x-minio-source-mtime - x-minio-source-etag to verify if the object indeed needs to be replicated or not, allowing us to avoid StatObject() call.	2022-09-14 18:44:04 -07:00
Poorna	a0fb0c1835	panic if replication config could not be read from disk (#15685 ) If replication config could not be read from bucket metadata for some reason, issue a panic so that unexpected replication outcomes can be avoided for replicated buckets. For similar reasons, adding a panic while fetching object-lock config if it failed for reason other than non-existence of config.	2022-09-13 21:23:33 -07:00
Poorna	6b9fd256e1	Persist in-memory replication stats to disk (#15594 ) to avoid relying on scanner-calculated replication metrics. This will improve the accuracy of the replication stats reported. This PR also adds on to #15556 by handing replication traffic that could not be queued by available workers to the MRF queue so that entries in `PENDING` status are healed faster.	2022-09-12 12:40:02 -07:00
Anis Elleuch	cf52691959	Save resync status in the backend using a last update timestamp (#15638 ) Currently, there is a short time window where the code is allowed to save the status of a replication resync. Currently, the window is `now.Sub(st.EndTime) <= resyncTimeInterval`. Also, any failure to write in the backend disks is not retried. Refactor the code a little bit to rely on the last timestamp of a successful write of the resync status of any given bucket in the backend disks.	2022-09-01 16:53:36 -07:00
Anis Elleuch	10e75116ef	Avoid replicating dirs in listing with replication enabled (#15641 ) When replication is enabled in a particular bucket, the listing will send objects to bucket replication, but it is also sending prefixes for non recursive listing which is useless and shows a lot of error logs. This commit will ignore prefixes.	2022-09-01 15:22:11 -07:00
Harshavardhana	433b6fa8fe	upgrade golang-lint to the latest (#15600 )	2022-08-26 12:52:29 -07:00
Harshavardhana	edba7c987b	fix: objects matching prefixes should not leave delete markers (#15586 ) This is needed to ensure that we do not leave prefixes where version is suspended, instead we never leave versions on these paths.	2022-08-24 13:46:29 -07:00
Poorna	4155c5b695	replication: improve MRF healing. (#15556 ) This PR improves the replication failure healing by persisting most recent failures to disk and re-queuing them until the replication is successful. While this does not eliminate the need for healing during a full scan, queuing MRF vastly improves the ETA to keeping replicated buckets in sync as it does not wait for the scanner visit to detect unreplicated object versions.	2022-08-22 16:53:06 -07:00
Harshavardhana	ae4ee95d25	change default lock retry interval to 50ms (#15560 ) competing calls on the same object on versioned bucket mutating calls on the same object may unexpected have higher delays. This can be reproduced with a replicated bucket overwriting the same object writes, deletes repeatedly. For longer locks like scanner keep the 1sec interval	2022-08-19 16:21:05 -07:00
Harshavardhana	e9055e9ef7	fix: walk() should cancel itself upon context cancellation (#15553 ) This PR fixes possible leaks that may emanate from not listening on context cancelation or timeouts. ``` goroutine 60957610 [chan send, 16 minutes]: github.com/minio/minio/cmd.(erasureServerPools).Walk.func1.1.1(...) github.com/minio/minio/cmd/erasure-server-pool.go:1724 +0x368 github.com/minio/minio/cmd.listPathRaw({0x4a9a740, 0xc0666dffc0},... github.com/minio/minio/cmd/metacache-set.go:1022 +0xfc4 github.com/minio/minio/cmd.(erasureServerPools).Walk.func1.1() github.com/minio/minio/cmd/erasure-server-pool.go:1764 +0x528 created by github.com/minio/minio/cmd.(*erasureServerPools).Walk.func1 github.com/minio/minio/cmd/erasure-server-pool.go:1697 +0x1b7 ```	2022-08-18 17:49:08 -07:00
Poorna	21fe14201f	replication: centralize healthcheck for remote targets (#15516 ) This PR moves health check from minio-go client to being managed on the server. Additionally integrating health check into site replication	2022-08-16 17:46:22 -07:00
Poorna	21bf5b4db7	replication: heal proactively upon access (#15501 ) Queue failed/pending replication for healing during listing and GET/HEAD API calls. This includes healing of existing objects that were never replicated or those in the middle of a resync operation. This PR also fixes a bug in ListObjectVersions where lifecycle filtering should be done.	2022-08-09 15:00:24 -07:00
ebozduman	b57e7321e7	Replaces 'disk'=>'drive' visible to end user (#15464 )	2022-08-04 16:10:08 -07:00
Poorna	5e0776e96a	replication: Include replica object versions for resync (#15427 )	2022-07-28 13:43:02 -07:00

1 2 3 4 5 ...

280 Commits