Commit Graph

24 Commits

Author SHA1 Message Date
jrconlin
18f7c22ae3
f address pjenvey's todo
added user_collection.last_modified to bso data pull
2020-04-10 14:59:35 -07:00
jrconlin
0cb62b98ec
f pip8 2020-04-10 14:44:48 -07:00
jrconlin
a74ed7b6d2
f fetch count with users, kick hoarders early 2020-04-10 14:41:41 -07:00
Philip Jenvey
d6b2dc2187
fix: don't replace user_collections
since bsos INTERLEAVE's w/ DELETE CASCADE

- persist unique_key_filter across writes
- fix new bundling of of bso_values for inserting bsos
- add TODO for fixing user_collections' modified time
2020-04-09 18:25:37 -07:00
jrconlin
1adfb6449e
f break user percentage into it's own function 2020-04-09 14:07:27 -07:00
jrconlin
edd0017d2c
feat: latest ops requests
* Add --hoard_limit to limit max number of records per user
* add reason to `failure_*.csv`
2020-04-09 12:05:18 -07:00
jrconlin
b74e529231
f fix "helpful" argparse help string parsing.
TLDR: Don't use a single %
2020-04-08 16:59:09 -07:00
jrconlin
f3d358caee
f general cleanup 2020-04-07 11:34:28 -07:00
jrconlin
84e1efbd27
f fix --user argument 2020-04-07 10:50:26 -07:00
jrconlin
4f9cb14b78
f add PID to success_*.csv and failure_*.csv files. 2020-04-07 08:43:33 -07:00
jrconlin
f9c1e5a532
f fix uid references, warning logic 2020-04-07 08:26:07 -07:00
jrconlin
d4a4ff885c
f convert k_c_a & generation to ints 2020-04-07 07:52:00 -07:00
jrconlin
2a6d5e28b4
f correct error reporting 2020-04-07 07:41:18 -07:00
jrconlin
00e67b4baf
f trap for "NULL" as client state 2020-04-06 19:28:28 -07:00
jrconlin
29185f28c3
f Dockerfile fix #4
add success / fail uid files.
2020-04-06 17:09:33 -07:00
jrconlin
99e152b5d8
f flake8 fixes 2020-04-06 16:02:28 -07:00
jrconlin
edca5ef0a5
f alter default anonymization
* check for "NULL" client_state in user.csv and skip if need be.
2020-04-06 15:57:50 -07:00
jrconlin
3df4c34d87
f r's 2020-04-02 17:36:14 -07:00
jrconlin
55edc74ad7
f add --ms_delay flag.
use `ms_delay` to pause between spanner transaction `--readchunk`s. This allows
some primative throttling for feeding spanner data.
Reminder: `readchunk` sets the max number of items to try to write per chunk
to spanner in any given transaction, default value 1000.
2020-04-02 09:38:54 -07:00
jrconlin
08a646a36e
feat: add --user_percent option
The `--user_percent` option will divvy up the users into blocks and
move the specified block. It takes an option formatted as
"block#:percentage". Block numbers are 1 based. For example,
--user_percent=2:33 will divide the total distinct users into
non-overlapping blocks of approximately 33%, and then move the second
block (e.g. the 33-65th users in the list). Extra users that may not be
evenly divided into percentage blocks will be appended to the last
block. (e.g. for `--user_percent=3:33`, users 66-99 would be copied
over, a total of 34 users)

Issue #407
2020-04-02 09:38:54 -07:00
jrconlin
0a9cf9c650
f pjenvey fix 2020-03-18 15:02:33 -07:00
jrconlin
be3b18f879
f add rust_migration WIP
* make user sorting optional
* formatting tweaks to dump_mysql.py and sync.avsc
2020-03-17 16:53:07 -07:00
jrconlin
a65123bcf2
feat: Add --abort and --user_range flags
* --abort stops copying BSO records after N instances.
* --user_range limits copy to offset:limit users.
* sorts users by fxa_uid
2020-03-16 13:32:38 -07:00
JR Conlin
ecfca9fdf5
feat: more user_migration stuff (#450)
* feat: more user_migration stuff

* create script to move users by node directly
* moved old scripts to `old` directory (for historic reasons, as well as
possible future use)
* cleaned up README
* try to solve the `parent row` error
an intermittent error may be responsible from one of two things:
1) a transaction failure resulted in a premature add of the unique key
to the UC filter.
2) an internal spanner update error resulting from trying to write the
bso before the user_collection row was written.
* Added "fix_collections.sql" script to update collections table to add
well known collections for future rectification.
* returned collection name lookup
* add "--user" arg to set bso and user id
* add `--dryrun` mode
2020-03-02 20:26:07 -08:00