Commit Graph

30 Commits

Author SHA1 Message Date
JR Conlin
b0f1590f4a
feat: Allow for failure "replay" from failure file (#644)
New option:
   `--retry_file=` takes a previous failure file and will
        retry the bso/UIDs contained in it.

Closes #642
2020-06-03 10:30:46 -07:00
JR Conlin
fa96964f07
bug: Make bso_num in migrate_node less truthy (#637)
Closes #636
2020-05-14 16:05:54 -07:00
JR Conlin
8aaa4492e9
User migration5 (#601)
* bug: Fix typos in tick, string replacements
* f multi-tread gen_bso_users
* added `--start_bso`, `--end_bso` to `gen_bso_users.py`
* added `bso_num` arg (same as `--start_bso=# --end_bso=#`) to `migrate_node.py`
* `gen_bso_users.py` takes same `bso_users_file` template as `migrate_node.py`
* f remove default value for BSO_Users.run bso_num
* f fix lock issue in gen_bso_users, trap for `` states in gen_fxa_users
* f make threading optional.
 There's a locking issue that appears to be inside of the mysql.
 Turning threading off for now (can be run in parallel)
* f fix tick, threading flag
* f rename confusing args in gen_bso and gen_fxa
 gen_bso_users:
  `--bso_users_file` => `--output_file`
 gen_fxa_users:
  `--fxa_file` => `--users_file`
  `--fxa_users_file` => `--output_file`
* f more tick fixes
* f don't use threading on Report if threading isn't available.
* f make `--bso_users_file` / `--fxa_users_file` consistent
* `--bso_user_file` is now `--bso_users_file`

Issue #407
2020-04-29 13:09:07 -07:00
Philip Jenvey
16058f20a4
feat: add a --wipe_user mode
deletes pre-existing user data on spanner before migrating. only usable
in --user mode.

and fix parsing of the new gen_fxa_users.py output

Closes #596
2020-04-20 11:41:36 -07:00
jrconlin
c4ffdb636a
f break apart migrate_node into submodules
Yeah, this one's full of stuff.

* `gen_fxa_users.py` takes the tokendata file, and generates a file
containing a the converted uid => fxa_uid/fxa_kid values. See
`gen_fxa_users.py --help` for arguments.

* `gen_bso_users.py` takes the generated `fxa_users_{date}.lst` file
from `gen_fxa_users.py` pulls the users from the `--bso_num` and dumps
them to `bso_users_{bso_num}_{date}.lst`

* `{success,failure}_*` files are now only generated when needed. In addition, they are now suffixed with `.log`. Hopefully a bit easier to find and clean up.

* `migrate_node.py` now takes `--bso_users_file` which is either the
name of a file that will be used for all BSOs, or a template that will
be used to find the bso_users_file (e.g. if you specify
`--bso_users_file=users/bso_users_#_2020_04_14.lst` and
`--bso_start=1 --bso_end=3`, migrate user will pull from
`users/bso_users_1_2020_04_14.lst` for users in BSO#1,
`users/bso_users_2_2020_04_14.lst` for users in BSO#2, etc.

*NOTE*: by default scripts will date stamp various cached files, ideally, we should take reasonably "fresh" ones to avoid potentially missing users that are suddenly added to nodes. This is not a requirement, and all scripts allow for a custom file name.
2020-04-14 16:12:56 -07:00
jrconlin
caddb661ed
f fix comment 2020-04-14 16:12:56 -07:00
jrconlin
18f7c22ae3
f address pjenvey's todo
added user_collection.last_modified to bso data pull
2020-04-10 14:59:35 -07:00
jrconlin
0cb62b98ec
f pip8 2020-04-10 14:44:48 -07:00
jrconlin
a74ed7b6d2
f fetch count with users, kick hoarders early 2020-04-10 14:41:41 -07:00
Philip Jenvey
d6b2dc2187
fix: don't replace user_collections
since bsos INTERLEAVE's w/ DELETE CASCADE

- persist unique_key_filter across writes
- fix new bundling of of bso_values for inserting bsos
- add TODO for fixing user_collections' modified time
2020-04-09 18:25:37 -07:00
jrconlin
1adfb6449e
f break user percentage into it's own function 2020-04-09 14:07:27 -07:00
jrconlin
edd0017d2c
feat: latest ops requests
* Add --hoard_limit to limit max number of records per user
* add reason to `failure_*.csv`
2020-04-09 12:05:18 -07:00
jrconlin
b74e529231
f fix "helpful" argparse help string parsing.
TLDR: Don't use a single %
2020-04-08 16:59:09 -07:00
jrconlin
f3d358caee
f general cleanup 2020-04-07 11:34:28 -07:00
jrconlin
84e1efbd27
f fix --user argument 2020-04-07 10:50:26 -07:00
jrconlin
4f9cb14b78
f add PID to success_*.csv and failure_*.csv files. 2020-04-07 08:43:33 -07:00
jrconlin
f9c1e5a532
f fix uid references, warning logic 2020-04-07 08:26:07 -07:00
jrconlin
d4a4ff885c
f convert k_c_a & generation to ints 2020-04-07 07:52:00 -07:00
jrconlin
2a6d5e28b4
f correct error reporting 2020-04-07 07:41:18 -07:00
jrconlin
00e67b4baf
f trap for "NULL" as client state 2020-04-06 19:28:28 -07:00
jrconlin
29185f28c3
f Dockerfile fix #4
add success / fail uid files.
2020-04-06 17:09:33 -07:00
jrconlin
99e152b5d8
f flake8 fixes 2020-04-06 16:02:28 -07:00
jrconlin
edca5ef0a5
f alter default anonymization
* check for "NULL" client_state in user.csv and skip if need be.
2020-04-06 15:57:50 -07:00
jrconlin
3df4c34d87
f r's 2020-04-02 17:36:14 -07:00
jrconlin
55edc74ad7
f add --ms_delay flag.
use `ms_delay` to pause between spanner transaction `--readchunk`s. This allows
some primative throttling for feeding spanner data.
Reminder: `readchunk` sets the max number of items to try to write per chunk
to spanner in any given transaction, default value 1000.
2020-04-02 09:38:54 -07:00
jrconlin
08a646a36e
feat: add --user_percent option
The `--user_percent` option will divvy up the users into blocks and
move the specified block. It takes an option formatted as
"block#:percentage". Block numbers are 1 based. For example,
--user_percent=2:33 will divide the total distinct users into
non-overlapping blocks of approximately 33%, and then move the second
block (e.g. the 33-65th users in the list). Extra users that may not be
evenly divided into percentage blocks will be appended to the last
block. (e.g. for `--user_percent=3:33`, users 66-99 would be copied
over, a total of 34 users)

Issue #407
2020-04-02 09:38:54 -07:00
jrconlin
0a9cf9c650
f pjenvey fix 2020-03-18 15:02:33 -07:00
jrconlin
be3b18f879
f add rust_migration WIP
* make user sorting optional
* formatting tweaks to dump_mysql.py and sync.avsc
2020-03-17 16:53:07 -07:00
jrconlin
a65123bcf2
feat: Add --abort and --user_range flags
* --abort stops copying BSO records after N instances.
* --user_range limits copy to offset:limit users.
* sorts users by fxa_uid
2020-03-16 13:32:38 -07:00
JR Conlin
ecfca9fdf5
feat: more user_migration stuff (#450)
* feat: more user_migration stuff

* create script to move users by node directly
* moved old scripts to `old` directory (for historic reasons, as well as
possible future use)
* cleaned up README
* try to solve the `parent row` error
an intermittent error may be responsible from one of two things:
1) a transaction failure resulted in a premature add of the unique key
to the UC filter.
2) an internal spanner update error resulting from trying to write the
bso before the user_collection row was written.
* Added "fix_collections.sql" script to update collections table to add
well known collections for future rectification.
* returned collection name lookup
* add "--user" arg to set bso and user id
* add `--dryrun` mode
2020-03-02 20:26:07 -08:00