pgloader

mirror of https://github.com/dimitri/pgloader.git synced 2026-01-27 01:51:15 +01:00

Author	SHA1	Message	Date
Dimitri Fontaine	2147a1d07b	Implement ALTER TABLE ... SET TABLESPACE ... as a pgloader clause. This allows creating tables in any target tablespace rather than the default one, and is supported for the various sources having support for the ALTER TABLE clause already.	2019-01-08 22:50:24 +01:00
Dimitri Fontaine	f28f8e577d	Review log-level for stored procedures. Some MySQL schema level features (on update current_timestamp) are migrated to stored procedures and triggers. We would log the CREATE PROCEDURE statements as LOG level entries instead of SQL level entries, most likely a stray devel/debug choice.	2019-01-08 22:44:07 +01:00
Dimitri Fontaine	bda06f8ac0	Implement Citus support from a MySQL database.	2018-12-17 16:31:47 +01:00
Dimitri Fontaine	290ad68d61	Implement materialize views in PostgreSQL source support.	2018-12-16 23:17:37 +01:00
Dimitri Fontaine	af2995b918	Apply quoting rules to SQLite index column names. The previous fix was wrong for missing the point: rather than unquote column names in the table definition when matching the column names in the index definition, we should in the first place have quoted the index column names when needed. Fixes #872 for real this time.	2018-12-02 00:17:26 +01:00
Dimitri Fontaine	a939d20dff	Unquote names when searching for an index column name in its table. If the source database is using a keyword (such as "order") as a column name, then pgloader is going to quote this column name in its internal catalogs. In that case, unquote the column in the pgloader catalogs when matching it against the unquoted column name we have in the index definition. Fixes #872.	2018-12-01 21:27:26 +01:00
Dimitri Fontaine	ab2cadff24	Simplify the regular expresion parsing the PostgreSQL version string. The debian/Ubuntu packaging would defeat the quite simple regexp parsing PostgreSQL version string that we have in pgloader. To make it more robust, make it more open to unforeseen strings. See #800, see #810.	2018-11-30 15:39:27 +01:00
Dimitri Fontaine	3f2f10eef1	Finish implementation of CAST rules for PostgreSQL source databases. Add a link to the table from the internal catalogs for columns so that we can match table-source-name in cast rules when migrating from PostgreSQL.	2018-11-19 19:33:37 +01:00
Dimitri Fontaine	16dda01f37	Deal with SSL verify error the wrong way. This patch adds an option --no-ssl-cert-verification that allows bypassing OpenSSL server certificate verification. It's hopefully a temporary measure that we set up in order to make progress when confronted to: SSL verify error: 20 X509_V_ERR_UNABLE_TO_GET_ISSUER_CERT_LOCALLY The real solution is of course to install the SSL certificates at a place where pgloader will look for them, which defaults to ~/.postgresql/postgresql.crt at the moment. It's not clear what the story is with the defaults from /etc/ssl, or how to make things happen in a better way. See #648, See #679, See #768, See #748, See #775.	2018-11-15 00:13:21 +01:00
Dimitri Fontaine	6c80404249	Implement support for Redshift "identity" columns. At this stage we don't even parse the details of the Redshift identity such as the seed and step values and consider them the same as a MySQL auto_increment extra description field. Fixes #860 (again).	2018-11-09 22:41:14 +01:00
Dimitri Fontaine	794bc7fc64	Improve redshift support: string_agg() doesn't exist there. Neither does array_agg(), unnest() and other very useful PostgreSQL functions. Redshift is from 8.0 times, so do things the old way: parse the output of the index definition that get from calling pg_index_def(). For that, this patch introduces the notion of SQL support that depends on PostgreSQL major version. If no major-version specific query is found in the pgloader source tree, then we use the generic one. Fixes #860.	2018-11-07 21:23:56 +01:00
Dimitri Fontaine	d3b21ac54d	Implement automatic discovery of the Citus distribution rules. With this patch, the following distribution rule distribute companies using id is equivalent to the following distribution rule set, given foreign keys in the source schema: distribute companies using id distribute campaigns using company_id distribute ads using company_id from campaigns distribute clicks using company_id from ads, campaigns distribute impressions using company_id from ads, campaigns In the current code (of this patch) pgloader walks the foreign-keys dependency tree and knows how to automatically derive distribution rules from a single rule and the foreign keys.	2018-10-18 15:31:29 +02:00
Dimitri Fontaine	8112a9b54f	Improve Citus Distribution Support. With this patch it's now actually possible to backfill the data on the fly when using the "distribute" new commands. The schema is modified to add the distribution key where specified, and changes to the primary and foreign keys happen automatically. Then a JOIN is generated to get the data directly during the COPY streaming to the Citus cluster.	2018-10-16 18:53:41 +02:00
Dimitri Fontaine	760763be4b	Use the constraint name when we have it. That's important for Citus, which doesn't know how to ADD a constraint without a name.	2018-10-10 15:44:21 -07:00
Dimitri Fontaine	381ac9d1a2	Add initial support for Citus distribution from pgloader. The idea is for pgloader to tweak the schema from a description of the sharding model, the distribute clause. Here's an example of such a clause: distribute company using id distribute campaign using company_id distribute ads using company_id from campaign distribute clicks using company_id from ads, campaign Given such commands, pgloader adds the distibution key to the table when needed, to the primary key definition of the table, and also to the foreign keys that are pointing to the changed primary key. Then when SELECTing the data from the source database, the idea is for pgloader to automatically JOIN the base table with the source table where to find the distribution key, in case it was just added in the schema. Finally, pgloader also calls the following Citus commands: SELECT create_distributed_table('company', 'id'); SELECT create_distributed_table('campaign', 'company_id'); SELECT create_distributed_table('ads', 'company_id'); SELECT create_distributed_table('clicks', 'company_id');	2018-10-10 14:35:12 -07:00
Dimitri Fontaine	5119d864f4	Assorted bug fixes in the context of Redshift support as a source. The catalog queries used in pgloader have to be adjusted for Redshift because this thing forked PostgreSQL 8.0, which is a long time ago now. Also, we had a couple bugs here and there that were not really related to Redshift support but were shown in that context. Fixes #813.	2018-09-04 11:49:21 +02:00
Dimitri Fontaine	4fbfd9e522	Refrain from using regexp_match() function, introduced in Pg10. Instead use the substring() function which has been there all along. See #813.	2018-08-22 10:52:01 +02:00
Dimitri Fontaine	cb633aa092	Refrain from some introspections on non-PGDG PostgreSQL variants. When dealing with PostgreSQL protocol compatible databases, often enough they don't support the same catalogs as PostgreSQL itself. Redshift for instance lacks foreign key support.	2018-08-20 11:52:59 +02:00
Dimitri Fontaine	d3bfb1db31	Bugfix previous commit: filter list format changed. We now accept the more general string and regex match rules, but the code to generate including and excluding lists from the catalogs had not been updated.	2018-08-20 11:50:50 +02:00
Dimitri Fontaine	fc3a1949f7	Add support for PostgreSQL as a source database. It's now possible to use pgloader to migrate from PostgreSQL to PostgreSQL. That might be useful for several reasons, including applying user defined cast rules at COPY time, or just moving from an hosted solution to another.	2018-08-20 11:09:52 +02:00
Dimitri Fontaine	a0bac47101	Refrain from TRUNCAT'ing an empty list of tables. Fixed #789.	2018-06-15 17:46:31 +02:00
Dimitri Fontaine	3db3ecf81b	Review Redshift data type dumb-down choices. It's a little more involved that what was done previously. In particular we need to pay attention to MySQL varchar(x) and transform them into something big enough when counting bytes rather than chars, like varchar(3x). Then there's the "text" datatype to take into account, and some more.	2018-05-23 13:43:28 +02:00
Dimitri Fontaine	d4dc4499a8	Add schema migration support for Redshift as a target. Redshift looks like a very old PostgreSQL (8.0.2) with some extra features and a very limited selection of data types. In this patch we parse the PostgreSQL version() function output and automatically determine if we're connected to Redshift. When connected to Redshift, we then dumb-down our target catalogs to the subset of data types that Redshift actually does support. Also, some catalog queries can't be done in Redshift, and 8.0 didn't have fully compliant VALUES statement, so we use a temporary table in places where we used to use SELECT ... FROM (VALUES(...)) in pgloader. COPYing data to Redshift isn't possible with just this set of changes, because Redshift also don't support the COPY FROM STDIN form. COPY sources are limited, and another patch will have to be cooked to prepare the data from pgloader into a format and location that Redshift knows how to handle. At least, it's possible to migrate a database schema to Redshift already.	2018-05-19 19:16:58 +02:00
Dimitri Fontaine	48af01dbbc	Fix implementation of foreign keys in data only mode. In data-only mode, the foreign keys parameter (which defaults to True) means something special: we remove the fkey definitions prior to the data only load then re-install the fkeys. This got broken in a previous commit, the WITH clause option being processed like the other DDL ones that only make sense when creating the schema. While fixing the setting in copy-database, we have to also fix a nesting bug in complete-pgsql-database that would prevent fkey to be installed again at the end of the load. This patch not only fix that choice, but also review the implementation of the drop-pgsql-fkeys support function to use more modern internal API, preparing a list of SQL statements to be sent to the psql-execute level. Fixes #745.	2018-02-19 22:07:43 +01:00
Dimitri Fontaine	e129e77eb6	Fix SQL execute counters maintenance.	2018-02-19 22:06:51 +01:00
Dimitri Fontaine	4fed8c5eca	Fix support for newid() from MS SQL. Several places in the code are involved to deal with the default values from MS SQL. The catalog query is dealing with strange quoting rules on the source side and used to fill in directly the PostgreSQL expected value. But then the quoting of a function call wasn't properly handled. Rather than coping with the quoting rules here, have the catalog query return a pgloader specific placeholder "GENERATE_UUID". Then the MS SQL specific code can normalize that to the symbol :generate_uuid. Then the generic PostgreSQL DDL code can implement the proper replacement for that symbol, not having to know where it comes from. Fix #742.	2018-02-17 00:25:33 +01:00
Dimitri Fontaine	5e3acbb462	When merging catalogs, "float" and "double precision" the same type. PostgreSQL understands both spellings of the data type name and implements float as being a double precision value, so we should refrain from any warning about that non-discrepency when doing a data-only load. Should fix #746.	2018-02-16 23:42:46 +01:00
Dimitri Fontaine	4612e68435	Implement support for new casting rules guards and actions. Namely the actions are “keep extra” and “drop extra” and the casting rule guard is “with extra on update current timestamp”. Having support for those elements in the casting rules allow such a definition as the following: type timestamp with extra on update current timestamp to "timestamp with time zone" drop extra The effect of such as cast rule would be to ignore the MySQL extra definition and then refrain pgloader from creating the PostgreSQL triggers that implement the same behavior. Fix #735.	2018-01-31 15:17:05 +01:00
Dimitri Fontaine	5ba42edb0c	Review misleading error message with schema not found. It might be that the schema exists but we didn't find what we expected to in there, so that it didn't make it to pgloader's internal catalogs. Be friendly to the user with a better error message. Fix #713.	2018-01-25 23:29:36 +01:00
Dimitri Fontaine	adf03c47ad	Clean up source code organisation. The copy format and batch facilities are no longer the meat of your PostgreSQL support in the src/pgsql directory, so have them leave in their own space.	2018-01-23 19:52:13 +01:00
Dimitri Fontaine	3bb128c5db	Review format-vector-row. This function prepares the data to be sent down to PostgreSQL as a clean COPY text with unicode handled correctly. This commit is mainly a clean-up of the function, and also adds some smarts to try and make it faster. In testing, the function is now tangentially faster than before, but not by much. The hope here is that it's now easier to optimize it.	2018-01-22 21:37:14 +01:00
Dimitri Fontaine	c05183fcba	Implement support for Foreign Tables and Partitionned Tables. Due to the way pgloader queries the PostgreSQL catalogs, it restricted the target table to be “ordinary” tables, as per the relkind description in the https://www.postgresql.org/docs/current/static/catalog-pg-class.html PostgreSQL documentation. Extend this to support relkind of 'r', 'f' and 'p'. Fixes #587, fixes #690.	2017-12-01 22:13:47 +01:00
Dimitri Fontaine	6964764fb4	Find schema names unquoted. When doing a MySQL to PostgreSQL migration in data only mode, pgloader matches schema names found on both source and target database, and much like with table names must do so ensuring unquoted schema names. Otherwise we fail to find the schema name again, because one spelling has the quotes, but not the other one, when using the “quote identifiers” option. Fix #659, at least some forms of it.	2017-11-19 17:12:21 +01:00
Dimitri Fontaine	db7a91d6c4	Add the MySQL target schema to the search_path. In the next release, pgloader defaults to targetting a new schema named the same as the MySQL database, because that's what makes more sense. But people are used to having 'public' in the search_path and everything in there. So when creating our target schema, when migrating from MySQL, arrange it so that the new schema is in the search_path by issuing a command like: ALTER DATABASE plop SET search_path TO public, f1db; And make this command visible in verbose (NOTICE) mode too, so that user can see what happens. Fix #654. I think.	2017-11-02 12:40:21 +01:00
Dimitri Fontaine	0a88645eb5	Fix time measurements of the write activity. When using --verbose or more detailed log messages, the summary prints timings for both read and write operations separately. The write summary timing took into account only the PostgreSQL batch activity, discarding the formatting of the data done by pgloader. As this formatting is quite heavy at the moment, the results are pretty misleading without that information.	2017-10-21 21:04:55 +02:00
Dimitri Fontaine	8a361a0ff8	Add support for multiple on update columns per table. The MySQL special syntax "on update current_timestamp()" used to support only a single column per table (in MySQL), and so did pgloader. In MariaDB version 10 it's now possible to have several column with that special treatment, so adapt pgloader to migrate that too. What pgloader does is recognize that several columns are to receive the same pre-update processing, and creates a single function that does the both of them, as in the following example, from pgloader logs in a test case: CREATE OR REPLACE FUNCTION mysql.on_update_current_timestamp_onupdate() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN NEW.update_date = now(); NEW.calc_date = now(); RETURN NEW; END; $$; CREATE TRIGGER on_update_current_timestamp BEFORE UPDATE ON mysql.onupdate FOR EACH ROW EXECUTE PROCEDURE mysql.on_update_current_timestamp_onupdate(); Fixes #629.	2017-09-15 01:04:57 +02:00
Dimitri Fontaine	a498313074	Implement support for MySQL FULLTEXT indexes. PostgreSQL btree indexes are limited in the size of the values they can index: values must fit in an index page (8kB). So when porting a MySQL full text index over full documents, we might get into an error like the following: index row size 2872 exceeds maximum 2712 for index "idx_5199509_search" To fix, query MySQL for the index type which is FULLTEXT rather than BTREE in those cases, and port it over to a PostgreSQL Full Text index with an hard-coded 'simple' configuration, such as the following test case: CREATE INDEX idx_75421_search ON mysql.fcm_batches USING gin(to_tsvector('simple', raw_payload)); Of course users might want to use a better configuration, including proper dictionnary for the documents. When using PostgreSQL each document may have its own configuration attached and yet they can all get indexed into the same index, so that's a task for the application developpers, not for pgloader. In passing, fix the list-typenames-without-btree-support.sql query to return separate entries for each index type rather than an {array,representation} of the result, as Postmodern won't turn the PostgreSQL array into a Common Lisp array by default. I'm kept wondering how it worked before. Fix #569.	2017-09-14 15:40:34 +02:00
Dimitri Fontaine	987c0703ad	Some default values come properly quoted from MariaDB now. Adjust the default value formating to check if the default value is already single-quoted and only add new 'single quotes' when it's not the case. Apparently ENUM default values in MariaDB 10 are now properly single quoted.	2017-09-14 15:39:04 +02:00
Dimitri Fontaine	72c58306ba	Fix the previous fix. See #614. Again. Should be ok now.	2017-08-25 01:56:34 +02:00
Dimitri Fontaine	f20a5a0667	Fix schema name comparing with quoted schema names. In the previous commit we introduced support for database names including spaces, which means that by default pgloader creates a target schema in PostgreSQL with a space in its name. That works well as soon as you always double-quote the schema name, which pgloader does. Now, in our internal catalogs, we keep the schema name double-quoted. And when comparing that schema names with quotes to the raw schema name from PostgreSQL, they won't match, and pgloader tries to create the schema again: ERROR Database error 42P06: schema "my sql" already exists Fix the comparing to compare unquoted schema name, fix #614 again: the previous fix would only work the first time.	2017-08-25 01:47:49 +02:00
Dimitri Fontaine	4fcb24f448	Reintroduce manual Garbage Collect in SBCL. It seems that SBCL still needs some help in deciding when to GC with very large values. In a test case with a “data” column averaging 375kB (up to about 3 MB per datum), it allows much larger batch size and prefetch rows settings without entering lldb.	2017-08-23 16:27:14 +02:00
Dimitri Fontaine	4f9eb8c06b	Track bytes sent to PostgreSQL. The pgstate infrastructure already had lots of details about what's going on, add to it the information about how many bytes are sent in every batch, and use this information in the monitor when something long is happening to display how many rows we sent from the beginning for this (supposedly) huge table, along with bytes and speed (bytes per seconds).	2017-08-23 11:55:49 +02:00
Dimitri Fontaine	1f242cd29e	Fix comment support to schema qualify target tables.	2017-08-23 11:26:08 +02:00
Dimitri Fontaine	28db6b9f13	Desultory cleanup of a useless declaim.	2017-08-21 16:46:32 +02:00
Dimitri Fontaine	03a8d57a50	Review --verbose log message. The verbosity is not that easy to adjust. Remove useless messages and add a new one telling when the COPY of a table is done. As we might have to wait for some time for indexes being built. keep the CREATE INDEX lines. Also keep the ALTER TABLE both for primary keys and foreign keys, again because the user might have to wait for quite some time.	2017-08-21 15:27:13 +02:00
Dimitri Fontaine	952e7da191	Bug fix CREATE TYPE in schema (previous patch). The previous patch fixed CREATE TYPE so that ENUM types are created in the same schema than the table using them, but failed to update the DROP TYPE statements to also target this schema...	2017-08-10 21:19:25 +02:00
Dimitri Fontaine	5a65da2147	Create new types in the proper schema. Previously to this patch, pgloader wouldn't care about which schema it creates extra types in. Extra types are mainly ENUM and SET support from MySQL. Now, pgloader creates those extra PostgreSQL ENUM types in the same schema as the table using them, which is a more sound default.	2017-08-10 18:57:09 +02:00
Dimitri Fontaine	5c1c4bf3ff	Fix MySQL Enum parsing. We use a CSV parser for the MySQL enum values, but the quote escaping wasn't properly setup: MySQL quotes ENUM values with a single-quote (') and uses two of them ('') for escaping single-quotes when found in the ENUM value itself. Fixes #597.	2017-08-01 18:40:27 +02:00
Dimitri Fontaine	dfe5c38185	Fix quoting policy in PostgreSQL ddl formating. We already have apply-identifier-case and identifier-case to decide how and when to quote our SQL object names, so don't force extra quotes in format string: refrain from using ~s.	2017-07-06 09:47:48 +02:00
Dimitri Fontaine	9da012ca51	Fix identifiers quoting when reading PostgreSQL catalogs. We sure can trust PostgreSQL to use names it knows how to handle. Still, it will be happy to store in its catalogs names containing upper case, and in that case we must quote them.	2017-07-06 03:16:06 +02:00

1 2 3 4 5

218 Commits