pgloader

mirror of https://github.com/dimitri/pgloader.git synced 2026-02-13 18:31:15 +01:00

Author	SHA1	Message	Date
Dimitri Fontaine	538464f078	Avoid operator is not unique errors. When the intarray extension is installed our PostgreSQL catalog query fails because we now have more than one operator solving smallint[] <@ smallint[]. It is easy to avoid that problem by casting to integer[], smallint being an implementation detail here anyway. Fix #532.	2017-04-06 23:55:06 +02:00
Dimitri Fontaine	0219f55071	Review DROP INDEX objects quoting. Force double-quoting of objects name in DROP INDEX commands by using the format directive ~s. The names of the objects we are dropping usually come from a PostgreSQL catalog, but still might contain force-quote conditions like starting with a number, as shown in #530. This fix certainly means we will have to review all the DDL formatting we do in pgloader and apply a single method of quoting all along. The simpler one is of course to force quote every object name in "", but it might not be the smartest one (what if some sources are sending already quoted object names, that needs a check), and it's certainly not the prettier way to go at it: people usually like to avoid unnecessary quotes, calling them clutter. Fix #530.	2017-04-01 22:37:26 +02:00
Dimitri Fontaine	1023577f50	Review internal database migration logic. Many options are now available to pgloader users, including short cuts that where not defined clearly enough. That could result in stupid things being done at times. In particular, when picking the "data only" option then indexes are not to be dropped before loading the data, but pgloader would still try and create them again at the end of the load, because the option that controls that behavior default to true and is not impacted by the "data only" choice. In this patch we review the logic and ensure it's applied in the same fashion in the different phases of the database migration: preparation, copying, rebuilding of indexes and completion of the database model. See also 96b2af6b2a2b163f7e9e3c0ba744da1733b23979 where we began fixing oddities but didn't go far enough.	2017-02-26 14:48:36 +01:00
Dimitri Fontaine	9e2b95d9b7	Implement support for PostgreSQL storage parameters. In PostgreSQL it is possible at CREATE TABLE time to set some extra storage parameters, the most useful of them in the context of pgloader being the FILLFACTOR. For the setting to be useful, it needs to be positionned at CREATE TABLE time, before we load the data. The BEFORE LOAD clause of the pgloader command allows to run SQL scripts that will be executed before the load, and even before the creation of the target schema when pgloader does that, which is nice for other use case. Here we implement a new `ALTER TABLE` rule that one can set in the pgloader command in order to change storage parameters at CREATE TABLE time: ALTER TABLE NAMES MATCHING ~/\./ SET (fillfactor='40') Fix #516.	2017-02-25 21:49:06 +01:00
Dimitri Fontaine	57dd9fcf47	Add int as an alias for integer. We cast MS SQL "int" type to "integer" in PostgreSQL, so add an entry in our type name mapping where they are known equivalent to avoid WARNINGs about the situation in DATA ONLY loads.	2017-02-25 17:54:57 +01:00
Dimitri Fontaine	5fd1e9f3aa	Fix catalog merge hasards. When reading table names from PostgreSQL, we might find some that need systematic quoting (such as names that begin with a digit). In that case, when later comparing the catalogs to match source database table names against PostgreSQL catalog table names, we need to unquote the PostgreSQL table name we are using. In passing, force the identifier-case to :none when reading object names from the PostgreSQL catalogs.	2017-02-25 17:53:08 +01:00
Dimitri Fontaine	dbf7d6e48f	Don't double-quote identifiers in catalog queries. Avoid double quoting the schema names when used in PostgreSQL catalog queries, where the identifiers are used as literal values and need to be single-quoted. Fix #476, again.	2017-01-10 21:12:34 +01:00
Dimitri Fontaine	8da09d7bed	Log PostgreSQL Catalog queries at SQL log level. See #476 where it would have been helpful to see the PostgreSQL catalog queries with `--log-min-messages sql` in the bug report. Also more generally useful.	2017-01-10 21:12:34 +01:00
Dimitri Fontaine	381ba18b50	Add a new log level: SQL. This sits between NOTICE and INFO, allowing to have a complete log of the SQL queries sent to the server while avoiding the very verbose trafic of the DEBUG log level. See #498.	2017-01-03 22:27:17 +01:00
Dimitri Fontaine	320a545533	Fix SQL types creation: consider views too. When migrating views from e.g. MySQL it is necessary to consider the user defined SQL types (ENUMs) those views might be using.	2016-12-18 19:31:21 +01:00
Dimitri Fontaine	ad56cf808b	Fix PostgreSQL index naming. A PostgreSQL index is always created in the same schema as the table it is defined against, and the CREATE INDEX command doesn't accept schema qualified index names.	2016-12-18 19:31:21 +01:00
Dimitri Fontaine	2dc733c4d6	Fix corner case in creating indexes again. When the option "drop indexes" is in use in loading data from a file, we collect the indexes from the PostgreSQL catalogs and then issue DROP commands against them before the load, then CREATE commands when it's done. The CREATE is done in parallel, and we create an lparallel kernel for that. The kernel must have a worker-count of at least 1, and we where not considering the case of 0 indexes on the target table. Fix #484.	2016-11-20 17:17:15 +01:00
Dimitri Fontaine	d7d36c5766	Review identifier case :quote. We added some confution about who's responsible to quote the SQL obejct names in between src/utils/quoting.lisp and src/pgsql/pgsql-ddl.lisp and as a result some migrations from MySQL with identifier case set to quote where broken, as in #439. To fix, remove any use of the format directive ~s in the PostgreSQL ddl output methods: we consider that the quoting of ~s is to be decided in apply-identifier-case. We then use ~a instead of ~s. Fix #439.	2016-09-17 22:45:45 +02:00
Dimitri Fontaine	5b6adb02b0	Implement and use DROP ... IF EXISTS. In cases where we have a WITH include drop option, we are generating lots of SQL DROP statements. We may be running an empty target database or in other situations where the target object of the DROP command might not exists. Add support for that case.	2016-09-10 18:01:04 +02:00
Dimitri Fontaine	f2dcf982d8	Fix stats collections in some cases. Calling a -with-timing from within a with-stats-collection macro is redundant and will have the numbers counted twice. Which in this case didn't happen because the stats label was manually copied, but borked with a typo in one copy.	2016-08-28 20:29:53 +02:00
Dimitri Fontaine	a86a606d55	Improve existing PostgreSQL database handling. When loading data into an existing PostgreSQL catalog, we DROP the indexes for better performance of the data loading. Some of the indexes are UNIQUE or even PRIMARY KEYS, and some FOREIGN KEYS might depend on them in the PostgreSQL dependency tracking of the catalog. We used to use the CASCADE option when dropping the indexes, which hides a bug: if we exclude from the load tables with foreign keys pointing to tables we target, then we would DROP those foreign keys because of the CASCADE option, but fail to install them again at the end of the load. To prevent that from happening, pgloader now query the PostgreSQL pg_depend system catalog to list the “missing” foreign keys and add them to our internal catalog representation, from which we know to DROP then CREATE the SQL object at the proper times. See #400 as this was an oversight in fixing this issue.	2016-08-10 22:02:06 +02:00
Dimitri Fontaine	43261e0016	Fix double-counting of fkeys in stats reports.	2016-08-08 21:09:15 +02:00
Dimitri Fontaine	53924fab01	Fix foreign key definition formatting. When we do have a condef (constraint definition in the PostgreSQL catalog slang), use it rather than trying to invent it again from the bits and pieces. See #400, which it actually fixes now...	2016-08-08 01:18:36 +02:00
Dimitri Fontaine	70572a2ea7	Implement support for existing target databases. Also known as the ORM case, it happens that other tools are used to create the target schema. In that case pgloader job is to fill in the exiting target tables with the data from the source tables. We still focus on load speed and pgloader will now DROP the constraints (Primary Key, Unique, Foreign Keys) and indexes before running the COPY statements, and re-install the schema it found in the target database once the data load is done. This behavior is activated when using the “create no tables” option as in the following test-case setup: with create no tables, include drop, truncate Fixes #400, for which I got a test-case to play with!	2016-08-06 20:19:15 +02:00
Dimitri Fontaine	2d47c4f0f5	Use internal catalog when loading from files. Replace the ad-hoc code that was used before in the load from file code path to use our full internal catalog representation, and adjust APIs to that end. The goal is to use catalogs everywhere in the PostgreSQL target API and allowing to process reason explicitely about source and target catalogs, see #400 for the main use case.	2016-08-05 11:42:06 +02:00
Dimitri Fontaine	2aedac7037	Improve our internal catalog representation. First, add index and foreign keys to the list of objects supported by the shared catalog facility, where is was only found in the pgsql schema specific package for historical raisons. Then also add to our catalog internal structures the notion of a trigger and a stored procedure, allowing for cleaner advanced default values support in the MySQL cast functions. Once we now have a proper and complete catalog, review the pgsql module DDL output function in terms of the catalog and rewrite the schema creation support so that it takes direct benefit of our internal catalogs representation. In passing, clean-up the code organisation of the pgsql target support module to be easier to work with. Next step consists of getting rid of src/pgsql/queries.lisp: this facility should be replaced by the usage of a target catalog that we fetch the usual way, thanks to the new src/pgsql/pgsql-schema.lisp file and list-all-* functions. That will in turn allow for an explicit step of merging the pre-existing PostgreSQL catalog when it's been created by other tools than pgloader, that is when migrating with the help of an ORM. See #400 for details.	2016-08-01 23:14:58 +02:00
Dimitri Fontaine	7daee9405f	Fix column names quoting in reset-all-sequences. The other user-provided names (schema and table) were already quoted using the quote_ident() PostgreSQL functio, but the column name (attname in the catalogs) were not. Blind attempt to fix #425.	2016-06-20 20:52:24 +02:00
Dimitri Fontaine	42e9e521e0	Add option "max parallel create index". By default, pgloader will start as many parallel CREATE INDEX commands as the maximum number of indexes you have on any single table that takes part in the load. As this number might be so great as to exhaust the target PostgreSQL server (e.g. maintenance_work_mem), we add an option to limit that to something reasonnable when the source schema isn't. Fix #386 in which 150 indexes are found on a single source table.	2016-04-11 17:40:52 +02:00
Dimitri Fontaine	31f8b5c5f0	Set application_name to 'pgloader' by default. It's always been possible to set application_name to anything really, making it easier to follow the PostgreSQL queries made by pgloader. Force that setting to 'pgloader' by default. Fix #387.	2016-04-11 17:14:38 +02:00
Dimitri Fontaine	fe3601b04c	Fix SQLite index support, add foreign keys support. It turns out recent changes broke tne SQLite index support (from adding support for MS SQL partial/filtered indexes), so fix it by using the pgsql-index structure rather than the specific sqlite-idx one. In passing, improve detection of PRIMARY KEY indexes, which was still lacking. This work showed that the introspection done by pgloader was wrong, it's way more crazy that we though, so adjust the code to loop over PRAGMA calls for each object we inspect. While adding PRAGMA calls, add support for foreign keys too, we have the code infrastructure that makes it easy now.	2016-03-27 20:39:13 +02:00
Dimitri Fontaine	cdc5d2f06b	Review on update CURRENT_TIMESTAMP support. Make it work on the second run, when the triggers and functions have already been deplyed, by doing the DROP function and trigger before we CREATE the table, then CREATE them again: we need to split the list again.	2016-03-27 19:13:33 +02:00
Dimitri Fontaine	d72c711b45	Implement support for on update CURRENT_TIMESTAMP. That's the MySQL slang for a simple ON UPDATE trigger, and that's what pgloader nows translate the expression to. Fix #195.	2016-03-27 01:01:40 +01:00
Dimitri Fontaine	fcc6e8f813	Implement ALTER SCHEMA ... RENAME TO... That's only available for MS SQL as of now, as it's the only source database we have where the notion of a schema makes sense. Fix #224.	2016-03-26 20:25:03 +01:00
Dimitri Fontaine	6f078daeb9	Ensure logging of errors. The first error of a batch was lost somewhere in the recent changes. My current best guess is that the rewrite of the copy-batch function made the handler-bind form setup by the handling-pgsql-notices macro ineffective, but I can't see why that is. See #85.	2016-03-26 17:51:38 +01:00
Dimitri Fontaine	e2fcd86868	Handle failure to convert index filters gracefully. We should not block any processing just because we can't parse an index. The best we can do just tonight is to try creating the index without the filter, ideally we would have to skip building the index entirely. That's for a later effort though, it's running late here. See #365.	2016-03-22 00:29:25 +01:00
Dimitri Fontaine	5e18cfd7d4	Implement support for partial indexes. MS SQL has a notion of a "filtered index" that matches the notion of a PostgreSQL partial index: the index only applies to the rows matching the index WHERE clause, or filter. The WHERE clause in both case are limited to simple expressions over a base table's row at a time, so we implement a limited WHERE clause parser for MS SQL filters and a transformation routine to rewrite the clause in PostgreSQL slang. In passing, we transform the filter constants using the same transformation functions as in the CAST rules, so that e.g. a MS SQL bit(1) value that got transformed into a PostgreSQL boolean is properly translated, as in the following example: MS SQL: "([deleted]=(0))" (that's from the catalogs) PostgreSQL: deleted = 'f' Of course the parser is still very badly tested, let's see what happens in the wild now. (Should) Fix #365.	2016-03-21 23:39:45 +01:00
Dimitri Fontaine	1ed07057fd	Implement --on-error-stop command line option. The implementation uses the dynamic binding on-error-stop so it's also available when pgloader is used as Common Lisp librairy. The (not-all-that-) recent changes made to the error handling make that implementation straightforward enough, so let's finally do it! Fix #85.	2016-03-21 20:52:50 +01:00
Dimitri Fontaine	8476c1a359	Allow setting search_path with multiple schemas. The PostgreSQL search_path allows multiple schemas and might even need it to be able to reference types and other tables. Allow setting more than one schema by using the fact that PostgreSQL schema names don't need to be individually quoted, and passing down the exact content of the SET search_path value down to PostgreSQL. Fix #359.	2016-03-20 20:54:08 +01:00
Dimitri Fontaine	3e8b7df0d3	Improve column formatting. Have a pretty-print option where we try to be nice for the reader, and don't use it in the CAST debug messages. Also allow working with the real maximum length of column names rather than hardcoding 22 cols...	2016-03-16 21:46:41 +01:00
Dimitri Fontaine	f1fe9ab702	Assorted fixes to MS SQL support. Having been given a test instance of a MS SQL database allows to quickly fix a series of assorted bugs related to schema handling of MS SQL databases. As it's the only source with a proper notion of schema that pgloader supports currently, it's not a surprise we had them. Fix #343. Fix #349. Fix #354.	2016-03-16 21:43:04 +01:00
Dimitri Fontaine	c724018840	Implement ALTER TABLE clause for MySQL migrations. The new ALTER TABLE facility allows to act on tables found in the MySQL database before the migration happens. In this patch the only provided actions are RENAME TO and SET SCHEMA, which fixes #224. In order to be able to provide the same option for MS SQL users, we will have to make it work at the SCHEMA level (ALTER SCHEMA ... RENAME TO ...) and modify the internal schema-struct so that the schema slot of our table instances are a schema instance rather than its name. Lacking MS SQL test database and instance, the facility is not yet provided for that source type.	2016-03-06 21:51:33 +01:00
Dimitri Fontaine	40c1581794	Review transaction and error handling in COPY. The PostgreSQL COPY protocol requires an explicit initialization phase that may fail, and in this case the Postmodern driver transaction is already dead, so there's no way we can even send ABORT to it. Review the error handling of our copy-batch function to cope with that fact, and add some logging of non-retryable errors we may have. Also improve the thread error reporting when using a binary image from where it might be difficult to open an interactive debugger, while still having the full blown Common Lisp debugging experience for the project developers. Add a test case for a missing column as in issue #339. Fix #339, see #337.	2016-02-21 15:56:06 +01:00
Dimitri Fontaine	76668c2626	Review package dependencies. The decision to use lots of different packages in pgloader has quite strong downsides at times, and the manual managment of dependencies is one of the, in particular how to avoid circular ones.	2016-01-31 18:42:01 +01:00
Dimitri Fontaine	d9d9e06c0f	Another attempt at fixing #323 . Rather than trying hard to have PostgreSQL fully qualify the index name with tricks around search_path setting at the time ::regclass is executed, simply join on pg_namespace to retrieve that schema in a new slot in our pgsql-index structure so that we can then reuse it when needed. Also add a test case for the scenario, including both a UNIQUE constraint and a classic index, because the DROP and CREATE/ALTER instructions differ.	2016-01-17 01:54:36 +01:00
Dimitri Fontaine	7dd69a11e1	Implement concurrency and workers for files sources. More than the syntax and API tweaks, this patch also make it so that a multi-file specification (using e.g. ALL FILENAMES IN DIRECTORY) can be loaded with several files in the group in parallel. To that effect, tweak again the md-connection and md-copy implementations.	2016-01-16 22:53:55 +01:00
Dimitri Fontaine	bfdbb2145b	Fix with drop index option, fix #323 . Have PostgreSQL always fully qualify the index related objects and SQL definition statements when fetching the list of indexes of a table, by playing with an empty search_path. Also improve the whole index creation by passing the table object as the context where to derive the table-name from, so that schema qualified tables are taken into account properly.	2016-01-15 15:04:07 +01:00
Dimitri Fontaine	1ff204c172	Typo fix.	2016-01-15 14:45:19 +01:00
Dimitri Fontaine	133028f58d	Desultory review code indentation.	2016-01-12 14:52:44 +01:00
Dimitri Fontaine	94ef8674ec	Typo fix (of sorts) Some API didn't get the table-name to table memo...	2016-01-11 01:42:18 +01:00
Dimitri Fontaine	a3fd22acd3	Review pgloader encoding story. Thanks to Common Lisp character data type, it's easy for pgloader to enforce always speaking to PostgreSQL in utf-8, and that's what has been done from the beginning actually. Now, without good reason for that, the first example of a SET clause that has been added to the docs where about how to set client_encoding, which should NOT be done. Fix that at the use level by removing the bad example from the docs and adding a WARNING whenever the client_encoding is set to a known bad value. It's a WARNING because we then simply force 'utf-8' anyway. Also, review completely the format-vector-row function to avoid doing double work with the Postmodern facilities we piggyback on. This was done halfway through and the utf-8 conversion was actually done twice.	2016-01-11 01:27:36 +01:00
Dimitri Fontaine	9e4938cea4	Implement PostgreSQL catalogs data structure. In order to share more code in between the different source types, finally have a go at the quite horrible mess of anonymous data structures floating around. Having a catalog and schema instances not only allows for code cleanup, but will also allow to implement some bug fixes and wishlist items such as mapping tables from a schema to another one. Also, supporting database sources having a notion of "schema" (in between "catalog" and "table") should get easier, including getting on-par with MySQL in the MS SQL support (materialized views has been asked for already). See #320, #316, #224 for references and a notion of progress being made. In passing, also clean up the copy-databases methods for database source types, so that they all use a fetch-metadata generic function and a prepare-pgsql-database and a complete-pgsql-database generic function. Actually, a single method does the job here. The responsibility of introspecting the source to populate the internal catalog/schema representation is now held by the fetch-metadata generic function, which in turn will call the specialized versions of list-all-columns and friends implementations. Once the catalog has been fetched, an explicit CAST call is then needed before we can continue. Finally, the fields/columns/transforms slots in the copy objects are still being used by the operative code, so the internal catalog representation is only used up to starting the data copy step, where the copy class instances are then all that's used. This might be refactored again in a follow-up patch.	2015-12-30 21:53:01 +01:00
Dimitri Fontaine	b4bfa18877	Fix more table name quoting, fix #163 again. Now that we can have several threads doing COPY, each of them need to know about the pgsql-reserved-keywords list. Make sure that's the case and in passing fix some call sites to apply-identifier-case. Also, more disturbingly, fix the code so that TRUNCATE is called from the main thread before giving control to the COPY threads, rather than having two concurrent threads doing the TRUNCATE twice. It's rather strange that we got no complaint from the field on that part...	2015-12-08 11:52:43 +01:00
Dimitri Fontaine	7c64a713d0	Fix PostgreSQL write times in the summary. It turns out the summary write times included time spent waiting for batches to be ready, which isn't fair to PostgreSQL COPY implementation, and moreover doesn't help figuring out the bottlenecks...	2015-11-29 23:23:30 +01:00
Dimitri Fontaine	cca44c800f	Simplify batch and transformation handling. Make batches of raw data straight from the reader output (map-rows) and have the transformation worker focus on changing the batch content from raw rows to copy strings. Also review the organisation of responsabilities in the code, allowing to move queue.lisp into utils/batch.lisp, renaming it as its scope has been reduced to only care about preparing batches. This came out of trying to have multiple workers concurrently processing the batches from the reader and feeding the hardcoded 2 COPY workers, but it failed for multiple reasons. All is left as of now is this cleanup, which seems to be on the faster side of things, which is always good.	2015-11-29 17:35:25 +01:00
Dimitri Fontaine	bc870ac96c	Use 2 copy threads per target table. It's been proven by Andres Freund benchmarks that the best number of parallel COPY threads concurrently active against a single table is 2 as of PostgreSQL current development version (up to 9.5 stable, might still apply to 9.6 depending on how we might solve the problem). Henceforth hardcode 2 COPY threads in pgloader. This also has the advantage that in the presence of lots of bad rows, we should sustain a better throughtput and not stall completely. Finally, also improve the MySQL setup to use 8 threads by default, that is be able to load two tables concurrently, each with 2 COPY workers, a reader and a transformer thread. It's all still experimental as far as performances go, next patches should bring the capability to configure the parallelism wanted from the command line and load command tho. Also, other source types will want to benefit from the same capabilities, it just happens that it's easier to play with MySQL first for some reasons here.	2015-11-17 17:06:36 +01:00

1 2 3 4

152 Commits