pgloader

mirror of https://github.com/dimitri/pgloader.git synced 2025-08-14 02:06:59 +02:00

Author	SHA1	Message	Date
Dimitri Fontaine	fe3601b04c	Fix SQLite index support, add foreign keys support. It turns out recent changes broke tne SQLite index support (from adding support for MS SQL partial/filtered indexes), so fix it by using the pgsql-index structure rather than the specific sqlite-idx one. In passing, improve detection of PRIMARY KEY indexes, which was still lacking. This work showed that the introspection done by pgloader was wrong, it's way more crazy that we though, so adjust the code to loop over PRAGMA calls for each object we inspect. While adding PRAGMA calls, add support for foreign keys too, we have the code infrastructure that makes it easy now.	2016-03-27 20:39:13 +02:00
Dimitri Fontaine	c724018840	Implement ALTER TABLE clause for MySQL migrations. The new ALTER TABLE facility allows to act on tables found in the MySQL database before the migration happens. In this patch the only provided actions are RENAME TO and SET SCHEMA, which fixes #224. In order to be able to provide the same option for MS SQL users, we will have to make it work at the SCHEMA level (ALTER SCHEMA ... RENAME TO ...) and modify the internal schema-struct so that the schema slot of our table instances are a schema instance rather than its name. Lacking MS SQL test database and instance, the facility is not yet provided for that source type.	2016-03-06 21:51:33 +01:00
Dimitri Fontaine	486be8c068	SQLite integer default values might be quoted. Fix #351 by having a new transformation function to process SQLite integers, that may be quoted...	2016-03-03 14:59:27 +01:00
Dimitri Fontaine	62edd5a2c8	Register "nocase" as a SQLite noise word. SQLite types include "text nocase" apparently, so add "nocase" as one of the managed noise words. It might be time we handle those the other way round, with a whitelist of expected tokens somewhere in the type definition rather than a blacklist of unknown words to exclude... Anyway, fix #350.	2016-03-03 00:21:43 +01:00
Dimitri Fontaine	bfdbb2145b	Fix with drop index option, fix #323 . Have PostgreSQL always fully qualify the index related objects and SQL definition statements when fetching the list of indexes of a table, by playing with an empty search_path. Also improve the whole index creation by passing the table object as the context where to derive the table-name from, so that schema qualified tables are taken into account properly.	2016-01-15 15:04:07 +01:00
Dimitri Fontaine	9e4938cea4	Implement PostgreSQL catalogs data structure. In order to share more code in between the different source types, finally have a go at the quite horrible mess of anonymous data structures floating around. Having a catalog and schema instances not only allows for code cleanup, but will also allow to implement some bug fixes and wishlist items such as mapping tables from a schema to another one. Also, supporting database sources having a notion of "schema" (in between "catalog" and "table") should get easier, including getting on-par with MySQL in the MS SQL support (materialized views has been asked for already). See #320, #316, #224 for references and a notion of progress being made. In passing, also clean up the copy-databases methods for database source types, so that they all use a fetch-metadata generic function and a prepare-pgsql-database and a complete-pgsql-database generic function. Actually, a single method does the job here. The responsibility of introspecting the source to populate the internal catalog/schema representation is now held by the fetch-metadata generic function, which in turn will call the specialized versions of list-all-columns and friends implementations. Once the catalog has been fetched, an explicit CAST call is then needed before we can continue. Finally, the fields/columns/transforms slots in the copy objects are still being used by the operative code, so the internal catalog representation is only used up to starting the data copy step, where the copy class instances are then all that's used. This might be refactored again in a follow-up patch.	2015-12-30 21:53:01 +01:00
Dimitri Fontaine	2dd7f68a30	Fix index completion management in MySQL and SQLite. We used to wait for the wrong number of workers, meaning the rest of the code began running before the indexes where all available. A user report where one of the indexes takes a very long time to compute made it obvious. In passing, also improve reporting of those rendez-vous sections.	2015-11-29 17:29:57 +01:00
Dimitri Fontaine	e23de0ce9f	Improve SQLite table names filtering. Filter the list of tables we migrate directly from the SQLite query, avoiding to return useless data. To do that, use the LIKE pattern matching supported by SQLite, where the REGEX operator is only available when extra features are loaded apparently. See #310 where filtering out the view still caused errors in the loading.	2015-11-22 22:10:26 +01:00
Dimitri Fontaine	a81f017222	Review SQLite integration with recent changes. The current way to do parallelism in pgloader was half baked in the SQLite source implementation, get it up to speed again.	2015-11-22 21:30:20 +01:00
Dimitri Fontaine	88bb4e0b95	Register "auto_increment" as a SQLite noise word. As seen in #302 it's possible to define a SQLite column of type "integer auto_increment". In my testing tho, it doesn't mean a thing. Worse than that, apparently when an integer column is created that is also used as the primary key of the table, the notation "integer auto_increment primary key" disables the rowid behavior that is certainly expected. Let's not yet mark the bug as fixed as I suppose we will have to do something about this rowid mess. Thanks again SQLite.	2015-10-22 21:55:34 +02:00
Dimitri Fontaine	633067a0fd	Allow more parallelism in database migrations. The newly added statistics are showing that read+write times are not enough to explain how long we wait for the data copying, so it must be the workers setup rather than the workers themselves. From there, let lparallel work its magic in scheduling the work we do in parallel in pgloader: rather than doing blocking receive-result calls for each table, only receive-result at the end of the whole copy-database processing. On test data here on the laptop we go from 6s to 3s to migrate the sakila database from MySQL to PostgreSQL: that's because we have lots of very small tables, so the cost of waiting after each COPY added up quite quickly. In passing, stop sharing the same connection object in between parallel workers that used to be controlled active in-sequence, see the new API clone-connection (which takes over new-pgsql-connection).	2015-10-20 22:15:55 +02:00
Dimitri Fontaine	41e9eebd54	Rationalize common generic API implementation. When devising the common API, the first step has been to implement specific methods for each generic function of the protocol. It now appears that in some cases we don't need the extra level of flexibility: each change of the API has been systematically reported to all the specific methods, so just use a single generic definition where possible. In particular, introduce new intermediate class for COPY subclasses allowing to share more common code in the methods implementation, rather than having to copy/paste and maintain several versions of the same code. It would be good to be able to centralize more code for the database sources and how they are organized around metadata/import-data/complete schema, but it doesn't look obvious how to do it just now.	2015-10-05 21:25:21 +02:00
Dimitri Fontaine	0d9c2119b1	Send one update-stats message per batch. Update the stats used to be a quite simple incf and doing it once per read row was good enough, but now that it involves sending a message to the monitor thread let's only send a message per batch, reducing the communication load here.	2015-10-05 18:04:08 +02:00
Dimitri Fontaine	96a33de084	Review the stats and reporting code organisation. In order to later be able to have more worker threads sharing the load (multiple readers and/or writers, maybe more specialized threads too), have all the stats be managed centrally by a single thread. We already have a "monitor" thread that get passed log messages so that the output buffer is not subject to race conditions, extend its use to also deal with statistics messages. In the current code, we send a message each time we read a row. In some future commits we should probably reduce the messaging here to something like one message per batch in the common case. Also, as a nice side effect of the code simplification and refactoring this fixes #283 wherein the before/after sections of individual CSV files within an ARCHIVE command where not counted in the reporting.	2015-10-05 01:46:29 +02:00
Dimitri Fontaine	75727df72f	Quote table names when migrating from SQLite, fix #281 . Apparently I just forgot to apply any smartness whatsoever to SQLite identifiers and just copied them as they are to PostgreSQL. Change that by calling apply-identifier-case.	2015-08-25 01:13:19 +02:00
Dimitri Fontaine	ea35eb575d	Implement --dry-run option, fix #264 . The dry run option will currently only check database connections, but as that happens after having correctly parsed the load file, it allows to also check that the command file is correct for the parser. Note that the list load-data API isn't subject to the dry-run method. In passing, we add some more API entry points to the connection objects and we should actually clean the code base to use the new QUERY generic all over the place. It's for another patch tho.	2015-08-22 16:23:47 +02:00
Dimitri Fontaine	56a89e9b53	Cleanup schema data structure building. As reported by clisp maintainer (thanks jackdaniel!) when trying to load pgloader, we had redoundant labels function names in places. Get rid of those by pushing the new columns found directly at the end of the list, avoiding the bulky code to then reverse the complex anonymous data structure. The Real Fix™ would be to define proper structures where to hold all those database catalogs representation, but that's an invasive patch and now isn't a good time to write it. At least pgloader should load and run with clisp now.	2015-08-15 23:54:45 +02:00
Dimitri Fontaine	b55ded11e0	Fix read counters when reading data from SQLite.	2015-06-16 23:14:10 +02:00
Dimitri Fontaine	ff78ebf048	Improve SQLite values parsing, fix #231 . It turns out that SQLite3 data type handling is back to kick us wherever it hurts, this time by the driver deciding to return blob data (a vector of unsigned bytes) when we expect properly encoded text data. In the wikipedia data test case used to reproduce the bug, we're lucky enough that the byte vectors actually map to properly encoded strings. Of course doing the proper thing costs some performances. I'd like to be able to decide if I should blame the SQLite driver or the whole product on this one. The per-value data type handling still is a disaster in my book, tho, which means it's crucially important for pgloader to get it right and allow users to seemlessly migrate away from using such a system.	2015-05-14 21:08:19 +02:00
Dimitri Fontaine	3848ad6ae5	SQLite integers can host bigints, fix #227 .	2015-04-30 18:17:13 +02:00
Dimitri Fontaine	ebc0dcda4f	Allow for empty-string SQLite column types, fix #220 again.	2015-04-30 17:18:14 +02:00
Dimitri Fontaine	5759ae50bb	Handle SQLite typemod in type name normalisation. Should fix #220.	2015-04-28 21:33:25 +02:00
Dimitri Fontaine	48f451bdbc	Implement the option to disable triggers when loading data. This option is dangerous and allows to skip ALL triggers when loading data against PostgreSQL. This includes foreign key constraints definitions and will allow loading data out of order. When using both the options "create no table" and "disable triggers" it will be possible to load data into a schema prepared by your favorite external tool, at the cost of not validating FK constraints. Use with care. Fix #167.	2015-02-19 15:05:10 +01:00
Dimitri Fontaine	cd46b6cbed	Clean up common code for sources. Only move code around, creating a src/sources/common directory with several files in there so as to split the too big src/sources.lisp.	2015-01-08 23:17:40 +01:00
Dimitri Fontaine	25c39b05e2	Tidying up some more.	2014-12-27 00:30:10 +01:00
Dimitri Fontaine	302a7d402b	Refactor connection handling, and clean-up many things. That's the big refactoring patch I've been sitting on for too long. First, refactor connection handling to use a uniformed "connection" concept (class and generic functions API) everywhere, so that the COPY derived objects just use that in their :source-db and :target-db slots. Given that, we don't need no messing around with pgconn and myconn- and other special variables at all anywhere in the tree. Second, clean up some oddities accumulated over time, where some parts of the code didn't get the memo when new API got into place. Third, fix any other oddity or missing part found while doing those first two activities, it was long overdue anyway...	2014-12-26 21:50:29 +01:00
Dimitri Fontaine	5b87b1a85e	Refactor identifier-case option into a dynamic binding. That makes it much easier to use from about anywhere in the code, which is what is needed. In passing, fix #129.	2014-11-21 23:32:02 +01:00
Dimitri Fontaine	ca325ba799	Refactor the SQLite source files.	2014-11-09 22:59:30 +01:00

28 Commits