pgloader

mirror of https://github.com/dimitri/pgloader.git synced 2025-08-08 07:16:58 +02:00

Author	SHA1	Message	Date
Conor McLaughlin	bc4df4f28c	debug(mssql): output values	2024-02-20 01:22:14 +08:00
Dimitri Fontaine	86b6a5cb80	We COPY the MS SQL data in the MS SQL column ordering. Fix #1124.	2020-04-05 15:12:46 +02:00
Dimitri Fontaine	b8da7dd2e9	Generic Function API for Materialized Views support. (#970 ) Implement a generic-function API to discover the source database schema and populate pgloader internal version of the catalogs. Cut down three copies of about the same code-path down to a single shared one, thanks to applying some amount of OOP to the code.	2019-05-20 19:28:38 +02:00
Dimitri Fontaine	c83a0375a0	Fix glitch in Materialized Views support for MSSQL. Thanks to @sorix6 for a bug report with a fix! Fixes #928.	2019-04-14 00:23:37 +02:00
Dimitri Fontaine	c019c16113	Implement MATERIALIZE VIEWS support for MS SQL, and distribute. The latter is not tested yet, but should have no impact if not used. Given how rare it is that I get a chance to play around with a MS SQL instance anyway, it might be better to push blind changes for it when it doesn't impact existing features…	2018-12-19 01:25:27 +01:00
Dimitri Fontaine	5c10f12a07	Fix duplicate package names. In a previous commit we re-used the package name pgloader.copy for the now separated implementation of the COPY protocol, but this package was already in use for the implementation of the COPY file format as a pgloader source. Oops. And CCL was happily doing its magic anyway, so that I've been blind to the problem. To fix, rename the new package pgloader.pgcopy, and to avoid having to deal with other problems of the same kind in the future, rename every source package pgloader.source.<format>, so that we now have pgloader.source.copy and pgloader.pgcopy, two visibily different packages to deal with. This light refactoring came with a challenge tho. The split in between the pgloader.sources API and the rest of the code involved some circular depencendies in the namespaces. CL is pretty flexible here because it can reload code definitions at runtime, but it was still a mess. To untangle it, implement a new namespace, the pgloader.load package, where we can use the pgloader.sources API and the pgloader.connection and pgloader.pgsql APIs too. A little problem gave birth to quite a massive patch. As it happens when refactoring and cleaning-up the dirt in any large enough project, right? See #748.	2018-02-24 19:24:22 +01:00
Dimitri Fontaine	a799cd5f5f	Improve error handling for MS SQL. In particular, implement more solid handling of poorly encoded data or badly setup connections, by reporting the error and continuing the load.	2017-01-28 17:47:44 +01:00
Dimitri Fontaine	35155654df	Allow to ALTER TABLE ... IN SCHEMA. That brings the ALTER TABLE feature to MS SQL source.	2016-03-26 20:50:05 +01:00
Dimitri Fontaine	fcc6e8f813	Implement ALTER SCHEMA ... RENAME TO... That's only available for MS SQL as of now, as it's the only source database we have where the notion of a schema makes sense. Fix #224.	2016-03-26 20:25:03 +01:00
Dimitri Fontaine	f1fe9ab702	Assorted fixes to MS SQL support. Having been given a test instance of a MS SQL database allows to quickly fix a series of assorted bugs related to schema handling of MS SQL databases. As it's the only source with a proper notion of schema that pgloader supports currently, it's not a surprise we had them. Fix #343. Fix #349. Fix #354.	2016-03-16 21:43:04 +01:00
Dimitri Fontaine	b026a860c1	Fix MS SQL fetch metadata function. It should return the fetched catalog rather than the count of objects, which is only used for statistics purposes. Fix #349. This problem once again shows that we lack proper testing environment for MS SQL source :/	2016-03-02 16:20:55 +01:00
Dimitri Fontaine	9e4938cea4	Implement PostgreSQL catalogs data structure. In order to share more code in between the different source types, finally have a go at the quite horrible mess of anonymous data structures floating around. Having a catalog and schema instances not only allows for code cleanup, but will also allow to implement some bug fixes and wishlist items such as mapping tables from a schema to another one. Also, supporting database sources having a notion of "schema" (in between "catalog" and "table") should get easier, including getting on-par with MySQL in the MS SQL support (materialized views has been asked for already). See #320, #316, #224 for references and a notion of progress being made. In passing, also clean up the copy-databases methods for database source types, so that they all use a fetch-metadata generic function and a prepare-pgsql-database and a complete-pgsql-database generic function. Actually, a single method does the job here. The responsibility of introspecting the source to populate the internal catalog/schema representation is now held by the fetch-metadata generic function, which in turn will call the specialized versions of list-all-columns and friends implementations. Once the catalog has been fetched, an explicit CAST call is then needed before we can continue. Finally, the fields/columns/transforms slots in the copy objects are still being used by the operative code, so the internal catalog representation is only used up to starting the data copy step, where the copy class instances are then all that's used. This might be refactored again in a follow-up patch.	2015-12-30 21:53:01 +01:00
Dimitri Fontaine	633067a0fd	Allow more parallelism in database migrations. The newly added statistics are showing that read+write times are not enough to explain how long we wait for the data copying, so it must be the workers setup rather than the workers themselves. From there, let lparallel work its magic in scheduling the work we do in parallel in pgloader: rather than doing blocking receive-result calls for each table, only receive-result at the end of the whole copy-database processing. On test data here on the laptop we go from 6s to 3s to migrate the sakila database from MySQL to PostgreSQL: that's because we have lots of very small tables, so the cost of waiting after each COPY added up quite quickly. In passing, stop sharing the same connection object in between parallel workers that used to be controlled active in-sequence, see the new API clone-connection (which takes over new-pgsql-connection).	2015-10-20 22:15:55 +02:00
Dimitri Fontaine	41e9eebd54	Rationalize common generic API implementation. When devising the common API, the first step has been to implement specific methods for each generic function of the protocol. It now appears that in some cases we don't need the extra level of flexibility: each change of the API has been systematically reported to all the specific methods, so just use a single generic definition where possible. In particular, introduce new intermediate class for COPY subclasses allowing to share more common code in the methods implementation, rather than having to copy/paste and maintain several versions of the same code. It would be good to be able to centralize more code for the database sources and how they are organized around metadata/import-data/complete schema, but it doesn't look obvious how to do it just now.	2015-10-05 21:25:21 +02:00
Dimitri Fontaine	0d9c2119b1	Send one update-stats message per batch. Update the stats used to be a quite simple incf and doing it once per read row was good enough, but now that it involves sending a message to the monitor thread let's only send a message per batch, reducing the communication load here.	2015-10-05 18:04:08 +02:00
Dimitri Fontaine	96a33de084	Review the stats and reporting code organisation. In order to later be able to have more worker threads sharing the load (multiple readers and/or writers, maybe more specialized threads too), have all the stats be managed centrally by a single thread. We already have a "monitor" thread that get passed log messages so that the output buffer is not subject to race conditions, extend its use to also deal with statistics messages. In the current code, we send a message each time we read a row. In some future commits we should probably reduce the messaging here to something like one message per batch in the common case. Also, as a nice side effect of the code simplification and refactoring this fixes #283 wherein the before/after sections of individual CSV files within an ARCHIVE command where not counted in the reporting.	2015-10-05 01:46:29 +02:00
Dimitri Fontaine	92d27f4f98	Allow quote/downcase identifiers option for MS SQL. As seen in #287 the previous decision to force quoting to :none is wrong, because index names in MS SQL source database might contain spaces, and then need to be quoted. Let's see what happens if we do it the usual way for MS SQL too, and allow users to control the quoting behaviour of pgloader here.	2015-09-03 23:34:25 +02:00
Dimitri Fontaine	6fc40c4844	Implement MS SQL option to skip creating schemas, fix #263 . Allow the user to control whether pgloader should create the same set of schema as found on the MS SQL database.	2015-08-15 16:10:15 +02:00
Dimitri Fontaine	48f451bdbc	Implement the option to disable triggers when loading data. This option is dangerous and allows to skip ALL triggers when loading data against PostgreSQL. This includes foreign key constraints definitions and will allow loading data out of order. When using both the options "create no table" and "disable triggers" it will be possible to load data into a schema prepared by your favorite external tool, at the cost of not validating FK constraints. Use with care. Fix #167.	2015-02-19 15:05:10 +01:00
Pascal Borreli	1a18b5cfac	Fixed typos	2015-02-18 23:17:16 +00:00
Dimitri Fontaine	13faf3e4f8	Blind try at fixing #158 . The call to format-pgsql-create-fkey was passing the fkey cons rather than just the fk definition structure as an argument.	2015-01-23 19:59:04 +01:00
Dimitri Fontaine	302a7d402b	Refactor connection handling, and clean-up many things. That's the big refactoring patch I've been sitting on for too long. First, refactor connection handling to use a uniformed "connection" concept (class and generic functions API) everywhere, so that the COPY derived objects just use that in their :source-db and :target-db slots. Given that, we don't need no messing around with pgconn and myconn- and other special variables at all anywhere in the tree. Second, clean up some oddities accumulated over time, where some parts of the code didn't get the memo when new API got into place. Third, fix any other oddity or missing part found while doing those first two activities, it was long overdue anyway...	2014-12-26 21:50:29 +01:00

22 Commits