pgloader

mirror of https://github.com/dimitri/pgloader.git synced 2026-01-25 17:11:02 +01:00

Author	SHA1	Message	Date
Dimitri Fontaine	47c22776f2	Cleanup and fix yesterday's refactoring of pgconn parameters.	2014-12-17 11:35:10 +01:00
Dimitri Fontaine	87f8e2392e	Cleanup a double pg-setting setting. That is now taken care of in the main query macros.	2014-12-16 18:46:07 +01:00
Dimitri Fontaine	073f012d1a	Add support for SSL modes in the PG connection string, fix #137 . In passing, refactor the *pgconn- dynamic bindings in favor of directly using the connection property list straight from the connection string parser, processing it when necessary. That allows to make it simple to add an internal :use-ssl property.	2014-12-16 18:45:43 +01:00
Dimitri Fontaine	d4cab3a81e	Fix MSSQL index and foreign key names. First, the index names in MS SQL, as in MySQL, are only unique per table, whereas they need to be globally unique (per database) in PostgreSQL. So reuse the infrastructure we had for MySQL here. Second, the way we trick table names in index and fkey structures means that we already did quote the names and we don't want to quote them again, so add a new possible identifier-case value to handle the case where nothing is to be done, pretty please.	2014-11-25 00:42:37 +01:00
Dimitri Fontaine	f263d1b2a4	Implement Foreign Key support for MSSQL. Piggyback as much as possible on the work already done for MySQL.	2014-11-24 23:42:19 +01:00
Dimitri Fontaine	e72325ee25	Improve Index build concurrency. Rather than doing ALTER TABLE directly, use CREATE UNIQUE INDEX in the all in parallel concurrent index build per table, and only in the end game "upgrade" that unique index into a PRIMARY KEY using ALTER TABLE. The reason why it's a good idea to do that is to avoid an ACCESS EXCLUSIVE LOCK at ALTER TABLE time, which is killing our index build concurrency.	2014-11-24 22:05:22 +01:00
Dimitri Fontaine	5b87b1a85e	Refactor identifier-case option into a dynamic binding. That makes it much easier to use from about anywhere in the code, which is what is needed. In passing, fix #129.	2014-11-21 23:32:02 +01:00
Dimitri Fontaine	ed853a7bea	Allow pgloader to work on windows.	2014-11-06 22:12:20 +01:00
Dimitri Fontaine	be9abe48fe	Cleanup some pgsql connection handling.	2014-09-10 22:20:20 +02:00
Neil Gentleman	16d3edf5be	check for reserved keyword after downcasing uppercase USER isn't reserved, but lowercase is	2014-08-08 18:25:06 -07:00
Dimitri Fontaine	624077bb95	Count bytes only once when under memory watch.	2014-08-03 19:48:49 +02:00
Dimitri Fontaine	3b2119cae4	Don't catch all and any errors for retrying batches. In particular a non existing table or column shouldn't be considered as an error we can just retry upon receiving.	2014-07-21 15:38:53 +02:00
Dimitri Fontaine	f7d251ed86	Fix quoting of TRUNCATE command, fix #84 . That patch is not a principaled approach at fixing the problem but should allow for not messing up with fully qualified table names. A proper way to do it would be to have a pgsql object name structure composed of the catalog, the schema and the name as separate entries, with assorted API to print that object properly. That's for another day though.	2014-06-20 13:10:39 +02:00
Dimitri Fontaine	0cba5edacd	Handle the whole of PostgreSQL errors. In a recent test case about camelCasing column names, another kind of error did pop up and there's really no point not to handle it correctly.	2014-06-16 17:31:41 +02:00
Dimitri Fontaine	ae63c9b85c	Improve COPY CONTEXT message parsing, fix #67 . When adding the CONTEXT message parsing I totally forgot that PostgreSQL provides a nice error message translation capability. The code now copes better with the situation, using a more advanced regular expression. We could inline the known translations in the matching, but that would be tedious to maintain, so we just use loose matching rules here.	2014-05-28 17:23:08 +02:00
Dimitri Fontaine	89d1ab460d	Handle both PostgreSQL reserved keywords catcode, fix #63 .	2014-05-27 17:36:00 +02:00
Dimitri Fontaine	e1bf53906d	Don't send over useless verbose log messages. When in :data logging mode we log the whole data set as we read then write it, which is quite a lot of data. Our current logging system works by filling up a queue that the cl-log lib is then fed from, and sending lots of data in that queue is way expensive, stop doing that. Hopefully we don't need to revisit the logs more than that, the other messages should be few enough not to count much when doing a full load.	2014-05-26 16:59:12 +02:00
Dimitri Fontaine	2637bb7e81	Avoid double logging the TRUNCATE call, that's scary.	2014-05-26 15:47:20 +02:00
Dimitri Fontaine	51b9618cf6	Fix a call to truncate-tables which didn't get the memo. In passing, have a default identifier-case of :downcase.	2014-05-26 10:59:35 +02:00
Dimitri Fontaine	b1ba09a21b	Handle MySQL FK column names idenfier case, fix #62 . The code forgot completely that MySQL column name references in foreign key definitions have to follow the identifier case rules, this patch fix that. To be able to do that, we need to parse the GROUP_CONCAT() result that lists the FK columns, as there's apparently no arrays in MySQL. The problem here is that about any character is allowed in column names when `quoted`, so using a comma here might reveal to be fragile later.	2014-05-22 12:35:54 +02:00
Dimitri Fontaine	e710cacad1	Truncate all tables in a single command, fix #61 . The truncate command is only sent to PostgreSQL when we didn't just CREATE TABLE before. Some refactoring would be necessary to fit the TRUNCATE command within the same transaction as the CREATE TABLE command, for PostgreSQL performances. This patch has been testing with MySQL and SQLite sources, the trick is that to be able to test it, it's needed to first make a full import (creating the target tables), so the test are not modified yet.	2014-05-19 18:07:35 +02:00
Dimitri Fontaine	6d92dc251f	Fix another useless use of loop.	2014-05-11 18:49:31 +02:00
Dimitri Fontaine	7fa95c1135	Fix bug #39 wherein unix domain sockets didn't make it properly to cl-postgres.	2014-02-24 17:23:17 +01:00
Dimitri Fontaine	8f6915d626	Fix issur #29 , using proper quoting. The patch from pull request #30 was hard-coding the PostgreSQL side quoting, we are using the quote_ident() function instead, as it's now available in every PostgreSQL production release (8.4 included).	2014-02-08 17:31:59 +01:00
Dimitri Fontaine	a8b0f91f37	Allow optional control of batch memory footprint, see #16 and #22 . With the new internal setting copy-batch-size it's now possible to instruct pgloader to close batches early (before copy-batch-rows limit) when crossing the byte count threshold. When set to 20 MB it allows the new test case (exhausted) to pass under SBCL and CCL, and there's no measurable cost when copy-batch-size is set to nil (its default value) in the testing done. This patch is published without any way to tune the values from the command language yet, that's the next step once its been proven effective.	2014-01-26 23:22:18 +01:00
Dimitri Fontaine	ca0d25d3b2	Provide a new log level, :data, activated when both --debug and --verbose are used.	2014-01-26 17:49:20 +01:00
Dimitri Fontaine	db947e1467	Rework reader and writer data exchange. With this patch, the whole data massaging and final formating into the PostgreSQL COPY TEXT format is done by the reader thread, which publishes a batch at a time in the communication channel: a lparallel.queue object. Before that, the raw vectors where pushed directly in the queue, offering more flexibility to adjust to the reader and writer IO rates and capabilities, but impeding the ability of the Garbage Collector: data still in the queue was not collected even if not needed anymore. The new model also uses less memory, and allows a better control over what amount of data stays in memory. The new concurrent-batches parameter should be key to being able to process huge rows. The intent is to offering a way for the users to tune concurrent-batches down to 1 for sources with massive per-row memory footprint. Even better would be to find a way to automatically adjust the setting without spending too much time counting the bytes we're batching. Preliminary tests show no sensible impact on performances from this patch, even some improvements in cases.	2014-01-25 23:54:49 +01:00
Dimitri Fontaine	c50164e53d	Manage the whole class of "integrity errors" also when retrying a batch...	2014-01-24 15:10:03 +01:00
Dimitri Fontaine	4cbe4b3218	Manage the whole class of "integrity violation" errors.	2014-01-23 14:59:46 +01:00
Dimitri Fontaine	9455752805	Review and simplify batch retry processing.	2014-01-23 00:15:57 +01:00
Dimitri Fontaine	7c238f45f2	Fix batch retry handling, broken in previous refactoring. Fixes #22 .	2014-01-22 22:52:18 +01:00
Dimitri Fontaine	a5c661dd4a	Cleanup error recovery logging.	2014-01-22 17:57:56 +01:00
Dimitri Fontaine	6e4a3e2165	Fix parsing COPY error message without column information, see issue #22 .	2014-01-20 17:19:41 +01:00
Dimitri Fontaine	ef358c0b7d	Take benefits of PostgreSQL COPY error CONTEXT. This message has the line number where the erroneous data was found on the server, and given the pre-processing we already done at that point, it's easy to convert that number into an index into the current batch, an array. To do do, we need Postmodern to expose the CONTEXT error message and we need to parse it. The following pull request cares about the Postmodern side of things: https://github.com/marijnh/Postmodern/pull/46 The parsing is done as simply as possible, only assuming that the error message is using comma separators and having the line number in second position. The parsing as done here should still work with localized message strings. CONTEXT: COPY errors, line 3, column b: "2006-13-11" This change should significantly reduce the cost of error processing.	2013-12-25 21:43:22 +01:00
Dimitri Fontaine	fe302af221	Refactor the dbname API to feed from the connection string directly.	2013-12-20 17:24:02 +01:00
Dimitri Fontaine	49f2567b0d	Improve stats on meta-data and before sections.	2013-12-20 10:50:27 +01:00
Dimitri Fontaine	aca04b1514	Fix problem found when trying to load the code with CCL.	2013-12-19 23:08:02 +01:00
Dimitri Fontaine	3a5d618cc8	Switch to using vectors for representing rows, minimize consing.	2013-12-18 23:30:11 +01:00
Dimitri Fontaine	82c4bc9e9e	Switch the pgsql batch implementation to using arrays to reduce consing.	2013-12-18 18:31:06 +01:00
Dimitri Fontaine	451861cfd5	Don't query PostgreSQL for the OIDs of an empty list of tables.	2013-11-24 16:43:59 +01:00
Dimitri Fontaine	4bc4cc53a7	Add support for connecting to PostgreSQL with Unix Domain Sockets, fixing #15 .	2013-11-22 17:19:48 +01:00
Dimitri Fontaine	6a6684bd8b	Desultory improve the SQLite importer.	2013-11-21 21:34:02 +01:00
Dimitri Fontaine	d99b859c3f	Improve and fix COPY error handling, transactions, connections, and GUCs.	2013-11-13 23:54:41 +01:00
Dimitri Fontaine	c16a56690a	Fix handling of errors in utility threads (create index and such).	2013-11-13 22:36:29 +01:00
Dimitri Fontaine	2a65a1d39f	Fix setting of PostgreSQL GUCs and transaction usage for DDL.	2013-11-12 11:34:31 +01:00
Dimitri Fontaine	f0cfd4554d	Some code cleanup.	2013-11-08 21:28:09 +01:00
Dimitri Fontaine	8c9af591e3	Cleanup where to add the table oids for MySQL indexes migration.	2013-11-08 21:09:50 +01:00
Dimitri Fontaine	47aaadea16	Implement better (hopefully proper) management of PostgreSQL error and warnings.	2013-11-08 18:45:51 +01:00
Dimitri Fontaine	e9b734dc41	Reverting "Shorten column names in the application to bypass a postmodern bug (or something)." Revert "Shorten column names in the application to bypass a postmodern bug (or something)." This reverts commit 240574a1a5f71edefc19a4b0f35f37862bdfeacc.	2013-11-08 17:27:45 +01:00
Dimitri Fontaine	240574a1a5	Shorten column names in the application to bypass a postmodern bug (or something).	2013-11-08 00:10:49 +01:00

1 2

62 Commits