pgloader

mirror of https://github.com/dimitri/pgloader.git synced 2025-08-10 08:17:00 +02:00

Author	SHA1	Message	Date
Dimitri Fontaine	eabfbb9cc8	Fix schema qualified table names usage (more). When parsing table names in the target URI, we are careful of splitting the table and schema name and store them into a cons in that case. Not all sources methods got the memo, clean that up. See #182 and #186, a pull request I am now going to be able to accept. Also see #287 that should be helped by being able to apply #186.	2015-09-04 01:06:15 +02:00
Dimitri Fontaine	0068a45e1c	Fix parsing of qualified target table names, see #186 . We used to parse qualified table names as a simple string, which then breaks attempts to be smart about how to quote idenfifiers. Some sources are known to accept dots in quoted table names and we need to be able to process that properly without tripping on qualified table names too late. Current code might not be the best approach as it's just using either a cons or a string for table names internally, rather than defining a proper data structure with a schema and a name slot. Well, that's for a later cleanup patch, I happen to be lazy tonight.	2015-04-17 23:22:30 +02:00
Dimitri Fontaine	48f451bdbc	Implement the option to disable triggers when loading data. This option is dangerous and allows to skip ALL triggers when loading data against PostgreSQL. This includes foreign key constraints definitions and will allow loading data out of order. When using both the options "create no table" and "disable triggers" it will be possible to load data into a schema prepared by your favorite external tool, at the cost of not validating FK constraints. Use with care. Fix #167.	2015-02-19 15:05:10 +01:00
Dimitri Fontaine	ce5a61face	Catch PostgreSQL internal errors too, fixes #155 .	2015-01-21 13:01:28 +01:00
Dimitri Fontaine	302a7d402b	Refactor connection handling, and clean-up many things. That's the big refactoring patch I've been sitting on for too long. First, refactor connection handling to use a uniformed "connection" concept (class and generic functions API) everywhere, so that the COPY derived objects just use that in their :source-db and :target-db slots. Given that, we don't need no messing around with pgconn and myconn- and other special variables at all anywhere in the tree. Second, clean up some oddities accumulated over time, where some parts of the code didn't get the memo when new API got into place. Third, fix any other oddity or missing part found while doing those first two activities, it was long overdue anyway...	2014-12-26 21:50:29 +01:00
Dimitri Fontaine	3b2119cae4	Don't catch all and any errors for retrying batches. In particular a non existing table or column shouldn't be considered as an error we can just retry upon receiving.	2014-07-21 15:38:53 +02:00
Dimitri Fontaine	0cba5edacd	Handle the whole of PostgreSQL errors. In a recent test case about camelCasing column names, another kind of error did pop up and there's really no point not to handle it correctly.	2014-06-16 17:31:41 +02:00
Dimitri Fontaine	ae63c9b85c	Improve COPY CONTEXT message parsing, fix #67 . When adding the CONTEXT message parsing I totally forgot that PostgreSQL provides a nice error message translation capability. The code now copes better with the situation, using a more advanced regular expression. We could inline the known translations in the matching, but that would be tedious to maintain, so we just use loose matching rules here.	2014-05-28 17:23:08 +02:00
Dimitri Fontaine	e1bf53906d	Don't send over useless verbose log messages. When in :data logging mode we log the whole data set as we read then write it, which is quite a lot of data. Our current logging system works by filling up a queue that the cl-log lib is then fed from, and sending lots of data in that queue is way expensive, stop doing that. Hopefully we don't need to revisit the logs more than that, the other messages should be few enough not to count much when doing a full load.	2014-05-26 16:59:12 +02:00
Dimitri Fontaine	2637bb7e81	Avoid double logging the TRUNCATE call, that's scary.	2014-05-26 15:47:20 +02:00
Dimitri Fontaine	51b9618cf6	Fix a call to truncate-tables which didn't get the memo. In passing, have a default identifier-case of :downcase.	2014-05-26 10:59:35 +02:00
Dimitri Fontaine	a8b0f91f37	Allow optional control of batch memory footprint, see #16 and #22 . With the new internal setting copy-batch-size it's now possible to instruct pgloader to close batches early (before copy-batch-rows limit) when crossing the byte count threshold. When set to 20 MB it allows the new test case (exhausted) to pass under SBCL and CCL, and there's no measurable cost when copy-batch-size is set to nil (its default value) in the testing done. This patch is published without any way to tune the values from the command language yet, that's the next step once its been proven effective.	2014-01-26 23:22:18 +01:00
Dimitri Fontaine	ca0d25d3b2	Provide a new log level, :data, activated when both --debug and --verbose are used.	2014-01-26 17:49:20 +01:00
Dimitri Fontaine	db947e1467	Rework reader and writer data exchange. With this patch, the whole data massaging and final formating into the PostgreSQL COPY TEXT format is done by the reader thread, which publishes a batch at a time in the communication channel: a lparallel.queue object. Before that, the raw vectors where pushed directly in the queue, offering more flexibility to adjust to the reader and writer IO rates and capabilities, but impeding the ability of the Garbage Collector: data still in the queue was not collected even if not needed anymore. The new model also uses less memory, and allows a better control over what amount of data stays in memory. The new concurrent-batches parameter should be key to being able to process huge rows. The intent is to offering a way for the users to tune concurrent-batches down to 1 for sources with massive per-row memory footprint. Even better would be to find a way to automatically adjust the setting without spending too much time counting the bytes we're batching. Preliminary tests show no sensible impact on performances from this patch, even some improvements in cases.	2014-01-25 23:54:49 +01:00
Dimitri Fontaine	c50164e53d	Manage the whole class of "integrity errors" also when retrying a batch...	2014-01-24 15:10:03 +01:00
Dimitri Fontaine	4cbe4b3218	Manage the whole class of "integrity violation" errors.	2014-01-23 14:59:46 +01:00
Dimitri Fontaine	9455752805	Review and simplify batch retry processing.	2014-01-23 00:15:57 +01:00
Dimitri Fontaine	7c238f45f2	Fix batch retry handling, broken in previous refactoring. Fixes #22 .	2014-01-22 22:52:18 +01:00
Dimitri Fontaine	a5c661dd4a	Cleanup error recovery logging.	2014-01-22 17:57:56 +01:00
Dimitri Fontaine	6e4a3e2165	Fix parsing COPY error message without column information, see issue #22 .	2014-01-20 17:19:41 +01:00
Dimitri Fontaine	ef358c0b7d	Take benefits of PostgreSQL COPY error CONTEXT. This message has the line number where the erroneous data was found on the server, and given the pre-processing we already done at that point, it's easy to convert that number into an index into the current batch, an array. To do do, we need Postmodern to expose the CONTEXT error message and we need to parse it. The following pull request cares about the Postmodern side of things: https://github.com/marijnh/Postmodern/pull/46 The parsing is done as simply as possible, only assuming that the error message is using comma separators and having the line number in second position. The parsing as done here should still work with localized message strings. CONTEXT: COPY errors, line 3, column b: "2006-13-11" This change should significantly reduce the cost of error processing.	2013-12-25 21:43:22 +01:00
Dimitri Fontaine	fe302af221	Refactor the dbname API to feed from the connection string directly.	2013-12-20 17:24:02 +01:00
Dimitri Fontaine	3a5d618cc8	Switch to using vectors for representing rows, minimize consing.	2013-12-18 23:30:11 +01:00
Dimitri Fontaine	82c4bc9e9e	Switch the pgsql batch implementation to using arrays to reduce consing.	2013-12-18 18:31:06 +01:00
Dimitri Fontaine	d99b859c3f	Improve and fix COPY error handling, transactions, connections, and GUCs.	2013-11-13 23:54:41 +01:00
Dimitri Fontaine	fe75f3df54	Improve the set of options for loading from databases, assorted fixes.	2013-11-05 23:07:03 +01:00
Dimitri Fontaine	0a38195853	Refactoring the API with a real definition of it, and reorg the source tree.	2013-11-04 13:21:45 +01:00

27 Commits