Commit Graph

17 Commits

Author SHA1 Message Date
Dimitri Fontaine
51b9618cf6 Fix a call to truncate-tables which didn't get the memo.
In passing, have a default identifier-case of :downcase.
2014-05-26 10:59:35 +02:00
Dimitri Fontaine
a8b0f91f37 Allow optional control of batch memory footprint, see #16 and #22.
With the new internal setting *copy-batch-size* it's now possible to
instruct pgloader to close batches early (before *copy-batch-rows* limit)
when crossing the byte count threshold.

When set to 20 MB it allows the new test case (exhausted) to pass under SBCL
and CCL, and there's no measurable cost when *copy-batch-size* is set to
nil (its default value) in the testing done.

This patch is published without any way to tune the values from the command
language yet, that's the next step once its been proven effective.
2014-01-26 23:22:18 +01:00
Dimitri Fontaine
ca0d25d3b2 Provide a new log level, :data, activated when both --debug and --verbose are used. 2014-01-26 17:49:20 +01:00
Dimitri Fontaine
db947e1467 Rework reader and writer data exchange.
With this patch, the whole data massaging and final formating into the
PostgreSQL COPY TEXT format is done by the reader thread, which publishes a
batch at a time in the communication channel: a lparallel.queue object.

Before that, the raw vectors where pushed directly in the queue, offering
more flexibility to adjust to the reader and writer IO rates and
capabilities, but impeding the ability of the Garbage Collector: data still
in the queue was not collected even if not needed anymore.

The new model also uses less memory, and allows a better control over what
amount of data stays in memory. The new *concurrent-batches* parameter
should be key to being able to process huge rows.

The intent is to offering a way for the users to tune *concurrent-batches*
down to 1 for sources with massive per-row memory footprint. Even better
would be to find a way to automatically adjust the setting without spending
too much time counting the bytes we're batching.

Preliminary tests show no sensible impact on performances from this patch,
even some improvements in cases.
2014-01-25 23:54:49 +01:00
Dimitri Fontaine
c50164e53d Manage the whole class of "integrity errors" also when retrying a batch... 2014-01-24 15:10:03 +01:00
Dimitri Fontaine
4cbe4b3218 Manage the whole class of "integrity violation" errors. 2014-01-23 14:59:46 +01:00
Dimitri Fontaine
9455752805 Review and simplify batch retry processing. 2014-01-23 00:15:57 +01:00
Dimitri Fontaine
7c238f45f2 Fix batch retry handling, broken in previous refactoring. Fixes #22. 2014-01-22 22:52:18 +01:00
Dimitri Fontaine
a5c661dd4a Cleanup error recovery logging. 2014-01-22 17:57:56 +01:00
Dimitri Fontaine
6e4a3e2165 Fix parsing COPY error message without column information, see issue #22. 2014-01-20 17:19:41 +01:00
Dimitri Fontaine
ef358c0b7d Take benefits of PostgreSQL COPY error CONTEXT.
This message has the line number where the erroneous data was found on the
server, and given the pre-processing we already done at that point, it's
easy to convert that number into an index into the current batch, an array.

To do do, we need Postmodern to expose the CONTEXT error message and we need
to parse it. The following pull request cares about the Postmodern side of
things:

  https://github.com/marijnh/Postmodern/pull/46

The parsing is done as simply as possible, only assuming that the error
message is using comma separators and having the line number in second
position. The parsing as done here should still work with localized message
strings.

  CONTEXT: COPY errors, line 3, column b: "2006-13-11"

This change should significantly reduce the cost of error processing.
2013-12-25 21:43:22 +01:00
Dimitri Fontaine
fe302af221 Refactor the dbname API to feed from the connection string directly. 2013-12-20 17:24:02 +01:00
Dimitri Fontaine
3a5d618cc8 Switch to using vectors for representing rows, minimize consing. 2013-12-18 23:30:11 +01:00
Dimitri Fontaine
82c4bc9e9e Switch the pgsql batch implementation to using arrays to reduce consing. 2013-12-18 18:31:06 +01:00
Dimitri Fontaine
d99b859c3f Improve and fix COPY error handling, transactions, connections, and GUCs. 2013-11-13 23:54:41 +01:00
Dimitri Fontaine
fe75f3df54 Improve the set of options for loading from databases, assorted fixes. 2013-11-05 23:07:03 +01:00
Dimitri Fontaine
0a38195853 Refactoring the API with a real definition of it, and reorg the source tree. 2013-11-04 13:21:45 +01:00