Commit Graph

63 Commits

Author SHA1 Message Date
Dimitri Fontaine
352f4adc8d Implement support for MySQL SET parameters.
pgloader had support for PostgreSQL SET parameters (gucs) from the
beginning, and in the same vein it might be necessary to tweak MySQL
connection parameters, and allow pgloader users to control them.

See #337 and #420 where net_read_timeout and net_write_timeout might need to
be set in order to be able to complete the migration, due to high volumes of
data being processed.
2017-06-27 10:00:47 +02:00
Dimitri Fontaine
e11ccf7bb7 Fix on-error-stop signaling.
To properly handle on-error-stop condition, make it a specific pgloader
condition with a specific handling behavior. In passing add some more log
messages for surprising conditions.

Fix #546.
2017-06-17 19:02:05 +02:00
Mikael Sand
3c4e64ed26 Fix spelling error in Windows-only code path (#545)
Fix spelling error for uiop:make-pathname* key by changing :direction to :directory
2017-04-30 19:53:53 +02:00
Dimitri Fontaine
37fc4ba550 Back to development mode. 2016-12-04 14:09:36 +01:00
Dimitri Fontaine
ac202dc70e Prepare release 3.3.2. 2016-12-03 17:38:52 +01:00
Elias Pipping
af46dc280f Use uiop:run-program ... :directory ... (#473) 2016-11-06 22:40:12 +01:00
Dimitri Fontaine
7070f82976 Back to development mode, not a release anymore.
The next version is going to be either 3.3.2 or 3.4.0.50 depending on
whether we have mainly bug fixes or new features.
2016-08-30 23:17:03 +02:00
Dimitri Fontaine
cb30891fbb Release pgloader v3.3.1. 2016-08-28 20:31:31 +02:00
Dimitri Fontaine
70572a2ea7 Implement support for existing target databases.
Also known as the ORM case, it happens that other tools are used to
create the target schema. In that case pgloader job is to fill in the
exiting target tables with the data from the source tables.

We still focus on load speed and pgloader will now DROP the
constraints (Primary Key, Unique, Foreign Keys) and indexes before
running the COPY statements, and re-install the schema it found in the
target database once the data load is done.

This behavior is activated when using the “create no tables” option as
in the following test-case setup:

  with create no tables, include drop, truncate

Fixes #400, for which I got a test-case to play with!
2016-08-06 20:19:15 +02:00
Dimitri Fontaine
1ed07057fd Implement --on-error-stop command line option.
The implementation uses the dynamic binding *on-error-stop* so it's also
available when pgloader is used as Common Lisp librairy.
The (not-all-that-) recent changes made to the error handling make that
implementation straightforward enough, so let's finally do it!

Fix #85.
2016-03-21 20:52:50 +01:00
Dimitri Fontaine
029ea0027a Upgrade version string.
We just tagged the repository as version 3.3.0.50 to be able to release
an experimental pgloader bundle, and we did tag the repository. The
first commit after that should then change the version string.
2016-01-31 21:49:43 +01:00
Dimitri Fontaine
d1a2e3f46b Improve the Dockerfile and the versioning.
When building from sources within the git environement, the version
number is ok, but it was wrong when building in the docker image. Fix
the version number to 3.3.0.50 to show that we're talking about a
development snapshot that is leading to version 3.3.1.

Yeah, 4 parts version numbers. That happens, apparently.
2016-01-07 10:21:52 +01:00
Dimitri Fontaine
b4bfa18877 Fix more table name quoting, fix #163 again.
Now that we can have several threads doing COPY, each of them need to
know about the *pgsql-reserved-keywords* list. Make sure that's the case
and in passing fix some call sites to apply-identifier-case.

Also, more disturbingly, fix the code so that TRUNCATE is called from
the main thread before giving control to the COPY threads, rather than
having two concurrent threads doing the TRUNCATE twice. It's rather
strange that we got no complaint from the field on that part...
2015-12-08 11:52:43 +01:00
Dimitri Fontaine
478d24f865 Fix root-dir initialization for ccl, see #303.
When using Clozure Common Lisp apparently a :absolute directory
component for make-pathname is supposed to contain a single path
component, fix by using parse-native-namestring instead.

In case it's needed, the following spelling seems portable enough:

  CL-USER> (uiop:merge-pathnames*
            (uiop:make-pathname* :directory '(:relative "pgloader"))
            (uiop:make-pathname* :directory '(:absolute "tmp")))
  #P"/tmp/pgloader/"
2015-10-24 22:22:24 +02:00
Dimitri Fontaine
96a33de084 Review the stats and reporting code organisation.
In order to later be able to have more worker threads sharing the
load (multiple readers and/or writers, maybe more specialized threads
too), have all the stats be managed centrally by a single thread. We
already have a "monitor" thread that get passed log messages so that the
output buffer is not subject to race conditions, extend its use to also
deal with statistics messages.

In the current code, we send a message each time we read a row. In some
future commits we should probably reduce the messaging here to something
like one message per batch in the common case.

Also, as a nice side effect of the code simplification and refactoring
this fixes #283 wherein the before/after sections of individual CSV
files within an ARCHIVE command where not counted in the reporting.
2015-10-05 01:46:29 +02:00
Dimitri Fontaine
04aa743eb7 Cleanup file based "connections".
When the notion of a connection class with a generic set of method was
invented, the very flexible specification formats available for the file
based sources where not integrated into the new connection system.

This patch provides a new connection class md-connection with a specific
sub-protocol (after opening a connection, the caller is supposed to loop
around open-next-stream) so that it's possible to both properly fit into
the connection concept and to better share the code in between our three
implementation (csv, copy, fixed).
2015-08-24 16:33:00 +02:00
Dimitri Fontaine
ea35eb575d Implement --dry-run option, fix #264.
The dry run option will currently only check database connections, but
as that happens after having correctly parsed the load file, it allows
to also check that the command file is correct for the parser.

Note that the list load-data API isn't subject to the dry-run method.

In passing, we add some more API entry points to the connection objects
and we should actually clean the code base to use the new QUERY generic
all over the place. It's for another patch tho.
2015-08-22 16:23:47 +02:00
Dimitri Fontaine
3af99051d2 Fix the preserve index names option.
MySQL names its primary keys "PRIMARY" and we need to always uniquify
this name even when the used asked pgloader to preserve index names.

Also, the create-indexes-again function now needs to ask for index names
to be preserved specifically.
2015-07-18 23:39:32 +02:00
Dimitri Fontaine
30511b4d4d Master's branch is now 3.2.git-hash. 2015-01-16 09:56:25 +01:00
Dimitri Fontaine
290916b0f0 Attempt at fixing --self-upgrade.
The option currently only works within the same build environment where
the image was first build, as noted in #133. This is an attempt at
convincing ASDF not to load systems that pgloader depends on in order to
be able to load only the new pgloader definition.

While it looks sound in principle, I failed to have it work in the lab.
Given that previous to this patch nothing works at all, it's not a
regression, let's push it as is makes the code saner.

Also, it looks like asdf::*immutable-systems* is what we want here, but
that's asdf 3.1.x and we're not there yet.
2015-01-14 20:54:11 +01:00
Dimitri Fontaine
302a7d402b Refactor connection handling, and clean-up many things.
That's the big refactoring patch I've been sitting on for too long.

First, refactor connection handling to use a uniformed "connection"
concept (class and generic functions API) everywhere, so that the COPY
derived objects just use that in their :source-db and :target-db slots.

Given that, we don't need no messing around with *pgconn* and *myconn-*
and other special variables at all anywhere in the tree.

Second, clean up some oddities accumulated over time, where some parts
of the code didn't get the memo when new API got into place.

Third, fix any other oddity or missing part found while doing those
first two activities, it was long overdue anyway...
2014-12-26 21:50:29 +01:00
Dimitri Fontaine
65c2043694 Improve pgloader usage from the command line.
Make it so that the following command line usages are accepted when
using pgloader without a command file:

 ./build/bin/pgloader ./test/sqlite/sqlite.db postgresql:///pgloader

 ./build/bin/pgloader --set "search_path='sakila'"  \
                      mysql://root@localhost/sakila \
                 postgresql:///sakila

 ./build/bin/pgloader --type csv                             \
                      --field id --field field               \
                      --with truncate                        \
                      --with "fields terminated by ','"      \
                      ./test/data/matching-1.csv             \
                      postgres:///pgloader?matching

It's now possible in most cases to just use command-line options, which
should make the entry bar to pgloader much lower.
2014-12-23 02:40:13 +01:00
Dimitri Fontaine
47c22776f2 Cleanup and fix yesterday's refactoring of pgconn parameters. 2014-12-17 11:35:10 +01:00
Dimitri Fontaine
073f012d1a Add support for SSL modes in the PG connection string, fix #137.
In passing, refactor the *pgconn- dynamic bindings in favor of directly
using the connection property list straight from the connection string
parser, processing it when necessary. That allows to make it simple to
add an internal :use-ssl property.
2014-12-16 18:45:43 +01:00
Dimitri Fontaine
5ab0831d4e Allow building pgloader even when the git command fails, fix #125. 2014-11-22 00:32:29 +01:00
Dimitri Fontaine
5b87b1a85e Refactor *identifier-case* option into a dynamic binding.
That makes it much easier to use from about anywhere in the code, which
is what is needed. In passing, fix #129.
2014-11-21 23:32:02 +01:00
Dimitri Fontaine
5bc5a52582 Some more initial steps towards MS SQL support. 2014-11-09 22:30:02 +01:00
Dimitri Fontaine
ed853a7bea Allow pgloader to work on windows. 2014-11-06 22:12:20 +01:00
Dimitri Fontaine
a5e9cb172c Follow-up to the merge, we're not a release branch. 2014-11-05 16:41:19 +01:00
Dimitri Fontaine
8f1fc2a7a6 Release version 3.1.1. 2014-11-05 16:20:14 +01:00
Dimitri Fontaine
c4141c464d Now that 3.1.0 is released, we're preparing 3.1.1 here. 2014-09-10 23:03:42 +02:00
Dimitri Fontaine
0a3b4af290 Release pgloader 3.1.0! 2014-09-10 23:02:27 +02:00
Dimitri Fontaine
de4ff30acc Implement --summary to copy the output to a file, fix #68.
Given than redirecting a tty such as *terminal-io* isn't easy enough,
let's provide a way to copy the summary output to a file. Another way to
solve it would have been to output the summary to the main logs, but
that could have made the logs parsing more difficult that necessary.

Let's see how users like it...
2014-06-14 23:31:11 +02:00
Dimitri Fontaine
b4dac6b684 Fix archive filename matching, recent regression.
The census test didn't pass anymore because I broke the archive filename
matching in b17383fa90, where the special
variable *csv-path-root* stoped being authoritative in the archive case.

To fix, initialize that variable to nil and give its value priority as
soon as it's non-nil, such as the archive case.
2014-06-03 10:33:43 +02:00
Dimitri Fontaine
f34017d023 Improve version strings.
Have the abbreviated git hash appear in the version string when not
using a released version of pgloader.
2014-05-03 15:21:32 +02:00
Dimitri Fontaine
a5a29407f0 Release pgloader version 3.0.99. 2014-04-29 13:59:33 +02:00
Dimitri Fontaine
3a9bc9db0f Switch the default memory watch to on. 2014-04-22 17:13:36 +02:00
Dimitri Fontaine
a8b0f91f37 Allow optional control of batch memory footprint, see #16 and #22.
With the new internal setting *copy-batch-size* it's now possible to
instruct pgloader to close batches early (before *copy-batch-rows* limit)
when crossing the byte count threshold.

When set to 20 MB it allows the new test case (exhausted) to pass under SBCL
and CCL, and there's no measurable cost when *copy-batch-size* is set to
nil (its default value) in the testing done.

This patch is published without any way to tune the values from the command
language yet, that's the next step once its been proven effective.
2014-01-26 23:22:18 +01:00
Dimitri Fontaine
db947e1467 Rework reader and writer data exchange.
With this patch, the whole data massaging and final formating into the
PostgreSQL COPY TEXT format is done by the reader thread, which publishes a
batch at a time in the communication channel: a lparallel.queue object.

Before that, the raw vectors where pushed directly in the queue, offering
more flexibility to adjust to the reader and writer IO rates and
capabilities, but impeding the ability of the Garbage Collector: data still
in the queue was not collected even if not needed anymore.

The new model also uses less memory, and allows a better control over what
amount of data stays in memory. The new *concurrent-batches* parameter
should be key to being able to process huge rows.

The intent is to offering a way for the users to tune *concurrent-batches*
down to 1 for sources with massive per-row memory footprint. Even better
would be to find a way to automatically adjust the setting without spending
too much time counting the bytes we're batching.

Preliminary tests show no sensible impact on performances from this patch,
even some improvements in cases.
2014-01-25 23:54:49 +01:00
Dimitri Fontaine
e8fcb15c27 Fix another hasty commit erroneously containing a for-tests change. 2014-01-23 23:29:27 +01:00
Dimitri Fontaine
b374d4bc8b The current retry method has no need for *copy-batch-split*. 2014-01-23 23:28:25 +01:00
Dimitri Fontaine
59e87b84a0 Release Candidate 8. 2014-01-23 00:26:08 +01:00
Dimitri Fontaine
80b6c46aae Version 3.0.97. 2014-01-15 22:53:43 +01:00
Dimitri Fontaine
d5890f7779 Release 3.0.96, debian packaging included. 2013-12-27 12:22:03 +01:00
Dimitri Fontaine
e31c5cbe83 Set the default batch size (in rows) back to 25000. 2013-12-23 13:17:04 +01:00
Dimitri Fontaine
fe302af221 Refactor the dbname API to feed from the connection string directly. 2013-12-20 17:24:02 +01:00
Dimitri Fontaine
0a6608061c Due to the bug fixed and the new batch implementation, version is now 3.0.95. 2013-12-18 18:56:12 +01:00
Dimitri Fontaine
82c4bc9e9e Switch the pgsql batch implementation to using arrays to reduce consing. 2013-12-18 18:31:06 +01:00
Dimitri Fontaine
f58b5960cd Prepare a debian package, and make it pgloader 3.0.94. 2013-12-09 13:04:29 +01:00
Dimitri Fontaine
157587476b Call it version 3.0.93 now that we did fix many bugs. 2013-11-26 16:59:44 +01:00