Commit Graph

1568 Commits

Author SHA1 Message Date
Dimitri Fontaine
3f7853491f Refactor PostgreSQL error handling.
The code was too complex and the transaction / connection handling wasn't
good enough, too many reconnections when a ROLLBACK; is all we need to be
able to continue our processing.

Also fix some stats counters about errors handled, and improve error message
by adding PostgreSQL explicitely, and the name of the table where the error
comes from.
2017-07-04 01:41:08 +02:00
Dimitri Fontaine
3eab88b144 Add a new "drop indexes" option for databases.
This allows to use a combination of "data only, drop indexes" so that when
the target database already exists, pgloader will use the existing schema
and still DROP INDEX before loading the data and do the CREATE INDEX dance
in parallel and all at the end of it.

Also, as I couldn't reproduce neither #539 (which is good, it's supposed to
be fixed now) nor #550 (that was open due to a regression): fixes #550.
2017-07-04 00:15:58 +02:00
Dimitri Fontaine
fc01c7acc9 Fix DBF handling of "empty" date strings.
Blind code a fix for an error when parsing empty date strings in a DBF file.
The small amount of information is surprising, I can't quite figure out
which input string can produce " - - " with the previous coding of
db3-date-to-pgsql-date.

Anyway, it seems easy enough to add some checks to a very optimistic
function and return nil when our checks aren't met.

Fixes #589, hopefully.
2017-06-30 13:37:36 +02:00
Dimitri Fontaine
1e436555a8 Refactor PostgreSQL conditions.
Use a single deftype postgresql-unavailable rather than copy/pasting the
same list of conditions in several places.
2017-06-29 14:08:52 +02:00
Dimitri Fontaine
60c1146e18 Assorted fixes.
Refrain from killing the Common Lisp image when doing interactive regression
testing if we typo'ed the regression test file name...
2017-06-29 12:35:40 +02:00
Dimitri Fontaine
cea82a6aa8 Reconnect to PostgreSQL in case of connection lost.
It may happen that PostgreSQL is restarted while pgloader is running, or
that for some other reason we lose the connection to the server, and in most
cases we know how to gracefully reconnect and retry, so just do so.

Fixes #546 initial report.
2017-06-29 12:34:34 +02:00
Dimitri Fontaine
f0d1f4ef8c Fix reduce usage with max function.
The (reduce #'max ...) requires an initial value to be provided, as the max
function wants at least 1 argument, as we can see here:

CL-USER> (handler-case (reduce #'max nil) (condition (e) (format t "~a" e)))
Too few arguments in call to #<Compiled-function MAX #x300000113C2F>:
0 arguments provided, at least 1 required.
2017-06-28 16:37:27 +02:00
Dimitri Fontaine
17a63e18ed Review "main" error handling.
The "main" function only gets used at the command line, and errors where not
cleanly reported to the users. Mainly because I almost never get to play
with pgloader that way, prefering a load command file and the REPL
environment, but that's not even acceptable as an excuse.

Now the binary program should be able to exit cleanly in all situations. In
testing, it may happens on unexpected erroneous situations that we quit
before printing all the messages in the monitoring queue, but at least now
we quit cleanly and with a non-zero exit status.

Fix #583.
2017-06-28 16:36:08 +02:00
Dimitri Fontaine
0549e74f6d Implement multiple reader per table for MySQL.
Experiment with the idea of splitting the read work in several concurrent
threads, where each reader is reading portions of the target table, using a
WHERE id <= x and id > y clause in its SELECT query.

For this to kick-in a number of conditions needs to be met, as described in
the documentation. The main interest might not be faster queries to overall
fetch the same data set, but better concurrency with as many readers as
writters and each couple its own dedicated queue.
2017-06-28 16:23:18 +02:00
Dimitri Fontaine
6d66280fa5 Review parallelism and memory behavior.
The previous patch made format-vector-row allocate its memory in one go
rather than byte after byte with vector-push-extend. In this patch we review
our usage of batches and parallelism.

Now the reader pushes each row directly to the lparallel queue and writers
concurrently consume from it, cook batches in COPY format, and then send
that chunk of data down to PostgreSQL. When looking at runtime profiles, the
time spent writing in PostgreSQL is a fraction of the time spent reading
from MySQL, so we consider that the writing thread has enough time to do the
data mungling without slowing us down.

The most interesting factor here is the memory behavor of pgloader, which
seems more stable than before, and easier to cope with for SBCL's GC.

Note that batch concurrency is no more, replaced by prefetch rows: the
reader thread no longer build batches and the count of items in the reader
queue is now a number a rows, not of batches of them.

Anyway, with this patch in I can't reproduce the following issues:

Fixes #337, Fixes #420.
2017-06-27 23:10:33 +02:00
Dimitri Fontaine
7f737a5f55 Reduce memory allocation in format-vector-row.
This function is used on every bit of data we send down to PostgreSQL, so I
have good hopes of reducing its memory allocation having an impact on
loading times. In particular for sizeable data sets.
2017-06-27 15:31:49 +02:00
Dimitri Fontaine
46d6f339df Add a user friendly message about what's happening...
Still in the abnormal termination case. pgloader might get stuck and if the
user knows it's waiting for threads to complete, they might be less worried
about the situation and opportunity to kill pgloader...
2017-06-27 11:19:07 +02:00
Dimitri Fontaine
2341ef195d Review abnormal termination code path.
In case of an exceptional condition leading to termination of the pgloader
program we tried to use log-message after the monitor should have been
closed. Also the 0.3s delay to let latests messages out looks like a poor
design.

This patch attempts to remedy both the situation: refrain from using a
closed down monitoring thread, and properly wait until it's done before
returning to the shell.

See #583.
2017-06-27 10:45:54 +02:00
Dimitri Fontaine
352f4adc8d Implement support for MySQL SET parameters.
pgloader had support for PostgreSQL SET parameters (gucs) from the
beginning, and in the same vein it might be necessary to tweak MySQL
connection parameters, and allow pgloader users to control them.

See #337 and #420 where net_read_timeout and net_write_timeout might need to
be set in order to be able to complete the migration, due to high volumes of
data being processed.
2017-06-27 10:00:47 +02:00
Otheus
b5a593af14 Update INSTALL.md (#585)
Add instructions for redhat/centos7
2017-06-21 21:27:27 +02:00
Dimitri Fontaine
a222a82f66 Improve docs on pgloader.io.
In the SQLite and MySQL cases, expand on the simple case before detailing
the command language. With our solid defaults, most times a single command
line with the source and target connection strings are going to be all you
need.
2017-06-20 16:24:25 +02:00
Dimitri Fontaine
cae86015a0 Let's be more specific about the license.
Upon GitHub's suggestion, add a LICENSE file to make it clear we are using
The PostgreSQL License. Assign the copyright to The PostgreSQL Global
Development Group as it's done for PostgreSQL, as it seems to be the right
thing to do.
2017-06-17 19:21:33 +02:00
Dimitri Fontaine
e11ccf7bb7 Fix on-error-stop signaling.
To properly handle on-error-stop condition, make it a specific pgloader
condition with a specific handling behavior. In passing add some more log
messages for surprising conditions.

Fix #546.
2017-06-17 19:02:05 +02:00
Dimitri Fontaine
5faf8605ce Fix corner cases and how we log them.
In the prepare-pgsql-database method we were logging too much details, such
as DDL warnings on if-not-exists for successful queries. And those logs are
to be found in PostgreSQL server logs anyway.

Also fix trying to create or drop a "nil" schema.
2017-06-17 18:16:18 +02:00
Dimitri Fontaine
6c931975de Refrain from pushing too much logging trafic.
In this patch we hard-code some cases when we know the log message won't be
displayed anywhere so as to avoid sending it to the monitor thread. It
certainly is a modularity violation, but given the performance impact...
2017-06-17 18:12:33 +02:00
Dimitri Fontaine
422fab646a Typo-level fix.
Using ~s with extra quotes is quite disturbing at this place (logs only, but
still).
2017-06-17 17:23:44 +02:00
Dimitri Fontaine
7f55b21044 Improve support for http(s) resources.
The code used to take into account content-length HTTP header to load that
number of bytes in memory from the remote server. Not only it's better to
use a fixed size allocated-once buffer for that (now 4k), but also doing so
allows downloading content that you don't know the content-length of.

In passing tell the HTTP-URI parser rule that we also accept https:// as a
prefix, not just http://.

This allows running pgloader in such cases:

  $ pgloader https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite_AutoIncrementPKs.sqlite pgsql:///chinook

And it just works!
2017-06-17 16:48:15 +02:00
Dimitri Fontaine
b301aa9394 Review create-schemas default behavior.
Get back in line with what the documentation says, and also fix the case for
default MySQL migrations now that we target a PostgreSQL schema with the
same name as the MySQL database name.

Open question yet: should we also register the new schema on the search_path
by default?

  ALTER DATABASE ... SET search_path TO public, newschema, ...;

Is it more of a POLA violation to alter the search_path or to not do it?

Fix #582.
2017-06-16 09:01:18 +02:00
Peter Matseykanets
cd16faee8a brew insist on capital --HEAD
Reword our documentation to use the uppercase variant so that users may copy and paste and benefit.

Fix #581.
2017-06-16 08:43:54 +02:00
Dimitri Fontaine
b3cb7b256d Travis: let's actually use the new EXTRA_OPTS. 2017-06-14 21:32:41 +02:00
Dimitri Fontaine
c02defa5f0 Travis: explicitely pass down the CL variable.
It seems that the test/Makefile didn't get the memo.
2017-06-14 21:25:18 +02:00
Dimitri Fontaine
1469789ede Try to get more information from CCL in testing.
The “magic” options --batch and --heap-reserve will be processed by CCL
itself before pgloader gets to see them, so try that in the testing
environment.
2017-06-14 21:12:54 +02:00
Dimitri Fontaine
de9b43c332 Add support for the MS-SYBDATE datatype.
Fixes #568, thanks to a test case being provided!
2017-06-14 21:02:00 +02:00
Dimitri Fontaine
2c644d55f2 Add --batch to CCL run options.
This option provides lots of information when it crashes, and should help us
with understanding Travis and DockerHub errors with CCL.
2017-06-14 11:49:22 +02:00
Adrian Vondendriesch
90a33b4b4c MSSQL: Add ON UPDATE / DELETE support for fkeys (#580)
The former query to find foreign key constraints doesn't consider ON
UPDATE and ON DELETE rules.
2017-06-13 12:03:51 +02:00
Adrian Vondendriesch
d966d37579 MSSQL: Fix Default value translation for getdate() (#576) (#577)
Currently the default value getdate() is replaced by 'now' which creates
statements like:
  CREATE TABLE ... created timestamp 'now' ...
which leads to table definitions like:
  default '2017-06-12 17:54:04.890129'::timestamp without time zone

This is because 'now' is evaluated when creating the table.

This commit fixes the issue by using CURRENT_TIMESTAMP as default
instead of 'now'.
2017-06-12 21:42:35 +02:00
Chris Bandy
b59421cb48 Multiple Travis CI jobs (#575)
* Test with multiple versions of PostgreSQL in Travis

* Test with multiple implementations of Lisp in Travis
2017-06-11 13:25:48 +02:00
Chris Bandy
1a5194de1d Repair Travis CI builds (#574)
* Uninstall PostgreSQL during Travis setup

Recent build images start with PostgreSQL running. Uninstall it so we
can install and run a specific version.

* Skip authentication in Travis builds

The wrong combination of connection parameters will cause psql and other
utilities to prompt for a password, stalling the Travis build
indefinitely.

* Move PostgreSQL setup for Travis builds into script

* Use a newer version of SBCL in Travis builds

Recent versions of bordeaux-threads require ASDF >= 3.1
2017-06-10 12:06:40 +02:00
Adrian Vondendriesch
1f3659941e MSSQL: fix typmod conversion for "max" typemods (#573)
The previous commit makes it possible to convert typemods for various
text types. In MSSQL it's possible to create a column like varchar(max).
Internally this is reported as varchar(-1) which results in a CREATE
TABLE statement that contains e.g. varchar(-1).

This patch drops the typemod if it's -1 (max).

It's based on Dimitris patch slightly modified by myself.
2017-06-09 16:12:17 +02:00
Adrian Vondendriesch
d0c273a512 MSSQL: Add support for typemods of various text types (#572)
If a custom CAST rule is defined (e.g CAST type varchar to varchar) the
original typemod get's lost. This commit is based on Dimitris patch from
571 and adds typmod support for "char", "nchar", "varchar", "nvarchar"
and "binary".
2017-06-09 11:25:02 +02:00
Dimitri Fontaine
355aedfd72 Fix "drop default" casting rule.
The previous coding would discard any work done at the apply-casting-rules
step when adding source specific smarts about handling default, because of
what looks like negligence and bad tests. A test case scenario exists but
was not exercized :(

Fix that by defaulting the default value to the one given back at the
apply-casting-rules stage, where we apply the "drop default" clause.
2017-06-08 21:39:06 +02:00
Dimitri Fontaine
d74c9625a3 Clarify the code.
(unless (remove-if-not ...)) reads better as (when (notany ...)) so rewrite
the test that way. For reference and context see #563.
2017-06-08 13:39:38 +02:00
Dimitri Fontaine
776f8fcf6f Add support for SQLite implicit pkeys.
It turns out that in some cases SQLite will not list its primary key indexes
in the pragma index_list(), and that's related to an implementation detail
about using its internal btree structure when rowid can be exploited as the
integer pkey.

Reverse engineer that bahavior in pgloader so that the PostgreSQL schema
contains the primary keys even when they are implicit. In doing that we must
be careful that the pkeys might also appear explicitely, in which case we
refrain from declaring them twice.

SQLite catalogs are not my favorite, not to say hackish as hell.

Fixes #563.
2017-06-08 12:11:38 +02:00
Dimitri Fontaine
25e5ea9ac3 Refactor error handling in complete-pgsql-database.
Given new SQLite test case from issue #563 we see that pgloader doesn't
handle errors gracefully in post-copy stage. That's because the API were not
properly defined, we should use pgsql-execute-with-timing rather than other
construct here, because it allows the "on error resume next" behavior we
want with after load DDL statements.

See #563.
2017-06-08 12:09:11 +02:00
Dimitri Fontaine
9fb37bf513 Fix SQLite usage of sqlite_sequence catalog.
It turns out that this catalog table in SQLite may or may not exists
depending on whether the current schema actually uses the feature. So we
need to learn to query the catalogs about the catalogs before querying the
catalogs to learn about the current schema. Thanks SQLite.
2017-06-02 22:34:58 +02:00
Dimitri Fontaine
57a7353a94 Add a log entry before executing SQL blocks.
That seems helpful.
2017-06-02 22:34:25 +02:00
Dimitri Fontaine
c6b634caad Provide "on error stop" as a WITH option.
As seen in #546 it would be easier to be able to specify the option in the
load command directly rather than only at the command line. Here we go!
2017-06-01 16:43:09 +02:00
Dimitri Fontaine
45719645da Implement GumRoad payment for the Moral License.
Let's make it easy for interested parties to actually buy a license, and
offer several levels of “partnership” with the project. Be a Sponsor, a
Partner or even consider having pgloader custom development made for you.
Oracle™ support anyone?
2017-05-31 15:27:06 +02:00
Dimitri Fontaine
21208980fd Improve readme content about building from source.
As reported we still have some complex situations to deal with when buildind
from sources in some environments: windows and MacOXS macports are not in a
very good shape at the moment.

Add some good content in the README to better help people in those
environments get their build.

Fix #444.
2017-05-29 13:01:40 +02:00
Dimitri Fontaine
8f92cc5a7d Fix our SQL parser.
In order to support custom SQL files with several queries and psql like
advanced \i feature, we have our own internal SQL parser in pgloader. The
PostgreSQL variant of SQL is pretty complex and allows dollar-quoting and
other subtleties that we need to take care of.

Here we fix the case when we have a dollar sign ($) as part of a single
quoted text (such as a regexp), so both not a new dollar-quoting tag and a
part of a quoted text being read.

In passing we also fix reading double-quoted identifiers, even when they
contain a dollar sign. After all the following is totally supported by
PostgreSQL:

  create table dollar("$id" serial, foo text);
  select "$id", foo from dollar;

Fix #561.
2017-05-29 12:42:12 +02:00
Dimitri Fontaine
1f9a0b6391 Handle SQLite Sequences (auto_increment).
It turns out that the SQLite catalogs have a way of registering if a column
is an auto_increment. You have to look into the sqlite_sequence catalog for
the name of the table the column is found into, and when there's a non-zero
entry there and the table as a single pk-id column of integer type then it's
safe to assume that we have an "auto_increment" column.

The current way to handle this property is suboptimal in pgloader too where
we need to attach an "extra" slot to our column representation with a value
of the "auto_increment" string. But at least it's easy to hack the code
here... anyway.

Fix #556.
2017-05-21 18:43:10 +02:00
Dimitri Fontaine
3b4af49e22 Implement ALTER SCHEMA and ALTER TABLE for SQLite.
It turns out we forgot to add support for internal catalog munging clauses
to SQLite support. The catalogs being normalized means there's no extra work
here other than allowing the parser to accept those clauses and then pass
them over to our generic `copy-database' method implementation.

It is to be noted that SQLite has no support for schemas as per the standard
and PostgreSQL, so that when we inspect the database schema we create a nil
entry here. It's then not possible to ALTER SCHEMA nil RENAME TO 'target';
unfortunately, but it's easy enough to SET search_path to 'target' anyway,
as shown in the modified test case.

Fix #552.
2017-05-14 20:47:01 +02:00
Dimitri Fontaine
65cdc48c1e Fix error conditions for unrecognized source.
When the source is not recognized, pgloader would not always output a useful
error message with the hint about using --type, as per #248. Here we fix the
conditions applying to this error message.
2017-05-12 13:08:30 +02:00
Dimitri Fontaine
fe2e1ee956 Fix comments handling in IXF support.
There's no need to add empty string comments, guard against doing that.
2017-05-06 15:10:40 +02:00
Dimitri Fontaine
e7afe993fa Review database migration logic and options again.
In #539 it appears that we fail to honor some actions, and it seems to be
because of sloppy reasonning in how to make sense of all of them. It could
well be that we have too many options for them to make sense in all possible
combinations, but for now it looks safe to bet on an easy fix...
2017-05-06 14:45:05 +02:00