Commit Graph

1288 Commits

Author SHA1 Message Date
Dimitri Fontaine
d4737a39ca Leave ssl lib alone in src/hooks.lisp.
That means we no longer eagerly load it when we think we will need it,
and also refrain from unloading it from the binary at image saving time.

In my local tests, doing so fix #330 by avoiding the error entirely in
the docker image, where obviously the libs found at build-time are found
again at the same place at run time.
2016-03-05 22:45:59 +01:00
Dimitri Fontaine
68aa205db5 Also commit SQLite test case changes.
See #351 for context, this adds a proper test case.
2016-03-03 14:59:57 +01:00
Dimitri Fontaine
486be8c068 SQLite integer default values might be quoted.
Fix #351 by having a new transformation function to process SQLite
integers, that may be quoted...
2016-03-03 14:59:27 +01:00
Dimitri Fontaine
62edd5a2c8 Register "nocase" as a SQLite noise word.
SQLite types include "text nocase" apparently, so add "nocase" as one of
the managed noise words. It might be time we handle those the other way
round, with a whitelist of expected tokens somewhere in the type
definition rather than a blacklist of unknown words to exclude...

Anyway, fix #350.
2016-03-03 00:21:43 +01:00
Dimitri Fontaine
b026a860c1 Fix MS SQL fetch metadata function.
It should return the fetched catalog rather than the count of objects,
which is only used for statistics purposes. Fix #349.

This problem once again shows that we lack proper testing environment
for MS SQL source :/
2016-03-02 16:20:55 +01:00
Dimitri Fontaine
eaa5807244 Adapt to CURRENT_TIMESTAMP(x) default values.
We target CURRENT_TIMESTAMP as the PostgreSQL default value for columns
when it was different before on the grounds that the type casting in
PostgreSQL is doing the job, as in the following example:

    pgloader# create table test_ts(ts timestamptz(6) not null default CURRENT_TIMESTAMP);
    CREATE TABLE
    pgloader# insert into test_ts VALUES(DEFAULT);
    INSERT 0 1
    pgloader# table test_ts;
                  ts
    -------------------------------
     2016-02-24 18:32:22.820477+01
    (1 row)

    pgloader# drop table test_ts;
    DROP TABLE
    pgloader# create table test_ts(ts timestamptz(0) not null default CURRENT_TIMESTAMP);
    CREATE TABLE
    pgloader# insert into test_ts VALUES(DEFAULT);
    INSERT 0 1
    pgloader# table test_ts;
               ts
    ------------------------
     2016-02-24 18:32:44+01
    (1 row)

Fix #341.
2016-02-24 18:30:16 +01:00
Dimitri Fontaine
40c1581794 Review transaction and error handling in COPY.
The PostgreSQL COPY protocol requires an explicit initialization phase
that may fail, and in this case the Postmodern driver transaction is
already dead, so there's no way we can even send ABORT to it.

Review the error handling of our copy-batch function to cope with that
fact, and add some logging of non-retryable errors we may have.

Also improve the thread error reporting when using a binary image from
where it might be difficult to open an interactive debugger, while still
having the full blown Common Lisp debugging experience for the project
developers.

Add a test case for a missing column as in issue #339.

Fix #339, see #337.
2016-02-21 15:56:06 +01:00
Dimitri Fontaine
9512ab187e Fix the fix, see #343.
Someday I should either stop working on pgloader in between other things
or have a better test suite, including MS SQL and all. Probably both.
And read compiler notes and warnings too, while at that...
2016-02-20 14:15:13 +01:00
Dimitri Fontaine
197258951c Improve MS SQL usage of the schema structs.
The function qualify-name is not in use anymore, but the MSSQL parts
didn't get the memo... fix #343.
2016-02-19 17:55:54 +01:00
Dimitri Fontaine
765bbb70aa Fix auto_increment support in cast rules.
This fixes #141 again when users are forcing MySQL bigint(20) into
PostgreSQL bigint types so that foreign keys can be installed. To this
effect, as cast rule such as the following is needing:

   cast type bigint when (= 20 precision) to bigint drop typemod

Before this patch, this user provided cast rule would also match against
MySQL types "with extra auto_increment", and it should not.

If you're having the problem that this patch fixes on an older pgloader
that you can't or won't upgrade, consider the following user provided
set of cast rules to achieve the same effect:

   cast type bigint with extra auto_increment to bigserial drop typemod,
        type bigint when (= 20 precision) to bigint drop typemod
2016-02-05 21:26:31 +01:00
Dimitri Fontaine
c108b85290 Allow package prefix in CAST ... USING clause.
Also, in passing, ass a new transformation function for MySQL allowing
to transform from varbinary to text.
2016-02-04 16:09:22 +01:00
Dimitri Fontaine
782561fd4e Handle default value transforms errors, fix #333.
It turns out that MySQL catalog always store default value as strings
even when the column itself is of type bytea. In some cases, it's then
impossible to transform the expected bytea from a string.

In passing, move some code around to fix dependencies and make it
possible to issue log warnings from the default value printing code.
2016-02-03 12:27:58 +01:00
Dimitri Fontaine
e7771ff3d8 Remove platform specific tar options. 2016-02-02 15:28:00 +01:00
Dimitri Fontaine
029ea0027a Upgrade version string.
We just tagged the repository as version 3.3.0.50 to be able to release
an experimental pgloader bundle, and we did tag the repository. The
first commit after that should then change the version string.
2016-01-31 21:49:43 +01:00
Dimitri Fontaine
1280ae0b8c Add a bundle distribution.
Using Quicklisp bundle facility it is possible to prepare a
self-contained archive of all the code needed to build pgloader.

Doing that should allow users to easily build pgloader when they are
being a restrictive proxy, and packagers to work from a source tarball
that has a very limited build dependencies.
2016-01-31 21:47:14 +01:00
Dimitri Fontaine
76668c2626 Review package dependencies.
The decision to use lots of different packages in pgloader has quite
strong downsides at times, and the manual managment of dependencies is
one of the, in particular how to avoid circular ones.
2016-01-31 18:42:01 +01:00
Dimitri Fontaine
64ab4d28dc Error out when using ignored options.
In the theory that it's a better service to the user to refuse doing
anything at all rather than ignore his/her commands, print out FATAL
errors when options are used that are incompatible with a load command
file.

See #327 for a case where this did happen.

In passing, tweak our report code to avoid printing the footer when we
didn't print anything at all previously.
2016-01-25 11:46:36 +01:00
Dimitri Fontaine
4e36bd3c55 Improve threads error handling.
See #328 where we are lacking useful stack trace in a --debug run
because of the previous talk-handler-bind coding, that was there to
avoid sinking the users into too many details. Let's try another
approach here.
2016-01-24 21:43:46 +01:00
Dimitri Fontaine
b2ec66c84b Force external-format of the logs files, see #328.
In the issue #328 the --debug level output is not helpful because of an
encoding error in the logfile. Let's see about forcing the log file
external format to utf-8 then.
2016-01-20 21:53:13 +01:00
Dimitri Fontaine
4c84954a0d Merge pull request #329 from maksimf/patch-1
Fix typo in documentation
2016-01-20 21:38:23 +01:00
Maxim Filippov
6d02591e9c Fix typo in documentation 2016-01-20 12:41:50 +03:00
Dimitri Fontaine
327745110a MySQL bytea default value can be "". Fix 291.
Thanks to a reproducable test case we can see that MySQL default for a
varbinary column is an empty string, so tweak the transform function
byte-vector-to-bytea in order to cope with that.
2016-01-18 21:55:01 +01:00
Dimitri Fontaine
d9d9e06c0f Another attempt at fixing #323.
Rather than trying hard to have PostgreSQL fully qualify the index name
with tricks around search_path setting at the time ::regclass is
executed, simply join on pg_namespace to retrieve that schema in a new
slot in our pgsql-index structure so that we can then reuse it when
needed.

Also add a test case for the scenario, including both a UNIQUE
constraint and a classic index, because the DROP and CREATE/ALTER
instructions differ.
2016-01-17 01:54:36 +01:00
Dimitri Fontaine
7dd69a11e1 Implement concurrency and workers for files sources.
More than the syntax and API tweaks, this patch also make it so that a
multi-file specification (using e.g. ALL FILENAMES IN DIRECTORY) can be
loaded with several files in the group in parallel.

To that effect, tweak again the md-connection and md-copy
implementations.
2016-01-16 22:53:55 +01:00
Dimitri Fontaine
aa8b756315 Fix when to create indexes.
In the recent refactoring and improvements of parallelism the indexes
creation would kick in before we know that the data is done being copied
over to the target table.

Fix that by maintaining a writers-count hashtable and only starting to
create indexes when that count reaches zero, meaning all the concurrent
tasks started to handle the COPY of the data are now done.
2016-01-16 19:50:21 +01:00
Dimitri Fontaine
dcc8eb6d61 Review api around worker-count.
It was worker-count and it's now exposed as the worker in the WITH
clause, but we can actually keep it as worker-count in the internal API,
and it feels better that way.
2016-01-16 19:49:52 +01:00
Dimitri Fontaine
eb45bf0338 Expose concurrency settings to the end users.
Add the workers and concurrency settings to the LOAD commands for
database sources so that users can tweak them now, and add mentions of
them in the documentation too.

From the documentation string of the copy-from method as found in
src/sources/common/methods.lisp:

   We allow WORKER-COUNT simultaneous workers to be active at the same time
   in the context of this COPY object. A single unit of work consist of
   several kinds of workers:

     - a reader getting raw data from the COPY source with `map-rows',
     - N transformers preparing raw data for PostgreSQL COPY protocol,
     - N writers sending the data down to PostgreSQL.

   The N here is setup to the CONCURRENCY parameter: with a CONCURRENCY of
   2, we start (+ 1 2 2) = 5 concurrent tasks, with a CONCURRENCY of 4 we
   start (+ 1 4 4) = 9 concurrent tasks, of which only WORKER-COUNT may be
   active simultaneously.

Those options should find their way in the remaining sources, that's for
a follow-up patch tho.
2016-01-15 23:22:32 +01:00
Dimitri Fontaine
fb40a472ab Simplify database WITH option handling.
Share more code by having a common flattening function as a semantic
predicate in the grammar.
2016-01-15 22:34:27 +01:00
Dimitri Fontaine
bfdbb2145b Fix with drop index option, fix #323.
Have PostgreSQL always fully qualify the index related objects and SQL
definition statements when fetching the list of indexes of a table, by
playing with an empty search_path.

Also improve the whole index creation by passing the table object as the
context where to derive the table-name from, so that schema qualified
tables are taken into account properly.
2016-01-15 15:04:07 +01:00
Dimitri Fontaine
1ff204c172 Typo fix. 2016-01-15 14:45:19 +01:00
Dimitri Fontaine
44a2bd14d4 Fix custom CAST rules with expressions, fix #322.
In a previous commit the typemod matching code had been broken, and we
failed to notice that until now. Thanks to bug report #322 we just got
the memo...

Add a test case in the local-only MySQL database.

The regression testing facilities should be improved to be able to test
a full database, and then to dynamically create said database from code
or something to ease test coverage of those cases.
2016-01-12 14:55:17 +01:00
Dimitri Fontaine
2c200f5747 Improve error handling for pkeys creation.
When creating the primary keys on top of the unique indexes, we might
still have errors (e.g. with NULL values). Make it so that a failure in
one pkey doesn't fail every other one, by having them all run within a
single connection rather than a single transaction.
2016-01-12 14:53:42 +01:00
Dimitri Fontaine
133028f58d Desultory review code indentation. 2016-01-12 14:52:44 +01:00
Dimitri Fontaine
ee69b8d4ce Randomly tweak batch sizes.
In order to avoid all concurrently prepared batches of rows to get sent
to PostgreSQL COPY command at the same time exactly, randomly vary the
size of each batch between -30% and +30% of the batch rows parameter.
2016-01-11 21:29:29 +01:00
Dimitri Fontaine
f256e12a4f Review load parallelism settings.
pgloader parallel workload is still hardcoded, but at least the code now
uses clear parameters as input so that it will be possible in a later
patch to expose them to the end-user.

The notions of workers and concurrency are now handled as follows:

  - concurrency is how many tasks are allowed to happen at once, by
    default we have a reader thread, a transformer thread and a COPY
    thread all actives for each table being loaded,

  - worker-count is how many parallel threads are allowed to run
    simultaneously and default to 8 currently, which means that in a
    typical migration from a database source and given default
    concurrency or 1 (3 threads), we might be loaded up to 3 different
    tables at any time.

The idea is to expose those settings to the user in the load file and as
command line options (such as --jobs) and see what it gives us. It might
help e.g. use more cores in loading a single CSV file.

As of this patch, there still can only be only one reader thread and the
number of transformer threads must be the same as the number of COPY
threads.

Finally, the CSV-like files user-defined projections are now handled in
the tranformation threads rather than in the reader thread...
2016-01-11 01:43:38 +01:00
Dimitri Fontaine
94ef8674ec Typo fix (of sorts)
Some API didn't get the table-name to table memo...
2016-01-11 01:42:18 +01:00
Dimitri Fontaine
a3fd22acd3 Review pgloader encoding story.
Thanks to Common Lisp character data type, it's easy for pgloader to
enforce always speaking to PostgreSQL in utf-8, and that's what has been
done from the beginning actually.

Now, without good reason for that, the first example of a SET clause
that has been added to the docs where about how to set client_encoding,
which should NOT be done.

Fix that at the use level by removing the bad example from the docs and
adding a WARNING whenever the client_encoding is set to a known bad
value. It's a WARNING because we then simply force 'utf-8' anyway.

Also, review completely the format-vector-row function to avoid doing
double work with the Postmodern facilities we piggyback on. This was
done halfway through and the utf-8 conversion was actually done twice.
2016-01-11 01:27:36 +01:00
Dimitri Fontaine
cf73a0e6c0 Merge pull request #318 from richardkmichael/detect-sbcl-core-compression
Detect sbcl core compression and Makefile gardening.
2016-01-10 17:53:48 +01:00
Richard Michael
6dcdf4711b Easier install by detecting SBCL core-compression.
Various Linux distributions provide SBCL without core-compression
enabled. On the other hand, Mac OSX (at least via `homebrew`) SBCL with
core-compression enabled.  To make installation easier, teach the make
process to detect core-compression, and use it if possible.
2016-01-09 22:17:02 -05:00
Dimitri Fontaine
d60b64c03b Implement MS SQL newsequentialid() default value.
We convert the default value call to newsequentialid() into a call to
the PostgreSQL uuid-ossp uuid_generate_v1() which seems like the
equivalent function.

The extension "uuid-ossp" needs to be installed in the target database.

(Blind) Fix #246.
2016-01-08 22:43:38 +01:00
Dimitri Fontaine
8a596ca933 Move connection into utils.
There's no reason why this file should be in the src/ top-level.
2016-01-07 16:42:43 +01:00
Dimitri Fontaine
d1a2e3f46b Improve the Dockerfile and the versioning.
When building from sources within the git environement, the version
number is ok, but it was wrong when building in the docker image. Fix
the version number to 3.3.0.50 to show that we're talking about a
development snapshot that is leading to version 3.3.1.

Yeah, 4 parts version numbers. That happens, apparently.
2016-01-07 10:21:52 +01:00
Dimitri Fontaine
ee2a68f924 Improve Dockerfile.
It was quite idiotic to RUN a git clone rather than just use the files
from the docker context...
2016-01-05 11:28:19 +01:00
Dimitri Fontaine
286a39f6e6 Proof read of the README.md file.
Some advice was pretty ancient, and we should now mention debian
packaging support and the docker hub image.
2016-01-04 23:22:52 +01:00
Dimitri Fontaine
f8cb7601c5 Implement a Dockerfile.
Apparently it's quite common nowadays for people to use docker to build
and run software in a contained way, so provide users with the facility
they need in order to do that.
2016-01-04 21:05:46 +01:00
Dimitri Fontaine
1bbbf96ba7 Fix minor API glitch/typo. 2016-01-04 21:01:15 +01:00
Dimitri Fontaine
a7291e9b4b Simplify copy-database implementation further.
Following-up to the recent refactoring effort, the IXF and DB3 source
classes didn't get the memo that they could piggyback on the generic
copy-database implementation. This patch implements that.

In passing, also simplify the instanciate-table-copy-object method for
copy subclasses that need specialization here, by using change-class and
call-next-method so as to reuse the generic code as much as possible.
2016-01-01 14:28:09 +01:00
Dimitri Fontaine
24cd0de9f7 Install the :create-schemas option back.
In the previous refactoring patch that option mistakenly went away,
although it is still needed for MS SQL and it is planned to make use of
it in the other source types too...

See #316 for reference.
2016-01-01 13:35:35 +01:00
Dimitri Fontaine
9e4938cea4 Implement PostgreSQL catalogs data structure.
In order to share more code in between the different source types,
finally have a go at the quite horrible mess of anonymous data
structures floating around.

Having a catalog and schema instances not only allows for code cleanup,
but will also allow to implement some bug fixes and wishlist items such
as mapping tables from a schema to another one.

Also, supporting database sources having a notion of "schema" (in
between "catalog" and "table") should get easier, including getting
on-par with MySQL in the MS SQL support (materialized views has been
asked for already).

See #320, #316, #224 for references and a notion of progress being made.

In passing, also clean up the copy-databases methods for database source
types, so that they all use a fetch-metadata generic function and a
prepare-pgsql-database and a complete-pgsql-database generic function.
Actually, a single method does the job here.

The responsibility of introspecting the source to populate the internal
catalog/schema representation is now held by the fetch-metadata generic
function, which in turn will call the specialized versions of
list-all-columns and friends implementations. Once the catalog has been
fetched, an explicit CAST call is then needed before we can continue.

Finally, the fields/columns/transforms slots in the copy objects are
still being used by the operative code, so the internal catalog
representation is only used up to starting the data copy step, where the
copy class instances are then all that's used.

This might be refactored again in a follow-up patch.
2015-12-30 21:53:01 +01:00
Dimitri Fontaine
d84ec3f808 Add SQLite test case for before/after load commands.
See bug #321, this change should have been part of previous commit.
2015-12-23 21:58:56 +01:00