Commit Graph

1488 Commits

Author SHA1 Message Date
Dimitri Fontaine
6c80404249 Implement support for Redshift "identity" columns.
At this stage we don't even parse the details of the Redshift identity such
as the seed and step values and consider them the same as a MySQL
auto_increment extra description field.

Fixes #860 (again).
2018-11-09 22:41:14 +01:00
Dimitri Fontaine
794bc7fc64 Improve redshift support: string_agg() doesn't exist there.
Neither does array_agg(), unnest() and other very useful PostgreSQL
functions. Redshift is from 8.0 times, so do things the old way: parse the
output of the index definition that get from calling pg_index_def().

For that, this patch introduces the notion of SQL support that depends on
PostgreSQL major version. If no major-version specific query is found in the
pgloader source tree, then we use the generic one.

Fixes #860.
2018-11-07 21:23:56 +01:00
Dimitri Fontaine
207cd82726 Improve SQLite type names parsing.
Allow spaces in more random places, as SQLite doesn't seem to normalize the
user input. Fixes #548 again.
2018-11-07 11:01:06 +01:00
Dimitri Fontaine
f8460c1705 Allow usernames and dbnames starting with digits (again).
It turns out that the rules about the names of users and databases are more
lax than pgloader would know, so it might be a good move for our DSN parsing
to accept more values and then let the source/target systems to complain
when something goes wrong.

See #230 which got broke again somewhere.
2018-10-20 19:28:19 +02:00
Jason Rigby
6e7ea90806 add cl-ironclad and cl-babel dependencies to docker builds (#854) 2018-10-18 18:56:40 +02:00
Larry Gebhardt
0e6f599282 Add Docker build instructions (#853) 2018-10-18 18:55:56 +02:00
Dimitri Fontaine
7b487ddaca Add a Citus distribution test case, from the citus tutorial. 2018-10-18 15:42:17 +02:00
Dimitri Fontaine
d3b21ac54d Implement automatic discovery of the Citus distribution rules.
With this patch, the following distribution rule

   distribute companies using id

is equivalent to the following distribution rule set, given foreign keys in
the source schema:

   distribute companies using id
   distribute campaigns using company_id
   distribute ads using company_id from campaigns
   distribute clicks using company_id from ads, campaigns
   distribute impressions using company_id from ads, campaigns

In the current code (of this patch) pgloader walks the foreign-keys
dependency tree and knows how to automatically derive distribution rules
from a single rule and the foreign keys.
2018-10-18 15:31:29 +02:00
Dimitri Fontaine
8112a9b54f Improve Citus Distribution Support.
With this patch it's now actually possible to backfill the data on the fly
when using the "distribute" new commands. The schema is modified to add the
distribution key where specified, and changes to the primary and foreign
keys happen automatically. Then a JOIN is generated to get the data directly
during the COPY streaming to the Citus cluster.
2018-10-16 18:53:41 +02:00
Dimitri Fontaine
760763be4b Use the constraint name when we have it.
That's important for Citus, which doesn't know how to ADD a constraint
without a name.
2018-10-10 15:44:21 -07:00
Dimitri Fontaine
381ac9d1a2 Add initial support for Citus distribution from pgloader.
The idea is for pgloader to tweak the schema from a description of the
sharding model, the distribute clause. Here's an example of such a clause:

   distribute company using id
   distribute campaign using company_id
   distribute ads using company_id from campaign
   distribute clicks using company_id from ads, campaign

Given such commands, pgloader adds the distibution key to the table when
needed, to the primary key definition of the table, and also to the foreign
keys that are pointing to the changed primary key.

Then when SELECTing the data from the source database, the idea is for
pgloader to automatically JOIN the base table with the source table where to
find the distribution key, in case it was just added in the schema.

Finally, pgloader also calls the following Citus commands:

  SELECT create_distributed_table('company', 'id');
  SELECT create_distributed_table('campaign', 'company_id');
  SELECT create_distributed_table('ads', 'company_id');
  SELECT create_distributed_table('clicks', 'company_id');
2018-10-10 14:35:12 -07:00
Dimitri Fontaine
344d0ca61b Implement AFTER SCHEMA sql code blocks.
This allows pgloader users to run SQL commands in between pgloader's schema
creation and the actual loading of the data.
2018-10-10 11:08:28 -07:00
Jon Snell
0957bd0efa Fix pgloader bug #844 by adding support for mssql real types (#845) 2018-10-05 12:47:54 +02:00
Dimitri Fontaine
d356bd501b Accept even more ragged date format input.
When parsing a date string from a date format, accept that the ms or us part
be completely missing, rather than just missing some digits.

Fixed #828.
2018-09-10 19:37:36 +02:00
Dimitri Fontaine
5119d864f4 Assorted bug fixes in the context of Redshift support as a source.
The catalog queries used in pgloader have to be adjusted for Redshift
because this thing forked PostgreSQL 8.0, which is a long time ago now.
Also, we had a couple bugs here and there that were not really related to
Redshift support but were shown in that context.

Fixes #813.
2018-09-04 11:49:21 +02:00
Dimitri Fontaine
0f58a3c84d Assorted fixes: catalogs SQLtypes and MySQL decoding as.
It turns out that when trying to debug "decoding as" the SQLtype listing
support in sqltype-list was found broken, so this patch fixes it. Then goes
on to fix the DECODING AS filters support, which we have switched to using
the better regexp-or-string filter struct but forgot to update the matching
code accordingly.

Fixes #665.
2018-08-31 22:51:41 -07:00
Dimitri Fontaine
4fbfd9e522 Refrain from using regexp_match() function, introduced in Pg10.
Instead use the substring() function which has been there all along.

See #813.
2018-08-22 10:52:01 +02:00
Dimitri Fontaine
c9b905b7ac Simplify our ASD system definition by using :serial t.
This allows to drop manually maintained list of files dependencies, instead
implying them by the order in which we list the files.
2018-08-20 11:55:47 +02:00
Dimitri Fontaine
cb633aa092 Refrain from some introspections on non-PGDG PostgreSQL variants.
When dealing with PostgreSQL protocol compatible databases, often enough
they don't support the same catalogs as PostgreSQL itself. Redshift for
instance lacks foreign key support.
2018-08-20 11:52:59 +02:00
Dimitri Fontaine
d3bfb1db31 Bugfix previous commit: filter list format changed.
We now accept the more general string and regex match rules, but the code to
generate including and excluding lists from the catalogs had not been updated.
2018-08-20 11:50:50 +02:00
Dimitri Fontaine
fc3a1949f7 Add support for PostgreSQL as a source database.
It's now possible to use pgloader to migrate from PostgreSQL to PostgreSQL.
That might be useful for several reasons, including applying user defined
cast rules at COPY time, or just moving from an hosted solution to another.
2018-08-20 11:09:52 +02:00
Dimitri Fontaine
1ee389d121 Fix parsing empty hostname fields in pgpass.
Fixes #823.
2018-08-14 10:07:05 +03:00
uniquestring
34cc25383a Improved Dockerfiles/docker image size (#821)
* Add dockerfiles to .dockerignore

Otherwise changes in the dockerfiles would invalidate the cache

* Rewrite Dockerfile

- Fix deprecated MAINTAINER instruction
- Move maintainer label to the bottom (improving cache)
- Tidy up apt-get
- Use COPY instead of ADD
  see https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#add-or-copy
- Remove WORKDIR instruction (we don't really need this)
- Combine remaining RUN layers to reduce layer count
- Move final binary instead of copying (reduce image size)

* Use -slim image an multistage build

Reduce size by using multistage builds and the -slim image.
Use debian:stable instead of an specific code name (future proof).

* [cosmetic] indent Dockerfile instructions

Make it easier to see where a new build stage begins

* Rewrite Dockerfile.ccl

Apply the same changes to Dockerfile.ccl as we did for Dockerfile
2018-08-11 01:08:00 +02:00
Christoph Berg
1a811707c6 releasing package pgloader version 3.5.2-3 2018-07-31 16:24:41 +02:00
alexknips
5ca3ee8aad Fix documentation of default MySQL cast rules (#815)
The default rule is `type int to bigint    when  (>= 10 precision)`.
2018-07-20 14:38:06 +02:00
Dimitri Fontaine
46d14af0d3 Add more default rules to MySQL datetime handling.
Given the variety of ways to setup default behavior for datetime and
timestamp data types in MySQL, we need yet more default casting rules. It
might be time to think about a more principled way to solve the problem, but
on the other hand, this ad-hoc one also comes with full overriding
flexibility for the end user.

Fixes #811.
2018-07-08 20:37:06 +02:00
Christoph Berg
1844823bce Rename regress test to ssl
And move ca-certificates dependency to correct test
2018-07-05 21:52:54 +02:00
Christoph Berg
a199db1ae4 Debian: Make cl-pgloader test depend on ca-certificates
Make cl-pgloader test depend on ca-certificates so the snakeoil certificate is
recognized as a valid CA. (Needs the /etc/ssl/certs/*.0 file.)
2018-07-05 19:07:56 +02:00
Dimitri Fontaine
1b150182dc Fix cl-csv delimiter type.
Travis spotted a bug with CCL that I failed to see, and that happens with
Clozure-CL but not with SBCL apparently:

2018-07-03T21:04:11.053795Z FATAL The value "\\\"", derived from the initarg :DELIMITER, can not be used to set the value of the slot CL-CSV::DELIMITER in #<CL-CSV::READ-DISPATCH-TABLE-ENTRY #x30200143DDCD>, because it is not of type (VECTOR (OR (MEMBER T NIL) CHARACTER)).

To fix, prefer the syntax #(#\\ #\") rather than "\\\"".
2018-07-04 01:32:40 +02:00
Christoph Berg
4eb8c7367f releasing package pgloader version 3.5.2-2 2018-07-03 22:53:02 +02:00
Christoph Berg
852b3bc888 debian: Test cl-pgloader through sbcl --eval. 2018-07-03 22:49:27 +02:00
Christoph Berg
647bf4cb86 debian/rules: invoke help2man without path 2018-07-03 22:22:01 +02:00
Christoph Berg
d46c3b8c59 debian/rules: Properly format buildapp invocation 2018-07-03 20:16:29 +02:00
Christoph Berg
bba850479b debian: Skip building and manpage generation in arch-indep builds. 2018-07-03 20:13:15 +02:00
Christoph Berg
ded148228d debian: Install pgloader.asd into correct location. (Closes: #857226) 2018-07-03 18:48:04 +02:00
Christoph Berg
4f5e426fc7 debian: #864309 was fixed in 3.5.2-1 2018-07-03 18:27:51 +02:00
Christoph Berg
8263e587f0 debian/source/options: Ignore changes in src/params.lisp (release vs non-release). 2018-07-03 17:17:28 +02:00
Christoph Berg
906fd96bf1 debian: Build manpage using help2man. 2018-07-03 17:13:35 +02:00
Christoph Berg
b4fae61d41 debian/copyright: syntax fixups 2018-07-03 17:13:35 +02:00
Dimitri Fontaine
8537bd661f Back to not being a release.
Maybe I should find a way to avoid this extra back-and-forth commit.
Someday.
2018-07-03 17:11:38 +02:00
Dimitri Fontaine
63af7e7373 Release 3.5.2.
This release fixes debian packaging, includes support for Redhift as a
target, and also fixes some bugs.
2018-07-03 16:58:55 +02:00
Christoph Berg
cb528c2e19 All included test data has been verified as free, stop building a +dfsg tarball. 2018-07-03 16:37:43 +02:00
Christoph Berg
f19e301c81 debian: Build sphinx docs
While we are at it, remove built docs on clean
2018-06-25 15:02:32 +02:00
Christoph Berg
7a974d712e docs: Remove sidebar_collapse: false
Sphinx's alabaster module on Debian stretch doesn't support
sidebar_collapse yet; remove the setting so the docs build everywhere
2018-06-25 14:48:29 +02:00
Christoph Berg
a1d42028a3 Build and install new sphinx docs instead. 2018-06-25 12:47:20 +02:00
Dimitri Fontaine
9661c5874d Fix previous patch.
It's easy to avoid having the warning about unused lexical variable with the
proper declaration, that I failed to install before because of a syntax
error when I tried. Let's fix it now that I realise what was wrong.
2018-06-23 00:50:35 +02:00
Dimitri Fontaine
8930734bea Ensure unquoted file names for logs and data.
The previous code could create files having as an example the following,
unhelpful name: \"errors\"/\"err\".\"errors\".log.

Fix #808.
2018-06-22 23:02:07 +02:00
Christoph Berg
ee44f19815 debian: Enable SSL in src/hooks.lisp. 2018-06-22 14:35:59 +02:00
Christoph Berg
2160d0abb2 debian: force SSL usage in test via PGSSLMODE 2018-06-22 14:25:12 +02:00
Dimitri Fontaine
047cf84341 Add support for PGSSLMODE environment variable.
PostgreSQL supports many environment variable to drive its connection
behavior, as documented at the following reference:

  https://www.postgresql.org/docs/current/static/libpq-envars.html

We don't yet support everything, adding them one piece at a time.
2018-06-22 14:13:15 +02:00