Commit Graph

1568 Commits

Author SHA1 Message Date
Dimitri Fontaine
eab1cbf326 More docs improvements.
Explain the feature list of pgloader better for improving discoverability of
what can be achieved with our nice little tool.
2018-12-19 22:40:32 +01:00
Dimitri Fontaine
ec071af0ad Add a Feature Matrix to the documentation.
That helps having both an overview of what pgloader is capable of doing with
a database migration, and also documenting that some sources don't have the
full support for some features yet.
2018-12-19 15:31:25 +01:00
Dimitri Fontaine
2cafa8360c Document newly added MATERIALIZE VIEWS for new sources.
Now it's possible to use this clause with a PostgreSQL or an MS SQL database
source.

Fixes #817.
2018-12-19 10:51:04 +01:00
Dimitri Fontaine
c019c16113 Implement MATERIALIZE VIEWS support for MS SQL, and distribute.
The latter is not tested yet, but should have no impact if not used. Given
how rare it is that I get a chance to play around with a MS SQL instance
anyway, it might be better to push blind changes for it when it doesn't
impact existing features…
2018-12-19 01:25:27 +01:00
Dimitri Fontaine
bda06f8ac0 Implement Citus support from a MySQL database. 2018-12-17 16:31:47 +01:00
Dimitri Fontaine
290ad68d61 Implement materialize views in PostgreSQL source support. 2018-12-16 23:17:37 +01:00
Dimitri Fontaine
007003647d Improve Redshift support documentation. 2018-12-14 18:21:34 +09:00
Dimitri Fontaine
f72afeeae7 Switch the documentation to the ReadTheDocs template. 2018-12-12 09:34:20 +09:00
Dimitri Fontaine
b6de8f1ead Improve Citus documentation. 2018-12-12 09:34:05 +09:00
Dimitri Fontaine
56d24de67a Update documentation with new features.
We have a lot of new features to document. This is a first patch about that,
some more work is to be done. That said, it's better than nothing already.
2018-12-11 14:25:08 +09:00
Dimitri Fontaine
af2995b918 Apply quoting rules to SQLite index column names.
The previous fix was wrong for missing the point: rather than unquote column
names in the table definition when matching the column names in the index
definition, we should in the first place have quoted the index column names
when needed.

Fixes #872 for real this time.
2018-12-02 00:17:26 +01:00
Dimitri Fontaine
a939d20dff Unquote names when searching for an index column name in its table.
If the source database is using a keyword (such as "order") as a column
name, then pgloader is going to quote this column name in its internal
catalogs. In that case, unquote the column in the pgloader catalogs when
matching it against the unquoted column name we have in the index
definition.

Fixes #872.
2018-12-01 21:27:26 +01:00
Dimitri Fontaine
ab2cadff24 Simplify the regular expresion parsing the PostgreSQL version string.
The debian/Ubuntu packaging would defeat the quite simple regexp parsing
PostgreSQL version string that we have in pgloader. To make it more robust,
make it more open to unforeseen strings.

See #800, see #810.
2018-11-30 15:39:27 +01:00
Dimitri Fontaine
801d8a6e09 Add support for MS SQL time data type.
As for the other datetime types we have to use CONVERT at the SQL level in
order to get a format that PostgreSQL understands. This time the magic
number for it is 114.
2018-11-23 10:43:58 +01:00
Dimitri Fontaine
6e325f67e0 Implement the save.lisp idea for the bundle.
This should make it easier to build pgloader with CCL rather than SBCL, all
from the bundle distribution, and also easier to support windows.

In passing, add a new file in the bundle distribution: version.sexp should
contain a CL string containing the pgloader version string.
2018-11-21 21:44:56 +01:00
Dimitri Fontaine
18bcf10903 Blind fix for a strange use-case.
A user reported a case where pgloader fails to find the table an index has
been created on in pgloader catalogs. That's a weird case. For now, just
issue a warning about the situation and skip the index.
2018-11-21 18:17:34 +01:00
Dimitri Fontaine
4ab26e5387 Handle other conditions in process-catalogs.
It might be that some random condition is signaled during process-catalogs,
causing the errors reported so far and that I can't reproduce. Let's add
some handler-case protection to have more clues about what could be
happening.

See #865, #800, #810, #859, #824.
2018-11-21 17:31:11 +01:00
Dimitri Fontaine
743769d750 Improve handling of errors when fetching the source catalogs.
We might have MS SQL failures at this stage, or even Redshift or other
PostgreSQL variants failing to execute our catalog queries. Handle
conditions by cleanly logging them and returning from copy-database without
doing anything. That's the best we can do here.

Fixes #605, fixes #757.
2018-11-21 10:38:19 +01:00
Dimitri Fontaine
1c18b41cd7 Implement a new way of building pgloader: make save.
This time we directly call into the save-lisp-and-die feature of the
implementation. As pgloader only supports SBCL and CCL at the time being,
doing things without an abstraction layer is easy enough.

This needs more testing and a special version for the bundle case too. One
step at a time, etc.
2018-11-20 22:59:43 +01:00
Dimitri Fontaine
3f2f10eef1 Finish implementation of CAST rules for PostgreSQL source databases.
Add a link to the table from the internal catalogs for columns so that we
can match table-source-name in cast rules when migrating from PostgreSQL.
2018-11-19 19:33:37 +01:00
Dimitri Fontaine
aa8ae159e2 Improve error handling when applying Citus distribution rules.
Make it so that we generate a proper error message to the user when failing
to figure out the PATH to the distribution key, rather than failing with an
internal error about The value NIL is not of type PGLOADER.CATALOG:TABLE.
2018-11-18 18:21:51 +01:00
Dimitri Fontaine
f07ac61269 Fix default/serial handling of pgsql as a source.
In the recent patch that added support for Redshift "identity" columns, we
broke support for PostgreSQL sequences. Unbreak that.
2018-11-18 17:46:41 +01:00
Dimitri Fontaine
1fd0576ace Fix Citus support related debug print instructions. 2018-11-16 00:08:27 +01:00
Dimitri Fontaine
8b1acbae87 Make sure the image knows how to print circular data structures.
Our catalogs representation is designed to be circular, which helps
navigating the graph from anywhere when processing it. This means that we
need to have *print-circle* set to t in the pgloader image, otherwise we
might run into Control stack exhausted when trying to print out debug
information...

Fixes #865, #800, #810, #859, #824.
2018-11-16 00:03:31 +01:00
Dimitri Fontaine
e291c502ba Install a call to cl+ssl:reload at image startup time, again.
Testing shows that it's not just debian which needs it, it's always
necessary. Just re-add our tweak now.

See #866, see #816, see #807, #794.
2018-11-15 23:59:51 +01:00
Dimitri Fontaine
16dda01f37 Deal with SSL verify error the wrong way.
This patch adds an option --no-ssl-cert-verification that allows bypassing
OpenSSL server certificate verification. It's hopefully a temporary measure
that we set up in order to make progress when confronted to:

  SSL verify error: 20 X509_V_ERR_UNABLE_TO_GET_ISSUER_CERT_LOCALLY

The real solution is of course to install the SSL certificates at a place
where pgloader will look for them, which defaults to
~/.postgresql/postgresql.crt at the moment. It's not clear what the story is
with the defaults from /etc/ssl, or how to make things happen in a better
way.

See #648, See #679, See #768, See #748, See #775.
2018-11-15 00:13:21 +01:00
Dimitri Fontaine
5ecf04acb9 Implement null if support as a WITH option.
This gives a default "null if" option to all the input columns at once, and
it's still possible to override the default per column.

In passing, fix project-fields declarations that SBCL now complains about
when they're not true, such as declaring a vector when we might have :null
or nil. As a result, remove the (declare (optimize speed)) in the generated
field processing code.
2018-11-13 21:41:27 +01:00
Dimitri Fontaine
a6ef7a56a9 Implement ipv6 hostname support in .pgpass rules.
An hostname could be written [::1] in .pgass, without having to escape the
colon characters, and with a proper enclosing in square brackets, as common
for ipv6 addresses.

Fixes #837.
2018-11-10 21:01:30 +01:00
Dimitri Fontaine
656bf85075 Review field to column projection code emitted.
The code emitted by pgloader to transform input fields into PostgreSQL
column values was using too many optimization declarations, some of them
that SBCL failed to follow through for lack of type marking in the generated
code.

As SBCL doesn't have enough information to be optimizing anyway, at least we
can make it so that we don't have a warning about it. The new code does that.

Fixes #803.
2018-11-10 20:22:04 +01:00
Dimitri Fontaine
6eaad0621b Desultory code maintenance for MS SQL identity support.
The code expects the keyword :auto-increment rather than a string nowadays
in order to process an extra column bits of information as meaning that we
want to cast to a serial/bigserial datatype.
2018-11-09 22:42:31 +01:00
Dimitri Fontaine
6c80404249 Implement support for Redshift "identity" columns.
At this stage we don't even parse the details of the Redshift identity such
as the seed and step values and consider them the same as a MySQL
auto_increment extra description field.

Fixes #860 (again).
2018-11-09 22:41:14 +01:00
Dimitri Fontaine
794bc7fc64 Improve redshift support: string_agg() doesn't exist there.
Neither does array_agg(), unnest() and other very useful PostgreSQL
functions. Redshift is from 8.0 times, so do things the old way: parse the
output of the index definition that get from calling pg_index_def().

For that, this patch introduces the notion of SQL support that depends on
PostgreSQL major version. If no major-version specific query is found in the
pgloader source tree, then we use the generic one.

Fixes #860.
2018-11-07 21:23:56 +01:00
Dimitri Fontaine
207cd82726 Improve SQLite type names parsing.
Allow spaces in more random places, as SQLite doesn't seem to normalize the
user input. Fixes #548 again.
2018-11-07 11:01:06 +01:00
Dimitri Fontaine
f8460c1705 Allow usernames and dbnames starting with digits (again).
It turns out that the rules about the names of users and databases are more
lax than pgloader would know, so it might be a good move for our DSN parsing
to accept more values and then let the source/target systems to complain
when something goes wrong.

See #230 which got broke again somewhere.
2018-10-20 19:28:19 +02:00
Jason Rigby
6e7ea90806 add cl-ironclad and cl-babel dependencies to docker builds (#854) 2018-10-18 18:56:40 +02:00
Larry Gebhardt
0e6f599282 Add Docker build instructions (#853) 2018-10-18 18:55:56 +02:00
Dimitri Fontaine
7b487ddaca Add a Citus distribution test case, from the citus tutorial. 2018-10-18 15:42:17 +02:00
Dimitri Fontaine
d3b21ac54d Implement automatic discovery of the Citus distribution rules.
With this patch, the following distribution rule

   distribute companies using id

is equivalent to the following distribution rule set, given foreign keys in
the source schema:

   distribute companies using id
   distribute campaigns using company_id
   distribute ads using company_id from campaigns
   distribute clicks using company_id from ads, campaigns
   distribute impressions using company_id from ads, campaigns

In the current code (of this patch) pgloader walks the foreign-keys
dependency tree and knows how to automatically derive distribution rules
from a single rule and the foreign keys.
2018-10-18 15:31:29 +02:00
Dimitri Fontaine
8112a9b54f Improve Citus Distribution Support.
With this patch it's now actually possible to backfill the data on the fly
when using the "distribute" new commands. The schema is modified to add the
distribution key where specified, and changes to the primary and foreign
keys happen automatically. Then a JOIN is generated to get the data directly
during the COPY streaming to the Citus cluster.
2018-10-16 18:53:41 +02:00
Dimitri Fontaine
760763be4b Use the constraint name when we have it.
That's important for Citus, which doesn't know how to ADD a constraint
without a name.
2018-10-10 15:44:21 -07:00
Dimitri Fontaine
381ac9d1a2 Add initial support for Citus distribution from pgloader.
The idea is for pgloader to tweak the schema from a description of the
sharding model, the distribute clause. Here's an example of such a clause:

   distribute company using id
   distribute campaign using company_id
   distribute ads using company_id from campaign
   distribute clicks using company_id from ads, campaign

Given such commands, pgloader adds the distibution key to the table when
needed, to the primary key definition of the table, and also to the foreign
keys that are pointing to the changed primary key.

Then when SELECTing the data from the source database, the idea is for
pgloader to automatically JOIN the base table with the source table where to
find the distribution key, in case it was just added in the schema.

Finally, pgloader also calls the following Citus commands:

  SELECT create_distributed_table('company', 'id');
  SELECT create_distributed_table('campaign', 'company_id');
  SELECT create_distributed_table('ads', 'company_id');
  SELECT create_distributed_table('clicks', 'company_id');
2018-10-10 14:35:12 -07:00
Dimitri Fontaine
344d0ca61b Implement AFTER SCHEMA sql code blocks.
This allows pgloader users to run SQL commands in between pgloader's schema
creation and the actual loading of the data.
2018-10-10 11:08:28 -07:00
Jon Snell
0957bd0efa Fix pgloader bug #844 by adding support for mssql real types (#845) 2018-10-05 12:47:54 +02:00
Dimitri Fontaine
d356bd501b Accept even more ragged date format input.
When parsing a date string from a date format, accept that the ms or us part
be completely missing, rather than just missing some digits.

Fixed #828.
2018-09-10 19:37:36 +02:00
Dimitri Fontaine
5119d864f4 Assorted bug fixes in the context of Redshift support as a source.
The catalog queries used in pgloader have to be adjusted for Redshift
because this thing forked PostgreSQL 8.0, which is a long time ago now.
Also, we had a couple bugs here and there that were not really related to
Redshift support but were shown in that context.

Fixes #813.
2018-09-04 11:49:21 +02:00
Dimitri Fontaine
0f58a3c84d Assorted fixes: catalogs SQLtypes and MySQL decoding as.
It turns out that when trying to debug "decoding as" the SQLtype listing
support in sqltype-list was found broken, so this patch fixes it. Then goes
on to fix the DECODING AS filters support, which we have switched to using
the better regexp-or-string filter struct but forgot to update the matching
code accordingly.

Fixes #665.
2018-08-31 22:51:41 -07:00
Dimitri Fontaine
4fbfd9e522 Refrain from using regexp_match() function, introduced in Pg10.
Instead use the substring() function which has been there all along.

See #813.
2018-08-22 10:52:01 +02:00
Dimitri Fontaine
c9b905b7ac Simplify our ASD system definition by using :serial t.
This allows to drop manually maintained list of files dependencies, instead
implying them by the order in which we list the files.
2018-08-20 11:55:47 +02:00
Dimitri Fontaine
cb633aa092 Refrain from some introspections on non-PGDG PostgreSQL variants.
When dealing with PostgreSQL protocol compatible databases, often enough
they don't support the same catalogs as PostgreSQL itself. Redshift for
instance lacks foreign key support.
2018-08-20 11:52:59 +02:00
Dimitri Fontaine
d3bfb1db31 Bugfix previous commit: filter list format changed.
We now accept the more general string and regex match rules, but the code to
generate including and excluding lists from the catalogs had not been updated.
2018-08-20 11:50:50 +02:00