218 Commits

Author SHA1 Message Date
Dimitri Fontaine
2147a1d07b Implement ALTER TABLE ... SET TABLESPACE ... as a pgloader clause.
This allows creating tables in any target tablespace rather than the default
one, and is supported for the various sources having support for the ALTER
TABLE clause already.
2019-01-08 22:50:24 +01:00
Dimitri Fontaine
f28f8e577d Review log-level for stored procedures.
Some MySQL schema level features (on update current_timestamp) are migrated
to stored procedures and triggers. We would log the CREATE PROCEDURE
statements as LOG level entries instead of SQL level entries, most likely a
stray devel/debug choice.
2019-01-08 22:44:07 +01:00
Dimitri Fontaine
bda06f8ac0 Implement Citus support from a MySQL database. 2018-12-17 16:31:47 +01:00
Dimitri Fontaine
290ad68d61 Implement materialize views in PostgreSQL source support. 2018-12-16 23:17:37 +01:00
Dimitri Fontaine
af2995b918 Apply quoting rules to SQLite index column names.
The previous fix was wrong for missing the point: rather than unquote column
names in the table definition when matching the column names in the index
definition, we should in the first place have quoted the index column names
when needed.

Fixes #872 for real this time.
2018-12-02 00:17:26 +01:00
Dimitri Fontaine
a939d20dff Unquote names when searching for an index column name in its table.
If the source database is using a keyword (such as "order") as a column
name, then pgloader is going to quote this column name in its internal
catalogs. In that case, unquote the column in the pgloader catalogs when
matching it against the unquoted column name we have in the index
definition.

Fixes #872.
2018-12-01 21:27:26 +01:00
Dimitri Fontaine
ab2cadff24 Simplify the regular expresion parsing the PostgreSQL version string.
The debian/Ubuntu packaging would defeat the quite simple regexp parsing
PostgreSQL version string that we have in pgloader. To make it more robust,
make it more open to unforeseen strings.

See #800, see #810.
2018-11-30 15:39:27 +01:00
Dimitri Fontaine
3f2f10eef1 Finish implementation of CAST rules for PostgreSQL source databases.
Add a link to the table from the internal catalogs for columns so that we
can match table-source-name in cast rules when migrating from PostgreSQL.
2018-11-19 19:33:37 +01:00
Dimitri Fontaine
16dda01f37 Deal with SSL verify error the wrong way.
This patch adds an option --no-ssl-cert-verification that allows bypassing
OpenSSL server certificate verification. It's hopefully a temporary measure
that we set up in order to make progress when confronted to:

  SSL verify error: 20 X509_V_ERR_UNABLE_TO_GET_ISSUER_CERT_LOCALLY

The real solution is of course to install the SSL certificates at a place
where pgloader will look for them, which defaults to
~/.postgresql/postgresql.crt at the moment. It's not clear what the story is
with the defaults from /etc/ssl, or how to make things happen in a better
way.

See #648, See #679, See #768, See #748, See #775.
2018-11-15 00:13:21 +01:00
Dimitri Fontaine
6c80404249 Implement support for Redshift "identity" columns.
At this stage we don't even parse the details of the Redshift identity such
as the seed and step values and consider them the same as a MySQL
auto_increment extra description field.

Fixes #860 (again).
2018-11-09 22:41:14 +01:00
Dimitri Fontaine
794bc7fc64 Improve redshift support: string_agg() doesn't exist there.
Neither does array_agg(), unnest() and other very useful PostgreSQL
functions. Redshift is from 8.0 times, so do things the old way: parse the
output of the index definition that get from calling pg_index_def().

For that, this patch introduces the notion of SQL support that depends on
PostgreSQL major version. If no major-version specific query is found in the
pgloader source tree, then we use the generic one.

Fixes #860.
2018-11-07 21:23:56 +01:00
Dimitri Fontaine
d3b21ac54d Implement automatic discovery of the Citus distribution rules.
With this patch, the following distribution rule

   distribute companies using id

is equivalent to the following distribution rule set, given foreign keys in
the source schema:

   distribute companies using id
   distribute campaigns using company_id
   distribute ads using company_id from campaigns
   distribute clicks using company_id from ads, campaigns
   distribute impressions using company_id from ads, campaigns

In the current code (of this patch) pgloader walks the foreign-keys
dependency tree and knows how to automatically derive distribution rules
from a single rule and the foreign keys.
2018-10-18 15:31:29 +02:00
Dimitri Fontaine
8112a9b54f Improve Citus Distribution Support.
With this patch it's now actually possible to backfill the data on the fly
when using the "distribute" new commands. The schema is modified to add the
distribution key where specified, and changes to the primary and foreign
keys happen automatically. Then a JOIN is generated to get the data directly
during the COPY streaming to the Citus cluster.
2018-10-16 18:53:41 +02:00
Dimitri Fontaine
760763be4b Use the constraint name when we have it.
That's important for Citus, which doesn't know how to ADD a constraint
without a name.
2018-10-10 15:44:21 -07:00
Dimitri Fontaine
381ac9d1a2 Add initial support for Citus distribution from pgloader.
The idea is for pgloader to tweak the schema from a description of the
sharding model, the distribute clause. Here's an example of such a clause:

   distribute company using id
   distribute campaign using company_id
   distribute ads using company_id from campaign
   distribute clicks using company_id from ads, campaign

Given such commands, pgloader adds the distibution key to the table when
needed, to the primary key definition of the table, and also to the foreign
keys that are pointing to the changed primary key.

Then when SELECTing the data from the source database, the idea is for
pgloader to automatically JOIN the base table with the source table where to
find the distribution key, in case it was just added in the schema.

Finally, pgloader also calls the following Citus commands:

  SELECT create_distributed_table('company', 'id');
  SELECT create_distributed_table('campaign', 'company_id');
  SELECT create_distributed_table('ads', 'company_id');
  SELECT create_distributed_table('clicks', 'company_id');
2018-10-10 14:35:12 -07:00
Dimitri Fontaine
5119d864f4 Assorted bug fixes in the context of Redshift support as a source.
The catalog queries used in pgloader have to be adjusted for Redshift
because this thing forked PostgreSQL 8.0, which is a long time ago now.
Also, we had a couple bugs here and there that were not really related to
Redshift support but were shown in that context.

Fixes #813.
2018-09-04 11:49:21 +02:00
Dimitri Fontaine
4fbfd9e522 Refrain from using regexp_match() function, introduced in Pg10.
Instead use the substring() function which has been there all along.

See #813.
2018-08-22 10:52:01 +02:00
Dimitri Fontaine
cb633aa092 Refrain from some introspections on non-PGDG PostgreSQL variants.
When dealing with PostgreSQL protocol compatible databases, often enough
they don't support the same catalogs as PostgreSQL itself. Redshift for
instance lacks foreign key support.
2018-08-20 11:52:59 +02:00
Dimitri Fontaine
d3bfb1db31 Bugfix previous commit: filter list format changed.
We now accept the more general string and regex match rules, but the code to
generate including and excluding lists from the catalogs had not been updated.
2018-08-20 11:50:50 +02:00
Dimitri Fontaine
fc3a1949f7 Add support for PostgreSQL as a source database.
It's now possible to use pgloader to migrate from PostgreSQL to PostgreSQL.
That might be useful for several reasons, including applying user defined
cast rules at COPY time, or just moving from an hosted solution to another.
2018-08-20 11:09:52 +02:00
Dimitri Fontaine
a0bac47101 Refrain from TRUNCAT'ing an empty list of tables.
Fixed #789.
2018-06-15 17:46:31 +02:00
Dimitri Fontaine
3db3ecf81b Review Redshift data type dumb-down choices.
It's a little more involved that what was done previously. In particular we
need to pay attention to MySQL varchar(x) and transform them into something
big enough when counting bytes rather than chars, like varchar(3x).

Then there's the "text" datatype to take into account, and some more.
2018-05-23 13:43:28 +02:00
Dimitri Fontaine
d4dc4499a8 Add schema migration support for Redshift as a target.
Redshift looks like a very old PostgreSQL (8.0.2) with some extra features
and a very limited selection of data types. In this patch we parse the
PostgreSQL version() function output and automatically determine if we're
connected to Redshift.

When connected to Redshift, we then dumb-down our target catalogs to the
subset of data types that Redshift actually does support.

Also, some catalog queries can't be done in Redshift, and 8.0 didn't have
fully compliant VALUES statement, so we use a temporary table in places
where we used to use SELECT ... FROM (VALUES(...)) in pgloader.

COPYing data to Redshift isn't possible with just this set of changes,
because Redshift also don't support the COPY FROM STDIN form. COPY sources
are limited, and another patch will have to be cooked to prepare the data
from pgloader into a format and location that Redshift knows how to handle.

At least, it's possible to migrate a database schema to Redshift already.
2018-05-19 19:16:58 +02:00
Dimitri Fontaine
48af01dbbc Fix implementation of foreign keys in data only mode.
In data-only mode, the foreign keys parameter (which defaults to True) means
something special: we remove the fkey definitions prior to the data only
load then re-install the fkeys.

This got broken in a previous commit, the WITH clause option being processed
like the other DDL ones that only make sense when creating the schema. While
fixing the setting in copy-database, we have to also fix a nesting bug in
complete-pgsql-database that would prevent fkey to be installed again at the
end of the load.

This patch not only fix that choice, but also review the implementation of
the drop-pgsql-fkeys support function to use more modern internal API,
preparing a list of SQL statements to be sent to the psql-execute level.

Fixes #745.
2018-02-19 22:07:43 +01:00
Dimitri Fontaine
e129e77eb6 Fix SQL execute counters maintenance. 2018-02-19 22:06:51 +01:00
Dimitri Fontaine
4fed8c5eca Fix support for newid() from MS SQL.
Several places in the code are involved to deal with the default values from
MS SQL. The catalog query is dealing with strange quoting rules on the
source side and used to fill in directly the PostgreSQL expected value. But
then the quoting of a function call wasn't properly handled.

Rather than coping with the quoting rules here, have the catalog query
return a pgloader specific placeholder "GENERATE_UUID". Then the MS SQL
specific code can normalize that to the symbol :generate_uuid. Then the
generic PostgreSQL DDL code can implement the proper replacement for that
symbol, not having to know where it comes from.

Fix #742.
2018-02-17 00:25:33 +01:00
Dimitri Fontaine
5e3acbb462 When merging catalogs, "float" and "double precision" the same type.
PostgreSQL understands both spellings of the data type name and implements
float as being a double precision value, so we should refrain from any
warning about that non-discrepency when doing a data-only load.

Should fix #746.
2018-02-16 23:42:46 +01:00
Dimitri Fontaine
4612e68435 Implement support for new casting rules guards and actions.
Namely the actions are “keep extra” and “drop extra” and the casting rule
guard is “with extra on update current timestamp”. Having support for those
elements in the casting rules allow such a definition as the following:

      type timestamp with extra on update current timestamp
        to "timestamp with time zone" drop extra

The effect of such as cast rule would be to ignore the MySQL extra
definition and then refrain pgloader from creating the PostgreSQL triggers
that implement the same behavior.

Fix #735.
2018-01-31 15:17:05 +01:00
Dimitri Fontaine
5ba42edb0c Review misleading error message with schema not found.
It might be that the schema exists but we didn't find what we expected to
in there, so that it didn't make it to pgloader's internal catalogs. Be
friendly to the user with a better error message.

Fix #713.
2018-01-25 23:29:36 +01:00
Dimitri Fontaine
adf03c47ad Clean up source code organisation.
The copy format and batch facilities are no longer the meat of your
PostgreSQL support in the src/pgsql directory, so have them leave in their
own space.
2018-01-23 19:52:13 +01:00
Dimitri Fontaine
3bb128c5db Review format-vector-row.
This function prepares the data to be sent down to PostgreSQL as a clean
COPY text with unicode handled correctly. This commit is mainly a clean-up
of the function, and also adds some smarts to try and make it faster.

In testing, the function is now tangentially faster than before, but not by
much. The hope here is that it's now easier to optimize it.
2018-01-22 21:37:14 +01:00
Dimitri Fontaine
c05183fcba Implement support for Foreign Tables and Partitionned Tables.
Due to the way pgloader queries the PostgreSQL catalogs, it restricted the
target table to be “ordinary” tables, as per the relkind description in the
https://www.postgresql.org/docs/current/static/catalog-pg-class.html
PostgreSQL documentation.

Extend this to support relkind of 'r', 'f' and 'p'.

Fixes #587, fixes #690.
2017-12-01 22:13:47 +01:00
Dimitri Fontaine
6964764fb4 Find schema names unquoted.
When doing a MySQL to PostgreSQL migration in data only mode, pgloader
matches schema names found on both source and target database, and much like
with table names must do so ensuring unquoted schema names.

Otherwise we fail to find the schema name again, because one spelling has
the quotes, but not the other one, when using the “quote identifiers”
option.

Fix #659, at least some forms of it.
2017-11-19 17:12:21 +01:00
Dimitri Fontaine
db7a91d6c4 Add the MySQL target schema to the search_path.
In the next release, pgloader defaults to targetting a new schema named the
same as the MySQL database, because that's what makes more sense. But people
are used to having 'public' in the search_path and everything in there.

So when creating our target schema, when migrating from MySQL, arrange it so
that the new schema is in the search_path by issuing a command like:

  ALTER DATABASE plop SET search_path TO public, f1db;

And make this command visible in verbose (NOTICE) mode too, so that user can
see what happens.

Fix #654. I think.
2017-11-02 12:40:21 +01:00
Dimitri Fontaine
0a88645eb5 Fix time measurements of the write activity.
When using --verbose or more detailed log messages, the summary prints
timings for both read and write operations separately. The write summary
timing took into account only the PostgreSQL batch activity, discarding the
formatting of the data done by pgloader.

As this formatting is quite heavy at the moment, the results are pretty
misleading without that information.
2017-10-21 21:04:55 +02:00
Dimitri Fontaine
8a361a0ff8 Add support for multiple on update columns per table.
The MySQL special syntax "on update current_timestamp()" used to support
only a single column per table (in MySQL), and so did pgloader. In MariaDB
version 10 it's now possible to have several column with that special
treatment, so adapt pgloader to migrate that too.

What pgloader does is recognize that several columns are to receive the same
pre-update processing, and creates a single function that does the both of
them, as in the following example, from pgloader logs in a test case:

    CREATE OR REPLACE FUNCTION mysql.on_update_current_timestamp_onupdate()
      RETURNS trigger
      LANGUAGE plpgsql
      AS
    $$
    BEGIN
       NEW.update_date = now();
       NEW.calc_date = now();
       RETURN NEW;
    END;
    $$;
    CREATE TRIGGER on_update_current_timestamp
            BEFORE UPDATE ON mysql.onupdate
          FOR EACH ROW
      EXECUTE PROCEDURE mysql.on_update_current_timestamp_onupdate();

Fixes #629.
2017-09-15 01:04:57 +02:00
Dimitri Fontaine
a498313074 Implement support for MySQL FULLTEXT indexes.
PostgreSQL btree indexes are limited in the size of the values they can
index: values must fit in an index page (8kB). So when porting a MySQL full
text index over full documents, we might get into an error like the
following:

  index row size 2872 exceeds maximum 2712 for index "idx_5199509_search"

To fix, query MySQL for the index type which is FULLTEXT rather than BTREE
in those cases, and port it over to a PostgreSQL Full Text index with an
hard-coded 'simple' configuration, such as the following test case:

  CREATE INDEX idx_75421_search ON mysql.fcm_batches USING gin(to_tsvector('simple', raw_payload));

Of course users might want to use a better configuration, including proper
dictionnary for the documents. When using PostgreSQL each document may have
its own configuration attached and yet they can all get indexed into the
same index, so that's a task for the application developpers, not for
pgloader.

In passing, fix the list-typenames-without-btree-support.sql query to return
separate entries for each index type rather than an {array,representation}
of the result, as Postmodern won't turn the PostgreSQL array into a Common
Lisp array by default. I'm kept wondering how it worked before.

Fix #569.
2017-09-14 15:40:34 +02:00
Dimitri Fontaine
987c0703ad Some default values come properly quoted from MariaDB now.
Adjust the default value formating to check if the default value is already
single-quoted and only add new 'single quotes' when it's not the case.

Apparently ENUM default values in MariaDB 10 are now properly single quoted.
2017-09-14 15:39:04 +02:00
Dimitri Fontaine
72c58306ba Fix the previous fix.
See #614. Again. Should be ok now.
2017-08-25 01:56:34 +02:00
Dimitri Fontaine
f20a5a0667 Fix schema name comparing with quoted schema names.
In the previous commit we introduced support for database names including
spaces, which means that by default pgloader creates a target schema in
PostgreSQL with a space in its name. That works well as soon as you always
double-quote the schema name, which pgloader does.

Now, in our internal catalogs, we keep the schema name double-quoted. And
when comparing that schema names with quotes to the raw schema name from
PostgreSQL, they won't match, and pgloader tries to create the schema again:

  ERROR Database error 42P06: schema "my sql" already exists

Fix the comparing to compare unquoted schema name, fix #614 again: the
previous fix would only work the first time.
2017-08-25 01:47:49 +02:00
Dimitri Fontaine
4fcb24f448 Reintroduce manual Garbage Collect in SBCL.
It seems that SBCL still needs some help in deciding when to GC with very
large values. In a test case with a “data” column averaging 375kB (up to
about 3 MB per datum), it allows much larger batch size and prefetch rows
settings without entering lldb.
2017-08-23 16:27:14 +02:00
Dimitri Fontaine
4f9eb8c06b Track bytes sent to PostgreSQL.
The pgstate infrastructure already had lots of details about what's going
on, add to it the information about how many bytes are sent in every batch,
and use this information in the monitor when something long is happening to
display how many rows we sent from the beginning for this (supposedly) huge
table, along with bytes and speed (bytes per seconds).
2017-08-23 11:55:49 +02:00
Dimitri Fontaine
1f242cd29e Fix comment support to schema qualify target tables. 2017-08-23 11:26:08 +02:00
Dimitri Fontaine
28db6b9f13 Desultory cleanup of a useless declaim. 2017-08-21 16:46:32 +02:00
Dimitri Fontaine
03a8d57a50 Review --verbose log message.
The verbosity is not that easy to adjust. Remove useless messages and add a
new one telling when the COPY of a table is done. As we might have to wait
for some time for indexes being built. keep the CREATE INDEX lines. Also
keep the ALTER TABLE both for primary keys and foreign keys, again because
the user might have to wait for quite some time.
2017-08-21 15:27:13 +02:00
Dimitri Fontaine
952e7da191 Bug fix CREATE TYPE in schema (previous patch).
The previous patch fixed CREATE TYPE so that ENUM types are created in the
same schema than the table using them, but failed to update the DROP TYPE
statements to also target this schema...
2017-08-10 21:19:25 +02:00
Dimitri Fontaine
5a65da2147 Create new types in the proper schema.
Previously to this patch, pgloader wouldn't care about which schema it
creates extra types in. Extra types are mainly ENUM and SET support from
MySQL. Now, pgloader creates those extra PostgreSQL ENUM types in the same
schema as the table using them, which is a more sound default.
2017-08-10 18:57:09 +02:00
Dimitri Fontaine
5c1c4bf3ff Fix MySQL Enum parsing.
We use a CSV parser for the MySQL enum values, but the quote escaping wasn't
properly setup: MySQL quotes ENUM values with a single-quote (') and uses
two of them ('') for escaping single-quotes when found in the ENUM value
itself.

Fixes #597.
2017-08-01 18:40:27 +02:00
Dimitri Fontaine
dfe5c38185 Fix quoting policy in PostgreSQL ddl formating.
We already have apply-identifier-case and *identifier-case* to decide how
and when to quote our SQL object names, so don't force extra quotes in
format string: refrain from using ~s.
2017-07-06 09:47:48 +02:00
Dimitri Fontaine
9da012ca51 Fix identifiers quoting when reading PostgreSQL catalogs.
We sure can trust PostgreSQL to use names it knows how to handle. Still, it
will be happy to store in its catalogs names containing upper case, and in
that case we must quote them.
2017-07-06 03:16:06 +02:00