This allows creating tables in any target tablespace rather than the default
one, and is supported for the various sources having support for the ALTER
TABLE clause already.
Some MySQL schema level features (on update current_timestamp) are migrated
to stored procedures and triggers. We would log the CREATE PROCEDURE
statements as LOG level entries instead of SQL level entries, most likely a
stray devel/debug choice.
The previous fix was wrong for missing the point: rather than unquote column
names in the table definition when matching the column names in the index
definition, we should in the first place have quoted the index column names
when needed.
Fixes#872 for real this time.
If the source database is using a keyword (such as "order") as a column
name, then pgloader is going to quote this column name in its internal
catalogs. In that case, unquote the column in the pgloader catalogs when
matching it against the unquoted column name we have in the index
definition.
Fixes#872.
The debian/Ubuntu packaging would defeat the quite simple regexp parsing
PostgreSQL version string that we have in pgloader. To make it more robust,
make it more open to unforeseen strings.
See #800, see #810.
This patch adds an option --no-ssl-cert-verification that allows bypassing
OpenSSL server certificate verification. It's hopefully a temporary measure
that we set up in order to make progress when confronted to:
SSL verify error: 20 X509_V_ERR_UNABLE_TO_GET_ISSUER_CERT_LOCALLY
The real solution is of course to install the SSL certificates at a place
where pgloader will look for them, which defaults to
~/.postgresql/postgresql.crt at the moment. It's not clear what the story is
with the defaults from /etc/ssl, or how to make things happen in a better
way.
See #648, See #679, See #768, See #748, See #775.
At this stage we don't even parse the details of the Redshift identity such
as the seed and step values and consider them the same as a MySQL
auto_increment extra description field.
Fixes#860 (again).
Neither does array_agg(), unnest() and other very useful PostgreSQL
functions. Redshift is from 8.0 times, so do things the old way: parse the
output of the index definition that get from calling pg_index_def().
For that, this patch introduces the notion of SQL support that depends on
PostgreSQL major version. If no major-version specific query is found in the
pgloader source tree, then we use the generic one.
Fixes#860.
With this patch, the following distribution rule
distribute companies using id
is equivalent to the following distribution rule set, given foreign keys in
the source schema:
distribute companies using id
distribute campaigns using company_id
distribute ads using company_id from campaigns
distribute clicks using company_id from ads, campaigns
distribute impressions using company_id from ads, campaigns
In the current code (of this patch) pgloader walks the foreign-keys
dependency tree and knows how to automatically derive distribution rules
from a single rule and the foreign keys.
With this patch it's now actually possible to backfill the data on the fly
when using the "distribute" new commands. The schema is modified to add the
distribution key where specified, and changes to the primary and foreign
keys happen automatically. Then a JOIN is generated to get the data directly
during the COPY streaming to the Citus cluster.
The idea is for pgloader to tweak the schema from a description of the
sharding model, the distribute clause. Here's an example of such a clause:
distribute company using id
distribute campaign using company_id
distribute ads using company_id from campaign
distribute clicks using company_id from ads, campaign
Given such commands, pgloader adds the distibution key to the table when
needed, to the primary key definition of the table, and also to the foreign
keys that are pointing to the changed primary key.
Then when SELECTing the data from the source database, the idea is for
pgloader to automatically JOIN the base table with the source table where to
find the distribution key, in case it was just added in the schema.
Finally, pgloader also calls the following Citus commands:
SELECT create_distributed_table('company', 'id');
SELECT create_distributed_table('campaign', 'company_id');
SELECT create_distributed_table('ads', 'company_id');
SELECT create_distributed_table('clicks', 'company_id');
The catalog queries used in pgloader have to be adjusted for Redshift
because this thing forked PostgreSQL 8.0, which is a long time ago now.
Also, we had a couple bugs here and there that were not really related to
Redshift support but were shown in that context.
Fixes#813.
When dealing with PostgreSQL protocol compatible databases, often enough
they don't support the same catalogs as PostgreSQL itself. Redshift for
instance lacks foreign key support.
We now accept the more general string and regex match rules, but the code to
generate including and excluding lists from the catalogs had not been updated.
It's now possible to use pgloader to migrate from PostgreSQL to PostgreSQL.
That might be useful for several reasons, including applying user defined
cast rules at COPY time, or just moving from an hosted solution to another.
It's a little more involved that what was done previously. In particular we
need to pay attention to MySQL varchar(x) and transform them into something
big enough when counting bytes rather than chars, like varchar(3x).
Then there's the "text" datatype to take into account, and some more.
Redshift looks like a very old PostgreSQL (8.0.2) with some extra features
and a very limited selection of data types. In this patch we parse the
PostgreSQL version() function output and automatically determine if we're
connected to Redshift.
When connected to Redshift, we then dumb-down our target catalogs to the
subset of data types that Redshift actually does support.
Also, some catalog queries can't be done in Redshift, and 8.0 didn't have
fully compliant VALUES statement, so we use a temporary table in places
where we used to use SELECT ... FROM (VALUES(...)) in pgloader.
COPYing data to Redshift isn't possible with just this set of changes,
because Redshift also don't support the COPY FROM STDIN form. COPY sources
are limited, and another patch will have to be cooked to prepare the data
from pgloader into a format and location that Redshift knows how to handle.
At least, it's possible to migrate a database schema to Redshift already.
In data-only mode, the foreign keys parameter (which defaults to True) means
something special: we remove the fkey definitions prior to the data only
load then re-install the fkeys.
This got broken in a previous commit, the WITH clause option being processed
like the other DDL ones that only make sense when creating the schema. While
fixing the setting in copy-database, we have to also fix a nesting bug in
complete-pgsql-database that would prevent fkey to be installed again at the
end of the load.
This patch not only fix that choice, but also review the implementation of
the drop-pgsql-fkeys support function to use more modern internal API,
preparing a list of SQL statements to be sent to the psql-execute level.
Fixes#745.
Several places in the code are involved to deal with the default values from
MS SQL. The catalog query is dealing with strange quoting rules on the
source side and used to fill in directly the PostgreSQL expected value. But
then the quoting of a function call wasn't properly handled.
Rather than coping with the quoting rules here, have the catalog query
return a pgloader specific placeholder "GENERATE_UUID". Then the MS SQL
specific code can normalize that to the symbol :generate_uuid. Then the
generic PostgreSQL DDL code can implement the proper replacement for that
symbol, not having to know where it comes from.
Fix#742.
PostgreSQL understands both spellings of the data type name and implements
float as being a double precision value, so we should refrain from any
warning about that non-discrepency when doing a data-only load.
Should fix#746.
Namely the actions are “keep extra” and “drop extra” and the casting rule
guard is “with extra on update current timestamp”. Having support for those
elements in the casting rules allow such a definition as the following:
type timestamp with extra on update current timestamp
to "timestamp with time zone" drop extra
The effect of such as cast rule would be to ignore the MySQL extra
definition and then refrain pgloader from creating the PostgreSQL triggers
that implement the same behavior.
Fix#735.
It might be that the schema exists but we didn't find what we expected to
in there, so that it didn't make it to pgloader's internal catalogs. Be
friendly to the user with a better error message.
Fix#713.
The copy format and batch facilities are no longer the meat of your
PostgreSQL support in the src/pgsql directory, so have them leave in their
own space.
This function prepares the data to be sent down to PostgreSQL as a clean
COPY text with unicode handled correctly. This commit is mainly a clean-up
of the function, and also adds some smarts to try and make it faster.
In testing, the function is now tangentially faster than before, but not by
much. The hope here is that it's now easier to optimize it.
Due to the way pgloader queries the PostgreSQL catalogs, it restricted the
target table to be “ordinary” tables, as per the relkind description in the
https://www.postgresql.org/docs/current/static/catalog-pg-class.html
PostgreSQL documentation.
Extend this to support relkind of 'r', 'f' and 'p'.
Fixes#587, fixes#690.
When doing a MySQL to PostgreSQL migration in data only mode, pgloader
matches schema names found on both source and target database, and much like
with table names must do so ensuring unquoted schema names.
Otherwise we fail to find the schema name again, because one spelling has
the quotes, but not the other one, when using the “quote identifiers”
option.
Fix#659, at least some forms of it.
In the next release, pgloader defaults to targetting a new schema named the
same as the MySQL database, because that's what makes more sense. But people
are used to having 'public' in the search_path and everything in there.
So when creating our target schema, when migrating from MySQL, arrange it so
that the new schema is in the search_path by issuing a command like:
ALTER DATABASE plop SET search_path TO public, f1db;
And make this command visible in verbose (NOTICE) mode too, so that user can
see what happens.
Fix#654. I think.
When using --verbose or more detailed log messages, the summary prints
timings for both read and write operations separately. The write summary
timing took into account only the PostgreSQL batch activity, discarding the
formatting of the data done by pgloader.
As this formatting is quite heavy at the moment, the results are pretty
misleading without that information.
The MySQL special syntax "on update current_timestamp()" used to support
only a single column per table (in MySQL), and so did pgloader. In MariaDB
version 10 it's now possible to have several column with that special
treatment, so adapt pgloader to migrate that too.
What pgloader does is recognize that several columns are to receive the same
pre-update processing, and creates a single function that does the both of
them, as in the following example, from pgloader logs in a test case:
CREATE OR REPLACE FUNCTION mysql.on_update_current_timestamp_onupdate()
RETURNS trigger
LANGUAGE plpgsql
AS
$$
BEGIN
NEW.update_date = now();
NEW.calc_date = now();
RETURN NEW;
END;
$$;
CREATE TRIGGER on_update_current_timestamp
BEFORE UPDATE ON mysql.onupdate
FOR EACH ROW
EXECUTE PROCEDURE mysql.on_update_current_timestamp_onupdate();
Fixes#629.
PostgreSQL btree indexes are limited in the size of the values they can
index: values must fit in an index page (8kB). So when porting a MySQL full
text index over full documents, we might get into an error like the
following:
index row size 2872 exceeds maximum 2712 for index "idx_5199509_search"
To fix, query MySQL for the index type which is FULLTEXT rather than BTREE
in those cases, and port it over to a PostgreSQL Full Text index with an
hard-coded 'simple' configuration, such as the following test case:
CREATE INDEX idx_75421_search ON mysql.fcm_batches USING gin(to_tsvector('simple', raw_payload));
Of course users might want to use a better configuration, including proper
dictionnary for the documents. When using PostgreSQL each document may have
its own configuration attached and yet they can all get indexed into the
same index, so that's a task for the application developpers, not for
pgloader.
In passing, fix the list-typenames-without-btree-support.sql query to return
separate entries for each index type rather than an {array,representation}
of the result, as Postmodern won't turn the PostgreSQL array into a Common
Lisp array by default. I'm kept wondering how it worked before.
Fix#569.
Adjust the default value formating to check if the default value is already
single-quoted and only add new 'single quotes' when it's not the case.
Apparently ENUM default values in MariaDB 10 are now properly single quoted.
In the previous commit we introduced support for database names including
spaces, which means that by default pgloader creates a target schema in
PostgreSQL with a space in its name. That works well as soon as you always
double-quote the schema name, which pgloader does.
Now, in our internal catalogs, we keep the schema name double-quoted. And
when comparing that schema names with quotes to the raw schema name from
PostgreSQL, they won't match, and pgloader tries to create the schema again:
ERROR Database error 42P06: schema "my sql" already exists
Fix the comparing to compare unquoted schema name, fix#614 again: the
previous fix would only work the first time.
It seems that SBCL still needs some help in deciding when to GC with very
large values. In a test case with a “data” column averaging 375kB (up to
about 3 MB per datum), it allows much larger batch size and prefetch rows
settings without entering lldb.
The pgstate infrastructure already had lots of details about what's going
on, add to it the information about how many bytes are sent in every batch,
and use this information in the monitor when something long is happening to
display how many rows we sent from the beginning for this (supposedly) huge
table, along with bytes and speed (bytes per seconds).
The verbosity is not that easy to adjust. Remove useless messages and add a
new one telling when the COPY of a table is done. As we might have to wait
for some time for indexes being built. keep the CREATE INDEX lines. Also
keep the ALTER TABLE both for primary keys and foreign keys, again because
the user might have to wait for quite some time.
The previous patch fixed CREATE TYPE so that ENUM types are created in the
same schema than the table using them, but failed to update the DROP TYPE
statements to also target this schema...
Previously to this patch, pgloader wouldn't care about which schema it
creates extra types in. Extra types are mainly ENUM and SET support from
MySQL. Now, pgloader creates those extra PostgreSQL ENUM types in the same
schema as the table using them, which is a more sound default.
We use a CSV parser for the MySQL enum values, but the quote escaping wasn't
properly setup: MySQL quotes ENUM values with a single-quote (') and uses
two of them ('') for escaping single-quotes when found in the ENUM value
itself.
Fixes#597.
We already have apply-identifier-case and *identifier-case* to decide how
and when to quote our SQL object names, so don't force extra quotes in
format string: refrain from using ~s.
We sure can trust PostgreSQL to use names it knows how to handle. Still, it
will be happy to store in its catalogs names containing upper case, and in
that case we must quote them.