Some MySQL schema level features (on update current_timestamp) are migrated
to stored procedures and triggers. We would log the CREATE PROCEDURE
statements as LOG level entries instead of SQL level entries, most likely a
stray devel/debug choice.
With this patch it's now actually possible to backfill the data on the fly
when using the "distribute" new commands. The schema is modified to add the
distribution key where specified, and changes to the primary and foreign
keys happen automatically. Then a JOIN is generated to get the data directly
during the COPY streaming to the Citus cluster.
It's now possible to use pgloader to migrate from PostgreSQL to PostgreSQL.
That might be useful for several reasons, including applying user defined
cast rules at COPY time, or just moving from an hosted solution to another.
Redshift looks like a very old PostgreSQL (8.0.2) with some extra features
and a very limited selection of data types. In this patch we parse the
PostgreSQL version() function output and automatically determine if we're
connected to Redshift.
When connected to Redshift, we then dumb-down our target catalogs to the
subset of data types that Redshift actually does support.
Also, some catalog queries can't be done in Redshift, and 8.0 didn't have
fully compliant VALUES statement, so we use a temporary table in places
where we used to use SELECT ... FROM (VALUES(...)) in pgloader.
COPYing data to Redshift isn't possible with just this set of changes,
because Redshift also don't support the COPY FROM STDIN form. COPY sources
are limited, and another patch will have to be cooked to prepare the data
from pgloader into a format and location that Redshift knows how to handle.
At least, it's possible to migrate a database schema to Redshift already.
In data-only mode, the foreign keys parameter (which defaults to True) means
something special: we remove the fkey definitions prior to the data only
load then re-install the fkeys.
This got broken in a previous commit, the WITH clause option being processed
like the other DDL ones that only make sense when creating the schema. While
fixing the setting in copy-database, we have to also fix a nesting bug in
complete-pgsql-database that would prevent fkey to be installed again at the
end of the load.
This patch not only fix that choice, but also review the implementation of
the drop-pgsql-fkeys support function to use more modern internal API,
preparing a list of SQL statements to be sent to the psql-execute level.
Fixes#745.
Namely the actions are “keep extra” and “drop extra” and the casting rule
guard is “with extra on update current timestamp”. Having support for those
elements in the casting rules allow such a definition as the following:
type timestamp with extra on update current timestamp
to "timestamp with time zone" drop extra
The effect of such as cast rule would be to ignore the MySQL extra
definition and then refrain pgloader from creating the PostgreSQL triggers
that implement the same behavior.
Fix#735.
In the next release, pgloader defaults to targetting a new schema named the
same as the MySQL database, because that's what makes more sense. But people
are used to having 'public' in the search_path and everything in there.
So when creating our target schema, when migrating from MySQL, arrange it so
that the new schema is in the search_path by issuing a command like:
ALTER DATABASE plop SET search_path TO public, f1db;
And make this command visible in verbose (NOTICE) mode too, so that user can
see what happens.
Fix#654. I think.
In the previous commit we introduced support for database names including
spaces, which means that by default pgloader creates a target schema in
PostgreSQL with a space in its name. That works well as soon as you always
double-quote the schema name, which pgloader does.
Now, in our internal catalogs, we keep the schema name double-quoted. And
when comparing that schema names with quotes to the raw schema name from
PostgreSQL, they won't match, and pgloader tries to create the schema again:
ERROR Database error 42P06: schema "my sql" already exists
Fix the comparing to compare unquoted schema name, fix#614 again: the
previous fix would only work the first time.
The verbosity is not that easy to adjust. Remove useless messages and add a
new one telling when the COPY of a table is done. As we might have to wait
for some time for indexes being built. keep the CREATE INDEX lines. Also
keep the ALTER TABLE both for primary keys and foreign keys, again because
the user might have to wait for quite some time.
In the complete PostgreSQL schema step, an error would be logged as you
expect but poorly handled: it would have the whole transaction rolled back,
meaning that a single Primary Key definition failure would cancel all the
others, plus the foreign keys, and also the triggers and comments.
It happens that other systems allow a primary column to contain NULL values,
which is forbidden in the standard and enforced by PostgreSQL, so that's not
a theoritical concern here.
In the prepare-pgsql-database method we were logging too much details, such
as DDL warnings on if-not-exists for successful queries. And those logs are
to be found in PostgreSQL server logs anyway.
Also fix trying to create or drop a "nil" schema.
Given new SQLite test case from issue #563 we see that pgloader doesn't
handle errors gracefully in post-copy stage. That's because the API were not
properly defined, we should use pgsql-execute-with-timing rather than other
construct here, because it allows the "on error resume next" behavior we
want with after load DDL statements.
See #563.
When the option "drop indexes" is in use in loading data from a file, we
collect the indexes from the PostgreSQL catalogs and then issue DROP
commands against them before the load, then CREATE commands when it's
done.
The CREATE is done in parallel, and we create an lparallel kernel for
that. The kernel must have a worker-count of at least 1, and we where
not considering the case of 0 indexes on the target table.
Fix#484.
In cases where we have a WITH include drop option, we are generating
lots of SQL DROP statements. We may be running an empty target database
or in other situations where the target object of the DROP command might
not exists. Add support for that case.
Calling a -with-timing from within a with-stats-collection macro is
redundant and will have the numbers counted twice. Which in this case
didn't happen because the stats label was manually copied, but borked
with a typo in one copy.
When loading data into an existing PostgreSQL catalog, we DROP the
indexes for better performance of the data loading. Some of the indexes
are UNIQUE or even PRIMARY KEYS, and some FOREIGN KEYS might depend on
them in the PostgreSQL dependency tracking of the catalog.
We used to use the CASCADE option when dropping the indexes, which hides
a bug: if we exclude from the load tables with foreign keys pointing to
tables we target, then we would DROP those foreign keys because of the
CASCADE option, but fail to install them again at the end of the load.
To prevent that from happening, pgloader now query the PostgreSQL
pg_depend system catalog to list the “missing” foreign keys and add them
to our internal catalog representation, from which we know to DROP then
CREATE the SQL object at the proper times.
See #400 as this was an oversight in fixing this issue.
Also known as the ORM case, it happens that other tools are used to
create the target schema. In that case pgloader job is to fill in the
exiting target tables with the data from the source tables.
We still focus on load speed and pgloader will now DROP the
constraints (Primary Key, Unique, Foreign Keys) and indexes before
running the COPY statements, and re-install the schema it found in the
target database once the data load is done.
This behavior is activated when using the “create no tables” option as
in the following test-case setup:
with create no tables, include drop, truncate
Fixes#400, for which I got a test-case to play with!