It's easy to avoid having the warning about unused lexical variable with the
proper declaration, that I failed to install before because of a syntax
error when I tried. Let's fix it now that I realise what was wrong.
When the LOAD command does not provide the name of the target table for a
DBF source, we can get the name of the table from the DBF file itself. That
feature got broken, here's a fix.
Fix#805.
Instead, it needs to parse CSV files. On the other hand, as we don't have to
implement the COPY protocol from within pgloader for Redshift (because it's
using S3 as the data source, and not STDIN), we don't need the level of
control that we are using when talking to a normal PostgreSQL.
It's a little more involved that what was done previously. In particular we
need to pay attention to MySQL varchar(x) and transform them into something
big enough when counting bytes rather than chars, like varchar(3x).
Then there's the "text" datatype to take into account, and some more.
Due to how type casting matching is implemented in pgloader, we need to add
two more MySQL casting rules in the default pgloader set to handle
specically the case when a datetime or timestamp column in MySQL has the
"extra" bit of information "on update current timestamp".
That's because for a casting rule to match, both the type definition and the
casting rule must have the :on-update-current-timestamp property positionned
the same, so that the existing default rules would not apply.
Also add some schema-level support by disabling our usual index and
constraint support when the target is Redshift, because it doesn't support
those parts of SQL.
The S3 parameters are read form either the process environment variables or
from the AWS configuration files in ~/.aws.
Redshift looks like a very old PostgreSQL (8.0.2) with some extra features
and a very limited selection of data types. In this patch we parse the
PostgreSQL version() function output and automatically determine if we're
connected to Redshift.
When connected to Redshift, we then dumb-down our target catalogs to the
subset of data types that Redshift actually does support.
Also, some catalog queries can't be done in Redshift, and 8.0 didn't have
fully compliant VALUES statement, so we use a temporary table in places
where we used to use SELECT ... FROM (VALUES(...)) in pgloader.
COPYing data to Redshift isn't possible with just this set of changes,
because Redshift also don't support the COPY FROM STDIN form. COPY sources
are limited, and another patch will have to be cooked to prepare the data
from pgloader into a format and location that Redshift knows how to handle.
At least, it's possible to migrate a database schema to Redshift already.
Some path computation didn't work when trying to regression test the
produced bundle.
Also, the bundle building steps would use the pgloader system definition and
dependencies from what's currently available in Quicklisp rather than from
the local pgloader.asd being built.
It used to be that extra were forced to being parsed before guards, but
there's no reason why a user wouldn't think to write its clauses the other
way round, so add support for that as well.
See #779.
Apparently cl+ssl needs to be reloaded a very specific way at image startup
time, and provides a function to do just that. Let's try and use this piece
of magic rather cffi:load-foreign-library directly.
The MySQL connection string parameter for SSL usage is useSSL, so map an
option name to our expected values for sslmode in database connection
strings.
See #748.
The default logfile location seems to be `/tmp/pgloader/pgloader.log`,
not `/tmp/pgloader.log` as currently documented. This is observable in
practice and also in [the source
code](5b227200a9/src/main.lisp (L110)).
We would hard-code the schema name into the table's name in the DB3 case on
the grounds that a db3/dbf file doesn't have a notion of a schema. But when
the user wants to add data into an existing target table, then we merge the
catalogs and must keep the given target schema and table name.
Fix#701.
Accept empty password lines in ~/.pgpass files, and when otherwise pgloader
fails to parse or process the file log a warning and return a nil password.
See #748.
In a previous commit we re-used the package name pgloader.copy for the now
separated implementation of the COPY protocol, but this package was already
in use for the implementation of the COPY file format as a pgloader source.
Oops.
And CCL was happily doing its magic anyway, so that I've been blind to the
problem.
To fix, rename the new package pgloader.pgcopy, and to avoid having to deal
with other problems of the same kind in the future, rename every source
package pgloader.source.<format>, so that we now have pgloader.source.copy
and pgloader.pgcopy, two visibily different packages to deal with.
This light refactoring came with a challenge tho. The split in between the
pgloader.sources API and the rest of the code involved some circular
depencendies in the namespaces. CL is pretty flexible here because it can
reload code definitions at runtime, but it was still a mess. To untangle it,
implement a new namespace, the pgloader.load package, where we can use the
pgloader.sources API and the pgloader.connection and pgloader.pgsql APIs
too.
A little problem gave birth to quite a massive patch. As it happens when
refactoring and cleaning-up the dirt in any large enough project, right?
See #748.
In data-only mode, the foreign keys parameter (which defaults to True) means
something special: we remove the fkey definitions prior to the data only
load then re-install the fkeys.
This got broken in a previous commit, the WITH clause option being processed
like the other DDL ones that only make sense when creating the schema. While
fixing the setting in copy-database, we have to also fix a nesting bug in
complete-pgsql-database that would prevent fkey to be installed again at the
end of the load.
This patch not only fix that choice, but also review the implementation of
the drop-pgsql-fkeys support function to use more modern internal API,
preparing a list of SQL statements to be sent to the psql-execute level.
Fixes#745.
Not all error paths are counted correctly at this point, this commit
improves the situation in passing. A thorough review should probably be
planned sometime.