836 Commits

Author SHA1 Message Date
Dimitri Fontaine
0f58a3c84d Assorted fixes: catalogs SQLtypes and MySQL decoding as.
It turns out that when trying to debug "decoding as" the SQLtype listing
support in sqltype-list was found broken, so this patch fixes it. Then goes
on to fix the DECODING AS filters support, which we have switched to using
the better regexp-or-string filter struct but forgot to update the matching
code accordingly.

Fixes #665.
2018-08-31 22:51:41 -07:00
Dimitri Fontaine
4fbfd9e522 Refrain from using regexp_match() function, introduced in Pg10.
Instead use the substring() function which has been there all along.

See #813.
2018-08-22 10:52:01 +02:00
Dimitri Fontaine
cb633aa092 Refrain from some introspections on non-PGDG PostgreSQL variants.
When dealing with PostgreSQL protocol compatible databases, often enough
they don't support the same catalogs as PostgreSQL itself. Redshift for
instance lacks foreign key support.
2018-08-20 11:52:59 +02:00
Dimitri Fontaine
d3bfb1db31 Bugfix previous commit: filter list format changed.
We now accept the more general string and regex match rules, but the code to
generate including and excluding lists from the catalogs had not been updated.
2018-08-20 11:50:50 +02:00
Dimitri Fontaine
fc3a1949f7 Add support for PostgreSQL as a source database.
It's now possible to use pgloader to migrate from PostgreSQL to PostgreSQL.
That might be useful for several reasons, including applying user defined
cast rules at COPY time, or just moving from an hosted solution to another.
2018-08-20 11:09:52 +02:00
Dimitri Fontaine
1ee389d121 Fix parsing empty hostname fields in pgpass.
Fixes #823.
2018-08-14 10:07:05 +03:00
Dimitri Fontaine
46d14af0d3 Add more default rules to MySQL datetime handling.
Given the variety of ways to setup default behavior for datetime and
timestamp data types in MySQL, we need yet more default casting rules. It
might be time to think about a more principled way to solve the problem, but
on the other hand, this ad-hoc one also comes with full overriding
flexibility for the end user.

Fixes #811.
2018-07-08 20:37:06 +02:00
Dimitri Fontaine
1b150182dc Fix cl-csv delimiter type.
Travis spotted a bug with CCL that I failed to see, and that happens with
Clozure-CL but not with SBCL apparently:

2018-07-03T21:04:11.053795Z FATAL The value "\\\"", derived from the initarg :DELIMITER, can not be used to set the value of the slot CL-CSV::DELIMITER in #<CL-CSV::READ-DISPATCH-TABLE-ENTRY #x30200143DDCD>, because it is not of type (VECTOR (OR (MEMBER T NIL) CHARACTER)).

To fix, prefer the syntax #(#\\ #\") rather than "\\\"".
2018-07-04 01:32:40 +02:00
Dimitri Fontaine
8537bd661f Back to not being a release.
Maybe I should find a way to avoid this extra back-and-forth commit.
Someday.
2018-07-03 17:11:38 +02:00
Dimitri Fontaine
63af7e7373 Release 3.5.2.
This release fixes debian packaging, includes support for Redhift as a
target, and also fixes some bugs.
2018-07-03 16:58:55 +02:00
Dimitri Fontaine
9661c5874d Fix previous patch.
It's easy to avoid having the warning about unused lexical variable with the
proper declaration, that I failed to install before because of a syntax
error when I tried. Let's fix it now that I realise what was wrong.
2018-06-23 00:50:35 +02:00
Dimitri Fontaine
8930734bea Ensure unquoted file names for logs and data.
The previous code could create files having as an example the following,
unhelpful name: \"errors\"/\"err\".\"errors\".log.

Fix #808.
2018-06-22 23:02:07 +02:00
Dimitri Fontaine
047cf84341 Add support for PGSSLMODE environment variable.
PostgreSQL supports many environment variable to drive its connection
behavior, as documented at the following reference:

  https://www.postgresql.org/docs/current/static/libpq-envars.html

We don't yet support everything, adding them one piece at a time.
2018-06-22 14:13:15 +02:00
Dimitri Fontaine
a0bac47101 Refrain from TRUNCAT'ing an empty list of tables.
Fixed #789.
2018-06-15 17:46:31 +02:00
Dimitri Fontaine
8c2cda75e5 Allow more punctuation signs in the parsers: dollar and percent.
For some reasons some people might use those in their connection strings, as
part of a username or such.

Fixes #809.
2018-06-15 17:26:51 +02:00
Dimitri Fontaine
dfedce2aba Fix support for discovery of DBF target table name.
When the LOAD command does not provide the name of the target table for a
DBF source, we can get the name of the table from the DBF file itself. That
feature got broken, here's a fix.

Fix #805.
2018-06-01 11:23:51 -04:00
Dimitri Fontaine
bcf9cf9bf4 Redshift doesn't have support for the COPY format.
Instead, it needs to parse CSV files. On the other hand, as we don't have to
implement the COPY protocol from within pgloader for Redshift (because it's
using S3 as the data source, and not STDIN), we don't need the level of
control that we are using when talking to a normal PostgreSQL.
2018-05-23 13:45:16 +02:00
Dimitri Fontaine
3db3ecf81b Review Redshift data type dumb-down choices.
It's a little more involved that what was done previously. In particular we
need to pay attention to MySQL varchar(x) and transform them into something
big enough when counting bytes rather than chars, like varchar(3x).

Then there's the "text" datatype to take into account, and some more.
2018-05-23 13:43:28 +02:00
Dimitri Fontaine
05b4c7c978 Fix default MySQL casting rules for on update current timestamp.
Due to how type casting matching is implemented in pgloader, we need to add
two more MySQL casting rules in the default pgloader set to handle
specically the case when a datetime or timestamp column in MySQL has the
"extra" bit of information "on update current timestamp".

That's because for a casting rule to match, both the type definition and the
casting rule must have the :on-update-current-timestamp property positionned
the same, so that the existing default rules would not apply.
2018-05-23 10:34:34 +02:00
Dimitri Fontaine
9ac400b623 Implement copying data through S3 for Redshift.
Also add some schema-level support by disabling our usual index and
constraint support when the target is Redshift, because it doesn't support
those parts of SQL.

The S3 parameters are read form either the process environment variables or
from the AWS configuration files in ~/.aws.
2018-05-21 21:22:15 +02:00
Dimitri Fontaine
d4dc4499a8 Add schema migration support for Redshift as a target.
Redshift looks like a very old PostgreSQL (8.0.2) with some extra features
and a very limited selection of data types. In this patch we parse the
PostgreSQL version() function output and automatically determine if we're
connected to Redshift.

When connected to Redshift, we then dumb-down our target catalogs to the
subset of data types that Redshift actually does support.

Also, some catalog queries can't be done in Redshift, and 8.0 didn't have
fully compliant VALUES statement, so we use a temporary table in places
where we used to use SELECT ... FROM (VALUES(...)) in pgloader.

COPYing data to Redshift isn't possible with just this set of changes,
because Redshift also don't support the COPY FROM STDIN form. COPY sources
are limited, and another patch will have to be cooked to prepare the data
from pgloader into a format and location that Redshift knows how to handle.

At least, it's possible to migrate a database schema to Redshift already.
2018-05-19 19:16:58 +02:00
Dimitri Fontaine
8fce6c84fc Move all typemod functions at the same place.
Having the parse-column-typemod function in the pgloader.transforms package
makes it available from everywhere in the pgloader code base.
2018-05-19 19:15:30 +02:00
Dimitri Fontaine
1f354131d0 Release pgloader 3.5.1.
Lots of bug fixes did happen, time to release.
2018-05-17 10:41:40 +02:00
Dimitri Fontaine
f30f596eca Review bundle and regression test facilities.
Some path computation didn't work when trying to regression test the
produced bundle.

Also, the bundle building steps would use the pgloader system definition and
dependencies from what's currently available in Quicklisp rather than from
the local pgloader.asd being built.
2018-05-17 10:39:32 +02:00
Dimitri Fontaine
a392328dad Allow any ordering of guards and extra cast rule clauses.
It used to be that extra were forced to being parsed before guards, but
there's no reason why a user wouldn't think to write its clauses the other
way round, so add support for that as well.

See #779.
2018-04-29 19:00:20 +02:00
Dimitri Fontaine
01f877bad7 Testing a change in the way we load CL+SSL.
Apparently cl+ssl needs to be reloaded a very specific way at image startup
time, and provides a function to do just that. Let's try and use this piece
of magic rather cffi:load-foreign-library directly.
2018-04-16 15:46:16 +02:00
Dimitri Fontaine
cb9e01f4d9 Code review for previous commit.
See #771.
2018-03-27 14:55:31 +02:00
Goo
c6271506ab Add a new transformation function: hex-to-dec
Closes #771
2018-03-27 14:51:34 +02:00
Dimitri Fontaine
e4dca1a086 Implement support for MySQL useSSL=true|false option.
The MySQL connection string parameter for SSL usage is useSSL, so map an
option name to our expected values for sslmode in database connection
strings.

See #748.
2018-03-16 16:41:40 +01:00
Dimitri Fontaine
3112adea6f Fix date-with-no-separator transform.
The expected string length was hard-coded, which is not a good idea given
the support for custom date formats.
2018-03-07 23:07:00 +01:00
Dimitri Fontaine
42c9ccfbb3 DB3: pick user's choice of schema name when given.
We would hard-code the schema name into the table's name in the DB3 case on
the grounds that a db3/dbf file doesn't have a notion of a schema. But when
the user wants to add data into an existing target table, then we merge the
catalogs and must keep the given target schema and table name.

Fix #701.
2018-02-25 23:39:52 +01:00
Dimitri Fontaine
784aff6ed5 Handle parsing errors in pgpass gracefully.
Accept empty password lines in ~/.pgpass files, and when otherwise pgloader
fails to parse or process the file log a warning and return a nil password.

See #748.
2018-02-25 00:12:06 +01:00
Dimitri Fontaine
5c10f12a07 Fix duplicate package names.
In a previous commit we re-used the package name pgloader.copy for the now
separated implementation of the COPY protocol, but this package was already
in use for the implementation of the COPY file format as a pgloader source.

Oops.

And CCL was happily doing its magic anyway, so that I've been blind to the
problem.

To fix, rename the new package pgloader.pgcopy, and to avoid having to deal
with other problems of the same kind in the future, rename every source
package pgloader.source.<format>, so that we now have pgloader.source.copy
and pgloader.pgcopy, two visibily different packages to deal with.

This light refactoring came with a challenge tho. The split in between the
pgloader.sources API and the rest of the code involved some circular
depencendies in the namespaces. CL is pretty flexible here because it can
reload code definitions at runtime, but it was still a mess. To untangle it,
implement a new namespace, the pgloader.load package, where we can use the
pgloader.sources API and the pgloader.connection and pgloader.pgsql APIs
too.

A little problem gave birth to quite a massive patch. As it happens when
refactoring and cleaning-up the dirt in any large enough project, right?

See #748.
2018-02-24 19:24:22 +01:00
Dimitri Fontaine
48af01dbbc Fix implementation of foreign keys in data only mode.
In data-only mode, the foreign keys parameter (which defaults to True) means
something special: we remove the fkey definitions prior to the data only
load then re-install the fkeys.

This got broken in a previous commit, the WITH clause option being processed
like the other DDL ones that only make sense when creating the schema. While
fixing the setting in copy-database, we have to also fix a nesting bug in
complete-pgsql-database that would prevent fkey to be installed again at the
end of the load.

This patch not only fix that choice, but also review the implementation of
the drop-pgsql-fkeys support function to use more modern internal API,
preparing a list of SQL statements to be sent to the psql-execute level.

Fixes #745.
2018-02-19 22:07:43 +01:00
Dimitri Fontaine
e129e77eb6 Fix SQL execute counters maintenance. 2018-02-19 22:06:51 +01:00
Dimitri Fontaine
957c975b9b Improve summary reporting of errors.
Not all error paths are counted correctly at this point, this commit
improves the situation in passing. A thorough review should probably be
planned sometime.
2018-02-19 22:05:53 +01:00
Dimitri Fontaine
4fed8c5eca Fix support for newid() from MS SQL.
Several places in the code are involved to deal with the default values from
MS SQL. The catalog query is dealing with strange quoting rules on the
source side and used to fill in directly the PostgreSQL expected value. But
then the quoting of a function call wasn't properly handled.

Rather than coping with the quoting rules here, have the catalog query
return a pgloader specific placeholder "GENERATE_UUID". Then the MS SQL
specific code can normalize that to the symbol :generate_uuid. Then the
generic PostgreSQL DDL code can implement the proper replacement for that
symbol, not having to know where it comes from.

Fix #742.
2018-02-17 00:25:33 +01:00
Dimitri Fontaine
5e3acbb462 When merging catalogs, "float" and "double precision" the same type.
PostgreSQL understands both spellings of the data type name and implements
float as being a double precision value, so we should refrain from any
warning about that non-discrepency when doing a data-only load.

Should fix #746.
2018-02-16 23:42:46 +01:00
Dimitri Fontaine
67a1b1d408 Fix SQLite SQL queries.
Some copy-paste errors made their way to those queries and prevented usage
of pgloader, but I missed that because I was using a previous version of the
query text files in my interactive environment.

Also, SQLite doesn't like the queries finishing with a semi-colon, so remove
them.

Fixes #747.
2018-02-16 17:51:58 +01:00
Dimitri Fontaine
ea6c91b429 Fix "drop default" casting rules for all databases.
The support for drop default in (user defined) casting rules was completely
broken in SQLite, because the code didn't even bother looking at what's
returning after applying the casting rules.

This patch fixes the code so that is uses the pgcol instance's default
value, as per after applying casting rules. The bug also existed in a subtle
form for MySQL and MS SQL, but would only show up there when the default
value is spelled using a known variation of “current timestamp”.
2018-02-08 23:33:51 +01:00
Dimitri Fontaine
29506e6fa6 Assorted fixes for SQLite.
First review the `sqlite_sequence` support so that we can still work with
databases that don't have this catalog, which doesn't always exists -- it
might depend on the SQLite version though.

Then while at it use the sql macro to host the SQLite “queries” in their own
files, enhancing the hackability of the system to some degrees. Not that
much, because we have to use a lot of PGRAMA command and then the column
output isn't documented with the query text itself.
2018-02-08 22:55:15 +01:00
Dimitri Fontaine
20d7858e27 Implement SQLite casting rule for “decimal”.
Fix #739.
2018-02-07 20:47:47 +01:00
Dimitri Fontaine
976e4c1c1d Fix SQLite processing of columns with a sequence attached.
The handling of the SQLite catalogs where fixed in a previous patch, but
either it's been broken in between or it never actually worked (oops).

Moreover, the recent patch about :on-update-current-timestamp changed the
casting rules matching code and we should position :auto-increment from the
SQLite module rather than "auto_increment" as before. That's better, but
wasn't done.

Fix #563 again, tested with a provided test-case (thanks!).
2018-01-31 22:49:10 +01:00
Dimitri Fontaine
4612e68435 Implement support for new casting rules guards and actions.
Namely the actions are “keep extra” and “drop extra” and the casting rule
guard is “with extra on update current timestamp”. Having support for those
elements in the casting rules allow such a definition as the following:

      type timestamp with extra on update current timestamp
        to "timestamp with time zone" drop extra

The effect of such as cast rule would be to ignore the MySQL extra
definition and then refrain pgloader from creating the PostgreSQL triggers
that implement the same behavior.

Fix #735.
2018-01-31 15:17:05 +01:00
Dimitri Fontaine
5ecd03ceba Don't push-row a nil value.
In case of a failure to pre-process or transform values in the row that as
been read, we need to refrain from pushing the row into our next batch.

See #726, that got hit by the recent bug in the middle of something else
entirely.
2018-01-25 23:53:11 +01:00
Dimitri Fontaine
25152f6054 Add a restart-case for interactive debugging.
When dealing with MATERIALIZING VIEWS test cases and failing in the middle
of them, as it happens when fixing bugs, then it was tedious (to say the
least) to clean-up manually the view each time.

That said, for end-users, doing it automatically would risk cleaning-up the
wrong view definition if they had a typo in their pgloader command, say.

Common Lisp helps a lot here: we simply create a restart that is only
available interactively for the developers of pgloader!
2018-01-25 23:38:59 +01:00
Dimitri Fontaine
7b08b6e3d3 Refrain from creating tables in “data only” operations.
We forgot that rule in the case of creating the target tables for the
materializing views commands, which led to surprising and wrong behavior.

Fix #721, and add a new test case while at it.
2018-01-25 23:32:31 +01:00
Dimitri Fontaine
5ba42edb0c Review misleading error message with schema not found.
It might be that the schema exists but we didn't find what we expected to
in there, so that it didn't make it to pgloader's internal catalogs. Be
friendly to the user with a better error message.

Fix #713.
2018-01-25 23:29:36 +01:00
Dimitri Fontaine
a603cd8882 Step back on (safety 0) optimization.
It doesn't appear worth it at this time yet, too risky.
2018-01-24 23:26:37 +01:00
Dimitri Fontaine
f86371970f Review the pgloader COPY implementation further.
Refactor file organisation further to allow for adding a “direct stream”
option when the on-error-stop behavior has been selected. This happens
currently by default for databases sources.

Introduce the new WITH option “on error resume next” which forces the
classic behavior of pgloader. The option “on error stop” already existed,
its implementation is new.

When this new behavior is activated, the data is sent to PostgreSQL
directly, without intermediate batches being built. It means that the whole
operation fails at the first error, and we don't have any information in
memory to try replaying any COPY of the data. It's gone.

This behavior should be fine for database migrations as you don't usually
want to fix the data manually in intermediate files, you want to fix the
problem at the source database and do the whole dance all-over again, up
until your casting rules are perfect.

This patch might also incurr some performance benenits in terms of both
timing and memory usage, though the local testing didn't show much of
anything for the moment.
2018-01-24 22:45:23 +01:00