Commit Graph

1568 Commits

Author SHA1 Message Date
Dimitri Fontaine
62b45e4d16 Fix log type output for summary files.
A summary file could be asked for which is not of either csv, json or copy
format. Then use the text format.

Fix #695.
2017-12-06 20:57:19 +01:00
Dimitri Fontaine
b7d87a9eb1 Fix MySQL bit(1) casting function.
When this function was written, pgloader would get an array of numbers over
the wire, nowadays it looks like it's receiving an array of characters
instead (in other words, a string).

Improve the `bits-to-boolean` function to accept either input, and raise an
error in another case.

My theory is that something changed either in MySQL (with version 10) or in
the Qmynd driver somehow... but tonight we just go easy and fix the bug
locally rather than try and understand where it might be coming from.

Fixes #684.
2017-12-03 23:06:54 +01:00
Dimitri Fontaine
c05183fcba Implement support for Foreign Tables and Partitionned Tables.
Due to the way pgloader queries the PostgreSQL catalogs, it restricted the
target table to be “ordinary” tables, as per the relkind description in the
https://www.postgresql.org/docs/current/static/catalog-pg-class.html
PostgreSQL documentation.

Extend this to support relkind of 'r', 'f' and 'p'.

Fixes #587, fixes #690.
2017-12-01 22:13:47 +01:00
Dimitri Fontaine
52f13456d9 Rewrite the SQLite type name parsing.
SQLite being very very liberal in type names (I think it accepts anything
and everything actually), our simple approach of tokenizing the input and
discarding noise words is not enough.

In this patch, we implement a new light parser for the SQLite type names to
better cope with noise words and random spacing of the catalog values that
SQLite failed to normalize. Well it didn't attempt, apparently.

Fix #548.
2017-11-28 18:19:12 +01:00
Dimitri Fontaine
2b861a3e96 New SQLite test cases. 2017-11-25 16:31:42 -08:00
Dimitri Fontaine
87f35e8852 Refrain from loading incomplete foreign key references in SQLite.
Given INCLUDING and EXCLUDING support it might be possible that we migrate a
table from SQLite without having selecting tables pointed to by foreign
keys. In that case, pgloader should still be able to load the data
definition and content fine, just skipping the incomplete fkey definitions.

That's implemented in this patch, which has been tested thanks to a
reproducible data set being made available!

Fixes #681.
2017-11-25 16:31:41 -08:00
Olleg Samoylov
62d776f5e8 Uppercase the SQL queries for MS SQL
In cases when the MS SQL database is setup with a case sensitive collation, then it would not find the catalog objects referenced from the query. To fix, just use UPPERCASE names, as they do work in both case insensitive and case sensitive collations.

In passing, add `system-index.txt` to `.gitignore` (generated by make).

Fixes #651.
2017-11-25 02:23:25 +01:00
Dimitri Fontaine
d69b72053a Implement default unsigned casting rules for MySQL.
The following casting rules are now the default for MySQL:

  - type tinyint when unsigned to smallint   drop typemod
  - type smallint when unsigned to integer  drop typemod
  - type mediumint when unsigned to integer  drop typemod
  - type integer when unsigned to bigint    drop typemod

Fixes #678.
2017-11-22 10:29:11 -08:00
Dimitri Fontaine
5c60f8c35c Implement a new type casting guard: unsigned.
MySQL allows using unsigned data types and pgloader should then target a
signed type of a larger capacity so that values can fit. For example, the
data definition “smallint(5) unsigned” should be casted to “integer”.

This patch allows user defined cast rules to be written against “unsigned”
data types as per their MySQL catalog representation.

See #678.
2017-11-22 10:26:03 -08:00
Dimitri Fontaine
6964764fb4 Find schema names unquoted.
When doing a MySQL to PostgreSQL migration in data only mode, pgloader
matches schema names found on both source and target database, and much like
with table names must do so ensuring unquoted schema names.

Otherwise we fail to find the schema name again, because one spelling has
the quotes, but not the other one, when using the “quote identifiers”
option.

Fix #659, at least some forms of it.
2017-11-19 17:12:21 +01:00
Dimitri Fontaine
1d7706c045 Fix the MySQL encoding error handling.
The error handling would try and read past the error buffer in some cases,
when the BABEL lib would give a position that's after the buffer read.

Fix #661.
2017-11-13 11:27:47 +01:00
Christoph Berg
5c4a64197d Run regression tests via autopkgtest 2017-11-12 21:05:48 +01:00
Christoph Berg
78df9c314a Sync Depends to cl-pgloader.
* Sync Depends to cl-pgloader.
* Priority: optional, move cl-pgloader to Section: lisp.
* Update S-V.
2017-11-11 17:14:18 +01:00
Christoph Berg
3002f4d30e Add new B-D cl-mustache and cl-yason. 2017-11-11 16:47:08 +01:00
Christoph Berg
28ea825d85 Run wrap-and-sort -st. 2017-11-11 16:39:19 +01:00
Dimitri Fontaine
db7a91d6c4 Add the MySQL target schema to the search_path.
In the next release, pgloader defaults to targetting a new schema named the
same as the MySQL database, because that's what makes more sense. But people
are used to having 'public' in the search_path and everything in there.

So when creating our target schema, when migrating from MySQL, arrange it so
that the new schema is in the search_path by issuing a command like:

  ALTER DATABASE plop SET search_path TO public, f1db;

And make this command visible in verbose (NOTICE) mode too, so that user can
see what happens.

Fix #654. I think.
2017-11-02 12:40:21 +01:00
Dimitri Fontaine
6b6c1c7d34 Add log entries for connection strings.
It helps a lot to debug what's happening, and it seems that we lost the
information when cleaning up the log levels in recent efforts to unclutter
the default output.
2017-11-02 12:38:45 +01:00
Dimitri Fontaine
501762d2f5 Update the website with the new Gumroad id. 2017-11-01 16:38:54 +01:00
Dimitri Fontaine
dd401c57f3 Fix a latent bug discovered in local testing with CCL.
It turns out that when using *print-pretty* in CCL we then have CL reader
references in the output, such as in the following example:

  QUERY: comment on table mysql.base64 is $#1=DXIDC_EMLAQ$Test decoding base64 documents$#1#$

Of course that's wrong, so prevent this from happening by
forcing *print-pretty* to nil in a top-level function. We still turn this on
in the monitor thread when printing error messages as those might contain
recursive data structures.
2017-10-21 21:06:35 +02:00
Dimitri Fontaine
0a88645eb5 Fix time measurements of the write activity.
When using --verbose or more detailed log messages, the summary prints
timings for both read and write operations separately. The write summary
timing took into account only the PostgreSQL batch activity, discarding the
formatting of the data done by pgloader.

As this formatting is quite heavy at the moment, the results are pretty
misleading without that information.
2017-10-21 21:04:55 +02:00
Dimitri Fontaine
a9afddf8ed Accept quoted namestrings as target type names for cast rules.
This allows passing "double precision" rather than float8, for example.

Fix #650.
2017-10-21 21:03:58 +02:00
Dimitri Fontaine
a28e9b3556 Prevent evaluating unused arguments in log-message.
A stop-gap has been installed to prevent sending too much trafic to the
monitor, but the log-message arguments were still evaluated, and the :data
level output from format-row-in-batch is pretty costly.
2017-10-16 17:26:07 +02:00
Dimitri Fontaine
b36f36b74e Add a (local) test case. 2017-10-16 17:25:44 +02:00
Dimitri Fontaine
9b80d2914c List files to load for system.
Install a new function in the hooks file. This function might help fix
--self-upgrade later, we keep it around for when we'll have time to see
about that.
2017-10-16 17:24:47 +02:00
Dimitri Fontaine
52720a5e6f Prefer QL overrides to ASDF setup.
The ql:*local-project-directories* is a much better facility for us to load
pgloader from the local PWD rather than from the QL distribution. It looks
like the previous method worked by accident, for once, and also downloaded
pgloader from QL, unnecessarily (we have the sources locally).
2017-10-03 13:47:48 +02:00
Dimitri Fontaine
5b227200a9 Fix error handling at monitor thread startup.
Errors such as failing to open the log file (maybe because of bad
permissions) weren't correctly handled. This fixes the problem by handling
the conditions at the lparallel task handler level and signaling a brand new
condition up to the main outside handler.

Fixes #638.
2017-10-03 01:23:59 +02:00
Dimitri Fontaine
2595ddaae3 Fix total-line in reporting.
We did it correctly for the bytes, and we need to apply the same logic to
other metric: the relevant information in the total summary line is the sum
from the data parts, not the sum from the postload parts.
2017-09-19 12:28:24 +02:00
Dimitri Fontaine
460fe6cc77 Fix quoting of default values for MariaDB 10 support.
The default values quoting changed in MariaDB 10, and we need to adjust in
pgloader: extra '' chars could defeat the default matching logic:

  "'0000-00-00'" is different from "0000-00-00"
2017-09-19 11:29:53 +02:00
Dan
62991bd5c5 Add missing column to GROUP BY. (#633) 2017-09-16 21:15:11 +02:00
Dimitri Fontaine
8a361a0ff8 Add support for multiple on update columns per table.
The MySQL special syntax "on update current_timestamp()" used to support
only a single column per table (in MySQL), and so did pgloader. In MariaDB
version 10 it's now possible to have several column with that special
treatment, so adapt pgloader to migrate that too.

What pgloader does is recognize that several columns are to receive the same
pre-update processing, and creates a single function that does the both of
them, as in the following example, from pgloader logs in a test case:

    CREATE OR REPLACE FUNCTION mysql.on_update_current_timestamp_onupdate()
      RETURNS trigger
      LANGUAGE plpgsql
      AS
    $$
    BEGIN
       NEW.update_date = now();
       NEW.calc_date = now();
       RETURN NEW;
    END;
    $$;
    CREATE TRIGGER on_update_current_timestamp
            BEFORE UPDATE ON mysql.onupdate
          FOR EACH ROW
      EXECUTE PROCEDURE mysql.on_update_current_timestamp_onupdate();

Fixes #629.
2017-09-15 01:04:57 +02:00
Dimitri Fontaine
b7347a567c Add test cases for MySQL.
At the moment it's a very manual process, and it might get automated
someday. Meanwhile it's still useful to have.

See #569 for an issue that got a test case added.
2017-09-14 15:59:10 +02:00
Dimitri Fontaine
a498313074 Implement support for MySQL FULLTEXT indexes.
PostgreSQL btree indexes are limited in the size of the values they can
index: values must fit in an index page (8kB). So when porting a MySQL full
text index over full documents, we might get into an error like the
following:

  index row size 2872 exceeds maximum 2712 for index "idx_5199509_search"

To fix, query MySQL for the index type which is FULLTEXT rather than BTREE
in those cases, and port it over to a PostgreSQL Full Text index with an
hard-coded 'simple' configuration, such as the following test case:

  CREATE INDEX idx_75421_search ON mysql.fcm_batches USING gin(to_tsvector('simple', raw_payload));

Of course users might want to use a better configuration, including proper
dictionnary for the documents. When using PostgreSQL each document may have
its own configuration attached and yet they can all get indexed into the
same index, so that's a task for the application developpers, not for
pgloader.

In passing, fix the list-typenames-without-btree-support.sql query to return
separate entries for each index type rather than an {array,representation}
of the result, as Postmodern won't turn the PostgreSQL array into a Common
Lisp array by default. I'm kept wondering how it worked before.

Fix #569.
2017-09-14 15:40:34 +02:00
Dimitri Fontaine
987c0703ad Some default values come properly quoted from MariaDB now.
Adjust the default value formating to check if the default value is already
single-quoted and only add new 'single quotes' when it's not the case.

Apparently ENUM default values in MariaDB 10 are now properly single quoted.
2017-09-14 15:39:04 +02:00
Dimitri Fontaine
dfac729daa Refrain from querying the catalogs again.
When we already have the information in the pgloader internal catalogs,
don't issue another MySQL query. In this case, it's been used to fetch the
list of columns and their data types so that we can choose to send either
`colname` or maybe astext(`colname`) as `colname` for some geographic types.

That's one less MySQL query per table.
2017-09-14 15:35:45 +02:00
Dimitri Fontaine
181f344159 Add support for current_timestamp() default spelling.
That's new in MariaDB 10 apparently.
2017-09-14 15:33:18 +02:00
Dimitri Fontaine
f921658866 Remove useless noise in the logs.
The individual CAST decisions are visible in the CREATE TABLE statements
that are logged a moment later. Also, calling `format-create-sql' on a
column definition that's not finished to be casted will process default
values before they get normalized, and issue WARNING to the poor user.

Not helpful. Bye.
2017-09-14 15:30:29 +02:00
Dimitri Fontaine
dbadab9e9e Implement a new “snake_case” quoting rule.
In passing, add the identifiers case option to SQLite support, which makes
it easier to test here, and add a table named "TableName" to our local test
database.

Fix #631.
2017-09-13 22:55:10 +02:00
Dimitri Fontaine
d2d4be2ed0 Fix test/csv-guess.load for old PostgreSQL.
In travis environment we still test with PostgreSQL 9.1 and 9.6, and there's
no reason for this test to use a modern spelling of create schema, after
all.

It works because the test/csv-before-after.load creates the schema and is
ran before test/csv-guess.load. That's good enough for now.
2017-09-09 00:59:39 +02:00
Dimitri Fontaine
38712d98e0 Fix regression testing.
Previous patch made regression failures obvious that were hidden by strange
bugs with CCL.

One such regression was introduced in commit
ab7e77c2d0 where we played with the complex
code generation for field projection, where the following two cases weren't
cleanly processed anymore:

  column text using "constant"
  column text using "field-name"

In the first case we want to load a user-defined constant in the column, in
the second case we want to load the value of the field "field-name" in the
column --- we just have different source and target names.

Another regression was introduced in the recent commit
01e5c23763 where the create-table function was
called too early, before we have fetched *pgsql-reserved-keywords*. As a
consequence table names weren't always properly quoted as shown in the
test/csv-header.load file which targets a table named "group".

Finally, skip the test/dbf.load regression test when using CCL as this
environment doesn't have the necessary CP850 code page / encoding.
2017-09-09 00:51:07 +02:00
Dimitri Fontaine
ebf9f7a6a9 Review and cleanup the logging monitor thread.
Due to errors in regression testing when using CCL, review this part of
pgloader. It turns out that cl-log:stop-messenger on a text-stream-messenger
closes the stream, which isn't a good idea when given *standard-output*.

At least it makes CCL chokes when it then wants to output something of its
own, such as when running in --batch mode (which is nice because it outputs
more diagnostic information).

To solve that problem, initialize the text-stream-messenger with a broadcast
stream made from *standard-output*, which we now may close at will.
2017-09-08 23:03:41 +02:00
Dimitri Fontaine
e7f6505d7d Review compile time dependencies.
The parser files don't depend on the sources, it's the other way round
nowadays. Also, the responsability to decipher the *sections* context should
be restricted to the monitor.lisp file, which is now the case.

And this time, fix #628 for real.
2017-09-08 15:38:32 +02:00
Dimitri Fontaine
9be130cdbe Fix symbol export hacks to execute at load time.
It seems that when compiling with CCL in “batch” mode, that is using
buildapp, the local symbol exporting facility didn't work at all. It needs
to be run at load time so that the compiler sees the symbols.

Fix #628.
2017-09-08 12:33:24 +02:00
Dimitri Fontaine
a9e8bfd4d7 Support for colon characters in PostgreSQL socket path.
Google Cloud SQL instances are now using the following format for the name
of their socket <PROJECT-ID>:<REGION>:<INSTANCE_NAME>. We do that by
allowing to escape a colon in the socket directory name by doubling it, as
in the username field. It also allows to accept any character in the socket
directory name, which is a good cleanup.

Fix #621.
2017-08-30 15:22:42 +02:00
Dimitri Fontaine
d5072d11e5 Implement support for a pgpass file.
The implementation follows PostgreSQL specifications as closely as possible,
with the escaping rules and the matching rules. The default path where to
find the .pgpass (or pgpass.conf on windows) are as documented in PostgreSQL
too. Only missing are the file permissions check.

Fix #460.
2017-08-29 03:16:35 +02:00
Dimitri Fontaine
bcc934d7aa Cleanup.
Some code was pasted twice in src/api.lisp, and a defstruct with no slots
isn't spelled the way I did in previous patches. We use a defstruct with no
slots for defining a hierarchy on which to dispatch our pretty-print
function.
2017-08-26 20:31:24 +02:00
Dimitri Fontaine
33ab9bcdd5 Typo Fix. oops. 2017-08-25 22:21:34 +02:00
Dimitri Fontaine
01e5c23763 Add support for explicit TARGET TABLE clause in load commands.
It used to be that you would give the target table name as an option to the
PostgreSQL connection string, which is untasteful:

   load ... into pgsql://user@host/dbname?tablename=foo.bar ...

Or even, for backwards compatibility:

   load ... into pgsql://user@host/dbname?foo.bar ...

The new syntax makes provision for a separate clause for the target table
name, possibly schema-qualified:

   load ... into pgsql://user@host/dbname target table foo.bar ...

Which is much better, in particular when used together with the target
columns clause.

Implementing this seemingly quite small feature had impact on many parsing
related features of pgloader, such as the regression testing facility. So
much so that some extra refactoring got into its way here, around the
lisp-code-for-loading-from-<source> functions and their usage in
`load-data'.

While at it, this patch simplifies a lot the `load-data' function by making
a good use of &allow-other-keys and :allow-other-keys t.

Finally, this patch splits main.lisp into main.lisp and api.lisp, with the
latter intended to contain functions for Common Lisp programs wanting to use
pgloader as a library. The API itself is still the same as before this
patch, tho. Just in another file for clarity.
2017-08-25 01:57:54 +02:00
Dimitri Fontaine
72c58306ba Fix the previous fix.
See #614. Again. Should be ok now.
2017-08-25 01:56:34 +02:00
Dimitri Fontaine
f20a5a0667 Fix schema name comparing with quoted schema names.
In the previous commit we introduced support for database names including
spaces, which means that by default pgloader creates a target schema in
PostgreSQL with a space in its name. That works well as soon as you always
double-quote the schema name, which pgloader does.

Now, in our internal catalogs, we keep the schema name double-quoted. And
when comparing that schema names with quotes to the raw schema name from
PostgreSQL, they won't match, and pgloader tries to create the schema again:

  ERROR Database error 42P06: schema "my sql" already exists

Fix the comparing to compare unquoted schema name, fix #614 again: the
previous fix would only work the first time.
2017-08-25 01:47:49 +02:00
Dimitri Fontaine
9d4743f598 Allow database names to contain spaces.
Then they must be quoted (single or double quotes accepted), of course.

Fix #614.
2017-08-24 23:05:26 +02:00