The handling of the SQLite catalogs where fixed in a previous patch, but
either it's been broken in between or it never actually worked (oops).
Moreover, the recent patch about :on-update-current-timestamp changed the
casting rules matching code and we should position :auto-increment from the
SQLite module rather than "auto_increment" as before. That's better, but
wasn't done.
Fix#563 again, tested with a provided test-case (thanks!).
Namely the actions are “keep extra” and “drop extra” and the casting rule
guard is “with extra on update current timestamp”. Having support for those
elements in the casting rules allow such a definition as the following:
type timestamp with extra on update current timestamp
to "timestamp with time zone" drop extra
The effect of such as cast rule would be to ignore the MySQL extra
definition and then refrain pgloader from creating the PostgreSQL triggers
that implement the same behavior.
Fix#735.
In case of a failure to pre-process or transform values in the row that as
been read, we need to refrain from pushing the row into our next batch.
See #726, that got hit by the recent bug in the middle of something else
entirely.
When dealing with MATERIALIZING VIEWS test cases and failing in the middle
of them, as it happens when fixing bugs, then it was tedious (to say the
least) to clean-up manually the view each time.
That said, for end-users, doing it automatically would risk cleaning-up the
wrong view definition if they had a typo in their pgloader command, say.
Common Lisp helps a lot here: we simply create a restart that is only
available interactively for the developers of pgloader!
We forgot that rule in the case of creating the target tables for the
materializing views commands, which led to surprising and wrong behavior.
Fix#721, and add a new test case while at it.
It might be that the schema exists but we didn't find what we expected to
in there, so that it didn't make it to pgloader's internal catalogs. Be
friendly to the user with a better error message.
Fix#713.
Refactor file organisation further to allow for adding a “direct stream”
option when the on-error-stop behavior has been selected. This happens
currently by default for databases sources.
Introduce the new WITH option “on error resume next” which forces the
classic behavior of pgloader. The option “on error stop” already existed,
its implementation is new.
When this new behavior is activated, the data is sent to PostgreSQL
directly, without intermediate batches being built. It means that the whole
operation fails at the first error, and we don't have any information in
memory to try replaying any COPY of the data. It's gone.
This behavior should be fine for database migrations as you don't usually
want to fix the data manually in intermediate files, you want to fix the
problem at the source database and do the whole dance all-over again, up
until your casting rules are perfect.
This patch might also incurr some performance benenits in terms of both
timing and memory usage, though the local testing didn't show much of
anything for the moment.
Copy some code over from cl-postgres-trivial-utf-8 and add the support for
PostgreSQL COPY escaping right at the same place, allowing to allocate our
formatted utf-8 buffer only once, with the escaping already installed.
This patch was expected to be more about perfs, but it's actually only about
code cleaning it seems, as it doesn't make a big difference in the testing I
could do here.
That said, getting rid of one intermediate buffer should be nice in terms of
memory management.
The copy format and batch facilities are no longer the meat of your
PostgreSQL support in the src/pgsql directory, so have them leave in their
own space.
This function prepares the data to be sent down to PostgreSQL as a clean
COPY text with unicode handled correctly. This commit is mainly a clean-up
of the function, and also adds some smarts to try and make it faster.
In testing, the function is now tangentially faster than before, but not by
much. The hope here is that it's now easier to optimize it.
We now have a qmynd-impl::decoding-error condition to deal with, which as a
very good error reporting, so that we don't need to poke into babel details
anymore. The error message adds the column name, type and collation to the
output, too.
We keep the babel handlers for a while until people have all migrated to
using the patch in qmynd.
With the fix to Qmynd, Fix#716.
The previous patch introduced parser conflicts and we couldn't parse some
expressions any more, such as the following:
fields escaped by '\',
It's now possible to represent single quote as either '''', '\'', or '0x27'
and we still can parse '\' as being a single backslash character.
See #705.
The option "fields optionally enclosed by" was missing a way to easily
specify a single quote as the quoting character. Add '\'' to the existing
solution '0x27' which isn't as friendly.
See #705.
The query for concurrency-support didn't get the memo that we should ignore
PostgreSQL identifier-case when querying the source MySQL database. Fix the
query string to include column names as given by the MySQL catalogs.
In bug report #703, the problem is found in PostgreSQL queries. This has
been fixed before already. Trying to reproduce the bug produced an error in
the concurrency-support query instead, so let's fix this one.
Fix#703.
The website is moving to pgloader.org and readthedocs.io is going to be
integrated. Let's see what happens. The docs build fine locally with the
sphinx tools and the docs/Makefile.
Having separate files for the documentation should help ease the maintenance
and add new topics, such as support for Common Lisp Hackers level docs,
which are currently missing.
When this function was written, pgloader would get an array of numbers over
the wire, nowadays it looks like it's receiving an array of characters
instead (in other words, a string).
Improve the `bits-to-boolean` function to accept either input, and raise an
error in another case.
My theory is that something changed either in MySQL (with version 10) or in
the Qmynd driver somehow... but tonight we just go easy and fix the bug
locally rather than try and understand where it might be coming from.
Fixes#684.
Due to the way pgloader queries the PostgreSQL catalogs, it restricted the
target table to be “ordinary” tables, as per the relkind description in the
https://www.postgresql.org/docs/current/static/catalog-pg-class.html
PostgreSQL documentation.
Extend this to support relkind of 'r', 'f' and 'p'.
Fixes#587, fixes#690.
SQLite being very very liberal in type names (I think it accepts anything
and everything actually), our simple approach of tokenizing the input and
discarding noise words is not enough.
In this patch, we implement a new light parser for the SQLite type names to
better cope with noise words and random spacing of the catalog values that
SQLite failed to normalize. Well it didn't attempt, apparently.
Fix#548.
Given INCLUDING and EXCLUDING support it might be possible that we migrate a
table from SQLite without having selecting tables pointed to by foreign
keys. In that case, pgloader should still be able to load the data
definition and content fine, just skipping the incomplete fkey definitions.
That's implemented in this patch, which has been tested thanks to a
reproducible data set being made available!
Fixes#681.
In cases when the MS SQL database is setup with a case sensitive collation, then it would not find the catalog objects referenced from the query. To fix, just use UPPERCASE names, as they do work in both case insensitive and case sensitive collations.
In passing, add `system-index.txt` to `.gitignore` (generated by make).
Fixes#651.
The following casting rules are now the default for MySQL:
- type tinyint when unsigned to smallint drop typemod
- type smallint when unsigned to integer drop typemod
- type mediumint when unsigned to integer drop typemod
- type integer when unsigned to bigint drop typemod
Fixes#678.
MySQL allows using unsigned data types and pgloader should then target a
signed type of a larger capacity so that values can fit. For example, the
data definition “smallint(5) unsigned” should be casted to “integer”.
This patch allows user defined cast rules to be written against “unsigned”
data types as per their MySQL catalog representation.
See #678.
When doing a MySQL to PostgreSQL migration in data only mode, pgloader
matches schema names found on both source and target database, and much like
with table names must do so ensuring unquoted schema names.
Otherwise we fail to find the schema name again, because one spelling has
the quotes, but not the other one, when using the “quote identifiers”
option.
Fix#659, at least some forms of it.
The error handling would try and read past the error buffer in some cases,
when the BABEL lib would give a position that's after the buffer read.
Fix#661.
In the next release, pgloader defaults to targetting a new schema named the
same as the MySQL database, because that's what makes more sense. But people
are used to having 'public' in the search_path and everything in there.
So when creating our target schema, when migrating from MySQL, arrange it so
that the new schema is in the search_path by issuing a command like:
ALTER DATABASE plop SET search_path TO public, f1db;
And make this command visible in verbose (NOTICE) mode too, so that user can
see what happens.
Fix#654. I think.
It helps a lot to debug what's happening, and it seems that we lost the
information when cleaning up the log levels in recent efforts to unclutter
the default output.
It turns out that when using *print-pretty* in CCL we then have CL reader
references in the output, such as in the following example:
QUERY: comment on table mysql.base64 is $#1=DXIDC_EMLAQ$Test decoding base64 documents$#1#$
Of course that's wrong, so prevent this from happening by
forcing *print-pretty* to nil in a top-level function. We still turn this on
in the monitor thread when printing error messages as those might contain
recursive data structures.
When using --verbose or more detailed log messages, the summary prints
timings for both read and write operations separately. The write summary
timing took into account only the PostgreSQL batch activity, discarding the
formatting of the data done by pgloader.
As this formatting is quite heavy at the moment, the results are pretty
misleading without that information.
A stop-gap has been installed to prevent sending too much trafic to the
monitor, but the log-message arguments were still evaluated, and the :data
level output from format-row-in-batch is pretty costly.
Install a new function in the hooks file. This function might help fix
--self-upgrade later, we keep it around for when we'll have time to see
about that.
The ql:*local-project-directories* is a much better facility for us to load
pgloader from the local PWD rather than from the QL distribution. It looks
like the previous method worked by accident, for once, and also downloaded
pgloader from QL, unnecessarily (we have the sources locally).
Errors such as failing to open the log file (maybe because of bad
permissions) weren't correctly handled. This fixes the problem by handling
the conditions at the lparallel task handler level and signaling a brand new
condition up to the main outside handler.
Fixes#638.
We did it correctly for the bytes, and we need to apply the same logic to
other metric: the relevant information in the total summary line is the sum
from the data parts, not the sum from the postload parts.
The default values quoting changed in MariaDB 10, and we need to adjust in
pgloader: extra '' chars could defeat the default matching logic:
"'0000-00-00'" is different from "0000-00-00"
The MySQL special syntax "on update current_timestamp()" used to support
only a single column per table (in MySQL), and so did pgloader. In MariaDB
version 10 it's now possible to have several column with that special
treatment, so adapt pgloader to migrate that too.
What pgloader does is recognize that several columns are to receive the same
pre-update processing, and creates a single function that does the both of
them, as in the following example, from pgloader logs in a test case:
CREATE OR REPLACE FUNCTION mysql.on_update_current_timestamp_onupdate()
RETURNS trigger
LANGUAGE plpgsql
AS
$$
BEGIN
NEW.update_date = now();
NEW.calc_date = now();
RETURN NEW;
END;
$$;
CREATE TRIGGER on_update_current_timestamp
BEFORE UPDATE ON mysql.onupdate
FOR EACH ROW
EXECUTE PROCEDURE mysql.on_update_current_timestamp_onupdate();
Fixes#629.