It used to be that extra were forced to being parsed before guards, but
there's no reason why a user wouldn't think to write its clauses the other
way round, so add support for that as well.
See #779.
The MySQL connection string parameter for SSL usage is useSSL, so map an
option name to our expected values for sslmode in database connection
strings.
See #748.
The support for drop default in (user defined) casting rules was completely
broken in SQLite, because the code didn't even bother looking at what's
returning after applying the casting rules.
This patch fixes the code so that is uses the pgcol instance's default
value, as per after applying casting rules. The bug also existed in a subtle
form for MySQL and MS SQL, but would only show up there when the default
value is spelled using a known variation of “current timestamp”.
First review the `sqlite_sequence` support so that we can still work with
databases that don't have this catalog, which doesn't always exists -- it
might depend on the SQLite version though.
Then while at it use the sql macro to host the SQLite “queries” in their own
files, enhancing the hackability of the system to some degrees. Not that
much, because we have to use a lot of PGRAMA command and then the column
output isn't documented with the query text itself.
Namely the actions are “keep extra” and “drop extra” and the casting rule
guard is “with extra on update current timestamp”. Having support for those
elements in the casting rules allow such a definition as the following:
type timestamp with extra on update current timestamp
to "timestamp with time zone" drop extra
The effect of such as cast rule would be to ignore the MySQL extra
definition and then refrain pgloader from creating the PostgreSQL triggers
that implement the same behavior.
Fix#735.
We forgot that rule in the case of creating the target tables for the
materializing views commands, which led to surprising and wrong behavior.
Fix#721, and add a new test case while at it.
The query for concurrency-support didn't get the memo that we should ignore
PostgreSQL identifier-case when querying the source MySQL database. Fix the
query string to include column names as given by the MySQL catalogs.
In bug report #703, the problem is found in PostgreSQL queries. This has
been fixed before already. Trying to reproduce the bug produced an error in
the concurrency-support query instead, so let's fix this one.
Fix#703.
When this function was written, pgloader would get an array of numbers over
the wire, nowadays it looks like it's receiving an array of characters
instead (in other words, a string).
Improve the `bits-to-boolean` function to accept either input, and raise an
error in another case.
My theory is that something changed either in MySQL (with version 10) or in
the Qmynd driver somehow... but tonight we just go easy and fix the bug
locally rather than try and understand where it might be coming from.
Fixes#684.
SQLite being very very liberal in type names (I think it accepts anything
and everything actually), our simple approach of tokenizing the input and
discarding noise words is not enough.
In this patch, we implement a new light parser for the SQLite type names to
better cope with noise words and random spacing of the catalog values that
SQLite failed to normalize. Well it didn't attempt, apparently.
Fix#548.
MySQL allows using unsigned data types and pgloader should then target a
signed type of a larger capacity so that values can fit. For example, the
data definition “smallint(5) unsigned” should be casted to “integer”.
This patch allows user defined cast rules to be written against “unsigned”
data types as per their MySQL catalog representation.
See #678.
The error handling would try and read past the error buffer in some cases,
when the BABEL lib would give a position that's after the buffer read.
Fix#661.
The default values quoting changed in MariaDB 10, and we need to adjust in
pgloader: extra '' chars could defeat the default matching logic:
"'0000-00-00'" is different from "0000-00-00"
The MySQL special syntax "on update current_timestamp()" used to support
only a single column per table (in MySQL), and so did pgloader. In MariaDB
version 10 it's now possible to have several column with that special
treatment, so adapt pgloader to migrate that too.
What pgloader does is recognize that several columns are to receive the same
pre-update processing, and creates a single function that does the both of
them, as in the following example, from pgloader logs in a test case:
CREATE OR REPLACE FUNCTION mysql.on_update_current_timestamp_onupdate()
RETURNS trigger
LANGUAGE plpgsql
AS
$$
BEGIN
NEW.update_date = now();
NEW.calc_date = now();
RETURN NEW;
END;
$$;
CREATE TRIGGER on_update_current_timestamp
BEFORE UPDATE ON mysql.onupdate
FOR EACH ROW
EXECUTE PROCEDURE mysql.on_update_current_timestamp_onupdate();
Fixes#629.
At the moment it's a very manual process, and it might get automated
someday. Meanwhile it's still useful to have.
See #569 for an issue that got a test case added.
In passing, add the identifiers case option to SQLite support, which makes
it easier to test here, and add a table named "TableName" to our local test
database.
Fix#631.
In travis environment we still test with PostgreSQL 9.1 and 9.6, and there's
no reason for this test to use a modern spelling of create schema, after
all.
It works because the test/csv-before-after.load creates the schema and is
ran before test/csv-guess.load. That's good enough for now.
Previous patch made regression failures obvious that were hidden by strange
bugs with CCL.
One such regression was introduced in commit
ab7e77c2d00decce64ab739d0eb3d2ca5bdb6a7e where we played with the complex
code generation for field projection, where the following two cases weren't
cleanly processed anymore:
column text using "constant"
column text using "field-name"
In the first case we want to load a user-defined constant in the column, in
the second case we want to load the value of the field "field-name" in the
column --- we just have different source and target names.
Another regression was introduced in the recent commit
01e5c2376390749c2b7041b17b9a974ee8efb6b2 where the create-table function was
called too early, before we have fetched *pgsql-reserved-keywords*. As a
consequence table names weren't always properly quoted as shown in the
test/csv-header.load file which targets a table named "group".
Finally, skip the test/dbf.load regression test when using CCL as this
environment doesn't have the necessary CP850 code page / encoding.
It used to be that you would give the target table name as an option to the
PostgreSQL connection string, which is untasteful:
load ... into pgsql://user@host/dbname?tablename=foo.bar ...
Or even, for backwards compatibility:
load ... into pgsql://user@host/dbname?foo.bar ...
The new syntax makes provision for a separate clause for the target table
name, possibly schema-qualified:
load ... into pgsql://user@host/dbname target table foo.bar ...
Which is much better, in particular when used together with the target
columns clause.
Implementing this seemingly quite small feature had impact on many parsing
related features of pgloader, such as the regression testing facility. So
much so that some extra refactoring got into its way here, around the
lisp-code-for-loading-from-<source> functions and their usage in
`load-data'.
While at it, this patch simplifies a lot the `load-data' function by making
a good use of &allow-other-keys and :allow-other-keys t.
Finally, this patch splits main.lisp into main.lisp and api.lisp, with the
latter intended to contain functions for Common Lisp programs wanting to use
pgloader as a library. The API itself is still the same as before this
patch, tho. Just in another file for clarity.
This allows to bypass SSL when you don't need it, like over localhost for
instance. Takes the same syntax as the PostgreSQL sslmode connection string
parameter.
In this commit we fail the guess faster, allowing to test for a much larger
sample. The sample is still hard-coded, but this time to 1000 lines.
Also add a test case, see #618.
This feature has been asked several times, and I can't see any way to fix
the GETENV parsing mess that we have. In this patch the GETENV support is
retired and replaced with a templating system, using the Mustache syntax.
To get back the GETENV feature, our implementation of the Mustache template
system adds support for fetching the template variable values from the OS
environment.
Fixes#555, Fixes#609.
See #500, #477, #278.
The spelling in SQLite for the default value is "current_date", instruct
pgloader about that. This commit also adds a test case in our sqlite.db
unit tests database.
Fixes#607.
Experiment with the idea of splitting the read work in several concurrent
threads, where each reader is reading portions of the target table, using a
WHERE id <= x and id > y clause in its SELECT query.
For this to kick-in a number of conditions needs to be met, as described in
the documentation. The main interest might not be faster queries to overall
fetch the same data set, but better concurrency with as many readers as
writters and each couple its own dedicated queue.
pgloader had support for PostgreSQL SET parameters (gucs) from the
beginning, and in the same vein it might be necessary to tweak MySQL
connection parameters, and allow pgloader users to control them.
See #337 and #420 where net_read_timeout and net_write_timeout might need to
be set in order to be able to complete the migration, due to high volumes of
data being processed.
The “magic” options --batch and --heap-reserve will be processed by CCL
itself before pgloader gets to see them, so try that in the testing
environment.
The previous coding would discard any work done at the apply-casting-rules
step when adding source specific smarts about handling default, because of
what looks like negligence and bad tests. A test case scenario exists but
was not exercized :(
Fix that by defaulting the default value to the one given back at the
apply-casting-rules stage, where we apply the "drop default" clause.
Given new SQLite test case from issue #563 we see that pgloader doesn't
handle errors gracefully in post-copy stage. That's because the API were not
properly defined, we should use pgsql-execute-with-timing rather than other
construct here, because it allows the "on error resume next" behavior we
want with after load DDL statements.
See #563.
It turns out we forgot to add support for internal catalog munging clauses
to SQLite support. The catalogs being normalized means there's no extra work
here other than allowing the parser to accept those clauses and then pass
them over to our generic `copy-database' method implementation.
It is to be noted that SQLite has no support for schemas as per the standard
and PostgreSQL, so that when we inspect the database schema we create a nil
entry here. It's then not possible to ALTER SCHEMA nil RENAME TO 'target';
unfortunately, but it's easy enough to SET search_path to 'target' anyway,
as shown in the modified test case.
Fix#552.
In advanced projections it could be that we call the transformation function
for some input fields twice. This is a bug that manifest in particular when
the output of the transformation can't be used/parsed again by the same
function as shown in the bug reported.
Fix#523.
In PostgreSQL it is possible at CREATE TABLE time to set some extra storage
parameters, the most useful of them in the context of pgloader being the
FILLFACTOR. For the setting to be useful, it needs to be positionned at
CREATE TABLE time, before we load the data.
The BEFORE LOAD clause of the pgloader command allows to run SQL scripts
that will be executed before the load, and even before the creation of the
target schema when pgloader does that, which is nice for other use case.
Here we implement a new `ALTER TABLE` rule that one can set in the pgloader
command in order to change storage parameters at CREATE TABLE time:
ALTER TABLE NAMES MATCHING ~/\./ SET (fillfactor='40')
Fix#516.
Avoid double quoting the schema names when used in PostgreSQL catalog
queries, where the identifiers are used as literal values and need to be
single-quoted.
Fix#476, again.
This pgloader command allows to migrate tables while changing the schema
they are found into in between their MySQL source database and their
PostgreSQL target database.
This changes the default behavior of pgloader with MySQL from always
targetting the 'public' schema to targetting by default a schema named
the same as the MySQL database. You can revert to the old behavior by
adding a rule:
ALTER SCHEMA 'dbname' RENAME TO 'public
We might want to add a patch to re-install the default behavior later.
Also see #489 where it used not to be possible to rename the schema at
migration time, causing strange errors (you need to spot NIL as the
schema name in the "failed to find target table" messages.
When the option "drop indexes" is in use in loading data from a file, we
collect the indexes from the PostgreSQL catalogs and then issue DROP
commands against them before the load, then CREATE commands when it's
done.
The CREATE is done in parallel, and we create an lparallel kernel for
that. The kernel must have a worker-count of at least 1, and we where
not considering the case of 0 indexes on the target table.
Fix#484.
As shown in #476, it is sometimes needed to be able to quote the
identifier names even when loading from a file, that is when specifying
the target table name in the database uri.
To that ends, allow the option "identifier case" to be used in the file
based cases too. Fixes#476.
When loading data into an existing PostgreSQL catalog, we DROP the
indexes for better performance of the data loading. Some of the indexes
are UNIQUE or even PRIMARY KEYS, and some FOREIGN KEYS might depend on
them in the PostgreSQL dependency tracking of the catalog.
We used to use the CASCADE option when dropping the indexes, which hides
a bug: if we exclude from the load tables with foreign keys pointing to
tables we target, then we would DROP those foreign keys because of the
CASCADE option, but fail to install them again at the end of the load.
To prevent that from happening, pgloader now query the PostgreSQL
pg_depend system catalog to list the “missing” foreign keys and add them
to our internal catalog representation, from which we know to DROP then
CREATE the SQL object at the proper times.
See #400 as this was an oversight in fixing this issue.
We used to force overly strict rules for a quoted field name in a CSV
load file, now accept any character but a quote to be part of the field
name.
Fixes#416.
Also known as the ORM case, it happens that other tools are used to
create the target schema. In that case pgloader job is to fill in the
exiting target tables with the data from the source tables.
We still focus on load speed and pgloader will now DROP the
constraints (Primary Key, Unique, Foreign Keys) and indexes before
running the COPY statements, and re-install the schema it found in the
target database once the data load is done.
This behavior is activated when using the “create no tables” option as
in the following test-case setup:
with create no tables, include drop, truncate
Fixes#400, for which I got a test-case to play with!