pgloader

mirror of https://github.com/dimitri/pgloader.git synced 2026-01-22 15:41:03 +01:00

Author	SHA1	Message	Date
Christoph Berg	cb528c2e19	All included test data has been verified as free, stop building a +dfsg tarball.	2018-07-03 16:37:43 +02:00
Dimitri Fontaine	7d8ac3b352	Reproduce a test case from issue #795 .	2018-06-15 22:14:04 +02:00
Dimitri Fontaine	a0bac47101	Refrain from TRUNCAT'ing an empty list of tables. Fixed #789.	2018-06-15 17:46:31 +02:00
Christoph Berg	30f90cb848	test/Makefile: Allow configuring the superuser database name Also, don't ignore errors while setting up the database	2018-06-04 10:52:14 +02:00
Dimitri Fontaine	1fe835d31b	Add sample data for fields escaped by double-quote. See #787.	2018-04-29 19:05:52 +02:00
Dimitri Fontaine	a392328dad	Allow any ordering of guards and extra cast rule clauses. It used to be that extra were forced to being parsed before guards, but there's no reason why a user wouldn't think to write its clauses the other way round, so add support for that as well. See #779.	2018-04-29 19:00:20 +02:00
Dimitri Fontaine	e4dca1a086	Implement support for MySQL useSSL=true\|false option. The MySQL connection string parameter for SSL usage is useSSL, so map an option name to our expected values for sslmode in database connection strings. See #748.	2018-03-16 16:41:40 +01:00
Dimitri Fontaine	4301503df2	Add a new test case for {{ENVVAR}} template support. See #555.	2018-02-20 14:45:07 +01:00
Dimitri Fontaine	ea6c91b429	Fix "drop default" casting rules for all databases. The support for drop default in (user defined) casting rules was completely broken in SQLite, because the code didn't even bother looking at what's returning after applying the casting rules. This patch fixes the code so that is uses the pgcol instance's default value, as per after applying casting rules. The bug also existed in a subtle form for MySQL and MS SQL, but would only show up there when the default value is spelled using a known variation of “current timestamp”.	2018-02-08 23:33:51 +01:00
Dimitri Fontaine	29506e6fa6	Assorted fixes for SQLite. First review the `sqlite_sequence` support so that we can still work with databases that don't have this catalog, which doesn't always exists -- it might depend on the SQLite version though. Then while at it use the sql macro to host the SQLite “queries” in their own files, enhancing the hackability of the system to some degrees. Not that much, because we have to use a lot of PGRAMA command and then the column output isn't documented with the query text itself.	2018-02-08 22:55:15 +01:00
Dimitri Fontaine	4612e68435	Implement support for new casting rules guards and actions. Namely the actions are “keep extra” and “drop extra” and the casting rule guard is “with extra on update current timestamp”. Having support for those elements in the casting rules allow such a definition as the following: type timestamp with extra on update current timestamp to "timestamp with time zone" drop extra The effect of such as cast rule would be to ignore the MySQL extra definition and then refrain pgloader from creating the PostgreSQL triggers that implement the same behavior. Fix #735.	2018-01-31 15:17:05 +01:00
Dimitri Fontaine	7b08b6e3d3	Refrain from creating tables in “data only” operations. We forgot that rule in the case of creating the target tables for the materializing views commands, which led to surprising and wrong behavior. Fix #721, and add a new test case while at it.	2018-01-25 23:32:31 +01:00
Dimitri Fontaine	07cdf3e7e5	Use MySQL column names in MySQL queries. The query for concurrency-support didn't get the memo that we should ignore PostgreSQL identifier-case when querying the source MySQL database. Fix the query string to include column names as given by the MySQL catalogs. In bug report #703, the problem is found in PostgreSQL queries. This has been fixed before already. Trying to reproduce the bug produced an error in the concurrency-support query instead, so let's fix this one. Fix #703.	2017-12-22 14:15:46 +01:00
Dimitri Fontaine	b7d87a9eb1	Fix MySQL bit(1) casting function. When this function was written, pgloader would get an array of numbers over the wire, nowadays it looks like it's receiving an array of characters instead (in other words, a string). Improve the `bits-to-boolean` function to accept either input, and raise an error in another case. My theory is that something changed either in MySQL (with version 10) or in the Qmynd driver somehow... but tonight we just go easy and fix the bug locally rather than try and understand where it might be coming from. Fixes #684.	2017-12-03 23:06:54 +01:00
Dimitri Fontaine	52f13456d9	Rewrite the SQLite type name parsing. SQLite being very very liberal in type names (I think it accepts anything and everything actually), our simple approach of tokenizing the input and discarding noise words is not enough. In this patch, we implement a new light parser for the SQLite type names to better cope with noise words and random spacing of the catalog values that SQLite failed to normalize. Well it didn't attempt, apparently. Fix #548.	2017-11-28 18:19:12 +01:00
Dimitri Fontaine	2b861a3e96	New SQLite test cases.	2017-11-25 16:31:42 -08:00
Dimitri Fontaine	5c60f8c35c	Implement a new type casting guard: unsigned. MySQL allows using unsigned data types and pgloader should then target a signed type of a larger capacity so that values can fit. For example, the data definition “smallint(5) unsigned” should be casted to “integer”. This patch allows user defined cast rules to be written against “unsigned” data types as per their MySQL catalog representation. See #678.	2017-11-22 10:26:03 -08:00
Dimitri Fontaine	1d7706c045	Fix the MySQL encoding error handling. The error handling would try and read past the error buffer in some cases, when the BABEL lib would give a position that's after the buffer read. Fix #661.	2017-11-13 11:27:47 +01:00
Dimitri Fontaine	a9afddf8ed	Accept quoted namestrings as target type names for cast rules. This allows passing "double precision" rather than float8, for example. Fix #650.	2017-10-21 21:03:58 +02:00
Dimitri Fontaine	b36f36b74e	Add a (local) test case.	2017-10-16 17:25:44 +02:00
Dimitri Fontaine	460fe6cc77	Fix quoting of default values for MariaDB 10 support. The default values quoting changed in MariaDB 10, and we need to adjust in pgloader: extra '' chars could defeat the default matching logic: "'0000-00-00'" is different from "0000-00-00"	2017-09-19 11:29:53 +02:00
Dimitri Fontaine	8a361a0ff8	Add support for multiple on update columns per table. The MySQL special syntax "on update current_timestamp()" used to support only a single column per table (in MySQL), and so did pgloader. In MariaDB version 10 it's now possible to have several column with that special treatment, so adapt pgloader to migrate that too. What pgloader does is recognize that several columns are to receive the same pre-update processing, and creates a single function that does the both of them, as in the following example, from pgloader logs in a test case: CREATE OR REPLACE FUNCTION mysql.on_update_current_timestamp_onupdate() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN NEW.update_date = now(); NEW.calc_date = now(); RETURN NEW; END; $$; CREATE TRIGGER on_update_current_timestamp BEFORE UPDATE ON mysql.onupdate FOR EACH ROW EXECUTE PROCEDURE mysql.on_update_current_timestamp_onupdate(); Fixes #629.	2017-09-15 01:04:57 +02:00
Dimitri Fontaine	b7347a567c	Add test cases for MySQL. At the moment it's a very manual process, and it might get automated someday. Meanwhile it's still useful to have. See #569 for an issue that got a test case added.	2017-09-14 15:59:10 +02:00
Dimitri Fontaine	dbadab9e9e	Implement a new “snake_case” quoting rule. In passing, add the identifiers case option to SQLite support, which makes it easier to test here, and add a table named "TableName" to our local test database. Fix #631.	2017-09-13 22:55:10 +02:00
Dimitri Fontaine	d2d4be2ed0	Fix test/csv-guess.load for old PostgreSQL. In travis environment we still test with PostgreSQL 9.1 and 9.6, and there's no reason for this test to use a modern spelling of create schema, after all. It works because the test/csv-before-after.load creates the schema and is ran before test/csv-guess.load. That's good enough for now.	2017-09-09 00:59:39 +02:00
Dimitri Fontaine	38712d98e0	Fix regression testing. Previous patch made regression failures obvious that were hidden by strange bugs with CCL. One such regression was introduced in commit ab7e77c2d00decce64ab739d0eb3d2ca5bdb6a7e where we played with the complex code generation for field projection, where the following two cases weren't cleanly processed anymore: column text using "constant" column text using "field-name" In the first case we want to load a user-defined constant in the column, in the second case we want to load the value of the field "field-name" in the column --- we just have different source and target names. Another regression was introduced in the recent commit 01e5c2376390749c2b7041b17b9a974ee8efb6b2 where the create-table function was called too early, before we have fetched pgsql-reserved-keywords. As a consequence table names weren't always properly quoted as shown in the test/csv-header.load file which targets a table named "group". Finally, skip the test/dbf.load regression test when using CCL as this environment doesn't have the necessary CP850 code page / encoding.	2017-09-09 00:51:07 +02:00
Dimitri Fontaine	01e5c23763	Add support for explicit TARGET TABLE clause in load commands. It used to be that you would give the target table name as an option to the PostgreSQL connection string, which is untasteful: load ... into pgsql://user@host/dbname?tablename=foo.bar ... Or even, for backwards compatibility: load ... into pgsql://user@host/dbname?foo.bar ... The new syntax makes provision for a separate clause for the target table name, possibly schema-qualified: load ... into pgsql://user@host/dbname target table foo.bar ... Which is much better, in particular when used together with the target columns clause. Implementing this seemingly quite small feature had impact on many parsing related features of pgloader, such as the regression testing facility. So much so that some extra refactoring got into its way here, around the lisp-code-for-loading-from-<source> functions and their usage in `load-data'. While at it, this patch simplifies a lot the `load-data' function by making a good use of &allow-other-keys and :allow-other-keys t. Finally, this patch splits main.lisp into main.lisp and api.lisp, with the latter intended to contain functions for Common Lisp programs wanting to use pgloader as a library. The API itself is still the same as before this patch, tho. Just in another file for clarity.	2017-08-25 01:57:54 +02:00
Dimitri Fontaine	9263baeb49	Implement sslmode for MySQL connections. This allows to bypass SSL when you don't need it, like over localhost for instance. Takes the same syntax as the PostgreSQL sslmode connection string parameter.	2017-08-24 14:56:59 +02:00
Dimitri Fontaine	b685c8801d	Improve guessing of CSV parameters. In this commit we fail the guess faster, allowing to test for a much larger sample. The sample is still hard-coded, but this time to 1000 lines. Also add a test case, see #618.	2017-08-24 13:30:14 +02:00
Dimitri Fontaine	f719d2976d	Implement a template system for pgloader commands. This feature has been asked several times, and I can't see any way to fix the GETENV parsing mess that we have. In this patch the GETENV support is retired and replaced with a templating system, using the Mustache syntax. To get back the GETENV feature, our implementation of the Mustache template system adds support for fetching the template variable values from the OS environment. Fixes #555, Fixes #609. See #500, #477, #278.	2017-08-16 01:33:11 +02:00
Dimitri Fontaine	049a1199c2	Implement support for SQLite current_date default value. The spelling in SQLite for the default value is "current_date", instruct pgloader about that. This commit also adds a test case in our sqlite.db unit tests database. Fixes #607.	2017-08-08 21:55:15 +02:00
Dimitri Fontaine	ae0c6ed119	Add support for preserving index names in SQLite. See #187.	2017-07-17 11:04:12 +02:00
Dimitri Fontaine	0549e74f6d	Implement multiple reader per table for MySQL. Experiment with the idea of splitting the read work in several concurrent threads, where each reader is reading portions of the target table, using a WHERE id <= x and id > y clause in its SELECT query. For this to kick-in a number of conditions needs to be met, as described in the documentation. The main interest might not be faster queries to overall fetch the same data set, but better concurrency with as many readers as writters and each couple its own dedicated queue.	2017-06-28 16:23:18 +02:00
Dimitri Fontaine	352f4adc8d	Implement support for MySQL SET parameters. pgloader had support for PostgreSQL SET parameters (gucs) from the beginning, and in the same vein it might be necessary to tweak MySQL connection parameters, and allow pgloader users to control them. See #337 and #420 where net_read_timeout and net_write_timeout might need to be set in order to be able to complete the migration, due to high volumes of data being processed.	2017-06-27 10:00:47 +02:00
Dimitri Fontaine	b3cb7b256d	Travis: let's actually use the new EXTRA_OPTS.	2017-06-14 21:32:41 +02:00
Dimitri Fontaine	1469789ede	Try to get more information from CCL in testing. The “magic” options --batch and --heap-reserve will be processed by CCL itself before pgloader gets to see them, so try that in the testing environment.	2017-06-14 21:12:54 +02:00
Dimitri Fontaine	355aedfd72	Fix "drop default" casting rule. The previous coding would discard any work done at the apply-casting-rules step when adding source specific smarts about handling default, because of what looks like negligence and bad tests. A test case scenario exists but was not exercized :( Fix that by defaulting the default value to the one given back at the apply-casting-rules stage, where we apply the "drop default" clause.	2017-06-08 21:39:06 +02:00
Dimitri Fontaine	25e5ea9ac3	Refactor error handling in complete-pgsql-database. Given new SQLite test case from issue #563 we see that pgloader doesn't handle errors gracefully in post-copy stage. That's because the API were not properly defined, we should use pgsql-execute-with-timing rather than other construct here, because it allows the "on error resume next" behavior we want with after load DDL statements. See #563.	2017-06-08 12:09:11 +02:00
Dimitri Fontaine	c6b634caad	Provide "on error stop" as a WITH option. As seen in #546 it would be easier to be able to specify the option in the load command directly rather than only at the command line. Here we go!	2017-06-01 16:43:09 +02:00
Dimitri Fontaine	3b4af49e22	Implement ALTER SCHEMA and ALTER TABLE for SQLite. It turns out we forgot to add support for internal catalog munging clauses to SQLite support. The catalogs being normalized means there's no extra work here other than allowing the parser to accept those clauses and then pass them over to our generic `copy-database' method implementation. It is to be noted that SQLite has no support for schemas as per the standard and PostgreSQL, so that when we inspect the database schema we create a nil entry here. It's then not possible to ALTER SCHEMA nil RENAME TO 'target'; unfortunately, but it's easy enough to SET search_path to 'target' anyway, as shown in the modified test case. Fix #552.	2017-05-14 20:47:01 +02:00
Dimitri Fontaine	ab7e77c2d0	Fix double transformation call in CSV projections. In advanced projections it could be that we call the transformation function for some input fields twice. This is a bug that manifest in particular when the output of the transformation can't be used/parsed again by the same function as shown in the bug reported. Fix #523.	2017-03-04 15:55:08 +01:00
Dimitri Fontaine	9e2b95d9b7	Implement support for PostgreSQL storage parameters. In PostgreSQL it is possible at CREATE TABLE time to set some extra storage parameters, the most useful of them in the context of pgloader being the FILLFACTOR. For the setting to be useful, it needs to be positionned at CREATE TABLE time, before we load the data. The BEFORE LOAD clause of the pgloader command allows to run SQL scripts that will be executed before the load, and even before the creation of the target schema when pgloader does that, which is nice for other use case. Here we implement a new `ALTER TABLE` rule that one can set in the pgloader command in order to change storage parameters at CREATE TABLE time: ALTER TABLE NAMES MATCHING ~/\./ SET (fillfactor='40') Fix #516.	2017-02-25 21:49:06 +01:00
Dimitri Fontaine	dbf7d6e48f	Don't double-quote identifiers in catalog queries. Avoid double quoting the schema names when used in PostgreSQL catalog queries, where the identifiers are used as literal values and need to be single-quoted. Fix #476, again.	2017-01-10 21:12:34 +01:00
Dimitri Fontaine	4931604361	Allow ALTER SCHEMA command for MySQL. This pgloader command allows to migrate tables while changing the schema they are found into in between their MySQL source database and their PostgreSQL target database. This changes the default behavior of pgloader with MySQL from always targetting the 'public' schema to targetting by default a schema named the same as the MySQL database. You can revert to the old behavior by adding a rule: ALTER SCHEMA 'dbname' RENAME TO 'public We might want to add a patch to re-install the default behavior later. Also see #489 where it used not to be possible to rename the schema at migration time, causing strange errors (you need to spot NIL as the schema name in the "failed to find target table" messages.	2016-12-18 19:31:21 +01:00
Dimitri Fontaine	2dc733c4d6	Fix corner case in creating indexes again. When the option "drop indexes" is in use in loading data from a file, we collect the indexes from the PostgreSQL catalogs and then issue DROP commands against them before the load, then CREATE commands when it's done. The CREATE is done in parallel, and we create an lparallel kernel for that. The kernel must have a worker-count of at least 1, and we where not considering the case of 0 indexes on the target table. Fix #484.	2016-11-20 17:17:15 +01:00
Dimitri Fontaine	526fafb4b7	Allow quoting identifiers in db uri tablename. As shown in #476, it is sometimes needed to be able to quote the identifier names even when loading from a file, that is when specifying the target table name in the database uri. To that ends, allow the option "identifier case" to be used in the file based cases too. Fixes #476.	2016-11-13 22:14:48 +01:00
Dimitri Fontaine	0b06bc6ad6	Update an old archive test case.	2016-08-28 20:29:30 +02:00
Dimitri Fontaine	a86a606d55	Improve existing PostgreSQL database handling. When loading data into an existing PostgreSQL catalog, we DROP the indexes for better performance of the data loading. Some of the indexes are UNIQUE or even PRIMARY KEYS, and some FOREIGN KEYS might depend on them in the PostgreSQL dependency tracking of the catalog. We used to use the CASCADE option when dropping the indexes, which hides a bug: if we exclude from the load tables with foreign keys pointing to tables we target, then we would DROP those foreign keys because of the CASCADE option, but fail to install them again at the end of the load. To prevent that from happening, pgloader now query the PostgreSQL pg_depend system catalog to list the “missing” foreign keys and add them to our internal catalog representation, from which we know to DROP then CREATE the SQL object at the proper times. See #400 as this was an oversight in fixing this issue.	2016-08-10 22:02:06 +02:00
Dimitri Fontaine	c2c98b8b42	Allow any character in a quoted CSV field name. We used to force overly strict rules for a quoted field name in a CSV load file, now accept any character but a quote to be part of the field name. Fixes #416.	2016-08-07 20:35:37 +02:00
Dimitri Fontaine	70572a2ea7	Implement support for existing target databases. Also known as the ORM case, it happens that other tools are used to create the target schema. In that case pgloader job is to fill in the exiting target tables with the data from the source tables. We still focus on load speed and pgloader will now DROP the constraints (Primary Key, Unique, Foreign Keys) and indexes before running the COPY statements, and re-install the schema it found in the target database once the data load is done. This behavior is activated when using the “create no tables” option as in the following test-case setup: with create no tables, include drop, truncate Fixes #400, for which I got a test-case to play with!	2016-08-06 20:19:15 +02:00

1 2 3 4 5

218 Commits