pgloader

mirror of https://github.com/dimitri/pgloader.git synced 2025-08-07 14:56:59 +02:00

Author	SHA1	Message	Date
Dimitri Fontaine	eab1cbf326	More docs improvements. Explain the feature list of pgloader better for improving discoverability of what can be achieved with our nice little tool.	2018-12-19 22:40:32 +01:00
Dimitri Fontaine	ec071af0ad	Add a Feature Matrix to the documentation. That helps having both an overview of what pgloader is capable of doing with a database migration, and also documenting that some sources don't have the full support for some features yet.	2018-12-19 15:31:25 +01:00
Dimitri Fontaine	2cafa8360c	Document newly added MATERIALIZE VIEWS for new sources. Now it's possible to use this clause with a PostgreSQL or an MS SQL database source. Fixes #817.	2018-12-19 10:51:04 +01:00
Dimitri Fontaine	c019c16113	Implement MATERIALIZE VIEWS support for MS SQL, and distribute. The latter is not tested yet, but should have no impact if not used. Given how rare it is that I get a chance to play around with a MS SQL instance anyway, it might be better to push blind changes for it when it doesn't impact existing features…	2018-12-19 01:25:27 +01:00
Dimitri Fontaine	bda06f8ac0	Implement Citus support from a MySQL database.	2018-12-17 16:31:47 +01:00
Dimitri Fontaine	290ad68d61	Implement materialize views in PostgreSQL source support.	2018-12-16 23:17:37 +01:00
Dimitri Fontaine	007003647d	Improve Redshift support documentation.	2018-12-14 18:21:34 +09:00
Dimitri Fontaine	f72afeeae7	Switch the documentation to the ReadTheDocs template.	2018-12-12 09:34:20 +09:00
Dimitri Fontaine	b6de8f1ead	Improve Citus documentation.	2018-12-12 09:34:05 +09:00
Dimitri Fontaine	56d24de67a	Update documentation with new features. We have a lot of new features to document. This is a first patch about that, some more work is to be done. That said, it's better than nothing already.	2018-12-11 14:25:08 +09:00
Dimitri Fontaine	af2995b918	Apply quoting rules to SQLite index column names. The previous fix was wrong for missing the point: rather than unquote column names in the table definition when matching the column names in the index definition, we should in the first place have quoted the index column names when needed. Fixes #872 for real this time.	2018-12-02 00:17:26 +01:00
Dimitri Fontaine	a939d20dff	Unquote names when searching for an index column name in its table. If the source database is using a keyword (such as "order") as a column name, then pgloader is going to quote this column name in its internal catalogs. In that case, unquote the column in the pgloader catalogs when matching it against the unquoted column name we have in the index definition. Fixes #872.	2018-12-01 21:27:26 +01:00
Dimitri Fontaine	ab2cadff24	Simplify the regular expresion parsing the PostgreSQL version string. The debian/Ubuntu packaging would defeat the quite simple regexp parsing PostgreSQL version string that we have in pgloader. To make it more robust, make it more open to unforeseen strings. See #800, see #810.	2018-11-30 15:39:27 +01:00
Dimitri Fontaine	801d8a6e09	Add support for MS SQL time data type. As for the other datetime types we have to use CONVERT at the SQL level in order to get a format that PostgreSQL understands. This time the magic number for it is 114.	2018-11-23 10:43:58 +01:00
Dimitri Fontaine	6e325f67e0	Implement the save.lisp idea for the bundle. This should make it easier to build pgloader with CCL rather than SBCL, all from the bundle distribution, and also easier to support windows. In passing, add a new file in the bundle distribution: version.sexp should contain a CL string containing the pgloader version string.	2018-11-21 21:44:56 +01:00
Dimitri Fontaine	18bcf10903	Blind fix for a strange use-case. A user reported a case where pgloader fails to find the table an index has been created on in pgloader catalogs. That's a weird case. For now, just issue a warning about the situation and skip the index.	2018-11-21 18:17:34 +01:00
Dimitri Fontaine	4ab26e5387	Handle other conditions in process-catalogs. It might be that some random condition is signaled during process-catalogs, causing the errors reported so far and that I can't reproduce. Let's add some handler-case protection to have more clues about what could be happening. See #865, #800, #810, #859, #824.	2018-11-21 17:31:11 +01:00
Dimitri Fontaine	743769d750	Improve handling of errors when fetching the source catalogs. We might have MS SQL failures at this stage, or even Redshift or other PostgreSQL variants failing to execute our catalog queries. Handle conditions by cleanly logging them and returning from copy-database without doing anything. That's the best we can do here. Fixes #605, fixes #757.	2018-11-21 10:38:19 +01:00
Dimitri Fontaine	1c18b41cd7	Implement a new way of building pgloader: make save. This time we directly call into the save-lisp-and-die feature of the implementation. As pgloader only supports SBCL and CCL at the time being, doing things without an abstraction layer is easy enough. This needs more testing and a special version for the bundle case too. One step at a time, etc.	2018-11-20 22:59:43 +01:00
Dimitri Fontaine	3f2f10eef1	Finish implementation of CAST rules for PostgreSQL source databases. Add a link to the table from the internal catalogs for columns so that we can match table-source-name in cast rules when migrating from PostgreSQL.	2018-11-19 19:33:37 +01:00
Dimitri Fontaine	aa8ae159e2	Improve error handling when applying Citus distribution rules. Make it so that we generate a proper error message to the user when failing to figure out the PATH to the distribution key, rather than failing with an internal error about The value NIL is not of type PGLOADER.CATALOG:TABLE.	2018-11-18 18:21:51 +01:00
Dimitri Fontaine	f07ac61269	Fix default/serial handling of pgsql as a source. In the recent patch that added support for Redshift "identity" columns, we broke support for PostgreSQL sequences. Unbreak that.	2018-11-18 17:46:41 +01:00
Dimitri Fontaine	1fd0576ace	Fix Citus support related debug print instructions.	2018-11-16 00:08:27 +01:00
Dimitri Fontaine	8b1acbae87	Make sure the image knows how to print circular data structures. Our catalogs representation is designed to be circular, which helps navigating the graph from anywhere when processing it. This means that we need to have print-circle set to t in the pgloader image, otherwise we might run into Control stack exhausted when trying to print out debug information... Fixes #865, #800, #810, #859, #824.	2018-11-16 00:03:31 +01:00
Dimitri Fontaine	e291c502ba	Install a call to cl+ssl:reload at image startup time, again. Testing shows that it's not just debian which needs it, it's always necessary. Just re-add our tweak now. See #866, see #816, see #807, #794.	2018-11-15 23:59:51 +01:00
Dimitri Fontaine	16dda01f37	Deal with SSL verify error the wrong way. This patch adds an option --no-ssl-cert-verification that allows bypassing OpenSSL server certificate verification. It's hopefully a temporary measure that we set up in order to make progress when confronted to: SSL verify error: 20 X509_V_ERR_UNABLE_TO_GET_ISSUER_CERT_LOCALLY The real solution is of course to install the SSL certificates at a place where pgloader will look for them, which defaults to ~/.postgresql/postgresql.crt at the moment. It's not clear what the story is with the defaults from /etc/ssl, or how to make things happen in a better way. See #648, See #679, See #768, See #748, See #775.	2018-11-15 00:13:21 +01:00
Dimitri Fontaine	5ecf04acb9	Implement null if support as a WITH option. This gives a default "null if" option to all the input columns at once, and it's still possible to override the default per column. In passing, fix project-fields declarations that SBCL now complains about when they're not true, such as declaring a vector when we might have :null or nil. As a result, remove the (declare (optimize speed)) in the generated field processing code.	2018-11-13 21:41:27 +01:00
Dimitri Fontaine	a6ef7a56a9	Implement ipv6 hostname support in .pgpass rules. An hostname could be written [::1] in .pgass, without having to escape the colon characters, and with a proper enclosing in square brackets, as common for ipv6 addresses. Fixes #837.	2018-11-10 21:01:30 +01:00
Dimitri Fontaine	656bf85075	Review field to column projection code emitted. The code emitted by pgloader to transform input fields into PostgreSQL column values was using too many optimization declarations, some of them that SBCL failed to follow through for lack of type marking in the generated code. As SBCL doesn't have enough information to be optimizing anyway, at least we can make it so that we don't have a warning about it. The new code does that. Fixes #803.	2018-11-10 20:22:04 +01:00
Dimitri Fontaine	6eaad0621b	Desultory code maintenance for MS SQL identity support. The code expects the keyword :auto-increment rather than a string nowadays in order to process an extra column bits of information as meaning that we want to cast to a serial/bigserial datatype.	2018-11-09 22:42:31 +01:00
Dimitri Fontaine	6c80404249	Implement support for Redshift "identity" columns. At this stage we don't even parse the details of the Redshift identity such as the seed and step values and consider them the same as a MySQL auto_increment extra description field. Fixes #860 (again).	2018-11-09 22:41:14 +01:00
Dimitri Fontaine	794bc7fc64	Improve redshift support: string_agg() doesn't exist there. Neither does array_agg(), unnest() and other very useful PostgreSQL functions. Redshift is from 8.0 times, so do things the old way: parse the output of the index definition that get from calling pg_index_def(). For that, this patch introduces the notion of SQL support that depends on PostgreSQL major version. If no major-version specific query is found in the pgloader source tree, then we use the generic one. Fixes #860.	2018-11-07 21:23:56 +01:00
Dimitri Fontaine	207cd82726	Improve SQLite type names parsing. Allow spaces in more random places, as SQLite doesn't seem to normalize the user input. Fixes #548 again.	2018-11-07 11:01:06 +01:00
Dimitri Fontaine	f8460c1705	Allow usernames and dbnames starting with digits (again). It turns out that the rules about the names of users and databases are more lax than pgloader would know, so it might be a good move for our DSN parsing to accept more values and then let the source/target systems to complain when something goes wrong. See #230 which got broke again somewhere.	2018-10-20 19:28:19 +02:00
Jason Rigby	6e7ea90806	add cl-ironclad and cl-babel dependencies to docker builds (#854 )	2018-10-18 18:56:40 +02:00
Larry Gebhardt	0e6f599282	Add Docker build instructions (#853 )	2018-10-18 18:55:56 +02:00
Dimitri Fontaine	7b487ddaca	Add a Citus distribution test case, from the citus tutorial.	2018-10-18 15:42:17 +02:00
Dimitri Fontaine	d3b21ac54d	Implement automatic discovery of the Citus distribution rules. With this patch, the following distribution rule distribute companies using id is equivalent to the following distribution rule set, given foreign keys in the source schema: distribute companies using id distribute campaigns using company_id distribute ads using company_id from campaigns distribute clicks using company_id from ads, campaigns distribute impressions using company_id from ads, campaigns In the current code (of this patch) pgloader walks the foreign-keys dependency tree and knows how to automatically derive distribution rules from a single rule and the foreign keys.	2018-10-18 15:31:29 +02:00
Dimitri Fontaine	8112a9b54f	Improve Citus Distribution Support. With this patch it's now actually possible to backfill the data on the fly when using the "distribute" new commands. The schema is modified to add the distribution key where specified, and changes to the primary and foreign keys happen automatically. Then a JOIN is generated to get the data directly during the COPY streaming to the Citus cluster.	2018-10-16 18:53:41 +02:00
Dimitri Fontaine	760763be4b	Use the constraint name when we have it. That's important for Citus, which doesn't know how to ADD a constraint without a name.	2018-10-10 15:44:21 -07:00
Dimitri Fontaine	381ac9d1a2	Add initial support for Citus distribution from pgloader. The idea is for pgloader to tweak the schema from a description of the sharding model, the distribute clause. Here's an example of such a clause: distribute company using id distribute campaign using company_id distribute ads using company_id from campaign distribute clicks using company_id from ads, campaign Given such commands, pgloader adds the distibution key to the table when needed, to the primary key definition of the table, and also to the foreign keys that are pointing to the changed primary key. Then when SELECTing the data from the source database, the idea is for pgloader to automatically JOIN the base table with the source table where to find the distribution key, in case it was just added in the schema. Finally, pgloader also calls the following Citus commands: SELECT create_distributed_table('company', 'id'); SELECT create_distributed_table('campaign', 'company_id'); SELECT create_distributed_table('ads', 'company_id'); SELECT create_distributed_table('clicks', 'company_id');	2018-10-10 14:35:12 -07:00
Dimitri Fontaine	344d0ca61b	Implement AFTER SCHEMA sql code blocks. This allows pgloader users to run SQL commands in between pgloader's schema creation and the actual loading of the data.	2018-10-10 11:08:28 -07:00
Jon Snell	0957bd0efa	Fix pgloader bug #844 by adding support for mssql real types (#845 )	2018-10-05 12:47:54 +02:00
Dimitri Fontaine	d356bd501b	Accept even more ragged date format input. When parsing a date string from a date format, accept that the ms or us part be completely missing, rather than just missing some digits. Fixed #828.	2018-09-10 19:37:36 +02:00
Dimitri Fontaine	5119d864f4	Assorted bug fixes in the context of Redshift support as a source. The catalog queries used in pgloader have to be adjusted for Redshift because this thing forked PostgreSQL 8.0, which is a long time ago now. Also, we had a couple bugs here and there that were not really related to Redshift support but were shown in that context. Fixes #813.	2018-09-04 11:49:21 +02:00
Dimitri Fontaine	0f58a3c84d	Assorted fixes: catalogs SQLtypes and MySQL decoding as. It turns out that when trying to debug "decoding as" the SQLtype listing support in sqltype-list was found broken, so this patch fixes it. Then goes on to fix the DECODING AS filters support, which we have switched to using the better regexp-or-string filter struct but forgot to update the matching code accordingly. Fixes #665.	2018-08-31 22:51:41 -07:00
Dimitri Fontaine	4fbfd9e522	Refrain from using regexp_match() function, introduced in Pg10. Instead use the substring() function which has been there all along. See #813.	2018-08-22 10:52:01 +02:00
Dimitri Fontaine	c9b905b7ac	Simplify our ASD system definition by using :serial t. This allows to drop manually maintained list of files dependencies, instead implying them by the order in which we list the files.	2018-08-20 11:55:47 +02:00
Dimitri Fontaine	cb633aa092	Refrain from some introspections on non-PGDG PostgreSQL variants. When dealing with PostgreSQL protocol compatible databases, often enough they don't support the same catalogs as PostgreSQL itself. Redshift for instance lacks foreign key support.	2018-08-20 11:52:59 +02:00
Dimitri Fontaine	d3bfb1db31	Bugfix previous commit: filter list format changed. We now accept the more general string and regex match rules, but the code to generate including and excluding lists from the catalogs had not been updated.	2018-08-20 11:50:50 +02:00

1 2 3 4 5 ...

1568 Commits