pgloader

mirror of https://github.com/dimitri/pgloader.git synced 2026-02-17 20:31:31 +01:00

Author	SHA1	Message	Date
Dimitri Fontaine	1ed07057fd	Implement --on-error-stop command line option. The implementation uses the dynamic binding on-error-stop so it's also available when pgloader is used as Common Lisp librairy. The (not-all-that-) recent changes made to the error handling make that implementation straightforward enough, so let's finally do it! Fix #85.	2016-03-21 20:52:50 +01:00
Dimitri Fontaine	8476c1a359	Allow setting search_path with multiple schemas. The PostgreSQL search_path allows multiple schemas and might even need it to be able to reference types and other tables. Allow setting more than one schema by using the fact that PostgreSQL schema names don't need to be individually quoted, and passing down the exact content of the SET search_path value down to PostgreSQL. Fix #359.	2016-03-20 20:54:08 +01:00
Dimitri Fontaine	63c3b3b1c7	Fix MS SQL text values processing. The previous code required non-zero data length for all MS SQL returned values, where it makes no sense for text like values (an empty string is ok). Also, the code was trimming spaces from both ends on syb-char data, and in testing that return type is used for varchar too. Fix #366. Fix #368.	2016-03-20 20:15:02 +01:00
Dimitri Fontaine	4155d06ae5	Improve support for MS SQL multicolumn indexes. Once more we can't use an aggregate over a text column in MS SQL to build the index definition from its catalog structure, so we have to do that in the lisp part of the code. Multi-column indexes are now supported, but filtered indexes still are a problem: the WHERE clause in MS SQL is not compatible with the PostgreSQL syntax (because of [names] and type casting. For example we cast MS SQL bit to PostgreSQL boolean, so WHERE ([deleted]=(0)) should be translated to WHERE not deleted And the code to do that is not included yet. The following documentation page offers more examples of WHERE expression we might want to support: https://technet.microsoft.com/en-us/library/cc280372.aspx WHERE EndDate IS NOT NULL AND ComponentID = 5 AND StartDate > '01/01/2008' EndDate IN ('20000825', '20000908', '20000918') It might be worth automating the translation to PostgreSQL syntax and operators, but it's not done in this patch. See #365, where the created index will now be as follows, which is a problem because of being UNIQUE: some existing data won't reload fine. CREATE UNIQUE INDEX idx_<oid>_foo_name_unique ON dbo.foo (name, type, deleted);	2016-03-18 11:01:06 +01:00
Dimitri Fontaine	d2a1ac639f	Fix MS SQL foreign key support. Avoid registering the first column name twice in the foreign key definition.	2016-03-16 22:01:01 +01:00
Dimitri Fontaine	4cb83ec6a5	DEBUG mode should list all SQL queries sent. Even for MS SQL source.	2016-03-16 21:55:40 +01:00
Dimitri Fontaine	3e8b7df0d3	Improve column formatting. Have a pretty-print option where we try to be nice for the reader, and don't use it in the CAST debug messages. Also allow working with the real maximum length of column names rather than hardcoding 22 cols...	2016-03-16 21:46:41 +01:00
Dimitri Fontaine	f1fe9ab702	Assorted fixes to MS SQL support. Having been given a test instance of a MS SQL database allows to quickly fix a series of assorted bugs related to schema handling of MS SQL databases. As it's the only source with a proper notion of schema that pgloader supports currently, it's not a surprise we had them. Fix #343. Fix #349. Fix #354.	2016-03-16 21:43:04 +01:00
Dimitri Fontaine	c1fc4f0879	Review MySQL foreign key introspection SQL query. It turns out sloppy SQL code made its way to pgloader wherein the GROUP BY clause of the foreign key listing wasn't reference the whole set of non aggregated output columns. Thanks to thiagokronig for the new query, which fixes #345.	2016-03-09 18:36:44 +01:00
Dimitri Fontaine	b7a873c03f	Drop default value on bigserial CAST in MS SQL. This is a blind attempt to fix #354.	2016-03-09 18:30:18 +01:00
Dimitri Fontaine	57f7fd1d4e	Find foreign keys with #'string= by default. Blind attempt at fixing #343 and #330, which now is on at the same level.	2016-03-09 16:33:44 +01:00
Dimitri Fontaine	c724018840	Implement ALTER TABLE clause for MySQL migrations. The new ALTER TABLE facility allows to act on tables found in the MySQL database before the migration happens. In this patch the only provided actions are RENAME TO and SET SCHEMA, which fixes #224. In order to be able to provide the same option for MS SQL users, we will have to make it work at the SCHEMA level (ALTER SCHEMA ... RENAME TO ...) and modify the internal schema-struct so that the schema slot of our table instances are a schema instance rather than its name. Lacking MS SQL test database and instance, the facility is not yet provided for that source type.	2016-03-06 21:51:33 +01:00
Dimitri Fontaine	d4737a39ca	Leave ssl lib alone in src/hooks.lisp. That means we no longer eagerly load it when we think we will need it, and also refrain from unloading it from the binary at image saving time. In my local tests, doing so fix #330 by avoiding the error entirely in the docker image, where obviously the libs found at build-time are found again at the same place at run time.	2016-03-05 22:45:59 +01:00
Dimitri Fontaine	486be8c068	SQLite integer default values might be quoted. Fix #351 by having a new transformation function to process SQLite integers, that may be quoted...	2016-03-03 14:59:27 +01:00
Dimitri Fontaine	62edd5a2c8	Register "nocase" as a SQLite noise word. SQLite types include "text nocase" apparently, so add "nocase" as one of the managed noise words. It might be time we handle those the other way round, with a whitelist of expected tokens somewhere in the type definition rather than a blacklist of unknown words to exclude... Anyway, fix #350.	2016-03-03 00:21:43 +01:00
Dimitri Fontaine	b026a860c1	Fix MS SQL fetch metadata function. It should return the fetched catalog rather than the count of objects, which is only used for statistics purposes. Fix #349. This problem once again shows that we lack proper testing environment for MS SQL source :/	2016-03-02 16:20:55 +01:00
Dimitri Fontaine	eaa5807244	Adapt to CURRENT_TIMESTAMP(x) default values. We target CURRENT_TIMESTAMP as the PostgreSQL default value for columns when it was different before on the grounds that the type casting in PostgreSQL is doing the job, as in the following example: pgloader# create table test_ts(ts timestamptz(6) not null default CURRENT_TIMESTAMP); CREATE TABLE pgloader# insert into test_ts VALUES(DEFAULT); INSERT 0 1 pgloader# table test_ts; ts ------------------------------- 2016-02-24 18:32:22.820477+01 (1 row) pgloader# drop table test_ts; DROP TABLE pgloader# create table test_ts(ts timestamptz(0) not null default CURRENT_TIMESTAMP); CREATE TABLE pgloader# insert into test_ts VALUES(DEFAULT); INSERT 0 1 pgloader# table test_ts; ts ------------------------ 2016-02-24 18:32:44+01 (1 row) Fix #341.	2016-02-24 18:30:16 +01:00
Dimitri Fontaine	40c1581794	Review transaction and error handling in COPY. The PostgreSQL COPY protocol requires an explicit initialization phase that may fail, and in this case the Postmodern driver transaction is already dead, so there's no way we can even send ABORT to it. Review the error handling of our copy-batch function to cope with that fact, and add some logging of non-retryable errors we may have. Also improve the thread error reporting when using a binary image from where it might be difficult to open an interactive debugger, while still having the full blown Common Lisp debugging experience for the project developers. Add a test case for a missing column as in issue #339. Fix #339, see #337.	2016-02-21 15:56:06 +01:00
Dimitri Fontaine	9512ab187e	Fix the fix, see #343 . Someday I should either stop working on pgloader in between other things or have a better test suite, including MS SQL and all. Probably both. And read compiler notes and warnings too, while at that...	2016-02-20 14:15:13 +01:00
Dimitri Fontaine	197258951c	Improve MS SQL usage of the schema structs. The function qualify-name is not in use anymore, but the MSSQL parts didn't get the memo... fix #343.	2016-02-19 17:55:54 +01:00
Dimitri Fontaine	765bbb70aa	Fix auto_increment support in cast rules. This fixes #141 again when users are forcing MySQL bigint(20) into PostgreSQL bigint types so that foreign keys can be installed. To this effect, as cast rule such as the following is needing: cast type bigint when (= 20 precision) to bigint drop typemod Before this patch, this user provided cast rule would also match against MySQL types "with extra auto_increment", and it should not. If you're having the problem that this patch fixes on an older pgloader that you can't or won't upgrade, consider the following user provided set of cast rules to achieve the same effect: cast type bigint with extra auto_increment to bigserial drop typemod, type bigint when (= 20 precision) to bigint drop typemod	2016-02-05 21:26:31 +01:00
Dimitri Fontaine	c108b85290	Allow package prefix in CAST ... USING clause. Also, in passing, ass a new transformation function for MySQL allowing to transform from varbinary to text.	2016-02-04 16:09:22 +01:00
Dimitri Fontaine	782561fd4e	Handle default value transforms errors, fix #333 . It turns out that MySQL catalog always store default value as strings even when the column itself is of type bytea. In some cases, it's then impossible to transform the expected bytea from a string. In passing, move some code around to fix dependencies and make it possible to issue log warnings from the default value printing code.	2016-02-03 12:27:58 +01:00
Dimitri Fontaine	029ea0027a	Upgrade version string. We just tagged the repository as version 3.3.0.50 to be able to release an experimental pgloader bundle, and we did tag the repository. The first commit after that should then change the version string.	2016-01-31 21:49:43 +01:00
Dimitri Fontaine	76668c2626	Review package dependencies. The decision to use lots of different packages in pgloader has quite strong downsides at times, and the manual managment of dependencies is one of the, in particular how to avoid circular ones.	2016-01-31 18:42:01 +01:00
Dimitri Fontaine	64ab4d28dc	Error out when using ignored options. In the theory that it's a better service to the user to refuse doing anything at all rather than ignore his/her commands, print out FATAL errors when options are used that are incompatible with a load command file. See #327 for a case where this did happen. In passing, tweak our report code to avoid printing the footer when we didn't print anything at all previously.	2016-01-25 11:46:36 +01:00
Dimitri Fontaine	4e36bd3c55	Improve threads error handling. See #328 where we are lacking useful stack trace in a --debug run because of the previous talk-handler-bind coding, that was there to avoid sinking the users into too many details. Let's try another approach here.	2016-01-24 21:43:46 +01:00
Dimitri Fontaine	b2ec66c84b	Force external-format of the logs files, see #328 . In the issue #328 the --debug level output is not helpful because of an encoding error in the logfile. Let's see about forcing the log file external format to utf-8 then.	2016-01-20 21:53:13 +01:00
Dimitri Fontaine	327745110a	MySQL bytea default value can be "". Fix 291. Thanks to a reproducable test case we can see that MySQL default for a varbinary column is an empty string, so tweak the transform function byte-vector-to-bytea in order to cope with that.	2016-01-18 21:55:01 +01:00
Dimitri Fontaine	d9d9e06c0f	Another attempt at fixing #323 . Rather than trying hard to have PostgreSQL fully qualify the index name with tricks around search_path setting at the time ::regclass is executed, simply join on pg_namespace to retrieve that schema in a new slot in our pgsql-index structure so that we can then reuse it when needed. Also add a test case for the scenario, including both a UNIQUE constraint and a classic index, because the DROP and CREATE/ALTER instructions differ.	2016-01-17 01:54:36 +01:00
Dimitri Fontaine	7dd69a11e1	Implement concurrency and workers for files sources. More than the syntax and API tweaks, this patch also make it so that a multi-file specification (using e.g. ALL FILENAMES IN DIRECTORY) can be loaded with several files in the group in parallel. To that effect, tweak again the md-connection and md-copy implementations.	2016-01-16 22:53:55 +01:00
Dimitri Fontaine	aa8b756315	Fix when to create indexes. In the recent refactoring and improvements of parallelism the indexes creation would kick in before we know that the data is done being copied over to the target table. Fix that by maintaining a writers-count hashtable and only starting to create indexes when that count reaches zero, meaning all the concurrent tasks started to handle the COPY of the data are now done.	2016-01-16 19:50:21 +01:00
Dimitri Fontaine	dcc8eb6d61	Review api around worker-count. It was worker-count and it's now exposed as the worker in the WITH clause, but we can actually keep it as worker-count in the internal API, and it feels better that way.	2016-01-16 19:49:52 +01:00
Dimitri Fontaine	eb45bf0338	Expose concurrency settings to the end users. Add the workers and concurrency settings to the LOAD commands for database sources so that users can tweak them now, and add mentions of them in the documentation too. From the documentation string of the copy-from method as found in src/sources/common/methods.lisp: We allow WORKER-COUNT simultaneous workers to be active at the same time in the context of this COPY object. A single unit of work consist of several kinds of workers: - a reader getting raw data from the COPY source with `map-rows', - N transformers preparing raw data for PostgreSQL COPY protocol, - N writers sending the data down to PostgreSQL. The N here is setup to the CONCURRENCY parameter: with a CONCURRENCY of 2, we start (+ 1 2 2) = 5 concurrent tasks, with a CONCURRENCY of 4 we start (+ 1 4 4) = 9 concurrent tasks, of which only WORKER-COUNT may be active simultaneously. Those options should find their way in the remaining sources, that's for a follow-up patch tho.	2016-01-15 23:22:32 +01:00
Dimitri Fontaine	fb40a472ab	Simplify database WITH option handling. Share more code by having a common flattening function as a semantic predicate in the grammar.	2016-01-15 22:34:27 +01:00
Dimitri Fontaine	bfdbb2145b	Fix with drop index option, fix #323 . Have PostgreSQL always fully qualify the index related objects and SQL definition statements when fetching the list of indexes of a table, by playing with an empty search_path. Also improve the whole index creation by passing the table object as the context where to derive the table-name from, so that schema qualified tables are taken into account properly.	2016-01-15 15:04:07 +01:00
Dimitri Fontaine	1ff204c172	Typo fix.	2016-01-15 14:45:19 +01:00
Dimitri Fontaine	44a2bd14d4	Fix custom CAST rules with expressions, fix #322 . In a previous commit the typemod matching code had been broken, and we failed to notice that until now. Thanks to bug report #322 we just got the memo... Add a test case in the local-only MySQL database. The regression testing facilities should be improved to be able to test a full database, and then to dynamically create said database from code or something to ease test coverage of those cases.	2016-01-12 14:55:17 +01:00
Dimitri Fontaine	2c200f5747	Improve error handling for pkeys creation. When creating the primary keys on top of the unique indexes, we might still have errors (e.g. with NULL values). Make it so that a failure in one pkey doesn't fail every other one, by having them all run within a single connection rather than a single transaction.	2016-01-12 14:53:42 +01:00
Dimitri Fontaine	133028f58d	Desultory review code indentation.	2016-01-12 14:52:44 +01:00
Dimitri Fontaine	ee69b8d4ce	Randomly tweak batch sizes. In order to avoid all concurrently prepared batches of rows to get sent to PostgreSQL COPY command at the same time exactly, randomly vary the size of each batch between -30% and +30% of the batch rows parameter.	2016-01-11 21:29:29 +01:00
Dimitri Fontaine	f256e12a4f	Review load parallelism settings. pgloader parallel workload is still hardcoded, but at least the code now uses clear parameters as input so that it will be possible in a later patch to expose them to the end-user. The notions of workers and concurrency are now handled as follows: - concurrency is how many tasks are allowed to happen at once, by default we have a reader thread, a transformer thread and a COPY thread all actives for each table being loaded, - worker-count is how many parallel threads are allowed to run simultaneously and default to 8 currently, which means that in a typical migration from a database source and given default concurrency or 1 (3 threads), we might be loaded up to 3 different tables at any time. The idea is to expose those settings to the user in the load file and as command line options (such as --jobs) and see what it gives us. It might help e.g. use more cores in loading a single CSV file. As of this patch, there still can only be only one reader thread and the number of transformer threads must be the same as the number of COPY threads. Finally, the CSV-like files user-defined projections are now handled in the tranformation threads rather than in the reader thread...	2016-01-11 01:43:38 +01:00
Dimitri Fontaine	94ef8674ec	Typo fix (of sorts) Some API didn't get the table-name to table memo...	2016-01-11 01:42:18 +01:00
Dimitri Fontaine	a3fd22acd3	Review pgloader encoding story. Thanks to Common Lisp character data type, it's easy for pgloader to enforce always speaking to PostgreSQL in utf-8, and that's what has been done from the beginning actually. Now, without good reason for that, the first example of a SET clause that has been added to the docs where about how to set client_encoding, which should NOT be done. Fix that at the use level by removing the bad example from the docs and adding a WARNING whenever the client_encoding is set to a known bad value. It's a WARNING because we then simply force 'utf-8' anyway. Also, review completely the format-vector-row function to avoid doing double work with the Postmodern facilities we piggyback on. This was done halfway through and the utf-8 conversion was actually done twice.	2016-01-11 01:27:36 +01:00
Dimitri Fontaine	d60b64c03b	Implement MS SQL newsequentialid() default value. We convert the default value call to newsequentialid() into a call to the PostgreSQL uuid-ossp uuid_generate_v1() which seems like the equivalent function. The extension "uuid-ossp" needs to be installed in the target database. (Blind) Fix #246.	2016-01-08 22:43:38 +01:00
Dimitri Fontaine	8a596ca933	Move connection into utils. There's no reason why this file should be in the src/ top-level.	2016-01-07 16:42:43 +01:00
Dimitri Fontaine	d1a2e3f46b	Improve the Dockerfile and the versioning. When building from sources within the git environement, the version number is ok, but it was wrong when building in the docker image. Fix the version number to 3.3.0.50 to show that we're talking about a development snapshot that is leading to version 3.3.1. Yeah, 4 parts version numbers. That happens, apparently.	2016-01-07 10:21:52 +01:00
Dimitri Fontaine	1bbbf96ba7	Fix minor API glitch/typo.	2016-01-04 21:01:15 +01:00
Dimitri Fontaine	a7291e9b4b	Simplify copy-database implementation further. Following-up to the recent refactoring effort, the IXF and DB3 source classes didn't get the memo that they could piggyback on the generic copy-database implementation. This patch implements that. In passing, also simplify the instanciate-table-copy-object method for copy subclasses that need specialization here, by using change-class and call-next-method so as to reuse the generic code as much as possible.	2016-01-01 14:28:09 +01:00
Dimitri Fontaine	24cd0de9f7	Install the :create-schemas option back. In the previous refactoring patch that option mistakenly went away, although it is still needed for MS SQL and it is planned to make use of it in the other source types too... See #316 for reference.	2016-01-01 13:35:35 +01:00

1 2 3 4 5 ...

570 Commits