pgloader

mirror of https://github.com/dimitri/pgloader.git synced 2026-01-22 23:51:23 +01:00

Author	SHA1	Message	Date
Dimitri Fontaine	c2c98b8b42	Allow any character in a quoted CSV field name. We used to force overly strict rules for a quoted field name in a CSV load file, now accept any character but a quote to be part of the field name. Fixes #416.	2016-08-07 20:35:37 +02:00
Dimitri Fontaine	70572a2ea7	Implement support for existing target databases. Also known as the ORM case, it happens that other tools are used to create the target schema. In that case pgloader job is to fill in the exiting target tables with the data from the source tables. We still focus on load speed and pgloader will now DROP the constraints (Primary Key, Unique, Foreign Keys) and indexes before running the COPY statements, and re-install the schema it found in the target database once the data load is done. This behavior is activated when using the “create no tables” option as in the following test-case setup: with create no tables, include drop, truncate Fixes #400, for which I got a test-case to play with!	2016-08-06 20:19:15 +02:00
Dimitri Fontaine	42e9e521e0	Add option "max parallel create index". By default, pgloader will start as many parallel CREATE INDEX commands as the maximum number of indexes you have on any single table that takes part in the load. As this number might be so great as to exhaust the target PostgreSQL server (e.g. maintenance_work_mem), we add an option to limit that to something reasonnable when the source schema isn't. Fix #386 in which 150 indexes are found on a single source table.	2016-04-11 17:40:52 +02:00
Dimitri Fontaine	31f8b5c5f0	Set application_name to 'pgloader' by default. It's always been possible to set application_name to anything really, making it easier to follow the PostgreSQL queries made by pgloader. Force that setting to 'pgloader' by default. Fix #387.	2016-04-11 17:14:38 +02:00
Dimitri Fontaine	b1d4e94f2a	Fix integer parsing support for SQLite. The function needs to return a string to be added to the COPY stream, we still need to make sure whatever given here looks like an integer. Given the very dynamic nature of data types in SQLite, the integer-to-string function was already a default now, but failed to be published before in its fixed version, somehow.	2016-03-27 20:42:40 +02:00
Dimitri Fontaine	fe3601b04c	Fix SQLite index support, add foreign keys support. It turns out recent changes broke tne SQLite index support (from adding support for MS SQL partial/filtered indexes), so fix it by using the pgsql-index structure rather than the specific sqlite-idx one. In passing, improve detection of PRIMARY KEY indexes, which was still lacking. This work showed that the introspection done by pgloader was wrong, it's way more crazy that we though, so adjust the code to loop over PRAGMA calls for each object we inspect. While adding PRAGMA calls, add support for foreign keys too, we have the code infrastructure that makes it easy now.	2016-03-27 20:39:13 +02:00
Dimitri Fontaine	8476c1a359	Allow setting search_path with multiple schemas. The PostgreSQL search_path allows multiple schemas and might even need it to be able to reference types and other tables. Allow setting more than one schema by using the fact that PostgreSQL schema names don't need to be individually quoted, and passing down the exact content of the SET search_path value down to PostgreSQL. Fix #359.	2016-03-20 20:54:08 +01:00
Dimitri Fontaine	c724018840	Implement ALTER TABLE clause for MySQL migrations. The new ALTER TABLE facility allows to act on tables found in the MySQL database before the migration happens. In this patch the only provided actions are RENAME TO and SET SCHEMA, which fixes #224. In order to be able to provide the same option for MS SQL users, we will have to make it work at the SCHEMA level (ALTER SCHEMA ... RENAME TO ...) and modify the internal schema-struct so that the schema slot of our table instances are a schema instance rather than its name. Lacking MS SQL test database and instance, the facility is not yet provided for that source type.	2016-03-06 21:51:33 +01:00
Dimitri Fontaine	68aa205db5	Also commit SQLite test case changes. See #351 for context, this adds a proper test case.	2016-03-03 14:59:57 +01:00
Dimitri Fontaine	40c1581794	Review transaction and error handling in COPY. The PostgreSQL COPY protocol requires an explicit initialization phase that may fail, and in this case the Postmodern driver transaction is already dead, so there's no way we can even send ABORT to it. Review the error handling of our copy-batch function to cope with that fact, and add some logging of non-retryable errors we may have. Also improve the thread error reporting when using a binary image from where it might be difficult to open an interactive debugger, while still having the full blown Common Lisp debugging experience for the project developers. Add a test case for a missing column as in issue #339. Fix #339, see #337.	2016-02-21 15:56:06 +01:00
Dimitri Fontaine	c108b85290	Allow package prefix in CAST ... USING clause. Also, in passing, ass a new transformation function for MySQL allowing to transform from varbinary to text.	2016-02-04 16:09:22 +01:00
Dimitri Fontaine	d9d9e06c0f	Another attempt at fixing #323 . Rather than trying hard to have PostgreSQL fully qualify the index name with tricks around search_path setting at the time ::regclass is executed, simply join on pg_namespace to retrieve that schema in a new slot in our pgsql-index structure so that we can then reuse it when needed. Also add a test case for the scenario, including both a UNIQUE constraint and a classic index, because the DROP and CREATE/ALTER instructions differ.	2016-01-17 01:54:36 +01:00
Dimitri Fontaine	7dd69a11e1	Implement concurrency and workers for files sources. More than the syntax and API tweaks, this patch also make it so that a multi-file specification (using e.g. ALL FILENAMES IN DIRECTORY) can be loaded with several files in the group in parallel. To that effect, tweak again the md-connection and md-copy implementations.	2016-01-16 22:53:55 +01:00
Dimitri Fontaine	eb45bf0338	Expose concurrency settings to the end users. Add the workers and concurrency settings to the LOAD commands for database sources so that users can tweak them now, and add mentions of them in the documentation too. From the documentation string of the copy-from method as found in src/sources/common/methods.lisp: We allow WORKER-COUNT simultaneous workers to be active at the same time in the context of this COPY object. A single unit of work consist of several kinds of workers: - a reader getting raw data from the COPY source with `map-rows', - N transformers preparing raw data for PostgreSQL COPY protocol, - N writers sending the data down to PostgreSQL. The N here is setup to the CONCURRENCY parameter: with a CONCURRENCY of 2, we start (+ 1 2 2) = 5 concurrent tasks, with a CONCURRENCY of 4 we start (+ 1 4 4) = 9 concurrent tasks, of which only WORKER-COUNT may be active simultaneously. Those options should find their way in the remaining sources, that's for a follow-up patch tho.	2016-01-15 23:22:32 +01:00
Dimitri Fontaine	44a2bd14d4	Fix custom CAST rules with expressions, fix #322 . In a previous commit the typemod matching code had been broken, and we failed to notice that until now. Thanks to bug report #322 we just got the memo... Add a test case in the local-only MySQL database. The regression testing facilities should be improved to be able to test a full database, and then to dynamically create said database from code or something to ease test coverage of those cases.	2016-01-12 14:55:17 +01:00
Dimitri Fontaine	d84ec3f808	Add SQLite test case for before/after load commands. See bug #321, this change should have been part of previous commit.	2015-12-23 21:58:56 +01:00
Dimitri Fontaine	b4bfa18877	Fix more table name quoting, fix #163 again. Now that we can have several threads doing COPY, each of them need to know about the pgsql-reserved-keywords list. Make sure that's the case and in passing fix some call sites to apply-identifier-case. Also, more disturbingly, fix the code so that TRUNCATE is called from the main thread before giving control to the COPY threads, rather than having two concurrent threads doing the TRUNCATE twice. It's rather strange that we got no complaint from the field on that part...	2015-12-08 11:52:43 +01:00
Dimitri Fontaine	93b6be43d4	Travis: adapt to PostgreSQL 9.1, again. We didn't have CREATE SCHEMA IF EXISTS at the time...	2015-11-23 22:09:08 +01:00
Dimitri Fontaine	973339abc8	Add a SQLite test case from #310 .	2015-11-22 22:16:05 +01:00
Dimitri Fontaine	e23de0ce9f	Improve SQLite table names filtering. Filter the list of tables we migrate directly from the SQLite query, avoiding to return useless data. To do that, use the LIKE pattern matching supported by SQLite, where the REGEX operator is only available when extra features are loaded apparently. See #310 where filtering out the view still caused errors in the loading.	2015-11-22 22:10:26 +01:00
Dimitri Fontaine	150d288d7a	Improve our regression testing facility. Next parallelism improvements will allow pgloader to use more than one COPY thread to load data, with the impact of changing the order of rows in the database. Rather than doing a copy out and `diff` of the data just loaded, load the reference data and do the diff in SQL: select * from loaded.data except select * from expected.data If such a query returns any row, we know we didn't load what was expected and the regression test is failing. This regression testing facility should also allow us to finally add support for multiple-table regression tests (sqlite, mysql, etc).	2015-11-17 17:03:08 +01:00
Dimitri Fontaine	933d1c8d6b	Add test case for #302 .	2015-10-22 22:35:32 +02:00
Dimitri Fontaine	6bf26c52ec	Implement a TimeZone option for IXF loading. The local-time:encode-timestamp function takes a default timezone and it is necessary to have control over it when loading from pgloader. Hence, add a timezone option to the IXF option list, that is now explicit and local to the IXF parser rather than shared with the DBF option list.	2015-10-05 16:46:15 +02:00
Dimitri Fontaine	96a33de084	Review the stats and reporting code organisation. In order to later be able to have more worker threads sharing the load (multiple readers and/or writers, maybe more specialized threads too), have all the stats be managed centrally by a single thread. We already have a "monitor" thread that get passed log messages so that the output buffer is not subject to race conditions, extend its use to also deal with statistics messages. In the current code, we send a message each time we read a row. In some future commits we should probably reduce the messaging here to something like one message per batch in the common case. Also, as a nice side effect of the code simplification and refactoring this fixes #283 wherein the before/after sections of individual CSV files within an ARCHIVE command where not counted in the reporting.	2015-10-05 01:46:29 +02:00
Dimitri Fontaine	598c860cf5	Improve user code parsing, fix #297 . To be able to use "t" (or "nil") as a column name, pgloader needs to be able to generate lisp code where those symbols are available. It's simple enough in that a Common Lisp package that doesn't :use :cl fullfills the condition, so intern user symbols in a specially crafted package that doesn't :use :cl. Now, we still need to be able to run transformation code that is using the :cl package symbols and the pgloader.transforms functions too. In this commit we introduce a heuristic to pick symbols either as functions from pgloader.transforms or anything else in pgloader.user-symbols. And so that user code may use NIL too, we provide an override mechanism to the intern-symbol heuristic and use it only when parsing user code, not when producing Common Lisp code from the parsed load command.	2015-09-21 13:23:21 +02:00
Dimitri Fontaine	78c6bf097a	Fix the build again. Once more did I change a test file data and forgot to commit the changes to the expected file of the regression test.	2015-09-12 00:40:15 +02:00
Dimitri Fontaine	98f18c4877	Improve CSV date format, fix #293 . The date format wouldn't allow using colon (:) in the noise parts of it, and would also insist that milliseconds should be on 4 digits and micro seconds on 6 digits. Allow for "ragged" input and take however many digits we actually find in the input.	2015-09-12 00:35:14 +02:00
Dimitri Fontaine	a0dc59624c	Fix schema qualified table names usage again. When the list of columns of the PostgreSQL target table isn't given in the load command, pgloader will happily query the system catalogs to get that information. The list-columns query didn't get the memo about the qualified table name format and the with-schema macro... fix #288.	2015-09-11 11:53:28 +02:00
Dimitri Fontaine	e054eb3838	Travis: set PGTZ in regress.sh The TimeZone parameter should be set both for input and for output in order to match our expected result file. Let's try to set PGTZ in the shell environment...	2015-09-07 20:24:00 +02:00
Dimitri Fontaine	60d58a96b8	Travis: let's try to force client timezone. The cvs-parse-date test is failing on Travis because the server up there in the Cloud isn't using the same timezone as my local machine. Let's just force the timezone in the SET clause...	2015-09-07 20:00:05 +02:00
Dimitri Fontaine	3f539b7384	Travis: update expected output file. Forgot to update the expected output file in the previous commit, which Travis is rightfully complaining about...	2015-09-07 17:47:03 +02:00
Dimitri Fontaine	04b2779239	Allow date format parsing to support time. A useful use case for date parsing at tine input level is to parse time (hour, minutes, seconds) rather than a full date (timestamp). Improve the code so that it's possible to use the date format facility even when the data field lacks the year/month/day information. Fix #288.	2015-09-07 17:05:10 +02:00
Dimitri Fontaine	eabfbb9cc8	Fix schema qualified table names usage (more). When parsing table names in the target URI, we are careful of splitting the table and schema name and store them into a cons in that case. Not all sources methods got the memo, clean that up. See #182 and #186, a pull request I am now going to be able to accept. Also see #287 that should be helped by being able to apply #186.	2015-09-04 01:06:15 +02:00
Dimitri Fontaine	b78bb6dd31	Allow quoted field names to contain spaces, fix #285 . Given a fully quoted field name, there should be no restriction about using spaces in between the quotes, but the parser used to choke on that case.	2015-09-01 14:32:50 +02:00
Dimitri Fontaine	75727df72f	Quote table names when migrating from SQLite, fix #281 . Apparently I just forgot to apply any smartness whatsoever to SQLite identifiers and just copied them as they are to PostgreSQL. Change that by calling apply-identifier-case.	2015-08-25 01:13:19 +02:00
Dimitri Fontaine	573a63cd3a	Add a local test, per #271 .	2015-08-24 16:42:39 +02:00
Dimitri Fontaine	04ddf940d9	Left pad COPY octal chars with 0, fix #275 . The COPY TEXT format accepts non printable characters with an escaped sequence wherin pgloader can pass in the octal number for the character in its encoding. When doing that with small numbers like \6 and the non-printable character is then followed by other numbers, then it becomes e.g. \646 which might not be part of the target encoding... To fix, always left pad the character octal number with zeroes, so that we now send in \00646 which COPY knows how to read: the char at \006 then 4 then 6. Also copy the test case over to pgloader and run it in the test suite.	2015-08-20 18:17:18 +02:00
Dimitri Fontaine	833b41c23b	Fix the regression test expected values, see #266 .	2015-07-26 14:45:43 +02:00
Dimitri Fontaine	b4b36caa84	Fix parsing dates with less-than 4 digits, fix #266 . The previous coding decided to add 2000 to the year as an integer if it was below 2000, which parses 1999 as 3999. Oops. Trigger the correction only when the date is given on 2 digits only, parsing 04 as 2004. Dates given on 3 digits are kept as-is. Playing with the century special parameter allows to cancel this behavior, that maybe should be made entirely optional. It's just too common to find current years on 2 digits only, sadly.	2015-07-26 14:41:44 +02:00
Dimitri Fontaine	d2a1a5643e	Improve SQL blocks support, fix #265 . It's now possible to use several files in a BEFORE LOAD EXECUTE section, and to mix DO and EXECUTE parts, bringing lots of flexibility in the commands. Also it actually simplifies the parser.	2015-07-24 17:41:35 +02:00
Dimitri Fontaine	54e29773d7	Fix index creation reporting, see #251 . The new option 'drop indexes' reuses the existing code to build all the indexes in parallel but failed to properly account for that fact in the summary report with timings. While fixing this, also fix the SQL used to re-establish the indexes and associated constraints to allow for parallel execution, the ALTER TABLE statements would block in ACCESS EXCLUSIVE MODE otherwise and make our efforts vain.	2015-07-18 23:06:15 +02:00
Dimitri Fontaine	a98788b670	Implement drop indexes option for copy and fixed. The option doesn't seem relevant to the db3 source type which contains a table definition: pgloader will create the table from scratch and no indexes are going to be found.	2015-07-16 21:39:06 +02:00
Dimitri Fontaine	49bf7e56f2	Implement a "drop indexes" option in CSV mode, fix #251 . When loading against a table that already has index definitions, the load can be quite slow. Previous commit introduced a warning in such a case. This commit introduces the option "drop indexes" that is not used by default. When this option is used, pgloader drops the indexes before loading the data then create the indexes again with the same definitions as before. All the indexes are created again in parallel to optimize performances. Only primary key indexes can't be created in parallel, so those are created in two steps (create unique index then alter table).	2015-07-16 12:22:58 +02:00
Dimitri Fontaine	1c7de22096	Add test coverage for #80 .	2015-06-25 14:16:12 +02:00
Dimitri Fontaine	7e508374c4	Add some SQLite test cases for real type, see #249 .	2015-06-16 23:13:52 +02:00
Dimitri Fontaine	c3b5d60542	Fix type declaration to include null values, fix #238 . In passing, add a test case for NIL datetime values in our SQLite sample database.	2015-05-22 23:49:03 +02:00
Dimitri Fontaine	ba7b27b60a	Travis: actually push the right version of the expected file.	2015-05-22 12:41:29 +02:00
Dimitri Fontaine	bffec4cc63	Allow for more options in the CSV escape character, fix #38 . To allow for importing JSON one-liners as-is in the database it can be interesting to leverage the CSV parser in a compatible setup. That setup requires being able to use any separator character as the escape character.	2015-05-22 12:31:06 +02:00
Dimitri Fontaine	dc86b5e600	Travis: Fix the text case connection string.	2015-05-21 16:29:14 +02:00
Dimitri Fontaine	abbc105c41	Implement CSV headers support. Some CSV files are given with an header line containing the list of their column names, use that when given the option "csv header". Note that when both "skip header" and "csv header" options are used, pgloader first skip as many required lines and then uses the next one as the csv header. Because of temporary failure to install the `ronn` documentation tool, this patch only commits the changes to the source docs and omits to update the man page (pgloader.1). A following patch is intended to be pushed that fixed that. See #236 which is using shell tricks to retrieve the field list from the CSV file itself and motivated this patch to finally get written.	2015-05-21 12:55:23 +02:00

1 2 3 4

170 Commits