pgloader

mirror of https://github.com/dimitri/pgloader.git synced 2025-08-13 17:56:59 +02:00

Author	SHA1	Message	Date
Dimitri Fontaine	787be7f188	Review fixed source import. The clone method was missing specific slots of fixed-copy class.	2016-03-26 18:33:04 +01:00
Dimitri Fontaine	7dd69a11e1	Implement concurrency and workers for files sources. More than the syntax and API tweaks, this patch also make it so that a multi-file specification (using e.g. ALL FILENAMES IN DIRECTORY) can be loaded with several files in the group in parallel. To that effect, tweak again the md-connection and md-copy implementations.	2016-01-16 22:53:55 +01:00
Dimitri Fontaine	41e9eebd54	Rationalize common generic API implementation. When devising the common API, the first step has been to implement specific methods for each generic function of the protocol. It now appears that in some cases we don't need the extra level of flexibility: each change of the API has been systematically reported to all the specific methods, so just use a single generic definition where possible. In particular, introduce new intermediate class for COPY subclasses allowing to share more common code in the methods implementation, rather than having to copy/paste and maintain several versions of the same code. It would be good to be able to centralize more code for the database sources and how they are organized around metadata/import-data/complete schema, but it doesn't look obvious how to do it just now.	2015-10-05 21:25:21 +02:00
Dimitri Fontaine	96a33de084	Review the stats and reporting code organisation. In order to later be able to have more worker threads sharing the load (multiple readers and/or writers, maybe more specialized threads too), have all the stats be managed centrally by a single thread. We already have a "monitor" thread that get passed log messages so that the output buffer is not subject to race conditions, extend its use to also deal with statistics messages. In the current code, we send a message each time we read a row. In some future commits we should probably reduce the messaging here to something like one message per batch in the common case. Also, as a nice side effect of the code simplification and refactoring this fixes #283 wherein the before/after sections of individual CSV files within an ARCHIVE command where not counted in the reporting.	2015-10-05 01:46:29 +02:00
Dimitri Fontaine	04aa743eb7	Cleanup file based "connections". When the notion of a connection class with a generic set of method was invented, the very flexible specification formats available for the file based sources where not integrated into the new connection system. This patch provides a new connection class md-connection with a specific sub-protocol (after opening a connection, the caller is supposed to loop around open-next-stream) so that it's possible to both properly fit into the connection concept and to better share the code in between our three implementation (csv, copy, fixed).	2015-08-24 16:33:00 +02:00
Dimitri Fontaine	54e29773d7	Fix index creation reporting, see #251 . The new option 'drop indexes' reuses the existing code to build all the indexes in parallel but failed to properly account for that fact in the summary report with timings. While fixing this, also fix the SQL used to re-establish the indexes and associated constraints to allow for parallel execution, the ALTER TABLE statements would block in ACCESS EXCLUSIVE MODE otherwise and make our efforts vain.	2015-07-18 23:06:15 +02:00
Dimitri Fontaine	a98788b670	Implement drop indexes option for copy and fixed. The option doesn't seem relevant to the db3 source type which contains a table definition: pgloader will create the table from scratch and no indexes are going to be found.	2015-07-16 21:39:06 +02:00
Dimitri Fontaine	48f451bdbc	Implement the option to disable triggers when loading data. This option is dangerous and allows to skip ALL triggers when loading data against PostgreSQL. This includes foreign key constraints definitions and will allow loading data out of order. When using both the options "create no table" and "disable triggers" it will be possible to load data into a schema prepared by your favorite external tool, at the cost of not validating FK constraints. Use with care. Fix #167.	2015-02-19 15:05:10 +01:00
Dimitri Fontaine	d494fbd4ca	Fix fixed and copy connection initialisation methods.	2015-01-16 10:06:51 +01:00
Dimitri Fontaine	e1bc6425e2	Implement support for PostgreSQL COPY format, fix #145 . PostgreSQL COPY format is not really CSV but something way easier to parse. Funnily enough, parsing it as CSV is not that easy, so we add here a special simple parser for the COPY format. It should be quite useful too try loading again reject data files from pgloader after manual fixing, too. It's still missing some documentation without any good excuse for that, will add soon.	2015-01-02 18:49:17 +01:00
Dimitri Fontaine	302a7d402b	Refactor connection handling, and clean-up many things. That's the big refactoring patch I've been sitting on for too long. First, refactor connection handling to use a uniformed "connection" concept (class and generic functions API) everywhere, so that the COPY derived objects just use that in their :source-db and :target-db slots. Given that, we don't need no messing around with pgconn and myconn- and other special variables at all anywhere in the tree. Second, clean up some oddities accumulated over time, where some parts of the code didn't get the memo when new API got into place. Third, fix any other oddity or missing part found while doing those first two activities, it was long overdue anyway...	2014-12-26 21:50:29 +01:00
Dimitri Fontaine	cd52654e3d	Transfer processing threads errors to the main thread, fix #130 . Handling the errors within the thread is useful when debugging pgloader interactively, but not so much when started from the command line, where it would hand the program forever with threads waiting for interactive debugger actions to be taken, with no way for the user to actually take action.	2014-11-27 00:05:12 +01:00
Dimitri Fontaine	9ddf117a90	In-passing desultory cleanup.	2014-07-14 21:54:50 +02:00
Dimitri Fontaine	55655ed927	Fix fixed-file column name quoting, as we did for CSV, fixes #70 .	2014-06-29 16:25:30 +02:00
Dimitri Fontaine	ab75e6c626	Improve ragged right code comments, follow-up to #90 .	2014-06-25 13:29:26 +02:00
Dimitri Fontaine	d1100efa28	Fix ragged lines support for fixed files, fixing #90 . It's possible for the data to stop before the end of a specified column, in which case we still want to accept whatever shortened data we have.	2014-06-24 18:58:26 +02:00
Dimitri Fontaine	88eba90776	Handle "ragged right" fixed width files, fix #82 .	2014-06-17 17:24:07 +02:00
Dimitri Fontaine	9ad98c5c2a	Handle errors when parsing fixed width files, per #82 .	2014-06-17 17:10:04 +02:00
Dimitri Fontaine	267a1cc755	The Useless Use Of Loop did strike.	2014-05-03 15:55:02 +02:00
Dimitri Fontaine	e39788e5cd	Fix some CCL warnings. Those were preventing a buildapp based build.	2014-05-03 15:36:30 +02:00
Dimitri Fontaine	429232c3de	Fix loading data from stdin: fix #53 . The stdin support really was one brick shy of a load, and in particular with-open-file was used against a stream when using that option.	2014-04-27 23:38:02 +02:00
Dimitri Fontaine	db947e1467	Rework reader and writer data exchange. With this patch, the whole data massaging and final formating into the PostgreSQL COPY TEXT format is done by the reader thread, which publishes a batch at a time in the communication channel: a lparallel.queue object. Before that, the raw vectors where pushed directly in the queue, offering more flexibility to adjust to the reader and writer IO rates and capabilities, but impeding the ability of the Garbage Collector: data still in the queue was not collected even if not needed anymore. The new model also uses less memory, and allows a better control over what amount of data stays in memory. The new concurrent-batches parameter should be key to being able to process huge rows. The intent is to offering a way for the users to tune concurrent-batches down to 1 for sources with massive per-row memory footprint. Even better would be to find a way to automatically adjust the setting without spending too much time counting the bytes we're batching. Preliminary tests show no sensible impact on performances from this patch, even some improvements in cases.	2014-01-25 23:54:49 +01:00
Dimitri Fontaine	7d94d4ff62	Small code cleanup.	2013-12-25 16:20:37 +01:00
Dimitri Fontaine	fe302af221	Refactor the dbname API to feed from the connection string directly.	2013-12-20 17:24:02 +01:00
Dimitri Fontaine	2019b918f0	Implement support for matching several files in a single archive clause.	2013-11-26 16:47:37 +01:00
Dimitri Fontaine	9d5dad7e3e	Implement support for FIXED COLS input files, reaching release candidate status.	2013-11-07 15:39:28 +01:00

26 Commits