pgloader

mirror of https://github.com/dimitri/pgloader.git synced 2026-02-16 03:41:04 +01:00

Author	SHA1	Message	Date
Dimitri Fontaine	0549e74f6d	Implement multiple reader per table for MySQL. Experiment with the idea of splitting the read work in several concurrent threads, where each reader is reading portions of the target table, using a WHERE id <= x and id > y clause in its SELECT query. For this to kick-in a number of conditions needs to be met, as described in the documentation. The main interest might not be faster queries to overall fetch the same data set, but better concurrency with as many readers as writters and each couple its own dedicated queue.	2017-06-28 16:23:18 +02:00
Dimitri Fontaine	6d66280fa5	Review parallelism and memory behavior. The previous patch made format-vector-row allocate its memory in one go rather than byte after byte with vector-push-extend. In this patch we review our usage of batches and parallelism. Now the reader pushes each row directly to the lparallel queue and writers concurrently consume from it, cook batches in COPY format, and then send that chunk of data down to PostgreSQL. When looking at runtime profiles, the time spent writing in PostgreSQL is a fraction of the time spent reading from MySQL, so we consider that the writing thread has enough time to do the data mungling without slowing us down. The most interesting factor here is the memory behavor of pgloader, which seems more stable than before, and easier to cope with for SBCL's GC. Note that batch concurrency is no more, replaced by prefetch rows: the reader thread no longer build batches and the count of items in the reader queue is now a number a rows, not of batches of them. Anyway, with this patch in I can't reproduce the following issues: Fixes #337, Fixes #420.	2017-06-27 23:10:33 +02:00
Dimitri Fontaine	d7d36c5766	Review identifier case :quote. We added some confution about who's responsible to quote the SQL obejct names in between src/utils/quoting.lisp and src/pgsql/pgsql-ddl.lisp and as a result some migrations from MySQL with identifier case set to quote where broken, as in #439. To fix, remove any use of the format directive ~s in the PostgreSQL ddl output methods: we consider that the quoting of ~s is to be decided in apply-identifier-case. We then use ~a instead of ~s. Fix #439.	2016-09-17 22:45:45 +02:00
Dimitri Fontaine	f8ae9f22b9	Implement support for SSL client certificates. This fixes #308 by automatically using the PostgreSQL Client Side SSL files as documented in the following reference: http://www.postgresql.org/docs/current/static/libpq-ssl.html#LIBPQ-SSL-FILE-USAGE This uses the Postmodern special support for it. Unfortunately couldn't test it locally other than it doesn't break non-ssl connections. Pushing to have user feedback.	2015-11-09 11:32:17 +01:00
Dimitri Fontaine	96a33de084	Review the stats and reporting code organisation. In order to later be able to have more worker threads sharing the load (multiple readers and/or writers, maybe more specialized threads too), have all the stats be managed centrally by a single thread. We already have a "monitor" thread that get passed log messages so that the output buffer is not subject to race conditions, extend its use to also deal with statistics messages. In the current code, we send a message each time we read a row. In some future commits we should probably reduce the messaging here to something like one message per batch in the common case. Also, as a nice side effect of the code simplification and refactoring this fixes #283 wherein the before/after sections of individual CSV files within an ARCHIVE command where not counted in the reporting.	2015-10-05 01:46:29 +02:00
Dimitri Fontaine	72fdf112ff	Simplify how to compute total load time, see #283 . In some cases pgloader total time computing is quite off: in the archive case because it fails to take into account per-file before and after sections, and in the general case when there's parallel work done. This patch helps by storing the start time explicitely and using it at the moment when the summary is displayed: no guessing any more. This is only used in the archive case for now because I want some feedback. On my machine with the usual test cases I have, the difference with and without this patch is line-noise, something more serious has to be done: let's push testing to the user by committing this early version of the work.	2015-08-29 23:08:22 +02:00
Dimitri Fontaine	22f4317a30	Add support for the CAST rule to SQLite sources. This allows users to benefit from the same flexible machinery when using SQLite as when using MySQL, and also allows to add some more default cast rules too.	2014-10-13 00:52:55 +02:00
Dimitri Fontaine	dfb1e9355a	Get rid of our own implementation of alexandria:read-file-into-string.	2014-10-01 23:23:46 +02:00
Dimitri Fontaine	2369a142a7	Refactor source code organisation. In passing, fix a bug in the previous commit where left-over code would cancel the whole new parsing code for advanced source fields options.	2014-10-01 23:20:24 +02:00

9 Commits