pgloader

mirror of https://github.com/dimitri/pgloader.git synced 2025-08-11 08:46:59 +02:00

Author	SHA1	Message	Date
Dimitri Fontaine	1179f449dc	Remove outdated comments about the reader and writer batches...	2014-08-03 19:49:10 +02:00
Dimitri Fontaine	624077bb95	Count bytes only once when under memory watch.	2014-08-03 19:48:49 +02:00
Dimitri Fontaine	e1bf53906d	Don't send over useless verbose log messages. When in :data logging mode we log the whole data set as we read then write it, which is quite a lot of data. Our current logging system works by filling up a queue that the cl-log lib is then fed from, and sending lots of data in that queue is way expensive, stop doing that. Hopefully we don't need to revisit the logs more than that, the other messages should be few enough not to count much when doing a full load.	2014-05-26 16:59:12 +02:00
Dimitri Fontaine	d7b05ba411	Handle more conditions, fix #57 . Turns out that in cases it's not possible to call format-vector-row on MySQL result sets, because it's been sending us vector of bytes (blob) while the expected data (from the table definition) clearly is text. Handle the error as an input reading error, skipping the line and being verbose about it in the logs. This patch fails to update the stats about what's happening because, so might need later changes.	2014-05-11 18:52:07 +02:00
Dimitri Fontaine	42635c70bd	Refrain from controling the encoding in pgloader, qmynd now handles it.	2014-03-02 01:27:02 +01:00
Dimitri Fontaine	1844f40ad1	Fix map-push-queue to ensure we send an :end-of-data message no matter what.	2014-01-28 21:05:37 +01:00
Dimitri Fontaine	a8b0f91f37	Allow optional control of batch memory footprint, see #16 and #22 . With the new internal setting copy-batch-size it's now possible to instruct pgloader to close batches early (before copy-batch-rows limit) when crossing the byte count threshold. When set to 20 MB it allows the new test case (exhausted) to pass under SBCL and CCL, and there's no measurable cost when copy-batch-size is set to nil (its default value) in the testing done. This patch is published without any way to tune the values from the command language yet, that's the next step once its been proven effective.	2014-01-26 23:22:18 +01:00
Dimitri Fontaine	db947e1467	Rework reader and writer data exchange. With this patch, the whole data massaging and final formating into the PostgreSQL COPY TEXT format is done by the reader thread, which publishes a batch at a time in the communication channel: a lparallel.queue object. Before that, the raw vectors where pushed directly in the queue, offering more flexibility to adjust to the reader and writer IO rates and capabilities, but impeding the ability of the Garbage Collector: data still in the queue was not collected even if not needed anymore. The new model also uses less memory, and allows a better control over what amount of data stays in memory. The new concurrent-batches parameter should be key to being able to process huge rows. The intent is to offering a way for the users to tune concurrent-batches down to 1 for sources with massive per-row memory footprint. Even better would be to find a way to automatically adjust the setting without spending too much time counting the bytes we're batching. Preliminary tests show no sensible impact on performances from this patch, even some improvements in cases.	2014-01-25 23:54:49 +01:00
Dimitri Fontaine	fb818ee0e3	Move sources into their own subdirectory, assorted cleaning.	2013-10-20 19:09:09 +02:00

9 Commits