Commit Graph

14 Commits

Author SHA1 Message Date
Dimitri Fontaine
9ddf117a90 In-passing desultory cleanup. 2014-07-14 21:54:50 +02:00
Dimitri Fontaine
55655ed927 Fix fixed-file column name quoting, as we did for CSV, fixes #70. 2014-06-29 16:25:30 +02:00
Dimitri Fontaine
ab75e6c626 Improve ragged right code comments, follow-up to #90. 2014-06-25 13:29:26 +02:00
Dimitri Fontaine
d1100efa28 Fix ragged lines support for fixed files, fixing #90.
It's possible for the data to stop before the end of a specified column,
in which case we still want to accept whatever shortened data we have.
2014-06-24 18:58:26 +02:00
Dimitri Fontaine
88eba90776 Handle "ragged right" fixed width files, fix #82. 2014-06-17 17:24:07 +02:00
Dimitri Fontaine
9ad98c5c2a Handle errors when parsing fixed width files, per #82. 2014-06-17 17:10:04 +02:00
Dimitri Fontaine
267a1cc755 The Useless Use Of Loop did strike. 2014-05-03 15:55:02 +02:00
Dimitri Fontaine
e39788e5cd Fix some CCL warnings.
Those were preventing a buildapp based build.
2014-05-03 15:36:30 +02:00
Dimitri Fontaine
429232c3de Fix loading data from stdin: fix #53.
The stdin support really was one brick shy of a load, and in particular
with-open-file was used against a stream when using that option.
2014-04-27 23:38:02 +02:00
Dimitri Fontaine
db947e1467 Rework reader and writer data exchange.
With this patch, the whole data massaging and final formating into the
PostgreSQL COPY TEXT format is done by the reader thread, which publishes a
batch at a time in the communication channel: a lparallel.queue object.

Before that, the raw vectors where pushed directly in the queue, offering
more flexibility to adjust to the reader and writer IO rates and
capabilities, but impeding the ability of the Garbage Collector: data still
in the queue was not collected even if not needed anymore.

The new model also uses less memory, and allows a better control over what
amount of data stays in memory. The new *concurrent-batches* parameter
should be key to being able to process huge rows.

The intent is to offering a way for the users to tune *concurrent-batches*
down to 1 for sources with massive per-row memory footprint. Even better
would be to find a way to automatically adjust the setting without spending
too much time counting the bytes we're batching.

Preliminary tests show no sensible impact on performances from this patch,
even some improvements in cases.
2014-01-25 23:54:49 +01:00
Dimitri Fontaine
7d94d4ff62 Small code cleanup. 2013-12-25 16:20:37 +01:00
Dimitri Fontaine
fe302af221 Refactor the dbname API to feed from the connection string directly. 2013-12-20 17:24:02 +01:00
Dimitri Fontaine
2019b918f0 Implement support for matching several files in a single archive clause. 2013-11-26 16:47:37 +01:00
Dimitri Fontaine
9d5dad7e3e Implement support for FIXED COLS input files, reaching release candidate status. 2013-11-07 15:39:28 +01:00