202 Commits

Author SHA1 Message Date
Dimitri Fontaine
ee498111bc Implement MySQL local (socket) connection. Fix #39.
The parser was happily parsing such a connection string as the
following, but the rest of the code didn't really know what to do about
it:

  mysql://unix:/var/run/mysqld/mysqld.sock:/main

In passing, fix bugs where the PostgreSQL unix domain socket connection
was still shy of a brick load, omitting to consider the case where the
connection host is actually a list of '(:unix . "path/to/socket").
2014-05-02 22:48:17 +02:00
Dimitri Fontaine
182128775b Another encoding and external formats fix for portability.
Some of our internal values now depend on the implementation, and could
either be a symbol on SBCL or an external-format structure on CCL. We
could typecase our way out I suppose, but it might be that SBCL has a
different version of the external-format type, so we'd rather use #+.
2014-04-29 15:25:56 +02:00
Dimitri Fontaine
f0cc4ddef9 Fix filename matching when no match is found. 2014-04-29 14:49:55 +02:00
Dimitri Fontaine
f5f584fdf1 Fix parsing ccl:describe-character-encodings.
First, despite the documentation mentionning the function writes
to *terminal-io*, in fact it's doing (format t ...) and thus the result
is written to *standard-output*.

Second, CCL has encodings with no aliases.
2014-04-29 14:25:40 +02:00
Dimitri Fontaine
a5a29407f0 Release pgloader version 3.0.99. 2014-04-29 13:59:33 +02:00
Dimitri Fontaine
c0d9bb4d8f Allows to build pgloader image using CCL.
Too many Makefile commands where hard-coded using SBCL, which prevented
from building successfully against CCL. That's now fixed.
2014-04-29 11:47:22 +02:00
Dimitri Fontaine
40128dbd75 Fix with-monitor support of :start-logger option.
It used to still launch an extra set of threads for monitoring where,
and that would confuse CCL where it's not possible to write into a
stream from more than one thread concurrently.
2014-04-29 11:43:03 +02:00
Dimitri Fontaine
0f62751a3f Improve summary output.
Try at having a deterministic ouput of it, which still apparently is not
always the case when using SBCL, now that it's been switched to using
the explicit *terminal-io* rather than t.

This change is needed for CCL support, though, where you don't get to
write to the same stream from different threads.
2014-04-29 11:42:02 +02:00
Dimitri Fontaine
3abcfeb569 Avoid empty index definitions in SQLite, fixes #52.
I could get down to the problem here, which is that a couple of indexes
where reported to pgloader but without any SQL definition for them, and
then pgloader would wait for non existing tasks.

It seems easier to just skip does indexes, that's what this patch does.
2014-04-28 16:00:34 +02:00
Dimitri Fontaine
9516a90d9d Fix SQLite support for filename parsing.
The code didn't get the memo about the way we now do support source
filenames and all.
2014-04-28 15:20:30 +02:00
Dimitri Fontaine
b758058208 Fix the fix for parsing quoted-filenames. 2014-04-28 15:18:18 +02:00
Dimitri Fontaine
b5c89e750c Quick review of the generic API documentation strings. 2014-04-28 14:36:15 +02:00
Dimitri Fontaine
429232c3de Fix loading data from stdin: fix #53.
The stdin support really was one brick shy of a load, and in particular
with-open-file was used against a stream when using that option.
2014-04-27 23:38:02 +02:00
Dimitri Fontaine
b5dec87915 Allow any non-quote characters in a quoted filename.
In particular, allow for a space to be used in the filename. The only
character that is not permitted anymore is the quote itself ('), it
should be easy enough to allow for escaping it as in the password field
if required.

Should probably fix #54, even though the lack of data currently reported
in that issue makes it a blind guess only.
2014-04-27 22:49:27 +02:00
Dimitri Fontaine
efd11ab759 Add user options to control pgloader batch behaviour.
The new WITH options allows the user to set values for the dynamic
variables *copy-batch-rows*, *copy-batch-size* and *concurrent-batches*.
That's needed in case like in issue #16 even with the batch size
defaulting to what looks like a proper setup.

In a longer term a review of the pgloader memory usage should be done
seriously, the numbers being way higher than the batch sizes we do setup
here.
2014-04-27 22:37:17 +02:00
Dimitri Fontaine
78a988eb47 Oops, forgot to add the new file charsets.lisp. 2014-04-26 18:55:43 +02:00
Dimitri Fontaine
35ca4927e9 Get rid of some lib dependencies.
The charset business isn't worth depending on an AGPL licenced lib which
is part of a huge Quicklisp system.
2014-04-25 17:21:11 +02:00
Dimitri Fontaine
789d854799 Fix issue #49 where data could be considered as a format string. 2014-04-23 17:03:35 +02:00
Dimitri Fontaine
3a9bc9db0f Switch the default memory watch to on. 2014-04-22 17:13:36 +02:00
Dimitri Fontaine
9fa638e233 Handle NIL values in transform functions.
When declaring types of arguments (mainly done for hinting the Common
Lisp compiler into generating more efficient code), it's important to
account for the possibility of the arguments being NIL, of NULL type.

That's been made clear in the way the projection function is now
generated in src/sources/source.lisp in project-fields function, with
all the arguments now being &optional so that we are able to cope with
ragged CSV files.

The only expected change from this patch is missing warnings in some
test cases, such as test/reformat.load, test/fixed.load and
test/archive.load.
2014-04-18 22:51:30 +02:00
Dimitri Fontaine
1af517323c Attempt to fix the OpenSSL loading situation.
For the generated binary to be really portable, we need to be able to
open openssl 1.0.1 even when we've been built against openssl 1.0.0.

A way to achieve that with SBCL is by forcing the unloading of the lib
at image saving time and register a hook to load it again at image init
time. Using the proper API, CFFI will happily load the available file
for the lib rather than insisting on loading the exact same one than
found on the build machine.
2014-04-18 22:24:11 +02:00
Dimitri Fontaine
114d2fedbc Another try at fixing #40.
The babel character-decoding-error condition is exposing both its
internal BUFFER and the current OCTETS, and it seems we should refer to
the BUFFER in our error reporting...
2014-03-04 15:54:31 +01:00
Dimitri Fontaine
654b3f5531 Fix the condition handler fix for #40.
Refrain from trying to display the character where we found a decoding
error when the error actually happens at end-of-input-in-character...
2014-03-04 14:20:27 +01:00
Dimitri Fontaine
56f3da28ed Fix #20 by skipping table and view missing from the catalogs. 2014-03-04 14:01:04 +01:00
Dimitri Fontaine
4d6def8105 Move some MySQL old import/export functions apart... 2014-03-04 13:52:48 +01:00
Dimitri Fontaine
46fd6632f2 Fix #40 by providing a per-table forced-encoding option.
This patch takes benefits from the recent patch
62fc85a1cf
so that you will need to freshen your local Qmynd copy if you want to
test from sources.
2014-03-03 23:39:22 +01:00
Dimitri Fontaine
1461cda1c0 Improve MySQL encoding errors handling.
When it's not possible to decode a MySQL value in the proper given
encoding, automatically replace the value with nil and be quite verbose
about it by logging an error.
2014-03-02 22:44:06 +01:00
Dimitri Fontaine
42635c70bd Refrain from controling the encoding in pgloader, qmynd now handles it. 2014-03-02 01:27:02 +01:00
Dimitri Fontaine
7fa95c1135 Fix bug #39 wherein unix domain sockets didn't make it properly to cl-postgres. 2014-02-24 17:23:17 +01:00
Dimitri Fontaine
643875a266 Improve CSV error handling, thanks to cl-csv continue restart. 2014-02-08 17:51:15 +01:00
Dimitri Fontaine
8f6915d626 Fix issur #29, using proper quoting.
The patch from pull request #30 was hard-coding the PostgreSQL side quoting,
we are using the quote_ident() function instead, as it's now available in
every PostgreSQL production release (8.4 included).
2014-02-08 17:31:59 +01:00
Dimitri Fontaine
a6e2c6364f Cleanup: the MySQL list-transform function is not used anymore. 2014-02-08 17:28:04 +01:00
Dimitri Fontaine
dbfd8cf06c Implement new CSV option "lines terminated by", fixes #23. 2014-02-04 20:58:46 +01:00
Dimitri Fontaine
1844f40ad1 Fix map-push-queue to ensure we send an :end-of-data message no matter what. 2014-01-28 21:05:37 +01:00
Dimitri Fontaine
a8b0f91f37 Allow optional control of batch memory footprint, see #16 and #22.
With the new internal setting *copy-batch-size* it's now possible to
instruct pgloader to close batches early (before *copy-batch-rows* limit)
when crossing the byte count threshold.

When set to 20 MB it allows the new test case (exhausted) to pass under SBCL
and CCL, and there's no measurable cost when *copy-batch-size* is set to
nil (its default value) in the testing done.

This patch is published without any way to tune the values from the command
language yet, that's the next step once its been proven effective.
2014-01-26 23:22:18 +01:00
Dimitri Fontaine
ceec4780f2 Improve log message pointing to the log file (use the true name). 2014-01-26 21:25:27 +01:00
Dimitri Fontaine
ca0d25d3b2 Provide a new log level, :data, activated when both --debug and --verbose are used. 2014-01-26 17:49:20 +01:00
Dimitri Fontaine
b60f40a5fa Fix transform function date-with-no-separator. 2014-01-26 17:48:45 +01:00
Dimitri Fontaine
db947e1467 Rework reader and writer data exchange.
With this patch, the whole data massaging and final formating into the
PostgreSQL COPY TEXT format is done by the reader thread, which publishes a
batch at a time in the communication channel: a lparallel.queue object.

Before that, the raw vectors where pushed directly in the queue, offering
more flexibility to adjust to the reader and writer IO rates and
capabilities, but impeding the ability of the Garbage Collector: data still
in the queue was not collected even if not needed anymore.

The new model also uses less memory, and allows a better control over what
amount of data stays in memory. The new *concurrent-batches* parameter
should be key to being able to process huge rows.

The intent is to offering a way for the users to tune *concurrent-batches*
down to 1 for sources with massive per-row memory footprint. Even better
would be to find a way to automatically adjust the setting without spending
too much time counting the bytes we're batching.

Preliminary tests show no sensible impact on performances from this patch,
even some improvements in cases.
2014-01-25 23:54:49 +01:00
Dimitri Fontaine
41add15397 In passing indentation change only. 2014-01-25 23:41:10 +01:00
Dimitri Fontaine
8ac2cc4930 Skip empty lines when reading from files. 2014-01-24 15:11:15 +01:00
Dimitri Fontaine
e92f085b04 Convert --root-dir to its truename before processing it, and manage errors to do so. 2014-01-24 15:10:45 +01:00
Dimitri Fontaine
c50164e53d Manage the whole class of "integrity errors" also when retrying a batch... 2014-01-24 15:10:03 +01:00
Dimitri Fontaine
69b550a46e Make use of the new usage function... 2014-01-24 10:14:51 +01:00
Dimitri Fontaine
be4cc804c0 Show usage and help when the command line options are not recognized. 2014-01-24 09:22:02 +01:00
Dimitri Fontaine
e8fcb15c27 Fix another hasty commit erroneously containing a for-tests change. 2014-01-23 23:29:27 +01:00
Dimitri Fontaine
b374d4bc8b The current retry method has no need for *copy-batch-split*. 2014-01-23 23:28:25 +01:00
Dimitri Fontaine
d132bafc07 Refrain from parsing a non-existing command file... 2014-01-23 23:17:34 +01:00
Dimitri Fontaine
3f61c66a79 Also handle extra columns in CSV parsing. 2014-01-23 15:15:42 +01:00
Dimitri Fontaine
516ef08c37 Allow loading ragged CSV files. 2014-01-23 15:07:05 +01:00