The parser was happily parsing such a connection string as the
following, but the rest of the code didn't really know what to do about
it:
mysql://unix:/var/run/mysqld/mysqld.sock:/main
In passing, fix bugs where the PostgreSQL unix domain socket connection
was still shy of a brick load, omitting to consider the case where the
connection host is actually a list of '(:unix . "path/to/socket").
Some of our internal values now depend on the implementation, and could
either be a symbol on SBCL or an external-format structure on CCL. We
could typecase our way out I suppose, but it might be that SBCL has a
different version of the external-format type, so we'd rather use #+.
First, despite the documentation mentionning the function writes
to *terminal-io*, in fact it's doing (format t ...) and thus the result
is written to *standard-output*.
Second, CCL has encodings with no aliases.
It used to still launch an extra set of threads for monitoring where,
and that would confuse CCL where it's not possible to write into a
stream from more than one thread concurrently.
Try at having a deterministic ouput of it, which still apparently is not
always the case when using SBCL, now that it's been switched to using
the explicit *terminal-io* rather than t.
This change is needed for CCL support, though, where you don't get to
write to the same stream from different threads.
I could get down to the problem here, which is that a couple of indexes
where reported to pgloader but without any SQL definition for them, and
then pgloader would wait for non existing tasks.
It seems easier to just skip does indexes, that's what this patch does.
In particular, allow for a space to be used in the filename. The only
character that is not permitted anymore is the quote itself ('), it
should be easy enough to allow for escaping it as in the password field
if required.
Should probably fix#54, even though the lack of data currently reported
in that issue makes it a blind guess only.
The new WITH options allows the user to set values for the dynamic
variables *copy-batch-rows*, *copy-batch-size* and *concurrent-batches*.
That's needed in case like in issue #16 even with the batch size
defaulting to what looks like a proper setup.
In a longer term a review of the pgloader memory usage should be done
seriously, the numbers being way higher than the batch sizes we do setup
here.
When declaring types of arguments (mainly done for hinting the Common
Lisp compiler into generating more efficient code), it's important to
account for the possibility of the arguments being NIL, of NULL type.
That's been made clear in the way the projection function is now
generated in src/sources/source.lisp in project-fields function, with
all the arguments now being &optional so that we are able to cope with
ragged CSV files.
The only expected change from this patch is missing warnings in some
test cases, such as test/reformat.load, test/fixed.load and
test/archive.load.
For the generated binary to be really portable, we need to be able to
open openssl 1.0.1 even when we've been built against openssl 1.0.0.
A way to achieve that with SBCL is by forcing the unloading of the lib
at image saving time and register a hook to load it again at image init
time. Using the proper API, CFFI will happily load the available file
for the lib rather than insisting on loading the exact same one than
found on the build machine.
The babel character-decoding-error condition is exposing both its
internal BUFFER and the current OCTETS, and it seems we should refer to
the BUFFER in our error reporting...
When it's not possible to decode a MySQL value in the proper given
encoding, automatically replace the value with nil and be quite verbose
about it by logging an error.
The patch from pull request #30 was hard-coding the PostgreSQL side quoting,
we are using the quote_ident() function instead, as it's now available in
every PostgreSQL production release (8.4 included).
With the new internal setting *copy-batch-size* it's now possible to
instruct pgloader to close batches early (before *copy-batch-rows* limit)
when crossing the byte count threshold.
When set to 20 MB it allows the new test case (exhausted) to pass under SBCL
and CCL, and there's no measurable cost when *copy-batch-size* is set to
nil (its default value) in the testing done.
This patch is published without any way to tune the values from the command
language yet, that's the next step once its been proven effective.
With this patch, the whole data massaging and final formating into the
PostgreSQL COPY TEXT format is done by the reader thread, which publishes a
batch at a time in the communication channel: a lparallel.queue object.
Before that, the raw vectors where pushed directly in the queue, offering
more flexibility to adjust to the reader and writer IO rates and
capabilities, but impeding the ability of the Garbage Collector: data still
in the queue was not collected even if not needed anymore.
The new model also uses less memory, and allows a better control over what
amount of data stays in memory. The new *concurrent-batches* parameter
should be key to being able to process huge rows.
The intent is to offering a way for the users to tune *concurrent-batches*
down to 1 for sources with massive per-row memory footprint. Even better
would be to find a way to automatically adjust the setting without spending
too much time counting the bytes we're batching.
Preliminary tests show no sensible impact on performances from this patch,
even some improvements in cases.