The truncate command is only sent to PostgreSQL when we didn't just
CREATE TABLE before. Some refactoring would be necessary to fit the
TRUNCATE command within the same transaction as the CREATE TABLE
command, for PostgreSQL performances.
This patch has been testing with MySQL and SQLite sources, the trick is
that to be able to test it, it's needed to first make a full
import (creating the target tables), so the test are not modified yet.
When using SQLite 3, a blob column might return either string of byte
vector values dynamically depending on the data itself, or maybe some
more complex parameters controlled at data insert time.
Hard-code the rule that a blob column returned as a string is in fact
base64 encoded (which looks like common practice) and decode it
automatically when needed, before sending to byte-vector-to-bytea. It
might be a tad slow but at least the data is properly converted.
In future, that decision might come and byte us in the back again, at
which point it'll be necessary to consider full casting options as in
the MySQL CAST rules. It seems like a big enough win for now if we can
avoid that.
This issue has been re-opened with blob instead of double. Semi-blindly
implement support for the blob type with an image data type.
Disturbingly enough when tested with non-binary data SQLite was
returning strings rather than byte vectors, tripping up the transform
function that sure expects byte vectors.
Turns out that in cases it's not possible to call format-vector-row on
MySQL result sets, because it's been sending us vector of bytes (blob)
while the expected data (from the table definition) clearly is text.
Handle the error as an input reading error, skipping the line and being
verbose about it in the logs. This patch fails to update the stats about
what's happening because, so might need later changes.
MacOSX users will be at home when using the usual packaging installer.
The binary file is installed into /usr/local/bin/pgloader and the man
page is installed too.
That allows using the same SQL files as usual when using pgloader, as it
even supports the \i and \ir psql features (and dollar quoting, etc).
In passing, refactor docs to avoid saying the same things all over the
place, which isn't a very good idea in a man page, at least as far
editing it is involved.
We need a different buildapp binary file for SBCL and for CCL, so make
it appear that way in the Makefile, and have both
./build/bin/buildapp.sbcl and ./build/bin/buildapp.ccl.
That avoid really confusing error messages when trying to build pgloader
with CCL and using the SBCL-compiled buildapp binary...
There's no reason not to parse again the command line with the newly
loaded code actually, so be sure to do the self-upgrade dance first
thing and recurse to the pgloader::main function (with a guard).
As from now, to install a new version of pgloader when you have an older
one, say because there's that bug that got fixed meanwhile, all you need
to do is run
$ git clone https://github.com/dimitri/pgloader.git /tmp/pgloader
$ pgloader --self-upgrade /tmp/pgloader <options as usual>
Any Common Lisp developper using the product is already doing that many
times a day, it might prove useful for users to be able to hot-patch
themselves too, after all.
The parser was happily parsing such a connection string as the
following, but the rest of the code didn't really know what to do about
it:
mysql://unix:/var/run/mysqld/mysqld.sock:/main
In passing, fix bugs where the PostgreSQL unix domain socket connection
was still shy of a brick load, omitting to consider the case where the
connection host is actually a list of '(:unix . "path/to/socket").
This packaging requires all pgloader dependencies to be available as a
debian package within the distribution, which is an work-in-progress
happening concurrently to this patch.
The current situation allows to actually build the pgloader package the
proper way already, some more needs to happen before anybody can do that
from a public debian repository.
Some of our internal values now depend on the implementation, and could
either be a symbol on SBCL or an external-format structure on CCL. We
could typecase our way out I suppose, but it might be that SBCL has a
different version of the external-format type, so we'd rather use #+.
The archive contents seem to have changed, and the regular expression to
match files that we were using doesn't match any filename in the archive
any more.
Also, have the command load more data by parsing more files, using the
ALL FILENAME MATCHING clause.
First, despite the documentation mentionning the function writes
to *terminal-io*, in fact it's doing (format t ...) and thus the result
is written to *standard-output*.
Second, CCL has encodings with no aliases.
It used to still launch an extra set of threads for monitoring where,
and that would confuse CCL where it's not possible to write into a
stream from more than one thread concurrently.
Try at having a deterministic ouput of it, which still apparently is not
always the case when using SBCL, now that it's been switched to using
the explicit *terminal-io* rather than t.
This change is needed for CCL support, though, where you don't get to
write to the same stream from different threads.
I could get down to the problem here, which is that a couple of indexes
where reported to pgloader but without any SQL definition for them, and
then pgloader would wait for non existing tasks.
It seems easier to just skip does indexes, that's what this patch does.
In particular, allow for a space to be used in the filename. The only
character that is not permitted anymore is the quote itself ('), it
should be easy enough to allow for escaping it as in the password field
if required.
Should probably fix#54, even though the lack of data currently reported
in that issue makes it a blind guess only.
The new WITH options allows the user to set values for the dynamic
variables *copy-batch-rows*, *copy-batch-size* and *concurrent-batches*.
That's needed in case like in issue #16 even with the batch size
defaulting to what looks like a proper setup.
In a longer term a review of the pgloader memory usage should be done
seriously, the numbers being way higher than the batch sizes we do setup
here.