When adding the CONTEXT message parsing I totally forgot that PostgreSQL
provides a nice error message translation capability. The code now copes
better with the situation, using a more advanced regular expression.
We could inline the known translations in the matching, but that would
be tedious to maintain, so we just use loose matching rules here.
Should help with issue #67 by allowing --client-min-messages to
effectively control entering the debugger in case of unhandled
conditions, etc.
Contrary to the discussion, in this patch --log-min-messages has no
impact on the behavior of the console and interactive behaviors.
In case of PostgreSQL schema preparation error, and when some
materialized views where given with their SQL command, they were left
over by pgloader. The next run would then fail because the view already
exists at CREATE VIEW time.
Fix that by cleaning up materialized views we just created in handling
any condition signaled when preparing the PostgreSQL schema.
When in :data logging mode we log the whole data set as we read then
write it, which is quite a lot of data. Our current logging system works
by filling up a queue that the cl-log lib is then fed from, and sending
lots of data in that queue is way expensive, stop doing that.
Hopefully we don't need to revisit the logs more than that, the other
messages should be few enough not to count much when doing a full load.
With this the user is now able to have a way about where the files are
going to be read and matched against the regular expression. It used not
to be necessary in the archive expansion mode, but is required now that
the feature is exposed in more cases.
When using LOAD CSV it's possible to load from filename matching a
regular expression, but for that to work the *csv-path-root* needs to be
properly setup at run-time.
As seen in #64 it's no longer necessary anymore to use a local clone of
qmynd to be able to compile pgloader: simplify the Makefile accordingly.
In passing, add source level dependencies so that if you edit any source
lisp file the binary will get automatically rebuilt by `make`.
The code forgot completely that MySQL column name references in foreign
key definitions have to follow the identifier case rules, this patch fix
that.
To be able to do that, we need to parse the GROUP_CONCAT() result that
lists the FK columns, as there's apparently no arrays in MySQL. The
problem here is that about any character is allowed in column names when
`quoted`, so using a comma here might reveal to be fragile later.
The truncate command is only sent to PostgreSQL when we didn't just
CREATE TABLE before. Some refactoring would be necessary to fit the
TRUNCATE command within the same transaction as the CREATE TABLE
command, for PostgreSQL performances.
This patch has been testing with MySQL and SQLite sources, the trick is
that to be able to test it, it's needed to first make a full
import (creating the target tables), so the test are not modified yet.
When using SQLite 3, a blob column might return either string of byte
vector values dynamically depending on the data itself, or maybe some
more complex parameters controlled at data insert time.
Hard-code the rule that a blob column returned as a string is in fact
base64 encoded (which looks like common practice) and decode it
automatically when needed, before sending to byte-vector-to-bytea. It
might be a tad slow but at least the data is properly converted.
In future, that decision might come and byte us in the back again, at
which point it'll be necessary to consider full casting options as in
the MySQL CAST rules. It seems like a big enough win for now if we can
avoid that.
This issue has been re-opened with blob instead of double. Semi-blindly
implement support for the blob type with an image data type.
Disturbingly enough when tested with non-binary data SQLite was
returning strings rather than byte vectors, tripping up the transform
function that sure expects byte vectors.
Turns out that in cases it's not possible to call format-vector-row on
MySQL result sets, because it's been sending us vector of bytes (blob)
while the expected data (from the table definition) clearly is text.
Handle the error as an input reading error, skipping the line and being
verbose about it in the logs. This patch fails to update the stats about
what's happening because, so might need later changes.
MacOSX users will be at home when using the usual packaging installer.
The binary file is installed into /usr/local/bin/pgloader and the man
page is installed too.
That allows using the same SQL files as usual when using pgloader, as it
even supports the \i and \ir psql features (and dollar quoting, etc).
In passing, refactor docs to avoid saying the same things all over the
place, which isn't a very good idea in a man page, at least as far
editing it is involved.
We need a different buildapp binary file for SBCL and for CCL, so make
it appear that way in the Makefile, and have both
./build/bin/buildapp.sbcl and ./build/bin/buildapp.ccl.
That avoid really confusing error messages when trying to build pgloader
with CCL and using the SBCL-compiled buildapp binary...
There's no reason not to parse again the command line with the newly
loaded code actually, so be sure to do the self-upgrade dance first
thing and recurse to the pgloader::main function (with a guard).
As from now, to install a new version of pgloader when you have an older
one, say because there's that bug that got fixed meanwhile, all you need
to do is run
$ git clone https://github.com/dimitri/pgloader.git /tmp/pgloader
$ pgloader --self-upgrade /tmp/pgloader <options as usual>
Any Common Lisp developper using the product is already doing that many
times a day, it might prove useful for users to be able to hot-patch
themselves too, after all.
The parser was happily parsing such a connection string as the
following, but the rest of the code didn't really know what to do about
it:
mysql://unix:/var/run/mysqld/mysqld.sock:/main
In passing, fix bugs where the PostgreSQL unix domain socket connection
was still shy of a brick load, omitting to consider the case where the
connection host is actually a list of '(:unix . "path/to/socket").
This packaging requires all pgloader dependencies to be available as a
debian package within the distribution, which is an work-in-progress
happening concurrently to this patch.
The current situation allows to actually build the pgloader package the
proper way already, some more needs to happen before anybody can do that
from a public debian repository.
Some of our internal values now depend on the implementation, and could
either be a symbol on SBCL or an external-format structure on CCL. We
could typecase our way out I suppose, but it might be that SBCL has a
different version of the external-format type, so we'd rather use #+.
The archive contents seem to have changed, and the regular expression to
match files that we were using doesn't match any filename in the archive
any more.
Also, have the command load more data by parsing more files, using the
ALL FILENAME MATCHING clause.