That's the big refactoring patch I've been sitting on for too long.
First, refactor connection handling to use a uniformed "connection"
concept (class and generic functions API) everywhere, so that the COPY
derived objects just use that in their :source-db and :target-db slots.
Given that, we don't need no messing around with *pgconn* and *myconn-*
and other special variables at all anywhere in the tree.
Second, clean up some oddities accumulated over time, where some parts
of the code didn't get the memo when new API got into place.
Third, fix any other oddity or missing part found while doing those
first two activities, it was long overdue anyway...
Make it so that the following command line usages are accepted when
using pgloader without a command file:
./build/bin/pgloader ./test/sqlite/sqlite.db postgresql:///pgloader
./build/bin/pgloader --set "search_path='sakila'" \
mysql://root@localhost/sakila \
postgresql:///sakila
./build/bin/pgloader --type csv \
--field id --field field \
--with truncate \
--with "fields terminated by ','" \
./test/data/matching-1.csv \
postgres:///pgloader?matching
It's now possible in most cases to just use command-line options, which
should make the entry bar to pgloader much lower.
In particular, protect the sysdb-data-to-lisp routine, that calls into
CFFI and character decodings, with a restart-case allowing the calling
code to just ignore errors on a particular column by skipping it and
using nil instead (or something else if needed).
Actually, fix pgloader not to depend on MS SQL specifics for bigints in
the protocol. Depending on the protocol version in the driver's setup,
MS SQL would send bigints either as floats on the wire, loosing range,
or as something else entirely which values do not match with what's in
the database actually.
Here we just convert the values to NUMERIC by using a CAST expression
directly in the query, so that the protocol only see NUMERIC and
everyone is happy. Or should be. Let's try that.
In passing, refactor the *pgconn- dynamic bindings in favor of directly
using the connection property list straight from the connection string
parser, processing it when necessary. That allows to make it simple to
add an internal :use-ssl property.
The current instructions build SBCL then pgloader all from sources in a
debian VM. It might be a good idea to maintain a debian+SBCL docker
image and build pgloader on top of that.
Handling the errors within the thread is useful when debugging pgloader
interactively, but not so much when started from the command line, where
it would hand the program forever with threads waiting for interactive
debugger actions to be taken, with no way for the user to actually take
action.
First, the index names in MS SQL, as in MySQL, are only unique per
table, whereas they need to be globally unique (per database) in
PostgreSQL. So reuse the infrastructure we had for MySQL here.
Second, the way we trick table names in index and fkey structures means
that we already did quote the names and we don't want to quote them
again, so add a new possible *identifier-case* value to handle the case
where nothing is to be done, pretty please.
Rather than doing ALTER TABLE directly, use CREATE UNIQUE INDEX in the
all in parallel concurrent index build per table, and only in the end
game "upgrade" that unique index into a PRIMARY KEY using ALTER TABLE.
The reason why it's a good idea to do that is to avoid an ACCESS
EXCLUSIVE LOCK at ALTER TABLE time, which is killing our index build
concurrency.
We got unsync again with the qmynd driver (see #124) and the WIP branch
of esrap has been in the works for a long time now. We now manually
fetch the "proper" version of those.
When querying the default values of MySQL tables against MySQL catalogs,
the default value is always returned as a string. When the column having
the default actually is a "binary" column, we want the transformation
functions to receive a proper vector of bytes.
This patch adds some hard-coded rules to be smarted about the situation
for columns of type "binary".