Now that we can have several threads doing COPY, each of them need to
know about the *pgsql-reserved-keywords* list. Make sure that's the case
and in passing fix some call sites to apply-identifier-case.
Also, more disturbingly, fix the code so that TRUNCATE is called from
the main thread before giving control to the COPY threads, rather than
having two concurrent threads doing the TRUNCATE twice. It's rather
strange that we got no complaint from the field on that part...
Next parallelism improvements will allow pgloader to use more than one
COPY thread to load data, with the impact of changing the order of rows
in the database.
Rather than doing a copy out and `diff` of the data just loaded, load
the reference data and do the diff in SQL:
select * from loaded.data
except
select * from expected.data
If such a query returns any row, we know we didn't load what was
expected and the regression test is failing.
This regression testing facility should also allow us to finally add
support for multiple-table regression tests (sqlite, mysql, etc).
In order to later be able to have more worker threads sharing the
load (multiple readers and/or writers, maybe more specialized threads
too), have all the stats be managed centrally by a single thread. We
already have a "monitor" thread that get passed log messages so that the
output buffer is not subject to race conditions, extend its use to also
deal with statistics messages.
In the current code, we send a message each time we read a row. In some
future commits we should probably reduce the messaging here to something
like one message per batch in the common case.
Also, as a nice side effect of the code simplification and refactoring
this fixes#283 wherein the before/after sections of individual CSV
files within an ARCHIVE command where not counted in the reporting.
The dry run option will currently only check database connections, but
as that happens after having correctly parsed the load file, it allows
to also check that the command file is correct for the parser.
Note that the list load-data API isn't subject to the dry-run method.
In passing, we add some more API entry points to the connection objects
and we should actually clean the code base to use the new QUERY generic
all over the place. It's for another patch tho.
The database connection code needed to switch to the "new" connection
facilities, and there was a bug in the processing of template sections
wherein the template user would inherit the template property.
Define a bunch of OS return codes and use them wisely, or at least in a
better way than just doing (uiop:quit) whenever there's something wrong,
without any difference whatsover to the caller.
Now we return a non-zero error code when we know something wrong did
happen. Which is more useful.
The option currently only works within the same build environment where
the image was first build, as noted in #133. This is an attempt at
convincing ASDF not to load systems that pgloader depends on in order to
be able to load only the new pgloader definition.
While it looks sound in principle, I failed to have it work in the lab.
Given that previous to this patch nothing works at all, it's not a
regression, let's push it as is makes the code saner.
Also, it looks like asdf::*immutable-systems* is what we want here, but
that's asdf 3.1.x and we're not there yet.
It's now possible to have pgloader print out its summary in one of
several formats: human-readable (default), csv, copy or json. The
choice of format is made depending on the extension of the summary
filename picked on the command line with the option --summary.
Fix bugs related to parsing the new COPY type, and make it so that we
know how to parse options (and fields, and other type dependant things)
even when --type is missing, in care the source URL has the information.
PostgreSQL COPY format is not really CSV but something way easier to
parse. Funnily enough, parsing it as CSV is not that easy, so we add
here a special simple parser for the COPY format.
It should be quite useful too try loading again reject data files from
pgloader after manual fixing, too. It's still missing some documentation
without any good excuse for that, will add soon.
In passing also allow --field to specify the whole field list, there's
no point in forcing the user to have as many --field switches on the
command line as they have columns in their data source file.
That's the big refactoring patch I've been sitting on for too long.
First, refactor connection handling to use a uniformed "connection"
concept (class and generic functions API) everywhere, so that the COPY
derived objects just use that in their :source-db and :target-db slots.
Given that, we don't need no messing around with *pgconn* and *myconn-*
and other special variables at all anywhere in the tree.
Second, clean up some oddities accumulated over time, where some parts
of the code didn't get the memo when new API got into place.
Third, fix any other oddity or missing part found while doing those
first two activities, it was long overdue anyway...
Make it so that the following command line usages are accepted when
using pgloader without a command file:
./build/bin/pgloader ./test/sqlite/sqlite.db postgresql:///pgloader
./build/bin/pgloader --set "search_path='sakila'" \
mysql://root@localhost/sakila \
postgresql:///sakila
./build/bin/pgloader --type csv \
--field id --field field \
--with truncate \
--with "fields terminated by ','" \
./test/data/matching-1.csv \
postgres:///pgloader?matching
It's now possible in most cases to just use command-line options, which
should make the entry bar to pgloader much lower.
To avoid wasting everybody's time when trying to debug --load
command.load, rename the option to be more explicit about what it does.
Also implement some basic guards in the form of testing that the
filename extension is part of a very short whitelist: .lisp, .cl, .lsp
and .asd.
Given than redirecting a tty such as *terminal-io* isn't easy enough,
let's provide a way to copy the summary output to a file. Another way to
solve it would have been to output the summary to the main logs, but
that could have made the logs parsing more difficult that necessary.
Let's see how users like it...
Should help with issue #67 by allowing --client-min-messages to
effectively control entering the debugger in case of unhandled
conditions, etc.
Contrary to the discussion, in this patch --log-min-messages has no
impact on the behavior of the console and interactive behaviors.
There's no reason not to parse again the command line with the newly
loaded code actually, so be sure to do the self-upgrade dance first
thing and recurse to the pgloader::main function (with a guard).
As from now, to install a new version of pgloader when you have an older
one, say because there's that bug that got fixed meanwhile, all you need
to do is run
$ git clone https://github.com/dimitri/pgloader.git /tmp/pgloader
$ pgloader --self-upgrade /tmp/pgloader <options as usual>
Any Common Lisp developper using the product is already doing that many
times a day, it might prove useful for users to be able to hot-patch
themselves too, after all.
Also ensure the directory we're given actually exists on disk, creating it
if necessary, and bail out early in case for whatever reason it's not
possible to create the directory.
This variable replaces reject-root-path and is used to set the root working
directory.
It defaults to /tmp/pgloader/ like previously.
Also set the logfile according to the root-dir.
TODO: tmpdir is not handled in comand-line. Is it really wanted to have more
command line parameters ?