The bug is related to the processing of empty-lines in the middle of
quoted text by cl-csv, which state machine has gotten quite complex to
be able to handle all the crazy different csv variants out there.
Testing shows the bug is fixed in pgloader by just updating cl-csv.
The option currently only works within the same build environment where
the image was first build, as noted in #133. This is an attempt at
convincing ASDF not to load systems that pgloader depends on in order to
be able to load only the new pgloader definition.
While it looks sound in principle, I failed to have it work in the lab.
Given that previous to this patch nothing works at all, it's not a
regression, let's push it as is makes the code saner.
Also, it looks like asdf::*immutable-systems* is what we want here, but
that's asdf 3.1.x and we're not there yet.
Loading external libs at application startup time is not convenient as
it forces users to install freetds everywhere even when they don't need
it. This patch makes it so that freetds is only loaded when pgloader is
asked to load from a MS SQL database source.
Note that we could have done the same for SSL if it wasn't possibly used
to connect to PostgreSQL, which isn't optional in current pgloader
implementation.
It's now possible to have pgloader print out its summary in one of
several formats: human-readable (default), csv, copy or json. The
choice of format is made depending on the extension of the summary
filename picked on the command line with the option --summary.
Fix bugs related to parsing the new COPY type, and make it so that we
know how to parse options (and fields, and other type dependant things)
even when --type is missing, in care the source URL has the information.
PostgreSQL COPY format is not really CSV but something way easier to
parse. Funnily enough, parsing it as CSV is not that easy, so we add
here a special simple parser for the COPY format.
It should be quite useful too try loading again reject data files from
pgloader after manual fixing, too. It's still missing some documentation
without any good excuse for that, will add soon.
Also augment the documentation with examples of bare stdin reading and
of advantages of the unix pipes to stream even remove archived content
down to PostgreSQL.
In passing also allow --field to specify the whole field list, there's
no point in forcing the user to have as many --field switches on the
command line as they have columns in their data source file.
That's the big refactoring patch I've been sitting on for too long.
First, refactor connection handling to use a uniformed "connection"
concept (class and generic functions API) everywhere, so that the COPY
derived objects just use that in their :source-db and :target-db slots.
Given that, we don't need no messing around with *pgconn* and *myconn-*
and other special variables at all anywhere in the tree.
Second, clean up some oddities accumulated over time, where some parts
of the code didn't get the memo when new API got into place.
Third, fix any other oddity or missing part found while doing those
first two activities, it was long overdue anyway...
Make it so that the following command line usages are accepted when
using pgloader without a command file:
./build/bin/pgloader ./test/sqlite/sqlite.db postgresql:///pgloader
./build/bin/pgloader --set "search_path='sakila'" \
mysql://root@localhost/sakila \
postgresql:///sakila
./build/bin/pgloader --type csv \
--field id --field field \
--with truncate \
--with "fields terminated by ','" \
./test/data/matching-1.csv \
postgres:///pgloader?matching
It's now possible in most cases to just use command-line options, which
should make the entry bar to pgloader much lower.
In particular, protect the sysdb-data-to-lisp routine, that calls into
CFFI and character decodings, with a restart-case allowing the calling
code to just ignore errors on a particular column by skipping it and
using nil instead (or something else if needed).
Actually, fix pgloader not to depend on MS SQL specifics for bigints in
the protocol. Depending on the protocol version in the driver's setup,
MS SQL would send bigints either as floats on the wire, loosing range,
or as something else entirely which values do not match with what's in
the database actually.
Here we just convert the values to NUMERIC by using a CAST expression
directly in the query, so that the protocol only see NUMERIC and
everyone is happy. Or should be. Let's try that.