When loading against a table that already has index definitions, the
load can be quite slow. Previous commit introduced a warning in such a
case. This commit introduces the option "drop indexes" that is not used
by default.
When this option is used, pgloader drops the indexes before loading the
data then create the indexes again with the same definitions as before.
All the indexes are created again in parallel to optimize performances.
Only primary key indexes can't be created in parallel, so those are
created in two steps (create unique index then alter table).
Pre-existing indexes will reduce data loading performances and it's
generally better to DROP the index prior to the load and CREATE them
again once the load is done. See #251 for an example of that.
In that patch we just add a WARNING against the situation, the next
patch will also add support for a new WITH clause option allowing to
have pgloader take care of the DROP/CREATE dance around the data
loading.
In some cases (such as when using a very old PostgreSQL instance or an
Amazon Redshift service, as in #255), the function pg_get_keywords()
does not exists but we assume that pgloader might still be able to
complete its job.
We're better off with a static list of keywords than with a unhandled
error here, so let's see what happens next with Redshift.
The problem in #249 is that SQLite is happy processing floats in an
integer field, so pgloader needs to be instructing via the CAST
mechanism to cast to float at migration time.
But then the transformation function would choke on integers, because of
its optimisation "declare" statement. Of course the integer
representation expected by PostgreSQL is float-compatible, so just
instruct the function that integers are welcome to the party.
Some CSV files are using the CSV escape character internally in their
fields. In that case we enter a parsing bug in cl-csv where backtracking
from parsing the escape string isn't possible (or at least
unimplemented).
To handle the case, change the quote parameter from \" to just \ and let
cl-csv use its escape-quote mechanism to decide if we're escaping only
separators or just any data.
See https://github.com/AccelerationNet/cl-csv/issues/17 where the escape
mode feature was introduced for pgloader issue #80 already.
The error handling was good enough to continue parsing the CSV data
after a recoverable parser error, but not good enough to actually report
its misfortunes to the user.
See #250 for a report where this is misleading.
As per PostgreSQL documentation on connection strings, allow overriding
of main URI components in the options parts, with a percent-encoded
syntax for parameters. It allows to bypass the main URI parser
limitations as seen in #199 (how to have a password start with a
colon?).
See:
http://www.postgresql.org/docs/9.3/interactive/libpq-connect.html#LIBPQ-CONNSTRING
To allow for importing JSON one-liners as-is in the database it can be
interesting to leverage the CSV parser in a compatible setup. That setup
requires being able to use any separator character as the escape
character.
Some CSV files are given with an header line containing the list of
their column names, use that when given the option "csv header".
Note that when both "skip header" and "csv header" options are used,
pgloader first skip as many required lines and then uses the next one as
the csv header.
Because of temporary failure to install the `ronn` documentation tool,
this patch only commits the changes to the source docs and omits to
update the man page (pgloader.1). A following patch is intended to be
pushed that fixed that.
See #236 which is using shell tricks to retrieve the field list from the
CSV file itself and motivated this patch to finally get written.
The database connection code needed to switch to the "new" connection
facilities, and there was a bug in the processing of template sections
wherein the template user would inherit the template property.
It turns out that SQLite3 data type handling is back to kick us wherever
it hurts, this time by the driver deciding to return blob data (a vector
of unsigned bytes) when we expect properly encoded text data.
In the wikipedia data test case used to reproduce the bug, we're lucky
enough that the byte vectors actually map to properly encoded strings.
Of course doing the proper thing costs some performances.
I'd like to be able to decide if I should blame the SQLite driver or the
whole product on this one. The per-value data type handling still is a
disaster in my book, tho, which means it's crucially important for
pgloader to get it right and allow users to seemlessly migrate away from
using such a system.
pgloader used to have a single database name parsing rule that is
supposed to be compliant with PostgreSQL identifier rules. Of course it
turns out that MySQL naming rules are different, so adjust the parser so
that the following connection string is accepted:
mysql://root@localhost/3scale_system_development
MS SQL default values can be quite... sophisticated, so get around with
using a more complex expression in the SQL query that retrieve the
default values.
The query and implementation has been largely provided by luqelinux and
jstans github users, and I finally merged manually their cumulated
efforts on this front.
When given a file in the COPY format, we should expect that its content
is already properly escaped as expected by PostgreSQL. Rather than
unescape the data then escape it again, add a new more of operation to
format-vector-row in which it won't even try to reformat the data.
In passing, fix an off-by-one bug in dealing with non-ascii characters.
We used to parse qualified table names as a simple string, which then
breaks attempts to be smart about how to quote idenfifiers. Some sources
are known to accept dots in quoted table names and we need to be able to
process that properly without tripping on qualified table names too
late.
Current code might not be the best approach as it's just using either a
cons or a string for table names internally, rather than defining a
proper data structure with a schema and a name slot.
Well, that's for a later cleanup patch, I happen to be lazy tonight.
Define a bunch of OS return codes and use them wisely, or at least in a
better way than just doing (uiop:quit) whenever there's something wrong,
without any difference whatsover to the caller.
Now we return a non-zero error code when we know something wrong did
happen. Which is more useful.