When using a CSV header, we might find fields in a different order than the
target table columns, and maybe not all of the fields are going to be read.
Take account of the header we read rather than expecting the header to look
like the target table definition.
Fix#888.
In this commit we fail the guess faster, allowing to test for a much larger
sample. The sample is still hard-coded, but this time to 1000 lines.
Also add a test case, see #618.
The PostgreSQL COPY protocol requires an explicit initialization phase
that may fail, and in this case the Postmodern driver transaction is
already dead, so there's no way we can even send ABORT to it.
Review the error handling of our copy-batch function to cope with that
fact, and add some logging of non-retryable errors we may have.
Also improve the thread error reporting when using a binary image from
where it might be difficult to open an interactive debugger, while still
having the full blown Common Lisp debugging experience for the project
developers.
Add a test case for a missing column as in issue #339.
Fix#339, see #337.
Next parallelism improvements will allow pgloader to use more than one
COPY thread to load data, with the impact of changing the order of rows
in the database.
Rather than doing a copy out and `diff` of the data just loaded, load
the reference data and do the diff in SQL:
select * from loaded.data
except
select * from expected.data
If such a query returns any row, we know we didn't load what was
expected and the regression test is failing.
This regression testing facility should also allow us to finally add
support for multiple-table regression tests (sqlite, mysql, etc).
To be able to use "t" (or "nil") as a column name, pgloader needs to be
able to generate lisp code where those symbols are available. It's
simple enough in that a Common Lisp package that doesn't :use :cl
fullfills the condition, so intern user symbols in a specially crafted
package that doesn't :use :cl.
Now, we still need to be able to run transformation code that is using
the :cl package symbols and the pgloader.transforms functions too. In
this commit we introduce a heuristic to pick symbols either as functions
from pgloader.transforms or anything else in pgloader.user-symbols.
And so that user code may use NIL too, we provide an override mechanism
to the intern-symbol heuristic and use it only when parsing user code,
not when producing Common Lisp code from the parsed load command.
The COPY TEXT format accepts non printable characters with an escaped
sequence wherin pgloader can pass in the octal number for the character
in its encoding. When doing that with small numbers like \6 and the
non-printable character is then followed by other numbers, then it
becomes e.g. \646 which might not be part of the target encoding...
To fix, always left pad the character octal number with zeroes, so that
we now send in \00646 which COPY knows how to read: the char at \006
then 4 then 6.
Also copy the test case over to pgloader and run it in the test suite.
To allow for importing JSON one-liners as-is in the database it can be
interesting to leverage the CSV parser in a compatible setup. That setup
requires being able to use any separator character as the escape
character.
Some CSV files are given with an header line containing the list of
their column names, use that when given the option "csv header".
Note that when both "skip header" and "csv header" options are used,
pgloader first skip as many required lines and then uses the next one as
the csv header.
Because of temporary failure to install the `ronn` documentation tool,
this patch only commits the changes to the source docs and omits to
update the man page (pgloader.1). A following patch is intended to be
pushed that fixed that.
See #236 which is using shell tricks to retrieve the field list from the
CSV file itself and motivated this patch to finally get written.
When given a file in the COPY format, we should expect that its content
is already properly escaped as expected by PostgreSQL. Rather than
unescape the data then escape it again, add a new more of operation to
format-vector-row in which it won't even try to reformat the data.
In passing, fix an off-by-one bug in dealing with non-ascii characters.
This option is dangerous and allows to skip ALL triggers when loading
data against PostgreSQL. This includes foreign key constraints
definitions and will allow loading data out of order.
When using both the options "create no table" and "disable triggers" it
will be possible to load data into a schema prepared by your favorite
external tool, at the cost of not validating FK constraints. Use with
care.
Fix#167.
As seen in #116, it might be better for the users to be able to ask for
field trimming right in the source definition, like we do for processing
nulls.
Those tests currently only work when a single table is the target of the
load, and when this target is explicit in the INTO target clause. More
work needs to be done to cover interesting cases like MySQL and SQLite
where we want to diff a full database rather than a single table.