The cl-db3 lib just got improvements for new dbase file types and field
types, reflect those in pgloader.
Also, cl-db3 now can read the encoding of the file (language driver)
directly in the header, meaning we can rely on that metadata by default, and
only override it when the users tells us to.
See #961.
Before that it was necessary to install a function in the lisp environment
either in the source itself in src/utils/transforms.lisp, or in a lisp file
loaded with --load-lisp-file (or -l for shorts).
While this could be good enough, sometimes a very simple combination of
existing features is required to transform a function and so doing some
level of lisp coding directly in the load command is a nice to have.
Fixes#961.
In this commit we fail the guess faster, allowing to test for a much larger
sample. The sample is still hard-coded, but this time to 1000 lines.
Also add a test case, see #618.
PostgreSQL COPY format is not really CSV but something way easier to
parse. Funnily enough, parsing it as CSV is not that easy, so we add
here a special simple parser for the COPY format.
It should be quite useful too try loading again reject data files from
pgloader after manual fixing, too. It's still missing some documentation
without any good excuse for that, will add soon.
When using LOAD CSV it's possible to load from filename matching a
regular expression, but for that to work the *csv-path-root* needs to be
properly setup at run-time.
With the new internal setting *copy-batch-size* it's now possible to
instruct pgloader to close batches early (before *copy-batch-rows* limit)
when crossing the byte count threshold.
When set to 20 MB it allows the new test case (exhausted) to pass under SBCL
and CCL, and there's no measurable cost when *copy-batch-size* is set to
nil (its default value) in the testing done.
This patch is published without any way to tune the values from the command
language yet, that's the next step once its been proven effective.