That patch is not a principaled approach at fixing the problem but
should allow for not messing up with fully qualified table names.
A proper way to do it would be to have a pgsql object name structure
composed of the catalog, the schema and the name as separate entries,
with assorted API to print that object properly. That's for another day
though.
The previous patch didn't take into account the need to retain the case
of the PostgreSQL column names when using double-quotes in the load
command, which is now properly forwarded down in the COPY command.
Only the MySQL command is addressed in this patch, because the code
level approach is not safisfying me completely. It might be easier to
just bite the bullet and review all the optional clauses return values
rather than add a layer as this patch does.
The feature still is available for MySQL given this patch, so let's push
it, get feedback, then see about how to make the approach scale and
revise all the other commands.
In SQLite it's possible to define columns using type names such as
"smallint unsigned" or "short integer", without any changes to the way
those data types are handled, given its "dynamic typing" features.
Improve the pgloader casting machinery for SQLite to handle those cases.
Turns out that debian has mawk by default, which is not behaving the
same in our very simple use case already. In passing, add gawk as a
build dependency of the debian package, because the packaging is meant
to exercize the test cases.
Those tests currently only work when a single table is the target of the
load, and when this target is explicit in the INTO target clause. More
work needs to be done to cover interesting cases like MySQL and SQLite
where we want to diff a full database rather than a single table.
With this the user is now able to have a way about where the files are
going to be read and matched against the regular expression. It used not
to be necessary in the archive expansion mode, but is required now that
the feature is exposed in more cases.
When using LOAD CSV it's possible to load from filename matching a
regular expression, but for that to work the *csv-path-root* needs to be
properly setup at run-time.
When using SQLite 3, a blob column might return either string of byte
vector values dynamically depending on the data itself, or maybe some
more complex parameters controlled at data insert time.
Hard-code the rule that a blob column returned as a string is in fact
base64 encoded (which looks like common practice) and decode it
automatically when needed, before sending to byte-vector-to-bytea. It
might be a tad slow but at least the data is properly converted.
In future, that decision might come and byte us in the back again, at
which point it'll be necessary to consider full casting options as in
the MySQL CAST rules. It seems like a big enough win for now if we can
avoid that.
This issue has been re-opened with blob instead of double. Semi-blindly
implement support for the blob type with an image data type.
Disturbingly enough when tested with non-binary data SQLite was
returning strings rather than byte vectors, tripping up the transform
function that sure expects byte vectors.
That allows using the same SQL files as usual when using pgloader, as it
even supports the \i and \ir psql features (and dollar quoting, etc).
In passing, refactor docs to avoid saying the same things all over the
place, which isn't a very good idea in a man page, at least as far
editing it is involved.
The parser was happily parsing such a connection string as the
following, but the rest of the code didn't really know what to do about
it:
mysql://unix:/var/run/mysqld/mysqld.sock:/main
In passing, fix bugs where the PostgreSQL unix domain socket connection
was still shy of a brick load, omitting to consider the case where the
connection host is actually a list of '(:unix . "path/to/socket").
The archive contents seem to have changed, and the regular expression to
match files that we were using doesn't match any filename in the archive
any more.
Also, have the command load more data by parsing more files, using the
ALL FILENAME MATCHING clause.
The new WITH options allows the user to set values for the dynamic
variables *copy-batch-rows*, *copy-batch-size* and *concurrent-batches*.
That's needed in case like in issue #16 even with the batch size
defaulting to what looks like a proper setup.
In a longer term a review of the pgloader memory usage should be done
seriously, the numbers being way higher than the batch sizes we do setup
here.
This rule has overridden the default rule for `tinyint(1)` and instead of placing `boolean`, it kept the typemod and placed `boolean(1)` into the resulting query.
With the new internal setting *copy-batch-size* it's now possible to
instruct pgloader to close batches early (before *copy-batch-rows* limit)
when crossing the byte count threshold.
When set to 20 MB it allows the new test case (exhausted) to pass under SBCL
and CCL, and there's no measurable cost when *copy-batch-size* is set to
nil (its default value) in the testing done.
This patch is published without any way to tune the values from the command
language yet, that's the next step once its been proven effective.
This message has the line number where the erroneous data was found on the
server, and given the pre-processing we already done at that point, it's
easy to convert that number into an index into the current batch, an array.
To do do, we need Postmodern to expose the CONTEXT error message and we need
to parse it. The following pull request cares about the Postmodern side of
things:
https://github.com/marijnh/Postmodern/pull/46
The parsing is done as simply as possible, only assuming that the error
message is using comma separators and having the line number in second
position. The parsing as done here should still work with localized message
strings.
CONTEXT: COPY errors, line 3, column b: "2006-13-11"
This change should significantly reduce the cost of error processing.
Including some Makefile hacks where test doesn't depend on the main pgloader
binary anymore because I coulnd't stop the binary to get being built again
even if it's been done already...