When parsing table names in the target URI, we are careful of splitting
the table and schema name and store them into a cons in that case. Not
all sources methods got the memo, clean that up.
See #182 and #186, a pull request I am now going to be able to accept.
Also see #287 that should be helped by being able to apply #186.
Apparently I just forgot to apply any smartness whatsoever to SQLite
identifiers and just copied them as they are to PostgreSQL. Change that
by calling apply-identifier-case.
The COPY TEXT format accepts non printable characters with an escaped
sequence wherin pgloader can pass in the octal number for the character
in its encoding. When doing that with small numbers like \6 and the
non-printable character is then followed by other numbers, then it
becomes e.g. \646 which might not be part of the target encoding...
To fix, always left pad the character octal number with zeroes, so that
we now send in \00646 which COPY knows how to read: the char at \006
then 4 then 6.
Also copy the test case over to pgloader and run it in the test suite.
The previous coding decided to add 2000 to the year as an integer if it
was below 2000, which parses 1999 as 3999. Oops. Trigger the correction
only when the date is given on 2 digits only, parsing 04 as 2004.
Dates given on 3 digits are kept as-is.
Playing with the *century* special parameter allows to cancel this
behavior, that maybe should be made entirely optional. It's just too
common to find current years on 2 digits only, sadly.
It's now possible to use several files in a BEFORE LOAD EXECUTE section,
and to mix DO and EXECUTE parts, bringing lots of flexibility in the
commands. Also it actually simplifies the parser.
The new option 'drop indexes' reuses the existing code to build all the
indexes in parallel but failed to properly account for that fact in the
summary report with timings.
While fixing this, also fix the SQL used to re-establish the indexes and
associated constraints to allow for parallel execution, the ALTER TABLE
statements would block in ACCESS EXCLUSIVE MODE otherwise and make our
efforts vain.
The option doesn't seem relevant to the db3 source type which contains a
table definition: pgloader will create the table from scratch and no
indexes are going to be found.
When loading against a table that already has index definitions, the
load can be quite slow. Previous commit introduced a warning in such a
case. This commit introduces the option "drop indexes" that is not used
by default.
When this option is used, pgloader drops the indexes before loading the
data then create the indexes again with the same definitions as before.
All the indexes are created again in parallel to optimize performances.
Only primary key indexes can't be created in parallel, so those are
created in two steps (create unique index then alter table).
To allow for importing JSON one-liners as-is in the database it can be
interesting to leverage the CSV parser in a compatible setup. That setup
requires being able to use any separator character as the escape
character.
Some CSV files are given with an header line containing the list of
their column names, use that when given the option "csv header".
Note that when both "skip header" and "csv header" options are used,
pgloader first skip as many required lines and then uses the next one as
the csv header.
Because of temporary failure to install the `ronn` documentation tool,
this patch only commits the changes to the source docs and omits to
update the man page (pgloader.1). A following patch is intended to be
pushed that fixed that.
See #236 which is using shell tricks to retrieve the field list from the
CSV file itself and motivated this patch to finally get written.
When given a file in the COPY format, we should expect that its content
is already properly escaped as expected by PostgreSQL. Rather than
unescape the data then escape it again, add a new more of operation to
format-vector-row in which it won't even try to reformat the data.
In passing, fix an off-by-one bug in dealing with non-ascii characters.
See test/parse/hans.goeuro.load for an example usage of the new option.
In passing, any error when creating indexes is now properly reported and
logged, which was missing previously. Oops.
This option is dangerous and allows to skip ALL triggers when loading
data against PostgreSQL. This includes foreign key constraints
definitions and will allow loading data out of order.
When using both the options "create no table" and "disable triggers" it
will be possible to load data into a schema prepared by your favorite
external tool, at the cost of not validating FK constraints. Use with
care.
Fix#167.
PostgreSQL COPY format is not really CSV but something way easier to
parse. Funnily enough, parsing it as CSV is not that easy, so we add
here a special simple parser for the COPY format.
It should be quite useful too try loading again reject data files from
pgloader after manual fixing, too. It's still missing some documentation
without any good excuse for that, will add soon.
Now it's possible to parse a command to load data from MS SQL. The
parser was until now parsing all database URI within the same common
rule and that isn't possible anymore if we want to distinguish in
between source database right from the parser, which we actually want to
do.
This patch also implement in-passing fixes all over the place, including
the transformation function float-to-string that only happened to work
on double-float data.
This allows users to benefit from the same flexible machinery when using
SQLite as when using MySQL, and also allows to add some more default
cast rules too.
It's not possible to use a comma separator when using more than one
source field option at the same time, and for better readability the
options are to be found enclosed in squared brackets.
Also, it's now possible to spell out "from" and "for" keywords on the
source definitions, making it easier to read and maintain the load file,
as in this full example:
(
a from 0 for 10,
b from 10 for 8,
c from 18 for 8,
d from 26 for 17 [null if blanks, trim right whitespace]
)
As seen in #116, it might be better for the users to be able to ask for
field trimming right in the source definition, like we do for processing
nulls.
The default values for MySQL WITH clause options wasn't really tested
and broke on simple cases, the new set of defaults is known to work in
many cases (most?).
Other combinations of options will need some review work, and we might
need to consider preventing some of them, that's for another patch tho.
When default values are used in SQLite they are of course using their
SQLite representation, which might not be compatible with the PostgreSQL
target data type we're casting to. Make it so that the default values
are transformed too, as we already do in the MySQL case.
See #100.