Commit Graph

1568 Commits

Author SHA1 Message Date
Victor Kryukov
54997be2dd Fix 182: properly quote tables with . in their names 2015-09-04 01:15:29 +02:00
Dimitri Fontaine
eabfbb9cc8 Fix schema qualified table names usage (more).
When parsing table names in the target URI, we are careful of splitting
the table and schema name and store them into a cons in that case. Not
all sources methods got the memo, clean that up.

See #182 and #186, a pull request I am now going to be able to accept.
Also see #287 that should be helped by being able to apply #186.
2015-09-04 01:06:15 +02:00
Dimitri Fontaine
92d27f4f98 Allow quote/downcase identifiers option for MS SQL.
As seen in #287 the previous decision to force quoting to :none is
wrong, because index names in MS SQL source database might contain
spaces, and then need to be quoted.

Let's see what happens if we do it the usual way for MS SQL too, and
allow users to control the quoting behaviour of pgloader here.
2015-09-03 23:34:25 +02:00
Dimitri Fontaine
b78bb6dd31 Allow quoted field names to contain spaces, fix #285.
Given a fully quoted field name, there should be no restriction about
using spaces in between the quotes, but the parser used to choke on that
case.
2015-09-01 14:32:50 +02:00
Dimitri Fontaine
62f0b7fc56 Support String Constants with Escapes in SQL files.
pgloader has to parse external SQL files because of the driver we use,
Postmodern, only know how to deal with sending one query at a time. So
SQL parsing we do, and split the queries, and send them one after the
other to the server.

PostgreSQL allows String Constants with C-style Escapes to be used in
some situations, and the SQL parsing done in pgloader failed to support
that.

  http://www.postgresql.org/docs/9.4/static/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-ESCAPE

This fixes #284.
2015-08-31 20:34:22 +02:00
Dimitri Fontaine
72fdf112ff Simplify how to compute total load time, see #283.
In some cases pgloader total time computing is quite off: in the archive
case because it fails to take into account per-file before and after
sections, and in the general case when there's parallel work done.

This patch helps by storing the start time explicitely and using it at
the moment when the summary is displayed: no guessing any more. This is
only used in the archive case for now because I want some feedback.

On my machine with the usual test cases I have, the difference with and
without this patch is line-noise, something more serious has to be done:
let's push testing to the user by committing this early version of the
work.
2015-08-29 23:08:22 +02:00
Gavin Wahl
d71d39c59f Fix mssql int conversion
The case statement got messed up in commit db0f21b5a5. Fixes #277.
2015-08-26 10:17:10 +02:00
Dimitri Fontaine
75727df72f Quote table names when migrating from SQLite, fix #281.
Apparently I just forgot to apply any smartness whatsoever to SQLite
identifiers and just copied them as they are to PostgreSQL. Change that
by calling apply-identifier-case.
2015-08-25 01:13:19 +02:00
Dimitri Fontaine
573a63cd3a Add a local test, per #271. 2015-08-24 16:42:39 +02:00
Dimitri Fontaine
04aa743eb7 Cleanup file based "connections".
When the notion of a connection class with a generic set of method was
invented, the very flexible specification formats available for the file
based sources where not integrated into the new connection system.

This patch provides a new connection class md-connection with a specific
sub-protocol (after opening a connection, the caller is supposed to loop
around open-next-stream) so that it's possible to both properly fit into
the connection concept and to better share the code in between our three
implementation (csv, copy, fixed).
2015-08-24 16:33:00 +02:00
Dimitri Fontaine
ea35eb575d Implement --dry-run option, fix #264.
The dry run option will currently only check database connections, but
as that happens after having correctly parsed the load file, it allows
to also check that the command file is correct for the parser.

Note that the list load-data API isn't subject to the dry-run method.

In passing, we add some more API entry points to the connection objects
and we should actually clean the code base to use the new QUERY generic
all over the place. It's for another patch tho.
2015-08-22 16:23:47 +02:00
Dimitri Fontaine
04ddf940d9 Left pad COPY octal chars with 0, fix #275.
The COPY TEXT format accepts non printable characters with an escaped
sequence wherin pgloader can pass in the octal number for the character
in its encoding. When doing that with small numbers like \6 and the
non-printable character is then followed by other numbers, then it
becomes e.g. \646 which might not be part of the target encoding...

To fix, always left pad the character octal number with zeroes, so that
we now send in \00646 which COPY knows how to read: the char at \006
then 4 then 6.

Also copy the test case over to pgloader and run it in the test suite.
2015-08-20 18:17:18 +02:00
Dimitri Fontaine
3a6120b931 Improve logging again.
The user experience is greatly enhanced by this little change, where you
know from the logs that pgloader could actually connect rather than
thinking it might be still trying...
2015-08-20 12:38:19 +02:00
Dimitri Fontaine
ba44e9786d More MySQL debugging information. 2015-08-20 12:34:12 +02:00
Dimitri Fontaine
8f0db173de Add some debug information when connecting to MySQL.
Makes debugging connection strings easier...
2015-08-20 12:19:03 +02:00
Dimitri Fontaine
178210b6f8 Fix 273 by renaming the new create-schemas option properly. 2015-08-18 17:38:04 +02:00
Dimitri Fontaine
56a89e9b53 Cleanup schema data structure building.
As reported by clisp maintainer (thanks jackdaniel!) when trying to load
pgloader, we had redoundant labels function names in places. Get rid of
those by pushing the new columns found directly at the end of the list,
avoiding the bulky code to then reverse the complex anonymous data
structure.

The Real Fix™ would be to define proper structures where to hold all
those database catalogs representation, but that's an invasive patch and
now isn't a good time to write it.

At least pgloader should load and run with clisp now.
2015-08-15 23:54:45 +02:00
Dimitri Fontaine
6fc40c4844 Implement MS SQL option to skip creating schemas, fix #263.
Allow the user to control whether pgloader should create the same set of
schema as found on the MS SQL database.
2015-08-15 16:10:15 +02:00
Dimitri Fontaine
3e3ebf2333 Fix numeric casting support for MS SQL.
It's possible to get a numeric column with nil precision and scale, and
the code wasn't ready for this situation. Bug found while seeing about
2015-08-15 16:02:21 +02:00
Dimitri Fontaine
5e7e5391ef Fix the drop indexes option again, fix #251.
The index and constraint names given by PostgreSQL catalogs should not
be second guessed, we need to just quote them. The identifier down
casing is interesting when we get identifiers from other system for a
migration, but are wrong when dropping existing indexes in PostgreSQL.

Also, the :null special value from Postmodern was routing the code
badly, just transform it manually to nil when fetching the index list,
manually.
2015-07-26 15:38:15 +02:00
Dimitri Fontaine
833b41c23b Fix the regression test expected values, see #266. 2015-07-26 14:45:43 +02:00
Dimitri Fontaine
b4b36caa84 Fix parsing dates with less-than 4 digits, fix #266.
The previous coding decided to add 2000 to the year as an integer if it
was below 2000, which parses 1999 as 3999. Oops. Trigger the correction
only when the date is given on 2 digits only, parsing 04 as 2004.

Dates given on 3 digits are kept as-is.

Playing with the *century* special parameter allows to cancel this
behavior, that maybe should be made entirely optional. It's just too
common to find current years on 2 digits only, sadly.
2015-07-26 14:41:44 +02:00
Dimitri Fontaine
d2a1a5643e Improve SQL blocks support, fix #265.
It's now possible to use several files in a BEFORE LOAD EXECUTE section,
and to mix DO and EXECUTE parts, bringing lots of flexibility in the
commands. Also it actually simplifies the parser.
2015-07-24 17:41:35 +02:00
Dimitri Fontaine
db0f21b5a5 Process MS SQL smallint datatypes as unsigned, fix #262.
The freetds protocol apparently sends unsigned versions of the values on
the wire, so that we have to convert them to signed numbers upon
reception.
2015-07-22 10:32:13 +02:00
Dimitri Fontaine
3af99051d2 Fix the preserve index names option.
MySQL names its primary keys "PRIMARY" and we need to always uniquify
this name even when the used asked pgloader to preserve index names.

Also, the create-indexes-again function now needs to ask for index names
to be preserved specifically.
2015-07-18 23:39:32 +02:00
Dimitri Fontaine
54e29773d7 Fix index creation reporting, see #251.
The new option 'drop indexes' reuses the existing code to build all the
indexes in parallel but failed to properly account for that fact in the
summary report with timings.

While fixing this, also fix the SQL used to re-establish the indexes and
associated constraints to allow for parallel execution, the ALTER TABLE
statements would block in ACCESS EXCLUSIVE MODE otherwise and make our
efforts vain.
2015-07-18 23:06:15 +02:00
Dimitri Fontaine
8511294ac7 Generalize index support to handle constraints, fix #251.
PostgreSQL rightfully forbifs DROP INDEX when the index is used to
enforce a constraint, the proper SQL to issue is then ALTER TABLE DROP
CONSTRAINT. Also, in such a case pg_dump issues a single ALTER TABLE ADD
CONSTRAINT statement to restore the situation.

Have pgloader do the same with indexes that are used to back a constraint.
2015-07-17 17:06:09 +02:00
Dimitri Fontaine
c3986b0997 Cast MySQL bigint(20) into numeric, fix #253.
In MySQL it's possible to have a bigint of 20 digits when using the
"unsigned" variant of the data type, whereas in PostgreSQL there's no
such variant and bigints are "limited" to the range -9223372036854775808
to +9223372036854775807 (19 digits numbers).

Fix the default casting rule to switch to PostgreSQL numeric in such cases.
2015-07-17 12:05:28 +02:00
Dimitri Fontaine
a308dd9bda Desultory code review and indentation. 2015-07-17 12:04:57 +02:00
Dimitri Fontaine
a98788b670 Implement drop indexes option for copy and fixed.
The option doesn't seem relevant to the db3 source type which contains a
table definition: pgloader will create the table from scratch and no
indexes are going to be found.
2015-07-16 21:39:06 +02:00
Dimitri Fontaine
4f2385fa4c Refactor code from previous commit.
The goal is to make it easy to add support for the 'drop indexes' option
in other source types (fixed, ixf, db3, file-based sources).
2015-07-16 19:35:34 +02:00
Dimitri Fontaine
49bf7e56f2 Implement a "drop indexes" option in CSV mode, fix #251.
When loading against a table that already has index definitions, the
load can be quite slow. Previous commit introduced a warning in such a
case. This commit introduces the option "drop indexes" that is not used
by default.

When this option is used, pgloader drops the indexes before loading the
data then create the indexes again with the same definitions as before.
All the indexes are created again in parallel to optimize performances.
Only primary key indexes can't be created in parallel, so those are
created in two steps (create unique index then alter table).
2015-07-16 12:22:58 +02:00
Dimitri Fontaine
7c834db6e3 Warn against pre-existing indexes.
Pre-existing indexes will reduce data loading performances and it's
generally better to DROP the index prior to the load and CREATE them
again once the load is done. See #251 for an example of that.

In that patch we just add a WARNING against the situation, the next
patch will also add support for a new WITH clause option allowing to
have pgloader take care of the DROP/CREATE dance around the data
loading.
2015-07-16 12:22:58 +02:00
Dimitri Fontaine
81ad98b323 Merge pull request #257 from Jamim/master
Default cast rules for MySQL's datatime types fixed
2015-07-10 19:21:07 +02:00
Aliaksei Urbanski
6a660fcf76 Default cast rules for MySQL's datatime types fixed [issue #252] 2015-07-10 15:37:25 +00:00
Dimitri Fontaine
88c801997e Default to a static list of PostgreSQL keywords.
In some cases (such as when using a very old PostgreSQL instance or an
Amazon Redshift service, as in #255), the function pg_get_keywords()
does not exists but we assume that pgloader might still be able to
complete its job.

We're better off with a static list of keywords than with a unhandled
error here, so let's see what happens next with Redshift.
2015-07-04 20:16:50 +02:00
Dimitri Fontaine
1f7382cd0d Fix error counts when transformation functions fail.
Related to #249, stop reporting 0 errors on sources where we failed to
handle some data transformation.
2015-06-27 19:32:08 +02:00
Dimitri Fontaine
5f85bf542a Fix float-to-string to accept integers, fix #249.
The problem in #249 is that SQLite is happy processing floats in an
integer field, so pgloader needs to be instructing via the CAST
mechanism to cast to float at migration time.

But then the transformation function would choke on integers, because of
its optimisation "declare" statement. Of course the integer
representation expected by PostgreSQL is float-compatible, so just
instruct the function that integers are welcome to the party.
2015-06-27 19:30:34 +02:00
Dimitri Fontaine
1c7de22096 Add test coverage for #80. 2015-06-25 14:16:12 +02:00
Dimitri Fontaine
d75c100399 Expose cl-csv escape mode option, fix #80.
Some CSV files are using the CSV escape character internally in their
fields. In that case we enter a parsing bug in cl-csv where backtracking
from parsing the escape string isn't possible (or at least
unimplemented).

To handle the case, change the quote parameter from \" to just \ and let
cl-csv use its escape-quote mechanism to decide if we're escaping only
separators or just any data.

See https://github.com/AccelerationNet/cl-csv/issues/17 where the escape
mode feature was introduced for pgloader issue #80 already.
2015-06-25 14:10:36 +02:00
Dimitri Fontaine
250ed1c791 Fix CSV parsing to log errors when trying to continue.
The error handling was good enough to continue parsing the CSV data
after a recoverable parser error, but not good enough to actually report
its misfortunes to the user.

See #250 for a report where this is misleading.
2015-06-25 10:31:14 +02:00
Dimitri Fontaine
b55ded11e0 Fix read counters when reading data from SQLite. 2015-06-16 23:14:10 +02:00
Dimitri Fontaine
7e508374c4 Add some SQLite test cases for real type, see #249. 2015-06-16 23:13:52 +02:00
Dimitri Fontaine
322f7dd8b5 Improve logging when loading extra code, see #245. 2015-06-11 13:02:29 +02:00
Dimitri Fontaine
ff5b596219 Merge pull request #241 from alexbaretta/master
Fix DROP TABLE statements on tables with foreign keys
2015-05-28 23:53:02 +02:00
Alex Baretta
49dcae8068 Fix DROP TABLE statements on tables with foreign keys 2015-05-28 14:24:28 -07:00
Dimitri Fontaine
ceaad92f4c Merge pull request #240 from alexbaretta/master
Two bugfixes
2015-05-27 01:14:03 +02:00
Alex Baretta
817fc9a258 Fix string delimiter syntax in COMMENT statements 2015-05-26 16:10:39 -07:00
Alex Baretta
0626d56303 Fix identifier case in FOREIGN KEY constraints 2015-05-25 18:08:37 -07:00
Dimitri Fontaine
c3b5d60542 Fix type declaration to include null values, fix #238.
In passing, add a test case for NIL datetime values in our SQLite sample
database.
2015-05-22 23:49:03 +02:00