Commit Graph

245 Commits

Author SHA1 Message Date
Dimitri Fontaine
dbadab9e9e Implement a new “snake_case” quoting rule.
In passing, add the identifiers case option to SQLite support, which makes
it easier to test here, and add a table named "TableName" to our local test
database.

Fix #631.
2017-09-13 22:55:10 +02:00
Dimitri Fontaine
d2d4be2ed0 Fix test/csv-guess.load for old PostgreSQL.
In travis environment we still test with PostgreSQL 9.1 and 9.6, and there's
no reason for this test to use a modern spelling of create schema, after
all.

It works because the test/csv-before-after.load creates the schema and is
ran before test/csv-guess.load. That's good enough for now.
2017-09-09 00:59:39 +02:00
Dimitri Fontaine
38712d98e0 Fix regression testing.
Previous patch made regression failures obvious that were hidden by strange
bugs with CCL.

One such regression was introduced in commit
ab7e77c2d0 where we played with the complex
code generation for field projection, where the following two cases weren't
cleanly processed anymore:

  column text using "constant"
  column text using "field-name"

In the first case we want to load a user-defined constant in the column, in
the second case we want to load the value of the field "field-name" in the
column --- we just have different source and target names.

Another regression was introduced in the recent commit
01e5c23763 where the create-table function was
called too early, before we have fetched *pgsql-reserved-keywords*. As a
consequence table names weren't always properly quoted as shown in the
test/csv-header.load file which targets a table named "group".

Finally, skip the test/dbf.load regression test when using CCL as this
environment doesn't have the necessary CP850 code page / encoding.
2017-09-09 00:51:07 +02:00
Dimitri Fontaine
01e5c23763 Add support for explicit TARGET TABLE clause in load commands.
It used to be that you would give the target table name as an option to the
PostgreSQL connection string, which is untasteful:

   load ... into pgsql://user@host/dbname?tablename=foo.bar ...

Or even, for backwards compatibility:

   load ... into pgsql://user@host/dbname?foo.bar ...

The new syntax makes provision for a separate clause for the target table
name, possibly schema-qualified:

   load ... into pgsql://user@host/dbname target table foo.bar ...

Which is much better, in particular when used together with the target
columns clause.

Implementing this seemingly quite small feature had impact on many parsing
related features of pgloader, such as the regression testing facility. So
much so that some extra refactoring got into its way here, around the
lisp-code-for-loading-from-<source> functions and their usage in
`load-data'.

While at it, this patch simplifies a lot the `load-data' function by making
a good use of &allow-other-keys and :allow-other-keys t.

Finally, this patch splits main.lisp into main.lisp and api.lisp, with the
latter intended to contain functions for Common Lisp programs wanting to use
pgloader as a library. The API itself is still the same as before this
patch, tho. Just in another file for clarity.
2017-08-25 01:57:54 +02:00
Dimitri Fontaine
9263baeb49 Implement sslmode for MySQL connections.
This allows to bypass SSL when you don't need it, like over localhost for
instance. Takes the same syntax as the PostgreSQL sslmode connection string
parameter.
2017-08-24 14:56:59 +02:00
Dimitri Fontaine
b685c8801d Improve guessing of CSV parameters.
In this commit we fail the guess faster, allowing to test for a much larger
sample. The sample is still hard-coded, but this time to 1000 lines.

Also add a test case, see #618.
2017-08-24 13:30:14 +02:00
Dimitri Fontaine
f719d2976d Implement a template system for pgloader commands.
This feature has been asked several times, and I can't see any way to fix
the GETENV parsing mess that we have. In this patch the GETENV support is
retired and replaced with a templating system, using the Mustache syntax.

To get back the GETENV feature, our implementation of the Mustache template
system adds support for fetching the template variable values from the OS
environment.

Fixes #555, Fixes #609.
See #500, #477, #278.
2017-08-16 01:33:11 +02:00
Dimitri Fontaine
049a1199c2 Implement support for SQLite current_date default value.
The spelling in SQLite for the default value is "current_date", instruct
pgloader about that. This commit also adds a test case in our sqlite.db
unit tests database.

Fixes #607.
2017-08-08 21:55:15 +02:00
Dimitri Fontaine
ae0c6ed119 Add support for preserving index names in SQLite.
See #187.
2017-07-17 11:04:12 +02:00
Dimitri Fontaine
0549e74f6d Implement multiple reader per table for MySQL.
Experiment with the idea of splitting the read work in several concurrent
threads, where each reader is reading portions of the target table, using a
WHERE id <= x and id > y clause in its SELECT query.

For this to kick-in a number of conditions needs to be met, as described in
the documentation. The main interest might not be faster queries to overall
fetch the same data set, but better concurrency with as many readers as
writters and each couple its own dedicated queue.
2017-06-28 16:23:18 +02:00
Dimitri Fontaine
352f4adc8d Implement support for MySQL SET parameters.
pgloader had support for PostgreSQL SET parameters (gucs) from the
beginning, and in the same vein it might be necessary to tweak MySQL
connection parameters, and allow pgloader users to control them.

See #337 and #420 where net_read_timeout and net_write_timeout might need to
be set in order to be able to complete the migration, due to high volumes of
data being processed.
2017-06-27 10:00:47 +02:00
Dimitri Fontaine
b3cb7b256d Travis: let's actually use the new EXTRA_OPTS. 2017-06-14 21:32:41 +02:00
Dimitri Fontaine
1469789ede Try to get more information from CCL in testing.
The “magic” options --batch and --heap-reserve will be processed by CCL
itself before pgloader gets to see them, so try that in the testing
environment.
2017-06-14 21:12:54 +02:00
Dimitri Fontaine
355aedfd72 Fix "drop default" casting rule.
The previous coding would discard any work done at the apply-casting-rules
step when adding source specific smarts about handling default, because of
what looks like negligence and bad tests. A test case scenario exists but
was not exercized :(

Fix that by defaulting the default value to the one given back at the
apply-casting-rules stage, where we apply the "drop default" clause.
2017-06-08 21:39:06 +02:00
Dimitri Fontaine
25e5ea9ac3 Refactor error handling in complete-pgsql-database.
Given new SQLite test case from issue #563 we see that pgloader doesn't
handle errors gracefully in post-copy stage. That's because the API were not
properly defined, we should use pgsql-execute-with-timing rather than other
construct here, because it allows the "on error resume next" behavior we
want with after load DDL statements.

See #563.
2017-06-08 12:09:11 +02:00
Dimitri Fontaine
c6b634caad Provide "on error stop" as a WITH option.
As seen in #546 it would be easier to be able to specify the option in the
load command directly rather than only at the command line. Here we go!
2017-06-01 16:43:09 +02:00
Dimitri Fontaine
3b4af49e22 Implement ALTER SCHEMA and ALTER TABLE for SQLite.
It turns out we forgot to add support for internal catalog munging clauses
to SQLite support. The catalogs being normalized means there's no extra work
here other than allowing the parser to accept those clauses and then pass
them over to our generic `copy-database' method implementation.

It is to be noted that SQLite has no support for schemas as per the standard
and PostgreSQL, so that when we inspect the database schema we create a nil
entry here. It's then not possible to ALTER SCHEMA nil RENAME TO 'target';
unfortunately, but it's easy enough to SET search_path to 'target' anyway,
as shown in the modified test case.

Fix #552.
2017-05-14 20:47:01 +02:00
Dimitri Fontaine
ab7e77c2d0 Fix double transformation call in CSV projections.
In advanced projections it could be that we call the transformation function
for some input fields twice. This is a bug that manifest in particular when
the output of the transformation can't be used/parsed again by the same
function as shown in the bug reported.

Fix #523.
2017-03-04 15:55:08 +01:00
Dimitri Fontaine
9e2b95d9b7 Implement support for PostgreSQL storage parameters.
In PostgreSQL it is possible at CREATE TABLE time to set some extra storage
parameters, the most useful of them in the context of pgloader being the
FILLFACTOR. For the setting to be useful, it needs to be positionned at
CREATE TABLE time, before we load the data.

The BEFORE LOAD clause of the pgloader command allows to run SQL scripts
that will be executed before the load, and even before the creation of the
target schema when pgloader does that, which is nice for other use case.

Here we implement a new `ALTER TABLE` rule that one can set in the pgloader
command in order to change storage parameters at CREATE TABLE time:

  ALTER TABLE NAMES MATCHING ~/\./ SET (fillfactor='40')

Fix #516.
2017-02-25 21:49:06 +01:00
Dimitri Fontaine
dbf7d6e48f Don't double-quote identifiers in catalog queries.
Avoid double quoting the schema names when used in PostgreSQL catalog
queries, where the identifiers are used as literal values and need to be
single-quoted.

Fix #476, again.
2017-01-10 21:12:34 +01:00
Dimitri Fontaine
4931604361 Allow ALTER SCHEMA command for MySQL.
This pgloader command allows to migrate tables while changing the schema
they are found into in between their MySQL source database and their
PostgreSQL target database.

This changes the default behavior of pgloader with MySQL from always
targetting the 'public' schema to targetting by default a schema named
the same as the MySQL database. You can revert to the old behavior by
adding a rule:

   ALTER SCHEMA 'dbname' RENAME TO 'public

We might want to add a patch to re-install the default behavior later.

Also see #489 where it used not to be possible to rename the schema at
migration time, causing strange errors (you need to spot NIL as the
schema name in the "failed to find target table" messages.
2016-12-18 19:31:21 +01:00
Dimitri Fontaine
2dc733c4d6 Fix corner case in creating indexes again.
When the option "drop indexes" is in use in loading data from a file, we
collect the indexes from the PostgreSQL catalogs and then issue DROP
commands against them before the load, then CREATE commands when it's
done.

The CREATE is done in parallel, and we create an lparallel kernel for
that. The kernel must have a worker-count of at least 1, and we where
not considering the case of 0 indexes on the target table.

Fix #484.
2016-11-20 17:17:15 +01:00
Dimitri Fontaine
526fafb4b7 Allow quoting identifiers in db uri tablename.
As shown in #476, it is sometimes needed to be able to quote the
identifier names even when loading from a file, that is when specifying
the target table name in the database uri.

To that ends, allow the option "identifier case" to be used in the file
based cases too. Fixes #476.
2016-11-13 22:14:48 +01:00
Dimitri Fontaine
0b06bc6ad6 Update an old archive test case. 2016-08-28 20:29:30 +02:00
Dimitri Fontaine
a86a606d55 Improve existing PostgreSQL database handling.
When loading data into an existing PostgreSQL catalog, we DROP the
indexes for better performance of the data loading. Some of the indexes
are UNIQUE or even PRIMARY KEYS, and some FOREIGN KEYS might depend on
them in the PostgreSQL dependency tracking of the catalog.

We used to use the CASCADE option when dropping the indexes, which hides
a bug: if we exclude from the load tables with foreign keys pointing to
tables we target, then we would DROP those foreign keys because of the
CASCADE option, but fail to install them again at the end of the load.

To prevent that from happening, pgloader now query the PostgreSQL
pg_depend system catalog to list the “missing” foreign keys and add them
to our internal catalog representation, from which we know to DROP then
CREATE the SQL object at the proper times.

See #400 as this was an oversight in fixing this issue.
2016-08-10 22:02:06 +02:00
Dimitri Fontaine
c2c98b8b42 Allow any character in a quoted CSV field name.
We used to force overly strict rules for a quoted field name in a CSV
load file, now accept any character but a quote to be part of the field
name.

Fixes #416.
2016-08-07 20:35:37 +02:00
Dimitri Fontaine
70572a2ea7 Implement support for existing target databases.
Also known as the ORM case, it happens that other tools are used to
create the target schema. In that case pgloader job is to fill in the
exiting target tables with the data from the source tables.

We still focus on load speed and pgloader will now DROP the
constraints (Primary Key, Unique, Foreign Keys) and indexes before
running the COPY statements, and re-install the schema it found in the
target database once the data load is done.

This behavior is activated when using the “create no tables” option as
in the following test-case setup:

  with create no tables, include drop, truncate

Fixes #400, for which I got a test-case to play with!
2016-08-06 20:19:15 +02:00
Dimitri Fontaine
42e9e521e0 Add option "max parallel create index".
By default, pgloader will start as many parallel CREATE INDEX commands
as the maximum number of indexes you have on any single table that takes
part in the load.

As this number might be so great as to exhaust the target PostgreSQL
server (e.g. maintenance_work_mem), we add an option to limit that to
something reasonnable when the source schema isn't.

Fix #386 in which 150 indexes are found on a single source table.
2016-04-11 17:40:52 +02:00
Dimitri Fontaine
31f8b5c5f0 Set application_name to 'pgloader' by default.
It's always been possible to set application_name to anything really,
making it easier to follow the PostgreSQL queries made by pgloader.
Force that setting to 'pgloader' by default.

Fix #387.
2016-04-11 17:14:38 +02:00
Dimitri Fontaine
b1d4e94f2a Fix integer parsing support for SQLite.
The function needs to return a string to be added to the COPY stream, we
still need to make sure whatever given here looks like an integer. Given
the very dynamic nature of data types in SQLite, the integer-to-string
function was already a default now, but failed to be published before in
its fixed version, somehow.
2016-03-27 20:42:40 +02:00
Dimitri Fontaine
fe3601b04c Fix SQLite index support, add foreign keys support.
It turns out recent changes broke tne SQLite index support (from adding
support for MS SQL partial/filtered indexes), so fix it by using the
pgsql-index structure rather than the specific sqlite-idx one.

In passing, improve detection of PRIMARY KEY indexes, which was still
lacking. This work showed that the introspection done by pgloader was
wrong, it's way more crazy that we though, so adjust the code to loop
over PRAGMA calls for each object we inspect.

While adding PRAGMA calls, add support for foreign keys too, we have the
code infrastructure that makes it easy now.
2016-03-27 20:39:13 +02:00
Dimitri Fontaine
8476c1a359 Allow setting search_path with multiple schemas.
The PostgreSQL search_path allows multiple schemas and might even need
it to be able to reference types and other tables. Allow setting more
than one schema by using the fact that PostgreSQL schema names don't
need to be individually quoted, and passing down the exact content of
the SET search_path value down to PostgreSQL.

Fix #359.
2016-03-20 20:54:08 +01:00
Dimitri Fontaine
c724018840 Implement ALTER TABLE clause for MySQL migrations.
The new ALTER TABLE facility allows to act on tables found in the MySQL
database before the migration happens. In this patch the only provided
actions are RENAME TO and SET SCHEMA, which fixes #224.

In order to be able to provide the same option for MS SQL users, we will
have to make it work at the SCHEMA level (ALTER SCHEMA ... RENAME TO
...) and modify the internal schema-struct so that the schema slot of
our table instances are a schema instance rather than its name.

Lacking MS SQL test database and instance, the facility is not yet
provided for that source type.
2016-03-06 21:51:33 +01:00
Dimitri Fontaine
68aa205db5 Also commit SQLite test case changes.
See #351 for context, this adds a proper test case.
2016-03-03 14:59:57 +01:00
Dimitri Fontaine
40c1581794 Review transaction and error handling in COPY.
The PostgreSQL COPY protocol requires an explicit initialization phase
that may fail, and in this case the Postmodern driver transaction is
already dead, so there's no way we can even send ABORT to it.

Review the error handling of our copy-batch function to cope with that
fact, and add some logging of non-retryable errors we may have.

Also improve the thread error reporting when using a binary image from
where it might be difficult to open an interactive debugger, while still
having the full blown Common Lisp debugging experience for the project
developers.

Add a test case for a missing column as in issue #339.

Fix #339, see #337.
2016-02-21 15:56:06 +01:00
Dimitri Fontaine
c108b85290 Allow package prefix in CAST ... USING clause.
Also, in passing, ass a new transformation function for MySQL allowing
to transform from varbinary to text.
2016-02-04 16:09:22 +01:00
Dimitri Fontaine
d9d9e06c0f Another attempt at fixing #323.
Rather than trying hard to have PostgreSQL fully qualify the index name
with tricks around search_path setting at the time ::regclass is
executed, simply join on pg_namespace to retrieve that schema in a new
slot in our pgsql-index structure so that we can then reuse it when
needed.

Also add a test case for the scenario, including both a UNIQUE
constraint and a classic index, because the DROP and CREATE/ALTER
instructions differ.
2016-01-17 01:54:36 +01:00
Dimitri Fontaine
7dd69a11e1 Implement concurrency and workers for files sources.
More than the syntax and API tweaks, this patch also make it so that a
multi-file specification (using e.g. ALL FILENAMES IN DIRECTORY) can be
loaded with several files in the group in parallel.

To that effect, tweak again the md-connection and md-copy
implementations.
2016-01-16 22:53:55 +01:00
Dimitri Fontaine
eb45bf0338 Expose concurrency settings to the end users.
Add the workers and concurrency settings to the LOAD commands for
database sources so that users can tweak them now, and add mentions of
them in the documentation too.

From the documentation string of the copy-from method as found in
src/sources/common/methods.lisp:

   We allow WORKER-COUNT simultaneous workers to be active at the same time
   in the context of this COPY object. A single unit of work consist of
   several kinds of workers:

     - a reader getting raw data from the COPY source with `map-rows',
     - N transformers preparing raw data for PostgreSQL COPY protocol,
     - N writers sending the data down to PostgreSQL.

   The N here is setup to the CONCURRENCY parameter: with a CONCURRENCY of
   2, we start (+ 1 2 2) = 5 concurrent tasks, with a CONCURRENCY of 4 we
   start (+ 1 4 4) = 9 concurrent tasks, of which only WORKER-COUNT may be
   active simultaneously.

Those options should find their way in the remaining sources, that's for
a follow-up patch tho.
2016-01-15 23:22:32 +01:00
Dimitri Fontaine
44a2bd14d4 Fix custom CAST rules with expressions, fix #322.
In a previous commit the typemod matching code had been broken, and we
failed to notice that until now. Thanks to bug report #322 we just got
the memo...

Add a test case in the local-only MySQL database.

The regression testing facilities should be improved to be able to test
a full database, and then to dynamically create said database from code
or something to ease test coverage of those cases.
2016-01-12 14:55:17 +01:00
Dimitri Fontaine
d84ec3f808 Add SQLite test case for before/after load commands.
See bug #321, this change should have been part of previous commit.
2015-12-23 21:58:56 +01:00
Dimitri Fontaine
b4bfa18877 Fix more table name quoting, fix #163 again.
Now that we can have several threads doing COPY, each of them need to
know about the *pgsql-reserved-keywords* list. Make sure that's the case
and in passing fix some call sites to apply-identifier-case.

Also, more disturbingly, fix the code so that TRUNCATE is called from
the main thread before giving control to the COPY threads, rather than
having two concurrent threads doing the TRUNCATE twice. It's rather
strange that we got no complaint from the field on that part...
2015-12-08 11:52:43 +01:00
Dimitri Fontaine
93b6be43d4 Travis: adapt to PostgreSQL 9.1, again.
We didn't have CREATE SCHEMA IF EXISTS at the time...
2015-11-23 22:09:08 +01:00
Dimitri Fontaine
973339abc8 Add a SQLite test case from #310. 2015-11-22 22:16:05 +01:00
Dimitri Fontaine
e23de0ce9f Improve SQLite table names filtering.
Filter the list of tables we migrate directly from the SQLite query,
avoiding to return useless data. To do that, use the LIKE pattern
matching supported by SQLite, where the REGEX operator is only available
when extra features are loaded apparently.

See #310 where filtering out the view still caused errors in the
loading.
2015-11-22 22:10:26 +01:00
Dimitri Fontaine
150d288d7a Improve our regression testing facility.
Next parallelism improvements will allow pgloader to use more than one
COPY thread to load data, with the impact of changing the order of rows
in the database.

Rather than doing a copy out and `diff` of the data just loaded, load
the reference data and do the diff in SQL:

          select * from loaded.data
  except
          select * from expected.data

If such a query returns any row, we know we didn't load what was
expected and the regression test is failing.

This regression testing facility should also allow us to finally add
support for multiple-table regression tests (sqlite, mysql, etc).
2015-11-17 17:03:08 +01:00
Dimitri Fontaine
933d1c8d6b Add test case for #302. 2015-10-22 22:35:32 +02:00
Dimitri Fontaine
6bf26c52ec Implement a TimeZone option for IXF loading.
The local-time:encode-timestamp function takes a default timezone and it
is necessary to have control over it when loading from pgloader. Hence,
add a timezone option to the IXF option list, that is now explicit and
local to the IXF parser rather than shared with the DBF option list.
2015-10-05 16:46:15 +02:00
Dimitri Fontaine
96a33de084 Review the stats and reporting code organisation.
In order to later be able to have more worker threads sharing the
load (multiple readers and/or writers, maybe more specialized threads
too), have all the stats be managed centrally by a single thread. We
already have a "monitor" thread that get passed log messages so that the
output buffer is not subject to race conditions, extend its use to also
deal with statistics messages.

In the current code, we send a message each time we read a row. In some
future commits we should probably reduce the messaging here to something
like one message per batch in the common case.

Also, as a nice side effect of the code simplification and refactoring
this fixes #283 wherein the before/after sections of individual CSV
files within an ARCHIVE command where not counted in the reporting.
2015-10-05 01:46:29 +02:00
Dimitri Fontaine
598c860cf5 Improve user code parsing, fix #297.
To be able to use "t" (or "nil") as a column name, pgloader needs to be
able to generate lisp code where those symbols are available. It's
simple enough in that a Common Lisp package that doesn't :use :cl
fullfills the condition, so intern user symbols in a specially crafted
package that doesn't :use :cl.

Now, we still need to be able to run transformation code that is using
the :cl package symbols and the pgloader.transforms functions too. In
this commit we introduce a heuristic to pick symbols either as functions
from pgloader.transforms or anything else in pgloader.user-symbols.

And so that user code may use NIL too, we provide an override mechanism
to the intern-symbol heuristic and use it only when parsing user code,
not when producing Common Lisp code from the parsed load command.
2015-09-21 13:23:21 +02:00
Dimitri Fontaine
78c6bf097a Fix the build again.
Once more did I change a test file data and forgot to commit the changes
to the expected file of the regression test.
2015-09-12 00:40:15 +02:00
Dimitri Fontaine
98f18c4877 Improve CSV date format, fix #293.
The date format wouldn't allow using colon (:) in the noise parts of it,
and would also insist that milliseconds should be on 4 digits and micro
seconds on 6 digits. Allow for "ragged" input and take however many
digits we actually find in the input.
2015-09-12 00:35:14 +02:00
Dimitri Fontaine
a0dc59624c Fix schema qualified table names usage again.
When the list of columns of the PostgreSQL target table isn't given in
the load command, pgloader will happily query the system catalogs to get
that information. The list-columns query didn't get the memo about the
qualified table name format and the with-schema macro... fix #288.
2015-09-11 11:53:28 +02:00
Dimitri Fontaine
e054eb3838 Travis: set PGTZ in regress.sh
The TimeZone parameter should be set both for input and for output in
order to match our expected result file. Let's try to set PGTZ in the
shell environment...
2015-09-07 20:24:00 +02:00
Dimitri Fontaine
60d58a96b8 Travis: let's try to force client timezone.
The cvs-parse-date test is failing on Travis because the server up there
in the Cloud isn't using the same timezone as my local machine. Let's
just force the timezone in the SET clause...
2015-09-07 20:00:05 +02:00
Dimitri Fontaine
3f539b7384 Travis: update expected output file.
Forgot to update the expected output file in the previous commit, which
Travis is rightfully complaining about...
2015-09-07 17:47:03 +02:00
Dimitri Fontaine
04b2779239 Allow date format parsing to support time.
A useful use case for date parsing at tine input level is to parse
time (hour, minutes, seconds) rather than a full date (timestamp).
Improve the code so that it's possible to use the date format facility
even when the data field lacks the year/month/day information.

Fix #288.
2015-09-07 17:05:10 +02:00
Dimitri Fontaine
eabfbb9cc8 Fix schema qualified table names usage (more).
When parsing table names in the target URI, we are careful of splitting
the table and schema name and store them into a cons in that case. Not
all sources methods got the memo, clean that up.

See #182 and #186, a pull request I am now going to be able to accept.
Also see #287 that should be helped by being able to apply #186.
2015-09-04 01:06:15 +02:00
Dimitri Fontaine
b78bb6dd31 Allow quoted field names to contain spaces, fix #285.
Given a fully quoted field name, there should be no restriction about
using spaces in between the quotes, but the parser used to choke on that
case.
2015-09-01 14:32:50 +02:00
Dimitri Fontaine
75727df72f Quote table names when migrating from SQLite, fix #281.
Apparently I just forgot to apply any smartness whatsoever to SQLite
identifiers and just copied them as they are to PostgreSQL. Change that
by calling apply-identifier-case.
2015-08-25 01:13:19 +02:00
Dimitri Fontaine
573a63cd3a Add a local test, per #271. 2015-08-24 16:42:39 +02:00
Dimitri Fontaine
04ddf940d9 Left pad COPY octal chars with 0, fix #275.
The COPY TEXT format accepts non printable characters with an escaped
sequence wherin pgloader can pass in the octal number for the character
in its encoding. When doing that with small numbers like \6 and the
non-printable character is then followed by other numbers, then it
becomes e.g. \646 which might not be part of the target encoding...

To fix, always left pad the character octal number with zeroes, so that
we now send in \00646 which COPY knows how to read: the char at \006
then 4 then 6.

Also copy the test case over to pgloader and run it in the test suite.
2015-08-20 18:17:18 +02:00
Dimitri Fontaine
833b41c23b Fix the regression test expected values, see #266. 2015-07-26 14:45:43 +02:00
Dimitri Fontaine
b4b36caa84 Fix parsing dates with less-than 4 digits, fix #266.
The previous coding decided to add 2000 to the year as an integer if it
was below 2000, which parses 1999 as 3999. Oops. Trigger the correction
only when the date is given on 2 digits only, parsing 04 as 2004.

Dates given on 3 digits are kept as-is.

Playing with the *century* special parameter allows to cancel this
behavior, that maybe should be made entirely optional. It's just too
common to find current years on 2 digits only, sadly.
2015-07-26 14:41:44 +02:00
Dimitri Fontaine
d2a1a5643e Improve SQL blocks support, fix #265.
It's now possible to use several files in a BEFORE LOAD EXECUTE section,
and to mix DO and EXECUTE parts, bringing lots of flexibility in the
commands. Also it actually simplifies the parser.
2015-07-24 17:41:35 +02:00
Dimitri Fontaine
54e29773d7 Fix index creation reporting, see #251.
The new option 'drop indexes' reuses the existing code to build all the
indexes in parallel but failed to properly account for that fact in the
summary report with timings.

While fixing this, also fix the SQL used to re-establish the indexes and
associated constraints to allow for parallel execution, the ALTER TABLE
statements would block in ACCESS EXCLUSIVE MODE otherwise and make our
efforts vain.
2015-07-18 23:06:15 +02:00
Dimitri Fontaine
a98788b670 Implement drop indexes option for copy and fixed.
The option doesn't seem relevant to the db3 source type which contains a
table definition: pgloader will create the table from scratch and no
indexes are going to be found.
2015-07-16 21:39:06 +02:00
Dimitri Fontaine
49bf7e56f2 Implement a "drop indexes" option in CSV mode, fix #251.
When loading against a table that already has index definitions, the
load can be quite slow. Previous commit introduced a warning in such a
case. This commit introduces the option "drop indexes" that is not used
by default.

When this option is used, pgloader drops the indexes before loading the
data then create the indexes again with the same definitions as before.
All the indexes are created again in parallel to optimize performances.
Only primary key indexes can't be created in parallel, so those are
created in two steps (create unique index then alter table).
2015-07-16 12:22:58 +02:00
Dimitri Fontaine
1c7de22096 Add test coverage for #80. 2015-06-25 14:16:12 +02:00
Dimitri Fontaine
7e508374c4 Add some SQLite test cases for real type, see #249. 2015-06-16 23:13:52 +02:00
Dimitri Fontaine
c3b5d60542 Fix type declaration to include null values, fix #238.
In passing, add a test case for NIL datetime values in our SQLite sample
database.
2015-05-22 23:49:03 +02:00
Dimitri Fontaine
ba7b27b60a Travis: actually push the right version of the expected file. 2015-05-22 12:41:29 +02:00
Dimitri Fontaine
bffec4cc63 Allow for more options in the CSV escape character, fix #38.
To allow for importing JSON one-liners as-is in the database it can be
interesting to leverage the CSV parser in a compatible setup. That setup
requires being able to use any separator character as the escape
character.
2015-05-22 12:31:06 +02:00
Dimitri Fontaine
dc86b5e600 Travis: Fix the text case connection string. 2015-05-21 16:29:14 +02:00
Dimitri Fontaine
abbc105c41 Implement CSV headers support.
Some CSV files are given with an header line containing the list of
their column names, use that when given the option "csv header".

Note that when both "skip header" and "csv header" options are used,
pgloader first skip as many required lines and then uses the next one as
the csv header.

Because of temporary failure to install the `ronn` documentation tool,
this patch only commits the changes to the source docs and omits to
update the man page (pgloader.1). A following patch is intended to be
pushed that fixed that.

See #236 which is using shell tricks to retrieve the field list from the
CSV file itself and motivated this patch to finally get written.
2015-05-21 12:55:23 +02:00
Mark Lee
dc04b40836 Accept periods in CSV field names
Periods are allowed in PG column names as well.
2015-05-15 07:22:07 -07:00
Dimitri Fontaine
95a5eb3184 Implement more COPY options, fix #218.
The COPY format now supports user defined delimiter and null options,
and we don't require the column names anymore as it's useless in that
context.
2015-04-30 14:30:16 +02:00
Dimitri Fontaine
53dcdfd8ef Fix handling of COPY data, fix #222.
When given a file in the COPY format, we should expect that its content
is already properly escaped as expected by PostgreSQL. Rather than
unescape the data then escape it again, add a new more of operation to
format-vector-row in which it won't even try to reformat the data.

In passing, fix an off-by-one bug in dealing with non-ascii characters.
2015-04-30 13:17:02 +02:00
Dimitri Fontaine
11976d2c23 Fix census-places URL location of the source file. 2015-04-17 21:58:19 +02:00
Dimitri Fontaine
7d2d09ce68 Add the option to preserve MySQL index names, fix #187.
See test/parse/hans.goeuro.load for an example usage of the new option.

In passing, any error when creating indexes is now properly reported and
logged, which was missing previously. Oops.
2015-03-07 20:19:47 +01:00
Dimitri Fontaine
48f451bdbc Implement the option to disable triggers when loading data.
This option is dangerous and allows to skip ALL triggers when loading
data against PostgreSQL. This includes foreign key constraints
definitions and will allow loading data out of order.

When using both the options "create no table" and "disable triggers" it
will be possible to load data into a schema prepared by your favorite
external tool, at the cost of not validating FK constraints. Use with
care.

Fix #167.
2015-02-19 15:05:10 +01:00
Dimitri Fontaine
ce5a61face Catch PostgreSQL internal errors too, fixes #155. 2015-01-21 13:01:28 +01:00
Dimitri Fontaine
a3b68bb80a Random test load file cleanup. 2015-01-15 17:49:19 +01:00
Dimitri Fontaine
e1bc6425e2 Implement support for PostgreSQL COPY format, fix #145.
PostgreSQL COPY format is not really CSV but something way easier to
parse. Funnily enough, parsing it as CSV is not that easy, so we add
here a special simple parser for the COPY format.

It should be quite useful too try loading again reject data files from
pgloader after manual fixing, too. It's still missing some documentation
without any good excuse for that, will add soon.
2015-01-02 18:49:17 +01:00
Dimitri Fontaine
1996256f8f Add support for postgres:// URI prefixes, fix #135. 2014-12-15 20:32:49 +01:00
Dimitri Fontaine
87e157bee2 Add a new database source type in the parser.
Now it's possible to parse a command to load data from MS SQL. The
parser was until now parsing all database URI within the same common
rule and that isn't possible anymore if we want to distinguish in
between source database right from the parser, which we actually want to
do.

This patch also implement in-passing fixes all over the place, including
the transformation function float-to-string that only happened to work
on double-float data.
2014-11-17 00:23:06 +01:00
Dimitri Fontaine
68416a79b3 In passing docs review.
Also modernize the test/census-place.load to use the newer fixed file
format options, and show that in the docs.
2014-10-18 22:42:31 +02:00
Dimitri Fontaine
22f4317a30 Add support for the CAST rule to SQLite sources.
This allows users to benefit from the same flexible machinery when using
SQLite as when using MySQL, and also allows to add some more default
cast rules too.
2014-10-13 00:52:55 +02:00
Dimitri Fontaine
00b002124b Travis: switch test case to timestamp, dropping the TZ.
The timezone is different in between my own machine and the test system,
just get rid of that discrepancy so that the test stop failing.
2014-10-02 01:32:19 +02:00
Dimitri Fontaine
08fc7fd092 Square brackets around source field options are now mandatory. 2014-10-02 01:17:24 +02:00
Dimitri Fontaine
2f49c9614c Fix the new test case's out file. 2014-10-02 01:16:58 +02:00
Dimitri Fontaine
7cf7e714fc Implement the source date format option. 2014-10-02 01:03:24 +02:00
Dimitri Fontaine
ac55d71401 Rework CSV and Fixed files source fields options, see #116.
It's not possible to use a comma separator when using more than one
source field option at the same time, and for better readability the
options are to be found enclosed in squared brackets.

Also, it's now possible to spell out "from" and "for" keywords on the
source definitions, making it easier to read and maintain the load file,
as in this full example:

          (
           a from  0 for 10,
           b from 10 for  8,
           c from 18 for  8,
           d from 26 for 17 [null if blanks, trim right whitespace]
          )
2014-10-01 18:32:40 +02:00
Dimitri Fontaine
ea97fc4659 Implement a new source level filter: trim.
As seen in #116, it might be better for the users to be able to ask for
field trimming right in the source definition, like we do for processing
nulls.
2014-09-29 15:16:04 +02:00
Dimitri Fontaine
8a0c91fa40 Fix conjunctions for the INCLUDING clause in MySQL.
We want all table matching any of the given constraints (regexp or
equality search).
2014-09-23 16:44:38 +02:00
Dimitri Fontaine
52e3371be8 Review default SQLite options. 2014-09-22 14:34:57 +02:00
Dimitri Fontaine
d4b58a1f78 Review MySQL default options.
The default values for MySQL WITH clause options wasn't really tested
and broke on simple cases, the new set of defaults is known to work in
many cases (most?).

Other combinations of options will need some review work, and we might
need to consider preventing some of them, that's for another patch tho.
2014-09-21 12:19:20 -05:00
Dimitri Fontaine
ca52ddacb1 SQLite: transform "0" timestamps to NULL, see #100. 2014-07-30 18:42:49 +02:00
Dimitri Fontaine
ed8022ce64 SQLite: transform default values to their PostgreSQL representation.
When default values are used in SQLite they are of course using their
SQLite representation, which might not be compatible with the PostgreSQL
target data type we're casting to. Make it so that the default values
are transformed too, as we already do in the MySQL case.

See #100.
2014-07-30 16:32:35 +02:00
Dimitri Fontaine
4aa8b0946f Get rid of unused sample file. 2014-07-18 11:07:25 +02:00