Commit Graph

20 Commits

Author SHA1 Message Date
Dimitri Fontaine
7b9b8a32e7 Move sexp parsing into its own file.
After all, it's shared between the CSV command parsing and the Cast
Rules parsing. src/parsers/command-csv.lisp still contains lots of
facilities shared between the file based sources, will need another
series of splits.
2015-10-05 11:39:44 +02:00
Dimitri Fontaine
c880f86bb6 Fix user defined casting rules.
Commit 598c860cf5 broke user defined
casting rules by interning "precision" and "scale" in the
pgloader.user-symbols package: those symbols need to be found in the
pgloader.transforms package instead.

Luckily enough the infrastructure to do that was already in place for
cl:nil.
2015-10-05 02:11:23 +02:00
Dimitri Fontaine
96a33de084 Review the stats and reporting code organisation.
In order to later be able to have more worker threads sharing the
load (multiple readers and/or writers, maybe more specialized threads
too), have all the stats be managed centrally by a single thread. We
already have a "monitor" thread that get passed log messages so that the
output buffer is not subject to race conditions, extend its use to also
deal with statistics messages.

In the current code, we send a message each time we read a row. In some
future commits we should probably reduce the messaging here to something
like one message per batch in the common case.

Also, as a nice side effect of the code simplification and refactoring
this fixes #283 wherein the before/after sections of individual CSV
files within an ARCHIVE command where not counted in the reporting.
2015-10-05 01:46:29 +02:00
Dimitri Fontaine
598c860cf5 Improve user code parsing, fix #297.
To be able to use "t" (or "nil") as a column name, pgloader needs to be
able to generate lisp code where those symbols are available. It's
simple enough in that a Common Lisp package that doesn't :use :cl
fullfills the condition, so intern user symbols in a specially crafted
package that doesn't :use :cl.

Now, we still need to be able to run transformation code that is using
the :cl package symbols and the pgloader.transforms functions too. In
this commit we introduce a heuristic to pick symbols either as functions
from pgloader.transforms or anything else in pgloader.user-symbols.

And so that user code may use NIL too, we provide an override mechanism
to the intern-symbol heuristic and use it only when parsing user code,
not when producing Common Lisp code from the parsed load command.
2015-09-21 13:23:21 +02:00
Dimitri Fontaine
b78bb6dd31 Allow quoted field names to contain spaces, fix #285.
Given a fully quoted field name, there should be no restriction about
using spaces in between the quotes, but the parser used to choke on that
case.
2015-09-01 14:32:50 +02:00
Dimitri Fontaine
04aa743eb7 Cleanup file based "connections".
When the notion of a connection class with a generic set of method was
invented, the very flexible specification formats available for the file
based sources where not integrated into the new connection system.

This patch provides a new connection class md-connection with a specific
sub-protocol (after opening a connection, the caller is supposed to loop
around open-next-stream) so that it's possible to both properly fit into
the connection concept and to better share the code in between our three
implementation (csv, copy, fixed).
2015-08-24 16:33:00 +02:00
Dimitri Fontaine
ea35eb575d Implement --dry-run option, fix #264.
The dry run option will currently only check database connections, but
as that happens after having correctly parsed the load file, it allows
to also check that the command file is correct for the parser.

Note that the list load-data API isn't subject to the dry-run method.

In passing, we add some more API entry points to the connection objects
and we should actually clean the code base to use the new QUERY generic
all over the place. It's for another patch tho.
2015-08-22 16:23:47 +02:00
Dimitri Fontaine
54e29773d7 Fix index creation reporting, see #251.
The new option 'drop indexes' reuses the existing code to build all the
indexes in parallel but failed to properly account for that fact in the
summary report with timings.

While fixing this, also fix the SQL used to re-establish the indexes and
associated constraints to allow for parallel execution, the ALTER TABLE
statements would block in ACCESS EXCLUSIVE MODE otherwise and make our
efforts vain.
2015-07-18 23:06:15 +02:00
Dimitri Fontaine
49bf7e56f2 Implement a "drop indexes" option in CSV mode, fix #251.
When loading against a table that already has index definitions, the
load can be quite slow. Previous commit introduced a warning in such a
case. This commit introduces the option "drop indexes" that is not used
by default.

When this option is used, pgloader drops the indexes before loading the
data then create the indexes again with the same definitions as before.
All the indexes are created again in parallel to optimize performances.
Only primary key indexes can't be created in parallel, so those are
created in two steps (create unique index then alter table).
2015-07-16 12:22:58 +02:00
Dimitri Fontaine
d75c100399 Expose cl-csv escape mode option, fix #80.
Some CSV files are using the CSV escape character internally in their
fields. In that case we enter a parsing bug in cl-csv where backtracking
from parsing the escape string isn't possible (or at least
unimplemented).

To handle the case, change the quote parameter from \" to just \ and let
cl-csv use its escape-quote mechanism to decide if we're escaping only
separators or just any data.

See https://github.com/AccelerationNet/cl-csv/issues/17 where the escape
mode feature was introduced for pgloader issue #80 already.
2015-06-25 14:10:36 +02:00
Dimitri Fontaine
bffec4cc63 Allow for more options in the CSV escape character, fix #38.
To allow for importing JSON one-liners as-is in the database it can be
interesting to leverage the CSV parser in a compatible setup. That setup
requires being able to use any separator character as the escape
character.
2015-05-22 12:31:06 +02:00
Dimitri Fontaine
abbc105c41 Implement CSV headers support.
Some CSV files are given with an header line containing the list of
their column names, use that when given the option "csv header".

Note that when both "skip header" and "csv header" options are used,
pgloader first skip as many required lines and then uses the next one as
the csv header.

Because of temporary failure to install the `ronn` documentation tool,
this patch only commits the changes to the source docs and omits to
update the man page (pgloader.1). A following patch is intended to be
pushed that fixed that.

See #236 which is using shell tricks to retrieve the field list from the
CSV file itself and motivated this patch to finally get written.
2015-05-21 12:55:23 +02:00
Dimitri Fontaine
dfb4cc2049 Allow dollars in CSV fields names, fix #236. 2015-05-21 12:51:39 +02:00
Mark Lee
dc04b40836 Accept periods in CSV field names
Periods are allowed in PG column names as well.
2015-05-15 07:22:07 -07:00
Dimitri Fontaine
0068a45e1c Fix parsing of qualified target table names, see #186.
We used to parse qualified table names as a simple string, which then
breaks attempts to be smart about how to quote idenfifiers. Some sources
are known to accept dots in quoted table names and we need to be able to
process that properly without tripping on qualified table names too
late.

Current code might not be the best approach as it's just using either a
cons or a string for table names internally, rather than defining a
proper data structure with a schema and a name slot.

Well, that's for a later cleanup patch, I happen to be lazy tonight.
2015-04-17 23:22:30 +02:00
Dimitri Fontaine
48f451bdbc Implement the option to disable triggers when loading data.
This option is dangerous and allows to skip ALL triggers when loading
data against PostgreSQL. This includes foreign key constraints
definitions and will allow loading data out of order.

When using both the options "create no table" and "disable triggers" it
will be possible to load data into a schema prepared by your favorite
external tool, at the cost of not validating FK constraints. Use with
care.

Fix #167.
2015-02-19 15:05:10 +01:00
Dimitri Fontaine
40f3c4f769 Improve HTTP handling of CSV and Fixed data sources.
In passing also allow --field to specify the whole field list, there's
no point in forcing the user to have as many --field switches on the
command line as they have columns in their data source file.
2014-12-27 17:02:19 +01:00
Dimitri Fontaine
302a7d402b Refactor connection handling, and clean-up many things.
That's the big refactoring patch I've been sitting on for too long.

First, refactor connection handling to use a uniformed "connection"
concept (class and generic functions API) everywhere, so that the COPY
derived objects just use that in their :source-db and :target-db slots.

Given that, we don't need no messing around with *pgconn* and *myconn-*
and other special variables at all anywhere in the tree.

Second, clean up some oddities accumulated over time, where some parts
of the code didn't get the memo when new API got into place.

Third, fix any other oddity or missing part found while doing those
first two activities, it was long overdue anyway...
2014-12-26 21:50:29 +01:00
Dimitri Fontaine
65c2043694 Improve pgloader usage from the command line.
Make it so that the following command line usages are accepted when
using pgloader without a command file:

 ./build/bin/pgloader ./test/sqlite/sqlite.db postgresql:///pgloader

 ./build/bin/pgloader --set "search_path='sakila'"  \
                      mysql://root@localhost/sakila \
                 postgresql:///sakila

 ./build/bin/pgloader --type csv                             \
                      --field id --field field               \
                      --with truncate                        \
                      --with "fields terminated by ','"      \
                      ./test/data/matching-1.csv             \
                      postgres:///pgloader?matching

It's now possible in most cases to just use command-line options, which
should make the entry bar to pgloader much lower.
2014-12-23 02:40:13 +01:00
Dimitri Fontaine
fff756f95f Refactor the command parser.
Split its content into separate files, so that each is easier to
maintain, and to make it easier also to add support for new sources.
2014-11-16 22:22:04 +01:00