Commit Graph

55 Commits

Author SHA1 Message Date
Dimitri Fontaine
cd46b6cbed Clean up common code for sources.
Only move code around, creating a src/sources/common directory with
several files in there so as to split the too big src/sources.lisp.
2015-01-08 23:17:40 +01:00
Dimitri Fontaine
2caefb0836 Fix and improve new summary reporting. 2015-01-06 12:36:14 +01:00
Dimitri Fontaine
ad8fb0b2a4 Implement machine readable summary files, fixes #144.
It's now possible to have pgloader print out its summary in one of
several formats: human-readable (default), csv, copy or json. The
choice of format is made depending on the extension of the summary
filename picked on the command line with the option --summary.
2015-01-06 01:22:01 +01:00
Dimitri Fontaine
d9f5bff5e0 Cleanup some code location. 2015-01-06 01:18:54 +01:00
Dimitri Fontaine
a86369a03d Improve the CLI situation a bit.
Fix bugs related to parsing the new COPY type, and make it so that we
know how to parse options (and fields, and other type dependant things)
even when --type is missing, in care the source URL has the information.
2015-01-06 00:07:31 +01:00
Dimitri Fontaine
e1bc6425e2 Implement support for PostgreSQL COPY format, fix #145.
PostgreSQL COPY format is not really CSV but something way easier to
parse. Funnily enough, parsing it as CSV is not that easy, so we add
here a special simple parser for the COPY format.

It should be quite useful too try loading again reject data files from
pgloader after manual fixing, too. It's still missing some documentation
without any good excuse for that, will add soon.
2015-01-02 18:49:17 +01:00
Dimitri Fontaine
40f3c4f769 Improve HTTP handling of CSV and Fixed data sources.
In passing also allow --field to specify the whole field list, there's
no point in forcing the user to have as many --field switches on the
command line as they have columns in their data source file.
2014-12-27 17:02:19 +01:00
Dimitri Fontaine
13992121b3 Export more utilities in the pgloader package.
That makes like of other CL users wanted to play with pgloader way easier.
2014-12-26 22:00:22 +01:00
Dimitri Fontaine
302a7d402b Refactor connection handling, and clean-up many things.
That's the big refactoring patch I've been sitting on for too long.

First, refactor connection handling to use a uniformed "connection"
concept (class and generic functions API) everywhere, so that the COPY
derived objects just use that in their :source-db and :target-db slots.

Given that, we don't need no messing around with *pgconn* and *myconn-*
and other special variables at all anywhere in the tree.

Second, clean up some oddities accumulated over time, where some parts
of the code didn't get the memo when new API got into place.

Third, fix any other oddity or missing part found while doing those
first two activities, it was long overdue anyway...
2014-12-26 21:50:29 +01:00
Dimitri Fontaine
6eac0d9dd8 Implement --before and --after options on the command line.
That allows using SQL scripts to run before and after the main data
processing and loading done by pgloader when used only from the command
line.
2014-12-23 12:21:44 +01:00
Dimitri Fontaine
65c2043694 Improve pgloader usage from the command line.
Make it so that the following command line usages are accepted when
using pgloader without a command file:

 ./build/bin/pgloader ./test/sqlite/sqlite.db postgresql:///pgloader

 ./build/bin/pgloader --set "search_path='sakila'"  \
                      mysql://root@localhost/sakila \
                 postgresql:///sakila

 ./build/bin/pgloader --type csv                             \
                      --field id --field field               \
                      --with truncate                        \
                      --with "fields terminated by ','"      \
                      ./test/data/matching-1.csv             \
                      postgres:///pgloader?matching

It's now possible in most cases to just use command-line options, which
should make the entry bar to pgloader much lower.
2014-12-23 02:40:13 +01:00
Dimitri Fontaine
f20d7cb452 Some more cleanup after the *pgconn* refactoring. 2014-12-19 14:41:49 +01:00
Dimitri Fontaine
5b726e47a0 Improve error reporting on connection error. 2014-12-19 14:24:35 +01:00
Dimitri Fontaine
d4cab3a81e Fix MSSQL index and foreign key names.
First, the index names in MS SQL, as in MySQL, are only unique per
table, whereas they need to be globally unique (per database) in
PostgreSQL. So reuse the infrastructure we had for MySQL here.

Second, the way we trick table names in index and fkey structures means
that we already did quote the names and we don't want to quote them
again, so add a new possible *identifier-case* value to handle the case
where nothing is to be done, pretty please.
2014-11-25 00:42:37 +01:00
Dimitri Fontaine
f263d1b2a4 Implement Foreign Key support for MSSQL.
Piggyback as much as possible on the work already done for MySQL.
2014-11-24 23:42:19 +01:00
Dimitri Fontaine
87e157bee2 Add a new database source type in the parser.
Now it's possible to parse a command to load data from MS SQL. The
parser was until now parsing all database URI within the same common
rule and that isn't possible anymore if we want to distinguish in
between source database right from the parser, which we actually want to
do.

This patch also implement in-passing fixes all over the place, including
the transformation function float-to-string that only happened to work
on double-float data.
2014-11-17 00:23:06 +01:00
Dimitri Fontaine
6473a892d4 First steps toward MS SQL compatibility. 2014-11-09 00:09:42 +01:00
Dimitri Fontaine
22f4317a30 Add support for the CAST rule to SQLite sources.
This allows users to benefit from the same flexible machinery when using
SQLite as when using MySQL, and also allows to add some more default
cast rules too.
2014-10-13 00:52:55 +02:00
Dimitri Fontaine
3c334dcdc4 Refactor the main parser to use the bind macro.
The metabang-bind lib offers a nice bind macro that solves the problem
of ignoring bindings in destructuring-bind, and allows a let* approach
to nested destructuring (wven when mixed with let declarations).

Using that lib (that we already indirectly depend on anyway) simplifies
the parser code substantially.
2014-10-02 17:05:35 +02:00
Dimitri Fontaine
aee7eeba8d Fix refactoring missed pieces. 2014-10-02 01:16:04 +02:00
Dimitri Fontaine
7cf7e714fc Implement the source date format option. 2014-10-02 01:03:24 +02:00
Dimitri Fontaine
dfb1e9355a Get rid of our own implementation of alexandria:read-file-into-string. 2014-10-01 23:23:46 +02:00
Dimitri Fontaine
2369a142a7 Refactor source code organisation.
In passing, fix a bug in the previous commit where left-over code would
cancel the whole new parsing code for advanced source fields options.
2014-10-01 23:20:24 +02:00
Dimitri Fontaine
9ddf117a90 In-passing desultory cleanup. 2014-07-14 21:54:50 +02:00
Dimitri Fontaine
3e0526c957 Implement early support for IXF files. 2014-07-14 21:53:50 +02:00
Dimitri Fontaine
807f5cefcd Fix omitted file dependency (reading queries from file). 2014-06-16 14:24:05 +02:00
Dimitri Fontaine
e710cacad1 Truncate all tables in a single command, fix #61.
The truncate command is only sent to PostgreSQL when we didn't just
CREATE TABLE before. Some refactoring would be necessary to fit the
TRUNCATE command within the same transaction as the CREATE TABLE
command, for PostgreSQL performances.

This patch has been testing with MySQL and SQLite sources, the trick is
that to be able to test it, it's needed to first make a full
import (creating the target tables), so the test are not modified yet.
2014-05-19 18:07:35 +02:00
Dimitri Fontaine
c38798a4dd Implement BEFORE/AFTER LOAD EXECUTE 'filename'.
That allows using the same SQL files as usual when using pgloader, as it
even supports the \i and \ir psql features (and dollar quoting, etc).

In passing, refactor docs to avoid saying the same things all over the
place, which isn't a very good idea in a man page, at least as far
editing it is involved.
2014-05-04 23:04:45 +02:00
Dimitri Fontaine
429232c3de Fix loading data from stdin: fix #53.
The stdin support really was one brick shy of a load, and in particular
with-open-file was used against a stream when using that option.
2014-04-27 23:38:02 +02:00
Dimitri Fontaine
35ca4927e9 Get rid of some lib dependencies.
The charset business isn't worth depending on an AGPL licenced lib which
is part of a huge Quicklisp system.
2014-04-25 17:21:11 +02:00
Dimitri Fontaine
a8b0f91f37 Allow optional control of batch memory footprint, see #16 and #22.
With the new internal setting *copy-batch-size* it's now possible to
instruct pgloader to close batches early (before *copy-batch-rows* limit)
when crossing the byte count threshold.

When set to 20 MB it allows the new test case (exhausted) to pass under SBCL
and CCL, and there's no measurable cost when *copy-batch-size* is set to
nil (its default value) in the testing done.

This patch is published without any way to tune the values from the command
language yet, that's the next step once its been proven effective.
2014-01-26 23:22:18 +01:00
Dimitri Fontaine
db947e1467 Rework reader and writer data exchange.
With this patch, the whole data massaging and final formating into the
PostgreSQL COPY TEXT format is done by the reader thread, which publishes a
batch at a time in the communication channel: a lparallel.queue object.

Before that, the raw vectors where pushed directly in the queue, offering
more flexibility to adjust to the reader and writer IO rates and
capabilities, but impeding the ability of the Garbage Collector: data still
in the queue was not collected even if not needed anymore.

The new model also uses less memory, and allows a better control over what
amount of data stays in memory. The new *concurrent-batches* parameter
should be key to being able to process huge rows.

The intent is to offering a way for the users to tune *concurrent-batches*
down to 1 for sources with massive per-row memory footprint. Even better
would be to find a way to automatically adjust the setting without spending
too much time counting the bytes we're batching.

Preliminary tests show no sensible impact on performances from this patch,
even some improvements in cases.
2014-01-25 23:54:49 +01:00
Dimitri Fontaine
a51a712b6a Fix asd dependencies, cleanup useless and misplaced compilation options. 2014-01-21 14:37:26 +01:00
Dimitri Fontaine
b2c9e0d2dc Refactor the whole logging infrastructure not to depend on threads sharing streams. 2013-12-24 19:08:55 +01:00
Dimitri Fontaine
fe302af221 Refactor the dbname API to feed from the connection string directly. 2013-12-20 17:24:02 +01:00
Dimitri Fontaine
633b6ae64f Export with-mysql-connection facility for REPL testing. 2013-12-18 12:05:56 +01:00
Dimitri Fontaine
2019b918f0 Implement support for matching several files in a single archive clause. 2013-11-26 16:47:37 +01:00
Cédric Villemain
8984a0a810 Rename reject-root-path to root-dir
Prepare the variable to be a parameter
2013-11-21 11:00:01 +01:00
Dimitri Fontaine
d99b859c3f Improve and fix COPY error handling, transactions, connections, and GUCs. 2013-11-13 23:54:41 +01:00
Dimitri Fontaine
8c9af591e3 Cleanup where to add the table oids for MySQL indexes migration. 2013-11-08 21:09:50 +01:00
Dimitri Fontaine
e9b734dc41 Reverting "Shorten column names in the application to bypass a postmodern bug (or something)."
Revert "Shorten column names in the application to bypass a postmodern bug (or something)."

This reverts commit 240574a1a5.
2013-11-08 17:27:45 +01:00
Dimitri Fontaine
240574a1a5 Shorten column names in the application to bypass a postmodern bug (or something). 2013-11-08 00:10:49 +01:00
Dimitri Fontaine
2e3edde3ad Implement a generic API to deal with indexes, use it in MySQL and SQLite sources. 2013-11-07 22:07:00 +01:00
Dimitri Fontaine
a227943012 Fix the logging system, we now have a proper logfile. 2013-11-07 20:46:47 +01:00
Dimitri Fontaine
9d5dad7e3e Implement support for FIXED COLS input files, reaching release candidate status. 2013-11-07 15:39:28 +01:00
Dimitri Fontaine
a9dd0aafa3 Fix ALTER TABLE ... DROP CONSTRAINT not to happen against non existing tables or constraints. 2013-11-05 21:24:32 +01:00
Dimitri Fontaine
e53e613a82 Implement support for MySQL Foreign Keys. 2013-11-05 18:48:54 +01:00
Dimitri Fontaine
2477b68e15 Implement filtering against the list of tables to migrate, in MySQL. 2013-11-05 14:48:05 +01:00
Dimitri Fontaine
6a75187b7d Refactor MySQL to use the new API. 2013-11-04 19:16:08 +01:00
Dimitri Fontaine
8abbaeea37 Refactor SQLite to use the new API. 2013-11-04 16:22:17 +01:00