68 Commits

Author SHA1 Message Date
Dimitri Fontaine
5e18cfd7d4 Implement support for partial indexes.
MS SQL has a notion of a "filtered index" that matches the notion of a
PostgreSQL partial index: the index only applies to the rows matching
the index WHERE clause, or filter.

The WHERE clause in both case are limited to simple expressions over a
base table's row at a time, so we implement a limited WHERE clause
parser for MS SQL filters and a transformation routine to rewrite the
clause in PostgreSQL slang.

In passing, we transform the filter constants using the same
transformation functions as in the CAST rules, so that e.g. a MS SQL
bit(1) value that got transformed into a PostgreSQL boolean is properly
translated, as in the following example:

  MS SQL:     "([deleted]=(0))"  (that's from the catalogs)
  PostgreSQL: deleted = 'f'

Of course the parser is still very badly tested, let's see what happens
in the wild now.

(Should) Fix #365.
2016-03-21 23:39:45 +01:00
Dimitri Fontaine
c724018840 Implement ALTER TABLE clause for MySQL migrations.
The new ALTER TABLE facility allows to act on tables found in the MySQL
database before the migration happens. In this patch the only provided
actions are RENAME TO and SET SCHEMA, which fixes #224.

In order to be able to provide the same option for MS SQL users, we will
have to make it work at the SCHEMA level (ALTER SCHEMA ... RENAME TO
...) and modify the internal schema-struct so that the schema slot of
our table instances are a schema instance rather than its name.

Lacking MS SQL test database and instance, the facility is not yet
provided for that source type.
2016-03-06 21:51:33 +01:00
Dimitri Fontaine
782561fd4e Handle default value transforms errors, fix #333.
It turns out that MySQL catalog always store default value as strings
even when the column itself is of type bytea. In some cases, it's then
impossible to transform the expected bytea from a string.

In passing, move some code around to fix dependencies and make it
possible to issue log warnings from the default value printing code.
2016-02-03 12:27:58 +01:00
Dimitri Fontaine
aa8b756315 Fix when to create indexes.
In the recent refactoring and improvements of parallelism the indexes
creation would kick in before we know that the data is done being copied
over to the target table.

Fix that by maintaining a writers-count hashtable and only starting to
create indexes when that count reaches zero, meaning all the concurrent
tasks started to handle the COPY of the data are now done.
2016-01-16 19:50:21 +01:00
Dimitri Fontaine
8a596ca933 Move connection into utils.
There's no reason why this file should be in the src/ top-level.
2016-01-07 16:42:43 +01:00
Dimitri Fontaine
9e4938cea4 Implement PostgreSQL catalogs data structure.
In order to share more code in between the different source types,
finally have a go at the quite horrible mess of anonymous data
structures floating around.

Having a catalog and schema instances not only allows for code cleanup,
but will also allow to implement some bug fixes and wishlist items such
as mapping tables from a schema to another one.

Also, supporting database sources having a notion of "schema" (in
between "catalog" and "table") should get easier, including getting
on-par with MySQL in the MS SQL support (materialized views has been
asked for already).

See #320, #316, #224 for references and a notion of progress being made.

In passing, also clean up the copy-databases methods for database source
types, so that they all use a fetch-metadata generic function and a
prepare-pgsql-database and a complete-pgsql-database generic function.
Actually, a single method does the job here.

The responsibility of introspecting the source to populate the internal
catalog/schema representation is now held by the fetch-metadata generic
function, which in turn will call the specialized versions of
list-all-columns and friends implementations. Once the catalog has been
fetched, an explicit CAST call is then needed before we can continue.

Finally, the fields/columns/transforms slots in the copy objects are
still being used by the operative code, so the internal catalog
representation is only used up to starting the data copy step, where the
copy class instances are then all that's used.

This might be refactored again in a follow-up patch.
2015-12-30 21:53:01 +01:00
Dimitri Fontaine
cca44c800f Simplify batch and transformation handling.
Make batches of raw data straight from the reader output (map-rows) and
have the transformation worker focus on changing the batch content from
raw rows to copy strings.

Also review the organisation of responsabilities in the code, allowing
to move queue.lisp into utils/batch.lisp, renaming it as its scope has
been reduced to only care about preparing batches.

This came out of trying to have multiple workers concurrently processing
the batches from the reader and feeding the hardcoded 2 COPY workers,
but it failed for multiple reasons. All is left as of now is this
cleanup, which seems to be on the faster side of things, which is always
good.
2015-11-29 17:35:25 +01:00
Dimitri Fontaine
e23de0ce9f Improve SQLite table names filtering.
Filter the list of tables we migrate directly from the SQLite query,
avoiding to return useless data. To do that, use the LIKE pattern
matching supported by SQLite, where the REGEX operator is only available
when extra features are loaded apparently.

See #310 where filtering out the view still caused errors in the
loading.
2015-11-22 22:10:26 +01:00
Dimitri Fontaine
41e9eebd54 Rationalize common generic API implementation.
When devising the common API, the first step has been to implement
specific methods for each generic function of the protocol. It now
appears that in some cases we don't need the extra level of flexibility:
each change of the API has been systematically reported to all the
specific methods, so just use a single generic definition where possible.

In particular, introduce new intermediate class for COPY subclasses
allowing to share more common code in the methods implementation, rather
than having to copy/paste and maintain several versions of the same
code.

It would be good to be able to centralize more code for the database
sources and how they are organized around metadata/import-data/complete
schema, but it doesn't look obvious how to do it just now.
2015-10-05 21:25:21 +02:00
Dimitri Fontaine
7b9b8a32e7 Move sexp parsing into its own file.
After all, it's shared between the CSV command parsing and the Cast
Rules parsing. src/parsers/command-csv.lisp still contains lots of
facilities shared between the file based sources, will need another
series of splits.
2015-10-05 11:39:44 +02:00
Dimitri Fontaine
96a33de084 Review the stats and reporting code organisation.
In order to later be able to have more worker threads sharing the
load (multiple readers and/or writers, maybe more specialized threads
too), have all the stats be managed centrally by a single thread. We
already have a "monitor" thread that get passed log messages so that the
output buffer is not subject to race conditions, extend its use to also
deal with statistics messages.

In the current code, we send a message each time we read a row. In some
future commits we should probably reduce the messaging here to something
like one message per batch in the common case.

Also, as a nice side effect of the code simplification and refactoring
this fixes #283 wherein the before/after sections of individual CSV
files within an ARCHIVE command where not counted in the reporting.
2015-10-05 01:46:29 +02:00
Dimitri Fontaine
eabfbb9cc8 Fix schema qualified table names usage (more).
When parsing table names in the target URI, we are careful of splitting
the table and schema name and store them into a cons in that case. Not
all sources methods got the memo, clean that up.

See #182 and #186, a pull request I am now going to be able to accept.
Also see #287 that should be helped by being able to apply #186.
2015-09-04 01:06:15 +02:00
Dimitri Fontaine
56a89e9b53 Cleanup schema data structure building.
As reported by clisp maintainer (thanks jackdaniel!) when trying to load
pgloader, we had redoundant labels function names in places. Get rid of
those by pushing the new columns found directly at the end of the list,
avoiding the bulky code to then reverse the complex anonymous data
structure.

The Real Fix™ would be to define proper structures where to hold all
those database catalogs representation, but that's an invasive patch and
now isn't a good time to write it.

At least pgloader should load and run with clisp now.
2015-08-15 23:54:45 +02:00
Dimitri Fontaine
d1fce3728a Allow more PostgreSQL URI options, fix #199.
As per PostgreSQL documentation on connection strings, allow overriding
of main URI components in the options parts, with a percent-encoded
syntax for parameters. It allows to bypass the main URI parser
limitations as seen in #199 (how to have a password start with a
colon?).

See:
 http://www.postgresql.org/docs/9.3/interactive/libpq-connect.html#LIBPQ-CONNSTRING
2015-05-22 23:39:04 +02:00
Dimitri Fontaine
55584406fa Add encoding support for db3 sources, fix #176.
It appears that db3 files are not limited to the ASCII character
encoding that they were designed with, so let's clue pgloader about
that.

This commit build
770cbe3526
and the pgloader Makefile has been updated to momentarily fetch cl-db3
from github rather than Quicklisp so that it's possible to enjoy the new
feature immediately.
2015-02-18 22:40:03 +01:00
Dimitri Fontaine
cd46b6cbed Clean up common code for sources.
Only move code around, creating a src/sources/common directory with
several files in there so as to split the too big src/sources.lisp.
2015-01-08 23:17:40 +01:00
Dimitri Fontaine
e1bc6425e2 Implement support for PostgreSQL COPY format, fix #145.
PostgreSQL COPY format is not really CSV but something way easier to
parse. Funnily enough, parsing it as CSV is not that easy, so we add
here a special simple parser for the COPY format.

It should be quite useful too try loading again reject data files from
pgloader after manual fixing, too. It's still missing some documentation
without any good excuse for that, will add soon.
2015-01-02 18:49:17 +01:00
Dimitri Fontaine
302a7d402b Refactor connection handling, and clean-up many things.
That's the big refactoring patch I've been sitting on for too long.

First, refactor connection handling to use a uniformed "connection"
concept (class and generic functions API) everywhere, so that the COPY
derived objects just use that in their :source-db and :target-db slots.

Given that, we don't need no messing around with *pgconn* and *myconn-*
and other special variables at all anywhere in the tree.

Second, clean up some oddities accumulated over time, where some parts
of the code didn't get the memo when new API got into place.

Third, fix any other oddity or missing part found while doing those
first two activities, it was long overdue anyway...
2014-12-26 21:50:29 +01:00
Dimitri Fontaine
87e157bee2 Add a new database source type in the parser.
Now it's possible to parse a command to load data from MS SQL. The
parser was until now parsing all database URI within the same common
rule and that isn't possible anymore if we want to distinguish in
between source database right from the parser, which we actually want to
do.

This patch also implement in-passing fixes all over the place, including
the transformation function float-to-string that only happened to work
on double-float data.
2014-11-17 00:23:06 +01:00
Dimitri Fontaine
fff756f95f Refactor the command parser.
Split its content into separate files, so that each is easier to
maintain, and to make it easier also to add support for new sources.
2014-11-16 22:22:04 +01:00
Dimitri Fontaine
03bba5f486 Some more SQL Server support (schema conversion).
Converting the table definitions (with type casting) seems to work. Also
did experiment a little with actuallt fetching some data... and had to
edit the cl-mssql driver, which is temporarily monkey patched.
2014-11-10 01:16:10 +01:00
Dimitri Fontaine
ca325ba799 Refactor the SQLite source files. 2014-11-09 22:59:30 +01:00
Dimitri Fontaine
6473a892d4 First steps toward MS SQL compatibility. 2014-11-09 00:09:42 +01:00
Dimitri Fontaine
ed853a7bea Allow pgloader to work on windows. 2014-11-06 22:12:20 +01:00
Dimitri Fontaine
3c334dcdc4 Refactor the main parser to use the bind macro.
The metabang-bind lib offers a nice bind macro that solves the problem
of ignoring bindings in destructuring-bind, and allows a let* approach
to nested destructuring (wven when mixed with let declarations).

Using that lib (that we already indirectly depend on anyway) simplifies
the parser code substantially.
2014-10-02 17:05:35 +02:00
Dimitri Fontaine
7cf7e714fc Implement the source date format option. 2014-10-02 01:03:24 +02:00
Dimitri Fontaine
2369a142a7 Refactor source code organisation.
In passing, fix a bug in the previous commit where left-over code would
cancel the whole new parsing code for advanced source fields options.
2014-10-01 23:20:24 +02:00
Dimitri Fontaine
422f87e912 We don't use the zip system anymore. 2014-09-10 22:19:59 +02:00
Dimitri Fontaine
3e0526c957 Implement early support for IXF files. 2014-07-14 21:53:50 +02:00
Dimitri Fontaine
807f5cefcd Fix omitted file dependency (reading queries from file). 2014-06-16 14:24:05 +02:00
Dimitri Fontaine
c3742a9410 Typo fix cl-base64 system's name, fix the fix for #60. 2014-05-16 23:36:45 +02:00
Dimitri Fontaine
9e12035ca1 Review SQLite blob types in light of "manifest typing", fix #60.
When using SQLite 3, a blob column might return either string of byte
vector values dynamically depending on the data itself, or maybe some
more complex parameters controlled at data insert time.

Hard-code the rule that a blob column returned as a string is in fact
base64 encoded (which looks like common practice) and decode it
automatically when needed, before sending to byte-vector-to-bytea. It
might be a tad slow but at least the data is properly converted.

In future, that decision might come and byte us in the back again, at
which point it'll be necessary to consider full casting options as in
the MySQL CAST rules. It seems like a big enough win for now if we can
avoid that.
2014-05-16 23:13:57 +02:00
Dimitri Fontaine
35ca4927e9 Get rid of some lib dependencies.
The charset business isn't worth depending on an AGPL licenced lib which
is part of a huge Quicklisp system.
2014-04-25 17:21:11 +02:00
Dimitri Fontaine
4d6def8105 Move some MySQL old import/export functions apart... 2014-03-04 13:52:48 +01:00
Dimitri Fontaine
db947e1467 Rework reader and writer data exchange.
With this patch, the whole data massaging and final formating into the
PostgreSQL COPY TEXT format is done by the reader thread, which publishes a
batch at a time in the communication channel: a lparallel.queue object.

Before that, the raw vectors where pushed directly in the queue, offering
more flexibility to adjust to the reader and writer IO rates and
capabilities, but impeding the ability of the Garbage Collector: data still
in the queue was not collected even if not needed anymore.

The new model also uses less memory, and allows a better control over what
amount of data stays in memory. The new *concurrent-batches* parameter
should be key to being able to process huge rows.

The intent is to offering a way for the users to tune *concurrent-batches*
down to 1 for sources with massive per-row memory footprint. Even better
would be to find a way to automatically adjust the setting without spending
too much time counting the bytes we're batching.

Preliminary tests show no sensible impact on performances from this patch,
even some improvements in cases.
2014-01-25 23:54:49 +01:00
Dimitri Fontaine
a51a712b6a Fix asd dependencies, cleanup useless and misplaced compilation options. 2014-01-21 14:37:26 +01:00
Dimitri Fontaine
2080d91e40 Fix dependency declarations in between files, should help with #19. 2014-01-02 23:48:57 +01:00
Dimitri Fontaine
17b366ca82 Create a website to present the software. 2014-01-02 23:25:23 +01:00
Dimitri Fontaine
b2c9e0d2dc Refactor the whole logging infrastructure not to depend on threads sharing streams. 2013-12-24 19:08:55 +01:00
Dimitri Fontaine
f02eb641b4 Switch from cl-mysql to qmynd, an all-lisp driver for MySQL. 2013-12-03 22:05:39 +01:00
Dimitri Fontaine
3486cc688f Looks like I forgot to add fixed.lisp in the asd system definitions. 2013-11-08 21:50:40 +01:00
Dimitri Fontaine
5ce5d53d7d Use trivial-backtrace to display more useful information in case of unexpected events, hopefully. 2013-11-07 20:14:06 +01:00
Dimitri Fontaine
6a75187b7d Refactor MySQL to use the new API. 2013-11-04 19:16:08 +01:00
Dimitri Fontaine
0a38195853 Refactoring the API with a real definition of it, and reorg the source tree. 2013-11-04 13:21:45 +01:00
Dimitri Fontaine
50114a0d3a Hack-in some support for SQLite data source, including some refactoring preps. 2013-10-24 00:21:46 +02:00
Dimitri Fontaine
ffebcf3bc7 Clean out the code by splitting away a bunch of PostgreSQL related facilities. 2013-10-21 22:35:22 +02:00
Dimitri Fontaine
fb818ee0e3 Move sources into their own subdirectory, assorted cleaning. 2013-10-20 19:09:09 +02:00
Dimitri Fontaine
6d27d28287 Implement a converter from old .INI syntax to current commands. 2013-10-12 23:59:28 +02:00
Dimitri Fontaine
2bf7c4df12 Assorted clean up to prepare a binary image. 2013-10-03 17:42:09 +02:00
Dimitri Fontaine
2ff0d11332 Fix a typo in the com.informatimago.clext ASD dependency declaration. 2013-09-30 17:31:28 +02:00