pgloader

mirror of https://github.com/dimitri/pgloader.git synced 2026-02-14 10:51:03 +01:00

Author	SHA1	Message	Date
Dimitri Fontaine	5e18cfd7d4	Implement support for partial indexes. MS SQL has a notion of a "filtered index" that matches the notion of a PostgreSQL partial index: the index only applies to the rows matching the index WHERE clause, or filter. The WHERE clause in both case are limited to simple expressions over a base table's row at a time, so we implement a limited WHERE clause parser for MS SQL filters and a transformation routine to rewrite the clause in PostgreSQL slang. In passing, we transform the filter constants using the same transformation functions as in the CAST rules, so that e.g. a MS SQL bit(1) value that got transformed into a PostgreSQL boolean is properly translated, as in the following example: MS SQL: "([deleted]=(0))" (that's from the catalogs) PostgreSQL: deleted = 'f' Of course the parser is still very badly tested, let's see what happens in the wild now. (Should) Fix #365.	2016-03-21 23:39:45 +01:00
Dimitri Fontaine	c724018840	Implement ALTER TABLE clause for MySQL migrations. The new ALTER TABLE facility allows to act on tables found in the MySQL database before the migration happens. In this patch the only provided actions are RENAME TO and SET SCHEMA, which fixes #224. In order to be able to provide the same option for MS SQL users, we will have to make it work at the SCHEMA level (ALTER SCHEMA ... RENAME TO ...) and modify the internal schema-struct so that the schema slot of our table instances are a schema instance rather than its name. Lacking MS SQL test database and instance, the facility is not yet provided for that source type.	2016-03-06 21:51:33 +01:00
Dimitri Fontaine	782561fd4e	Handle default value transforms errors, fix #333 . It turns out that MySQL catalog always store default value as strings even when the column itself is of type bytea. In some cases, it's then impossible to transform the expected bytea from a string. In passing, move some code around to fix dependencies and make it possible to issue log warnings from the default value printing code.	2016-02-03 12:27:58 +01:00
Dimitri Fontaine	aa8b756315	Fix when to create indexes. In the recent refactoring and improvements of parallelism the indexes creation would kick in before we know that the data is done being copied over to the target table. Fix that by maintaining a writers-count hashtable and only starting to create indexes when that count reaches zero, meaning all the concurrent tasks started to handle the COPY of the data are now done.	2016-01-16 19:50:21 +01:00
Dimitri Fontaine	8a596ca933	Move connection into utils. There's no reason why this file should be in the src/ top-level.	2016-01-07 16:42:43 +01:00
Dimitri Fontaine	9e4938cea4	Implement PostgreSQL catalogs data structure. In order to share more code in between the different source types, finally have a go at the quite horrible mess of anonymous data structures floating around. Having a catalog and schema instances not only allows for code cleanup, but will also allow to implement some bug fixes and wishlist items such as mapping tables from a schema to another one. Also, supporting database sources having a notion of "schema" (in between "catalog" and "table") should get easier, including getting on-par with MySQL in the MS SQL support (materialized views has been asked for already). See #320, #316, #224 for references and a notion of progress being made. In passing, also clean up the copy-databases methods for database source types, so that they all use a fetch-metadata generic function and a prepare-pgsql-database and a complete-pgsql-database generic function. Actually, a single method does the job here. The responsibility of introspecting the source to populate the internal catalog/schema representation is now held by the fetch-metadata generic function, which in turn will call the specialized versions of list-all-columns and friends implementations. Once the catalog has been fetched, an explicit CAST call is then needed before we can continue. Finally, the fields/columns/transforms slots in the copy objects are still being used by the operative code, so the internal catalog representation is only used up to starting the data copy step, where the copy class instances are then all that's used. This might be refactored again in a follow-up patch.	2015-12-30 21:53:01 +01:00
Dimitri Fontaine	cca44c800f	Simplify batch and transformation handling. Make batches of raw data straight from the reader output (map-rows) and have the transformation worker focus on changing the batch content from raw rows to copy strings. Also review the organisation of responsabilities in the code, allowing to move queue.lisp into utils/batch.lisp, renaming it as its scope has been reduced to only care about preparing batches. This came out of trying to have multiple workers concurrently processing the batches from the reader and feeding the hardcoded 2 COPY workers, but it failed for multiple reasons. All is left as of now is this cleanup, which seems to be on the faster side of things, which is always good.	2015-11-29 17:35:25 +01:00
Dimitri Fontaine	e23de0ce9f	Improve SQLite table names filtering. Filter the list of tables we migrate directly from the SQLite query, avoiding to return useless data. To do that, use the LIKE pattern matching supported by SQLite, where the REGEX operator is only available when extra features are loaded apparently. See #310 where filtering out the view still caused errors in the loading.	2015-11-22 22:10:26 +01:00
Dimitri Fontaine	41e9eebd54	Rationalize common generic API implementation. When devising the common API, the first step has been to implement specific methods for each generic function of the protocol. It now appears that in some cases we don't need the extra level of flexibility: each change of the API has been systematically reported to all the specific methods, so just use a single generic definition where possible. In particular, introduce new intermediate class for COPY subclasses allowing to share more common code in the methods implementation, rather than having to copy/paste and maintain several versions of the same code. It would be good to be able to centralize more code for the database sources and how they are organized around metadata/import-data/complete schema, but it doesn't look obvious how to do it just now.	2015-10-05 21:25:21 +02:00
Dimitri Fontaine	7b9b8a32e7	Move sexp parsing into its own file. After all, it's shared between the CSV command parsing and the Cast Rules parsing. src/parsers/command-csv.lisp still contains lots of facilities shared between the file based sources, will need another series of splits.	2015-10-05 11:39:44 +02:00
Dimitri Fontaine	96a33de084	Review the stats and reporting code organisation. In order to later be able to have more worker threads sharing the load (multiple readers and/or writers, maybe more specialized threads too), have all the stats be managed centrally by a single thread. We already have a "monitor" thread that get passed log messages so that the output buffer is not subject to race conditions, extend its use to also deal with statistics messages. In the current code, we send a message each time we read a row. In some future commits we should probably reduce the messaging here to something like one message per batch in the common case. Also, as a nice side effect of the code simplification and refactoring this fixes #283 wherein the before/after sections of individual CSV files within an ARCHIVE command where not counted in the reporting.	2015-10-05 01:46:29 +02:00
Dimitri Fontaine	eabfbb9cc8	Fix schema qualified table names usage (more). When parsing table names in the target URI, we are careful of splitting the table and schema name and store them into a cons in that case. Not all sources methods got the memo, clean that up. See #182 and #186, a pull request I am now going to be able to accept. Also see #287 that should be helped by being able to apply #186.	2015-09-04 01:06:15 +02:00
Dimitri Fontaine	56a89e9b53	Cleanup schema data structure building. As reported by clisp maintainer (thanks jackdaniel!) when trying to load pgloader, we had redoundant labels function names in places. Get rid of those by pushing the new columns found directly at the end of the list, avoiding the bulky code to then reverse the complex anonymous data structure. The Real Fix™ would be to define proper structures where to hold all those database catalogs representation, but that's an invasive patch and now isn't a good time to write it. At least pgloader should load and run with clisp now.	2015-08-15 23:54:45 +02:00
Dimitri Fontaine	d1fce3728a	Allow more PostgreSQL URI options, fix #199 . As per PostgreSQL documentation on connection strings, allow overriding of main URI components in the options parts, with a percent-encoded syntax for parameters. It allows to bypass the main URI parser limitations as seen in #199 (how to have a password start with a colon?). See: http://www.postgresql.org/docs/9.3/interactive/libpq-connect.html#LIBPQ-CONNSTRING	2015-05-22 23:39:04 +02:00
Dimitri Fontaine	55584406fa	Add encoding support for db3 sources, fix #176 . It appears that db3 files are not limited to the ASCII character encoding that they were designed with, so let's clue pgloader about that. This commit build `770cbe3526` and the pgloader Makefile has been updated to momentarily fetch cl-db3 from github rather than Quicklisp so that it's possible to enjoy the new feature immediately.	2015-02-18 22:40:03 +01:00
Dimitri Fontaine	cd46b6cbed	Clean up common code for sources. Only move code around, creating a src/sources/common directory with several files in there so as to split the too big src/sources.lisp.	2015-01-08 23:17:40 +01:00
Dimitri Fontaine	e1bc6425e2	Implement support for PostgreSQL COPY format, fix #145 . PostgreSQL COPY format is not really CSV but something way easier to parse. Funnily enough, parsing it as CSV is not that easy, so we add here a special simple parser for the COPY format. It should be quite useful too try loading again reject data files from pgloader after manual fixing, too. It's still missing some documentation without any good excuse for that, will add soon.	2015-01-02 18:49:17 +01:00
Dimitri Fontaine	302a7d402b	Refactor connection handling, and clean-up many things. That's the big refactoring patch I've been sitting on for too long. First, refactor connection handling to use a uniformed "connection" concept (class and generic functions API) everywhere, so that the COPY derived objects just use that in their :source-db and :target-db slots. Given that, we don't need no messing around with pgconn and myconn- and other special variables at all anywhere in the tree. Second, clean up some oddities accumulated over time, where some parts of the code didn't get the memo when new API got into place. Third, fix any other oddity or missing part found while doing those first two activities, it was long overdue anyway...	2014-12-26 21:50:29 +01:00
Dimitri Fontaine	87e157bee2	Add a new database source type in the parser. Now it's possible to parse a command to load data from MS SQL. The parser was until now parsing all database URI within the same common rule and that isn't possible anymore if we want to distinguish in between source database right from the parser, which we actually want to do. This patch also implement in-passing fixes all over the place, including the transformation function float-to-string that only happened to work on double-float data.	2014-11-17 00:23:06 +01:00
Dimitri Fontaine	fff756f95f	Refactor the command parser. Split its content into separate files, so that each is easier to maintain, and to make it easier also to add support for new sources.	2014-11-16 22:22:04 +01:00
Dimitri Fontaine	03bba5f486	Some more SQL Server support (schema conversion). Converting the table definitions (with type casting) seems to work. Also did experiment a little with actuallt fetching some data... and had to edit the cl-mssql driver, which is temporarily monkey patched.	2014-11-10 01:16:10 +01:00
Dimitri Fontaine	ca325ba799	Refactor the SQLite source files.	2014-11-09 22:59:30 +01:00
Dimitri Fontaine	6473a892d4	First steps toward MS SQL compatibility.	2014-11-09 00:09:42 +01:00
Dimitri Fontaine	ed853a7bea	Allow pgloader to work on windows.	2014-11-06 22:12:20 +01:00
Dimitri Fontaine	3c334dcdc4	Refactor the main parser to use the `bind` macro. The metabang-bind lib offers a nice bind macro that solves the problem of ignoring bindings in destructuring-bind, and allows a let* approach to nested destructuring (wven when mixed with let declarations). Using that lib (that we already indirectly depend on anyway) simplifies the parser code substantially.	2014-10-02 17:05:35 +02:00
Dimitri Fontaine	7cf7e714fc	Implement the source date format option.	2014-10-02 01:03:24 +02:00
Dimitri Fontaine	2369a142a7	Refactor source code organisation. In passing, fix a bug in the previous commit where left-over code would cancel the whole new parsing code for advanced source fields options.	2014-10-01 23:20:24 +02:00
Dimitri Fontaine	422f87e912	We don't use the zip system anymore.	2014-09-10 22:19:59 +02:00
Dimitri Fontaine	3e0526c957	Implement early support for IXF files.	2014-07-14 21:53:50 +02:00
Dimitri Fontaine	807f5cefcd	Fix omitted file dependency (reading queries from file).	2014-06-16 14:24:05 +02:00
Dimitri Fontaine	c3742a9410	Typo fix cl-base64 system's name, fix the fix for #60 .	2014-05-16 23:36:45 +02:00
Dimitri Fontaine	9e12035ca1	Review SQLite blob types in light of "manifest typing", fix #60 . When using SQLite 3, a blob column might return either string of byte vector values dynamically depending on the data itself, or maybe some more complex parameters controlled at data insert time. Hard-code the rule that a blob column returned as a string is in fact base64 encoded (which looks like common practice) and decode it automatically when needed, before sending to byte-vector-to-bytea. It might be a tad slow but at least the data is properly converted. In future, that decision might come and byte us in the back again, at which point it'll be necessary to consider full casting options as in the MySQL CAST rules. It seems like a big enough win for now if we can avoid that.	2014-05-16 23:13:57 +02:00
Dimitri Fontaine	35ca4927e9	Get rid of some lib dependencies. The charset business isn't worth depending on an AGPL licenced lib which is part of a huge Quicklisp system.	2014-04-25 17:21:11 +02:00
Dimitri Fontaine	4d6def8105	Move some MySQL old import/export functions apart...	2014-03-04 13:52:48 +01:00
Dimitri Fontaine	db947e1467	Rework reader and writer data exchange. With this patch, the whole data massaging and final formating into the PostgreSQL COPY TEXT format is done by the reader thread, which publishes a batch at a time in the communication channel: a lparallel.queue object. Before that, the raw vectors where pushed directly in the queue, offering more flexibility to adjust to the reader and writer IO rates and capabilities, but impeding the ability of the Garbage Collector: data still in the queue was not collected even if not needed anymore. The new model also uses less memory, and allows a better control over what amount of data stays in memory. The new concurrent-batches parameter should be key to being able to process huge rows. The intent is to offering a way for the users to tune concurrent-batches down to 1 for sources with massive per-row memory footprint. Even better would be to find a way to automatically adjust the setting without spending too much time counting the bytes we're batching. Preliminary tests show no sensible impact on performances from this patch, even some improvements in cases.	2014-01-25 23:54:49 +01:00
Dimitri Fontaine	a51a712b6a	Fix asd dependencies, cleanup useless and misplaced compilation options.	2014-01-21 14:37:26 +01:00
Dimitri Fontaine	2080d91e40	Fix dependency declarations in between files, should help with #19 .	2014-01-02 23:48:57 +01:00
Dimitri Fontaine	17b366ca82	Create a website to present the software.	2014-01-02 23:25:23 +01:00
Dimitri Fontaine	b2c9e0d2dc	Refactor the whole logging infrastructure not to depend on threads sharing streams.	2013-12-24 19:08:55 +01:00
Dimitri Fontaine	f02eb641b4	Switch from cl-mysql to qmynd, an all-lisp driver for MySQL.	2013-12-03 22:05:39 +01:00
Dimitri Fontaine	3486cc688f	Looks like I forgot to add fixed.lisp in the asd system definitions.	2013-11-08 21:50:40 +01:00
Dimitri Fontaine	5ce5d53d7d	Use trivial-backtrace to display more useful information in case of unexpected events, hopefully.	2013-11-07 20:14:06 +01:00
Dimitri Fontaine	6a75187b7d	Refactor MySQL to use the new API.	2013-11-04 19:16:08 +01:00
Dimitri Fontaine	0a38195853	Refactoring the API with a real definition of it, and reorg the source tree.	2013-11-04 13:21:45 +01:00
Dimitri Fontaine	50114a0d3a	Hack-in some support for SQLite data source, including some refactoring preps.	2013-10-24 00:21:46 +02:00
Dimitri Fontaine	ffebcf3bc7	Clean out the code by splitting away a bunch of PostgreSQL related facilities.	2013-10-21 22:35:22 +02:00
Dimitri Fontaine	fb818ee0e3	Move sources into their own subdirectory, assorted cleaning.	2013-10-20 19:09:09 +02:00
Dimitri Fontaine	6d27d28287	Implement a converter from old .INI syntax to current commands.	2013-10-12 23:59:28 +02:00
Dimitri Fontaine	2bf7c4df12	Assorted clean up to prepare a binary image.	2013-10-03 17:42:09 +02:00
Dimitri Fontaine	2ff0d11332	Fix a typo in the com.informatimago.clext ASD dependency declaration.	2013-09-30 17:31:28 +02:00

1 2

68 Commits