pgloader

mirror of https://github.com/dimitri/pgloader.git synced 2025-08-08 15:27:00 +02:00

Author	SHA1	Message	Date
Dimitri Fontaine	6ca376ef9b	Simplify the main function (refactor). Move some code away in its own function for easier review and modifications of the main entry point.	2015-11-16 16:01:25 +01:00
Dimitri Fontaine	da05782002	Allow date formats to miss time parts. In case the seconds field are not provided just use "00" rather than NIL as currently...	2015-11-14 21:14:20 +01:00
Dimitri Fontaine	6cbec206af	Turns out SSL key/crt file paths should be strings. Our PostgreSQL driver uses CFFI to load the SSL support from open ssl and as a result the certificate and key file names should be strings rather than pathnames. Should fix #308 again...	2015-11-11 23:10:29 +01:00
Dimitri Fontaine	f8ae9f22b9	Implement support for SSL client certificates. This fixes #308 by automatically using the PostgreSQL Client Side SSL files as documented in the following reference: http://www.postgresql.org/docs/current/static/libpq-ssl.html#LIBPQ-SSL-FILE-USAGE This uses the Postmodern special support for it. Unfortunately couldn't test it locally other than it doesn't break non-ssl connections. Pushing to have user feedback.	2015-11-09 11:32:17 +01:00
Dimitri Fontaine	e3cc76b2d4	Export copy-column-list from pgloader.sources. That allows the copy-column-list specific method for MySQL to be a method of the common pgloader.sources::copy-column-list generic function, and then to be called again when needed. This fixes an oversight in #41e9eeb and fixes #132 again.	2015-11-08 18:48:54 +01:00
Dimitri Fontaine	042045c0a6	Clozure already provides a getenv function. Trying to provide a new one fails with an error, that I missed because I must have forgotten to `make clean` when adding the previous #+ccl variant here... This alone doesn't allow to fix building for CCL but already improves the situation as reported at #303. Next failure is something I fail to understand tonight: Fatal SIMPLE-ERROR: Compilation failed: In MAKE-DOUBLE-FLOAT: Type declarations violated in (THE FIXNUM 4294967295) in /Users/dim/dev/pgloader/build/quicklisp/local-projects/qmynd/src/common/utilities.lisp	2015-11-08 18:26:58 +01:00
Dimitri Fontaine	3673c5c341	Add a Travis CI Build badge.	2015-10-31 17:48:30 +01:00
Dimitri Fontaine	20ce095384	Merge pull request #305 from gitter-badger/gitter-badge Add a Gitter chat badge to README.md	2015-10-31 17:46:20 +01:00
The Gitter Badger	56a60db146	Add Gitter badge	2015-10-31 16:40:52 +00:00
Dimitri Fontaine	478d24f865	Fix root-dir initialization for ccl, see #303 . When using Clozure Common Lisp apparently a :absolute directory component for make-pathname is supposed to contain a single path component, fix by using parse-native-namestring instead. In case it's needed, the following spelling seems portable enough: CL-USER> (uiop:merge-pathnames* (uiop:make-pathname* :directory '(:relative "pgloader")) (uiop:make-pathname* :directory '(:absolute "tmp"))) #P"/tmp/pgloader/"	2015-10-24 22:22:24 +02:00
Dimitri Fontaine	4df3167da1	Introduce another worker thread: transformers. We used to have a reader and a writer cooperating concurrently into loading the data from the source to PostgreSQL. The tranformation of the data was then the responsibility of the reader thread. Measurements showed that the PostgreSQL processes were mostly idle, waiting for the reader to produce data fast enough. In this patch we introduce a third worker thread that is responsible for processing the raw data into pre-formatted batches, allowing the reader to focus on extracting the data only. We now have two lparallel queues involved in the processing, the raw queue contains the vectors of raw data directly, and the processed-queue contains batches of properly encoded strings for the COPY text protocol. On the test laptop the performance gain isn't noticeable yet, it might be that we need much larger data sets to see a gain here. At least the setup isn't detrimental to performances on smaller data sets. Next improvements are going to allow more features: specialized batch retry thread and parallel table copy scheduling for database sources. Let's also continue caring about performances and play with having several worker and writer threads for each reader. In later patches. And some day, too, we will need to make the number of workers a user defined variable rather than something hard coded as today. It's on the todo list, meanwhile, dear user, consider changing the (make-kernel 6) into (make-kernel 12) or something else in src/sources/mysql/mysql.lisp, and consider enlighting me with whatever it is you find by doing so!	2015-10-23 00:17:58 +02:00
Dimitri Fontaine	4f3b3472a2	Cleanup and timing display improvements. Have each thread publish its own start-time so that the main thread may compute time spent in source and target processing, in order to fix the crude hack of taking (max read-time write-time) in the total time column of the summary. We still have some strange artefacts here: we consider that the full processing time is bound to the writer thread (:target), because it needs to have the reader done already to be able to COPY the last batch... but in testing I've seen some :source timings higher than the :target ones... Let's solve problems one at a time tho, I guess multi-threading and accurate wall clock times aren't to be expected to mix and match that easily anyway (multi cores, single RTC and all that).	2015-10-22 22:36:40 +02:00
Dimitri Fontaine	933d1c8d6b	Add test case for #302 .	2015-10-22 22:35:32 +02:00
Dimitri Fontaine	88bb4e0b95	Register "auto_increment" as a SQLite noise word. As seen in #302 it's possible to define a SQLite column of type "integer auto_increment". In my testing tho, it doesn't mean a thing. Worse than that, apparently when an integer column is created that is also used as the primary key of the table, the notation "integer auto_increment primary key" disables the rowid behavior that is certainly expected. Let's not yet mark the bug as fixed as I suppose we will have to do something about this rowid mess. Thanks again SQLite.	2015-10-22 21:55:34 +02:00
Dimitri Fontaine	f654f10d0d	Fix CSV summary format string. This got broken when adding read/write separate stats in the reporting.	2015-10-20 23:56:20 +02:00
Dimitri Fontaine	2ed14d595d	Trick the reporting to show non-zero timings. When calling lparallel:receive-results from the main threads we loose the ability to measure proper per-table processing times, because it all happens in parallel and the main threads seems to receive events in the same millisecond as when the worker is started, meaning it's all 0.0s. So when we don't have "secs" stats, pick the greatest of read or write time, which we do have from the worker threads themselves. The number are still wrong, but less so than the "0.0s" displayed before this patch.	2015-10-20 23:39:15 +02:00
Dimitri Fontaine	69b8b0305d	Fix reporting in case of missing values.	2015-10-20 23:24:03 +02:00
Dimitri Fontaine	1fb69b2039	Retry connecting to PostgreSQL in some cases. Now that we can setup many concurrent threads working against the PostgreSQL database, and before we open the number of workers to our users, install an heuristic to manage the PostgreSQL error classes “too many connection” and “configuration limit exceeded” so that pgloader waits for some time (retry-connect-delay) then tries connecting again. It's quite simplistic but should cover lots of border-line cases way more nicely than just throwing the interactive debugger at the end user.	2015-10-20 23:15:05 +02:00
Dimitri Fontaine	633067a0fd	Allow more parallelism in database migrations. The newly added statistics are showing that read+write times are not enough to explain how long we wait for the data copying, so it must be the workers setup rather than the workers themselves. From there, let lparallel work its magic in scheduling the work we do in parallel in pgloader: rather than doing blocking receive-result calls for each table, only receive-result at the end of the whole copy-database processing. On test data here on the laptop we go from 6s to 3s to migrate the sakila database from MySQL to PostgreSQL: that's because we have lots of very small tables, so the cost of waiting after each COPY added up quite quickly. In passing, stop sharing the same connection object in between parallel workers that used to be controlled active in-sequence, see the new API clone-connection (which takes over new-pgsql-connection).	2015-10-20 22:15:55 +02:00
Dimitri Fontaine	187565b181	Add read/write separate stats. Add metrics to devise where the time is spent in current pgloader code so that it's possible to then optimize away the batch processing as we do it today. Given the following extract of the measures, it seems that doing the data transformations in the reader thread isn't so bright an idea. More to come. table name total time read write ----------------- -------------- --------- --------- extract 2.014s before load 0.050s fetch 0.000s ----------------- -------------- --------- --------- geolite.location 16.090s 15.933s 5.732s geolite.blocks 28.896s 28.795s 5.312s ----------------- -------------- --------- --------- after load 37.772s ----------------- -------------- --------- --------- Total import time 1m25.082s 44.728s 11.044s	2015-10-11 21:35:19 +02:00
Dimitri Fontaine	c3726ce07a	Refrain from starting the logger twice in load-data.	2015-10-05 21:27:48 +02:00
Dimitri Fontaine	41e9eebd54	Rationalize common generic API implementation. When devising the common API, the first step has been to implement specific methods for each generic function of the protocol. It now appears that in some cases we don't need the extra level of flexibility: each change of the API has been systematically reported to all the specific methods, so just use a single generic definition where possible. In particular, introduce new intermediate class for COPY subclasses allowing to share more common code in the methods implementation, rather than having to copy/paste and maintain several versions of the same code. It would be good to be able to centralize more code for the database sources and how they are organized around metadata/import-data/complete schema, but it doesn't look obvious how to do it just now.	2015-10-05 21:25:21 +02:00
Dimitri Fontaine	0d9c2119b1	Send one update-stats message per batch. Update the stats used to be a quite simple incf and doing it once per read row was good enough, but now that it involves sending a message to the monitor thread let's only send a message per batch, reducing the communication load here.	2015-10-05 18:04:08 +02:00
Dimitri Fontaine	6bf26c52ec	Implement a TimeZone option for IXF loading. The local-time:encode-timestamp function takes a default timezone and it is necessary to have control over it when loading from pgloader. Hence, add a timezone option to the IXF option list, that is now explicit and local to the IXF parser rather than shared with the DBF option list.	2015-10-05 16:46:15 +02:00
Dimitri Fontaine	7b9b8a32e7	Move sexp parsing into its own file. After all, it's shared between the CSV command parsing and the Cast Rules parsing. src/parsers/command-csv.lisp still contains lots of facilities shared between the file based sources, will need another series of splits.	2015-10-05 11:39:44 +02:00
Dimitri Fontaine	f1df6ee89a	Forgot a new file, thanks Travis. Someday I will learn not to code that late at night.	2015-10-05 02:21:59 +02:00
Dimitri Fontaine	c880f86bb6	Fix user defined casting rules. Commit `598c860cf5` broke user defined casting rules by interning "precision" and "scale" in the pgloader.user-symbols package: those symbols need to be found in the pgloader.transforms package instead. Luckily enough the infrastructure to do that was already in place for cl:nil.	2015-10-05 02:11:23 +02:00
Dimitri Fontaine	96a33de084	Review the stats and reporting code organisation. In order to later be able to have more worker threads sharing the load (multiple readers and/or writers, maybe more specialized threads too), have all the stats be managed centrally by a single thread. We already have a "monitor" thread that get passed log messages so that the output buffer is not subject to race conditions, extend its use to also deal with statistics messages. In the current code, we send a message each time we read a row. In some future commits we should probably reduce the messaging here to something like one message per batch in the common case. Also, as a nice side effect of the code simplification and refactoring this fixes #283 wherein the before/after sections of individual CSV files within an ARCHIVE command where not counted in the reporting.	2015-10-05 01:46:29 +02:00
Dimitri Fontaine	bc9d2d8962	Monitor events are now structures. This allows to use typecase to dispatch events in the main loop and avoid using destructuring-bind, as we now have properly type events.	2015-10-04 18:55:10 +02:00
Dimitri Fontaine	38a725fe74	Add support for IXF blobs to bytea. A quick test shows it should work, so push that too.	2015-09-24 17:47:57 +02:00
Dimitri Fontaine	bd44e6423b	Add support for IXF CLOB data type. In passing, fix the condition message to read the unknown IXF data types as decimal numbers (rather than hexadecimal) as they are documented that way in the IBM reference documentation.	2015-09-23 23:15:48 +02:00
Dimitri Fontaine	598c860cf5	Improve user code parsing, fix #297 . To be able to use "t" (or "nil") as a column name, pgloader needs to be able to generate lisp code where those symbols are available. It's simple enough in that a Common Lisp package that doesn't :use :cl fullfills the condition, so intern user symbols in a specially crafted package that doesn't :use :cl. Now, we still need to be able to run transformation code that is using the :cl package symbols and the pgloader.transforms functions too. In this commit we introduce a heuristic to pick symbols either as functions from pgloader.transforms or anything else in pgloader.user-symbols. And so that user code may use NIL too, we provide an override mechanism to the intern-symbol heuristic and use it only when parsing user code, not when producing Common Lisp code from the parsed load command.	2015-09-21 13:23:21 +02:00
Dimitri Fontaine	fe812061c4	Update README file for build instructions, fix #296 . When building from source you should really build from current's HEAD in git master branch... In passing, comment out the --self-update paragraph as it's know to be broken unless you still have all the source dependencies at the right place for ASDF to find them... making the feature developer only.	2015-09-17 20:27:55 +02:00
Dimitri Fontaine	f6aa8210b9	Load libsybdb even in --dry-run mode, fix #295 .	2015-09-16 20:39:42 +02:00
Dimitri Fontaine	78c6bf097a	Fix the build again. Once more did I change a test file data and forgot to commit the changes to the expected file of the regression test.	2015-09-12 00:40:15 +02:00
Dimitri Fontaine	98f18c4877	Improve CSV date format, fix #293 . The date format wouldn't allow using colon (:) in the noise parts of it, and would also insist that milliseconds should be on 4 digits and micro seconds on 6 digits. Allow for "ragged" input and take however many digits we actually find in the input.	2015-09-12 00:35:14 +02:00
Dimitri Fontaine	a195ac6dd4	Adapt to the new cl-ixf API. This allows fixing bugs in processing the IXF files, which pgloader directly benefits from.	2015-09-12 00:19:02 +02:00
Dimitri Fontaine	a0dc59624c	Fix schema qualified table names usage again. When the list of columns of the PostgreSQL target table isn't given in the load command, pgloader will happily query the system catalogs to get that information. The list-columns query didn't get the memo about the qualified table name format and the with-schema macro... fix #288.	2015-09-11 11:53:28 +02:00
Dimitri Fontaine	e054eb3838	Travis: set PGTZ in regress.sh The TimeZone parameter should be set both for input and for output in order to match our expected result file. Let's try to set PGTZ in the shell environment...	2015-09-07 20:24:00 +02:00
Dimitri Fontaine	60d58a96b8	Travis: let's try to force client timezone. The cvs-parse-date test is failing on Travis because the server up there in the Cloud isn't using the same timezone as my local machine. Let's just force the timezone in the SET clause...	2015-09-07 20:00:05 +02:00
Dimitri Fontaine	3f539b7384	Travis: update expected output file. Forgot to update the expected output file in the previous commit, which Travis is rightfully complaining about...	2015-09-07 17:47:03 +02:00
Dimitri Fontaine	04b2779239	Allow date format parsing to support time. A useful use case for date parsing at tine input level is to parse time (hour, minutes, seconds) rather than a full date (timestamp). Improve the code so that it's possible to use the date format facility even when the data field lacks the year/month/day information. Fix #288.	2015-09-07 17:05:10 +02:00
Dimitri Fontaine	bd50ba45ea	Make it easier to contact me from the moral license.	2015-09-06 22:00:02 +02:00
Dimitri Fontaine	769a4c579f	Preparing next pgloader release (3.2.2).	2015-09-06 21:14:24 +02:00
Dimitri Fontaine	87698cf114	Fix MS SQL int conversion, fix #282 . This is a blind fix that I hope is the one needed, wherein -1 get sent to unsigned-to-sign which fails to handle it as expected. The protocol and driver are then sending unsigned ints and that's what we now handle in the CFFI layer. This allows to push the attempt to ease testing of it, and allows me also to cut a release without modified files handling around.	2015-09-06 21:11:14 +02:00
Dimitri Fontaine	fce44a3f49	Update website, download, sponsors, license. The download section now explains why binaries are not to be found here anymore. Also we add Redpill Linpro as a sponsor, and we add a pgloader Moral License page in the spirit of the Varnish License. Let's see what happens with that.	2015-09-06 21:05:00 +02:00
Dimitri Fontaine	930b540aff	Default to source table name in IXF loading. The cleanup introduced by `eabfbb9cc8` did break the case when we lack a target table name for the IXF load. Just default to using the source name in that case.	2015-09-04 14:15:48 +02:00
Dimitri Fontaine	33fc3add25	Handle default values particularities for IXF. It turns out that IXF format might embed particulars of the systems the data comes from (Informix or DB2) and we need to process some strings so that they are compatible with PostgreSQL ("CURRENT TIMESTAMP" here). See #272 that should be fixed here, but being in the blind, don't close the issue just yet.	2015-09-04 12:25:11 +02:00
Dimitri Fontaine	e76a4c8f5a	Error early when failing to match IXF type. Our IXF type matching appears to be incomplete, and pgloader would then happily try to create a table column of type NIL in PostgreSQL: prevent that from happening by raising an error condition early. Fix #289.	2015-09-04 12:17:57 +02:00
Victor Kryukov	792db2fcf4	Better regexp for PG SQL identifiers	2015-09-04 01:15:45 +02:00

... 11 12 13 14 15 ...

1568 Commits