pgloader

mirror of https://github.com/dimitri/pgloader.git synced 2025-08-11 16:57:00 +02:00

Author	SHA1	Message	Date
Dimitri Fontaine	cf6182fafa	Add a notice message with guessed parameters. We might have to help users debug our decision, and I expect we will have to improve our guess “engine” here.	2017-07-07 02:34:23 +02:00
Dimitri Fontaine	471f2b6d88	Implement automagic guessing of CSV parameters. As we know how many columns we expect from the input file, it's possible to read a sample (10 lines as of this patch) and try many different CSV reader parameters combinations until we find one that works: it returns the right number of fields. It is still possible of course to specify parameters on the command line or in a load file if necessary, but it makes the simple case even simpler. As simple as: pgloader file.csv pgsql:///pgloader?tablename=target	2017-07-07 02:16:53 +02:00
Dimitri Fontaine	14e1830b77	Fix CLI insistance of --field. From a load file, as soon as pgloader can retrieve the schema of the target table the source field list defaults to the target column list. Let's apply the same rules to the command line.	2017-07-07 01:00:55 +02:00
Dimitri Fontaine	154c74f85e	Update online docs with new release. The docs/ directory goes to http://pgloader.io.	2017-07-06 17:07:55 +02:00
Dimitri Fontaine	64959595fc	Back to development release in the master's branch.	2017-07-06 16:55:56 +02:00
Dimitri Fontaine	d71da6ba66	Release pgloader 3.4.1	2017-07-06 16:53:29 +02:00
Adrian Vondendriesch	058f9d5451	Debian (#578 ) * debian: Bump compat version to 9. * debian: Bump Standards-Version to 3.9.8	2017-07-06 15:38:14 +02:00
Dimitri Fontaine	7a371529be	Implement "drop indexes" option for MySQL and MSSQL too. It was only offered for SQLite without good reason really, and tests show that it works as well with MySQL of course. Offer the option there too. See `3eab88b144` for details.	2017-07-06 10:06:03 +02:00
Dimitri Fontaine	2363d8845f	Fix create schema handling in data only scenarios. In `b301aa9394` the "create schema" default changed to true, which is a good idea. As a consequence pgloader should consider this operation only when "create tables" is set: we don't want to start with creating target schemas in a target database that is said to be ready to host the data.	2017-07-06 09:48:03 +02:00
Dimitri Fontaine	dfe5c38185	Fix quoting policy in PostgreSQL ddl formating. We already have apply-identifier-case and identifier-case to decide how and when to quote our SQL object names, so don't force extra quotes in format string: refrain from using ~s.	2017-07-06 09:47:48 +02:00
Dimitri Fontaine	9da012ca51	Fix identifiers quoting when reading PostgreSQL catalogs. We sure can trust PostgreSQL to use names it knows how to handle. Still, it will be happy to store in its catalogs names containing upper case, and in that case we must quote them.	2017-07-06 03:16:06 +02:00
Dimitri Fontaine	e87477ed31	Restrict condition handling to relevant conditions. In md-methods copy-database function, don't pretend we are able to handle any condition when preparing the PostgreSQL schema, database-error is all we are dealing with there really.	2017-07-06 03:16:05 +02:00
Dimitri Fontaine	d3d40cd47d	Have git ignore local desktop files.	2017-07-06 03:16:05 +02:00
Dimitri Fontaine	e37cb3a9e7	Split SQL queries into their own files. This change was long overdue. Ideally we would use something like the YeSQL library for Clojure, but it seems like the cl-yesql equivalent is not ready yet, and it depends on an experimental build system... So this patch introduces an URL abstraction built on-top of a hash table. You can then reference src/pgsql/sql/list-all-columns.sql as (sql "pgsql/list-all-columns.sql") in the source code directly. So for now the templating system is CL's format language. It is still an improvement from embedded string. Again, one step at a time.	2017-07-06 03:16:05 +02:00
Dimitri Fontaine	d50ed64635	Defensive programming, after though. It might be that a column-type-name is actually an sqltype instance, and then #'string= won't be happy. Prevent that now with discarding any smarts when the type name does not satisfies stringp.	2017-07-06 00:59:36 +02:00
Dimitri Fontaine	26d372bca3	Implement support for non-btree indexes (e.g. MySQL spatial keys). When pgloader fetches the index list from a source database, it doesn't fetch information about access methods for the indexes: I don't even know if the overlap in between index access methods from one RDMBS to another covers more than just btree... It could happen that MySQL indexes a "geometry" column tho. This datatype is converted automatically to "point" by pgloader, which is good. But the index creation would fail with the following error message: Database error 42704: data type point has no default operator class for access method "btree" In this patch when setting up the target schema we issue a PostgreSQL catalog query to dynamically list those datatypes without btree support and fetch their opclasses, with an hard-coded preference to GiST, then GIN, so as to be able to automatically use the proper access method when btree isn't available. And now pgloader transparently issues the proper statement: CREATE INDEX idx_168468_idx_location ON pagila.address USING gist(location); Currently this exploration is limited to indexes with a single column. To implement the general case we would need a more complex lookup: we would have to find the intersection of all the supported access methods for all involved columns. Of course we might need to do that someday. One step at a time is plenty good enough tho.	2017-07-06 00:42:43 +02:00
Dimitri Fontaine	8405c331a9	Error handling improvements for PostgreSQL schema. In the complete PostgreSQL schema step, an error would be logged as you expect but poorly handled: it would have the whole transaction rolled back, meaning that a single Primary Key definition failure would cancel all the others, plus the foreign keys, and also the triggers and comments. It happens that other systems allow a primary column to contain NULL values, which is forbidden in the standard and enforced by PostgreSQL, so that's not a theoritical concern here.	2017-07-05 17:53:33 +02:00
Dimitri Fontaine	bae40d40c3	Fix identifier quoting corner cases. In cases when pgloader needs to build a new identifer from existing ones (mainly for renaming indexes, because they are unique per-table in the source database and unique per-schema in PostgreSQL), and we compose the new name from already quoted strings, pgloader was doing the wrong thing. Fix that by having a build-identifier function that may unquote parts then re-quote properly (if needed) the new identifier.	2017-07-05 15:37:21 +02:00
Dimitri Fontaine	f6cb428c6d	Check empty strings in DB3 numeric fields. Another blind attempt at fixing pgloader from a bug report on gitter, see	2017-07-04 23:15:47 +02:00
Dimitri Fontaine	652e435843	Only catch thread errors in pgloader-image. In the REPL we're going to have all errors pop in the interactive debugger, and that should be what we want...	2017-07-04 01:55:27 +02:00
Dimitri Fontaine	3f7853491f	Refactor PostgreSQL error handling. The code was too complex and the transaction / connection handling wasn't good enough, too many reconnections when a ROLLBACK; is all we need to be able to continue our processing. Also fix some stats counters about errors handled, and improve error message by adding PostgreSQL explicitely, and the name of the table where the error comes from.	2017-07-04 01:41:08 +02:00
Dimitri Fontaine	3eab88b144	Add a new "drop indexes" option for databases. This allows to use a combination of "data only, drop indexes" so that when the target database already exists, pgloader will use the existing schema and still DROP INDEX before loading the data and do the CREATE INDEX dance in parallel and all at the end of it. Also, as I couldn't reproduce neither #539 (which is good, it's supposed to be fixed now) nor #550 (that was open due to a regression): fixes #550.	2017-07-04 00:15:58 +02:00
Dimitri Fontaine	fc01c7acc9	Fix DBF handling of "empty" date strings. Blind code a fix for an error when parsing empty date strings in a DBF file. The small amount of information is surprising, I can't quite figure out which input string can produce " - - " with the previous coding of db3-date-to-pgsql-date. Anyway, it seems easy enough to add some checks to a very optimistic function and return nil when our checks aren't met. Fixes #589, hopefully.	2017-06-30 13:37:36 +02:00
Dimitri Fontaine	1e436555a8	Refactor PostgreSQL conditions. Use a single deftype postgresql-unavailable rather than copy/pasting the same list of conditions in several places.	2017-06-29 14:08:52 +02:00
Dimitri Fontaine	60c1146e18	Assorted fixes. Refrain from killing the Common Lisp image when doing interactive regression testing if we typo'ed the regression test file name...	2017-06-29 12:35:40 +02:00
Dimitri Fontaine	cea82a6aa8	Reconnect to PostgreSQL in case of connection lost. It may happen that PostgreSQL is restarted while pgloader is running, or that for some other reason we lose the connection to the server, and in most cases we know how to gracefully reconnect and retry, so just do so. Fixes #546 initial report.	2017-06-29 12:34:34 +02:00
Dimitri Fontaine	f0d1f4ef8c	Fix reduce usage with max function. The (reduce #'max ...) requires an initial value to be provided, as the max function wants at least 1 argument, as we can see here: CL-USER> (handler-case (reduce #'max nil) (condition (e) (format t "~a" e))) Too few arguments in call to #<Compiled-function MAX #x300000113C2F>: 0 arguments provided, at least 1 required.	2017-06-28 16:37:27 +02:00
Dimitri Fontaine	17a63e18ed	Review "main" error handling. The "main" function only gets used at the command line, and errors where not cleanly reported to the users. Mainly because I almost never get to play with pgloader that way, prefering a load command file and the REPL environment, but that's not even acceptable as an excuse. Now the binary program should be able to exit cleanly in all situations. In testing, it may happens on unexpected erroneous situations that we quit before printing all the messages in the monitoring queue, but at least now we quit cleanly and with a non-zero exit status. Fix #583.	2017-06-28 16:36:08 +02:00
Dimitri Fontaine	0549e74f6d	Implement multiple reader per table for MySQL. Experiment with the idea of splitting the read work in several concurrent threads, where each reader is reading portions of the target table, using a WHERE id <= x and id > y clause in its SELECT query. For this to kick-in a number of conditions needs to be met, as described in the documentation. The main interest might not be faster queries to overall fetch the same data set, but better concurrency with as many readers as writters and each couple its own dedicated queue.	2017-06-28 16:23:18 +02:00
Dimitri Fontaine	6d66280fa5	Review parallelism and memory behavior. The previous patch made format-vector-row allocate its memory in one go rather than byte after byte with vector-push-extend. In this patch we review our usage of batches and parallelism. Now the reader pushes each row directly to the lparallel queue and writers concurrently consume from it, cook batches in COPY format, and then send that chunk of data down to PostgreSQL. When looking at runtime profiles, the time spent writing in PostgreSQL is a fraction of the time spent reading from MySQL, so we consider that the writing thread has enough time to do the data mungling without slowing us down. The most interesting factor here is the memory behavor of pgloader, which seems more stable than before, and easier to cope with for SBCL's GC. Note that batch concurrency is no more, replaced by prefetch rows: the reader thread no longer build batches and the count of items in the reader queue is now a number a rows, not of batches of them. Anyway, with this patch in I can't reproduce the following issues: Fixes #337, Fixes #420.	2017-06-27 23:10:33 +02:00
Dimitri Fontaine	7f737a5f55	Reduce memory allocation in format-vector-row. This function is used on every bit of data we send down to PostgreSQL, so I have good hopes of reducing its memory allocation having an impact on loading times. In particular for sizeable data sets.	2017-06-27 15:31:49 +02:00
Dimitri Fontaine	46d6f339df	Add a user friendly message about what's happening... Still in the abnormal termination case. pgloader might get stuck and if the user knows it's waiting for threads to complete, they might be less worried about the situation and opportunity to kill pgloader...	2017-06-27 11:19:07 +02:00
Dimitri Fontaine	2341ef195d	Review abnormal termination code path. In case of an exceptional condition leading to termination of the pgloader program we tried to use log-message after the monitor should have been closed. Also the 0.3s delay to let latests messages out looks like a poor design. This patch attempts to remedy both the situation: refrain from using a closed down monitoring thread, and properly wait until it's done before returning to the shell. See #583.	2017-06-27 10:45:54 +02:00
Dimitri Fontaine	352f4adc8d	Implement support for MySQL SET parameters. pgloader had support for PostgreSQL SET parameters (gucs) from the beginning, and in the same vein it might be necessary to tweak MySQL connection parameters, and allow pgloader users to control them. See #337 and #420 where net_read_timeout and net_write_timeout might need to be set in order to be able to complete the migration, due to high volumes of data being processed.	2017-06-27 10:00:47 +02:00
Otheus	b5a593af14	Update INSTALL.md (#585 ) Add instructions for redhat/centos7	2017-06-21 21:27:27 +02:00
Dimitri Fontaine	a222a82f66	Improve docs on pgloader.io. In the SQLite and MySQL cases, expand on the simple case before detailing the command language. With our solid defaults, most times a single command line with the source and target connection strings are going to be all you need.	2017-06-20 16:24:25 +02:00
Dimitri Fontaine	cae86015a0	Let's be more specific about the license. Upon GitHub's suggestion, add a LICENSE file to make it clear we are using The PostgreSQL License. Assign the copyright to The PostgreSQL Global Development Group as it's done for PostgreSQL, as it seems to be the right thing to do.	2017-06-17 19:21:33 +02:00
Dimitri Fontaine	e11ccf7bb7	Fix on-error-stop signaling. To properly handle on-error-stop condition, make it a specific pgloader condition with a specific handling behavior. In passing add some more log messages for surprising conditions. Fix #546.	2017-06-17 19:02:05 +02:00
Dimitri Fontaine	5faf8605ce	Fix corner cases and how we log them. In the prepare-pgsql-database method we were logging too much details, such as DDL warnings on if-not-exists for successful queries. And those logs are to be found in PostgreSQL server logs anyway. Also fix trying to create or drop a "nil" schema.	2017-06-17 18:16:18 +02:00
Dimitri Fontaine	6c931975de	Refrain from pushing too much logging trafic. In this patch we hard-code some cases when we know the log message won't be displayed anywhere so as to avoid sending it to the monitor thread. It certainly is a modularity violation, but given the performance impact...	2017-06-17 18:12:33 +02:00
Dimitri Fontaine	422fab646a	Typo-level fix. Using ~s with extra quotes is quite disturbing at this place (logs only, but still).	2017-06-17 17:23:44 +02:00
Dimitri Fontaine	7f55b21044	Improve support for http(s) resources. The code used to take into account content-length HTTP header to load that number of bytes in memory from the remote server. Not only it's better to use a fixed size allocated-once buffer for that (now 4k), but also doing so allows downloading content that you don't know the content-length of. In passing tell the HTTP-URI parser rule that we also accept https:// as a prefix, not just http://. This allows running pgloader in such cases: $ pgloader https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite_AutoIncrementPKs.sqlite pgsql:///chinook And it just works!	2017-06-17 16:48:15 +02:00
Dimitri Fontaine	b301aa9394	Review create-schemas default behavior. Get back in line with what the documentation says, and also fix the case for default MySQL migrations now that we target a PostgreSQL schema with the same name as the MySQL database name. Open question yet: should we also register the new schema on the search_path by default? ALTER DATABASE ... SET search_path TO public, newschema, ...; Is it more of a POLA violation to alter the search_path or to not do it? Fix #582.	2017-06-16 09:01:18 +02:00
Peter Matseykanets	cd16faee8a	brew insist on capital --HEAD Reword our documentation to use the uppercase variant so that users may copy and paste and benefit. Fix #581.	2017-06-16 08:43:54 +02:00
Dimitri Fontaine	b3cb7b256d	Travis: let's actually use the new EXTRA_OPTS.	2017-06-14 21:32:41 +02:00
Dimitri Fontaine	c02defa5f0	Travis: explicitely pass down the CL variable. It seems that the test/Makefile didn't get the memo.	2017-06-14 21:25:18 +02:00
Dimitri Fontaine	1469789ede	Try to get more information from CCL in testing. The “magic” options --batch and --heap-reserve will be processed by CCL itself before pgloader gets to see them, so try that in the testing environment.	2017-06-14 21:12:54 +02:00
Dimitri Fontaine	de9b43c332	Add support for the MS-SYBDATE datatype. Fixes #568, thanks to a test case being provided!	2017-06-14 21:02:00 +02:00
Dimitri Fontaine	2c644d55f2	Add --batch to CCL run options. This option provides lots of information when it crashes, and should help us with understanding Travis and DockerHub errors with CCL.	2017-06-14 11:49:22 +02:00
Adrian Vondendriesch	90a33b4b4c	MSSQL: Add ON UPDATE / DELETE support for fkeys (#580 ) The former query to find foreign key constraints doesn't consider ON UPDATE and ON DELETE rules.	2017-06-13 12:03:51 +02:00

1 2 3 4 5 ...

1288 Commits