pgloader

mirror of https://github.com/dimitri/pgloader.git synced 2025-08-07 23:07:00 +02:00

Author	SHA1	Message	Date
Dimitri Fontaine	9263baeb49	Implement sslmode for MySQL connections. This allows to bypass SSL when you don't need it, like over localhost for instance. Takes the same syntax as the PostgreSQL sslmode connection string parameter.	2017-08-24 14:56:59 +02:00
Dimitri Fontaine	b685c8801d	Improve guessing of CSV parameters. In this commit we fail the guess faster, allowing to test for a much larger sample. The sample is still hard-coded, but this time to 1000 lines. Also add a test case, see #618.	2017-08-24 13:30:14 +02:00
Dimitri Fontaine	8004a9dd59	Improve report output with bytes information. Understanding the timings requires not only the number of rows copied into each table but also how many bytes that represent. We add that information now in tht output. The number of bytes presented is computed from the unicode representation we prepare in pgloader for each row before sending it down to PostgreSQL.	2017-08-24 12:45:51 +02:00
Dimitri Fontaine	3b93ffa37a	Rewrite the reporting support entirely. Use a generic function protocol in order to implement the human readable, verbose, csv, copy and json reporting output formats. This is much cleaner and extensible than the previous way. Use that new power to implement a real JSON output from the internal state object.	2017-08-24 12:33:51 +02:00
Dimitri Fontaine	4fcb24f448	Reintroduce manual Garbage Collect in SBCL. It seems that SBCL still needs some help in deciding when to GC with very large values. In a test case with a “data” column averaging 375kB (up to about 3 MB per datum), it allows much larger batch size and prefetch rows settings without entering lldb.	2017-08-23 16:27:14 +02:00
Dimitri Fontaine	4f9eb8c06b	Track bytes sent to PostgreSQL. The pgstate infrastructure already had lots of details about what's going on, add to it the information about how many bytes are sent in every batch, and use this information in the monitor when something long is happening to display how many rows we sent from the beginning for this (supposedly) huge table, along with bytes and speed (bytes per seconds).	2017-08-23 11:55:49 +02:00
Dimitri Fontaine	1f242cd29e	Fix comment support to schema qualify target tables.	2017-08-23 11:26:08 +02:00
Dimitri Fontaine	a849f893a6	Implement a base46-decode transformation function.	2017-08-21 17:06:06 +02:00
Dimitri Fontaine	c62f4279c0	Be more verbose with long-running loads. Add a message every 20 batches so that the user knows it's still going on. Also, in passing, fix some messages: present is not precise enough to decide if the log refers to an event that is being done or starting next.	2017-08-21 16:50:16 +02:00
Dimitri Fontaine	28db6b9f13	Desultory cleanup of a useless declaim.	2017-08-21 16:46:32 +02:00
Dimitri Fontaine	03a8d57a50	Review --verbose log message. The verbosity is not that easy to adjust. Remove useless messages and add a new one telling when the COPY of a table is done. As we might have to wait for some time for indexes being built. keep the CREATE INDEX lines. Also keep the ALTER TABLE both for primary keys and foreign keys, again because the user might have to wait for quite some time.	2017-08-21 15:27:13 +02:00
Dimitri Fontaine	f719d2976d	Implement a template system for pgloader commands. This feature has been asked several times, and I can't see any way to fix the GETENV parsing mess that we have. In this patch the GETENV support is retired and replaced with a templating system, using the Mustache syntax. To get back the GETENV feature, our implementation of the Mustache template system adds support for fetching the template variable values from the OS environment. Fixes #555, Fixes #609. See #500, #477, #278.	2017-08-16 01:33:11 +02:00
Dimitri Fontaine	e21ce09ad7	Implement support for MySQL linestring data type. This data type is now converted automatically to a PostgreSQL path data type, using the open path notation with square brackets: https://www.postgresql.org/docs/current/static/datatype-geometric.html#AEN7103 Fix #445.	2017-08-15 15:26:06 +02:00
Dimitri Fontaine	20a85055f4	Implement support for MS SQL set parameters. It is sometimes needed to tweak MS SQL server parameters, such as the textsize parameters which allows fetching the whole set of bytes of a text of binary column (not kidding). Now it's possible to add such a line in the load file: set mssql parameters textsize to '104857600' Fixes #603.	2017-08-12 23:43:22 +02:00
Dimitri Fontaine	30f359735c	Make it easier to test “main” code. This code path is exercised from the command line only, which means I don't get to run it that often. And it's a pain to debug. So make it easier to run `process-source-and-target` from the REPL.	2017-08-10 21:58:53 +02:00
Dimitri Fontaine	773dcaeca3	Fix a race condition in the monitor thread. Startup log messages could be lost because the monitor would be started but not ready to process messages. Fix that by “warming up” the monitoring thread, having it execute a small computation and more importantly wait for the result to be received back, blocking. See #599 where parsing errors from a wrong URL were missed in the command line output, quite disturbingly.	2017-08-10 21:51:55 +02:00
Dimitri Fontaine	370038a74e	Fix the PostgreSQL URL in the MySQL howto. See #599 again, wherein I missed that the URL error was not a copy-paste'o but rather an error in the documentation itself…	2017-08-10 21:49:51 +02:00
Dimitri Fontaine	952e7da191	Bug fix CREATE TYPE in schema (previous patch). The previous patch fixed CREATE TYPE so that ENUM types are created in the same schema than the table using them, but failed to update the DROP TYPE statements to also target this schema...	2017-08-10 21:19:25 +02:00
Dimitri Fontaine	073a5c1e37	Fix Ergast link in MySQL howto. See #599.	2017-08-10 20:58:24 +02:00
Dimitri Fontaine	5a65da2147	Create new types in the proper schema. Previously to this patch, pgloader wouldn't care about which schema it creates extra types in. Extra types are mainly ENUM and SET support from MySQL. Now, pgloader creates those extra PostgreSQL ENUM types in the same schema as the table using them, which is a more sound default.	2017-08-10 18:57:09 +02:00
Dimitri Fontaine	981b801ce7	Fix user defined rules to cast ENUM to Text. The MySQL enum are casted to PostgreSQL enum types just fine, but sometimes that's not what the user wants. In case when we have a CAST rule for an ENUM column, recognize the fact and respect user choice. Fixes #608.	2017-08-10 18:01:17 +02:00
Dimitri Fontaine	049a1199c2	Implement support for SQLite current_date default value. The spelling in SQLite for the default value is "current_date", instruct pgloader about that. This commit also adds a test case in our sqlite.db unit tests database. Fixes #607.	2017-08-08 21:55:15 +02:00
Luke Snape	ecd6a8e25c	Ignore nulls in varbinary-to-string transform (#606 )	2017-08-07 21:37:37 +02:00
Dimitri Fontaine	38a6b4968d	Improve bundle building. Now when building a bundle file for source distribution of pgloader, always test it by building a binary image from the bundle tarball in a test directory. Also make it easy to target "latest" Quicklisp distribution with the following spelling: make BUNDLEDIST=latest bundle	2017-08-01 19:20:15 +02:00
Dimitri Fontaine	72431d4708	Improve the Quicklist dist support for bundles. When distributing a pgloader bundle we're using the ql-dist facility. In recent commit we hand-picked the last known working distribution of quicklisp for pgloader. Make it easy to target "latest" known distribution or hard-code one from the Makefile or the bundle/ql.lisp file.	2017-08-01 18:48:20 +02:00
Dimitri Fontaine	5c1c4bf3ff	Fix MySQL Enum parsing. We use a CSV parser for the MySQL enum values, but the quote escaping wasn't properly setup: MySQL quotes ENUM values with a single-quote (') and uses two of them ('') for escaping single-quotes when found in the ENUM value itself. Fixes #597.	2017-08-01 18:40:27 +02:00
Dimitri Fontaine	3103b0dc72	Escape SQL identifiers in SQLite catalog queries. SQLite supports the backtick escaping for SQL identifiers and we'd rather use it. Fixes #600.	2017-07-31 23:11:29 +02:00
Dimitri Fontaine	d37ad27754	Handle empty tables in concurrency support for MySQL. When the table is empty we get nil for min and max values of the id column. In that case we don't compute a set of ranges and “cancel” concurrency support for the empty table. Fixes #596.	2017-07-18 13:35:01 +02:00
Dimitri Fontaine	b1fa3aec3c	Implement a separate switch to drop the schemas. The with option “include drop” used to also apply to schemas, which is not that useful and problematic when trying to DROP SCHEMA public, because you might not connect as the owner of that schema. Even if we don't target the public schema by default, users can choose to do so thanks to our ALTER SCHEMA ... RENAME TO ... command. Fixes #594.	2017-07-18 13:13:36 +02:00
Dimitri Fontaine	ae0c6ed119	Add support for preserving index names in SQLite. See #187.	2017-07-17 11:04:12 +02:00
Dimitri Fontaine	cf6182fafa	Add a notice message with guessed parameters. We might have to help users debug our decision, and I expect we will have to improve our guess “engine” here.	2017-07-07 02:34:23 +02:00
Dimitri Fontaine	471f2b6d88	Implement automagic guessing of CSV parameters. As we know how many columns we expect from the input file, it's possible to read a sample (10 lines as of this patch) and try many different CSV reader parameters combinations until we find one that works: it returns the right number of fields. It is still possible of course to specify parameters on the command line or in a load file if necessary, but it makes the simple case even simpler. As simple as: pgloader file.csv pgsql:///pgloader?tablename=target	2017-07-07 02:16:53 +02:00
Dimitri Fontaine	14e1830b77	Fix CLI insistance of --field. From a load file, as soon as pgloader can retrieve the schema of the target table the source field list defaults to the target column list. Let's apply the same rules to the command line.	2017-07-07 01:00:55 +02:00
Dimitri Fontaine	154c74f85e	Update online docs with new release. The docs/ directory goes to http://pgloader.io.	2017-07-06 17:07:55 +02:00
Dimitri Fontaine	64959595fc	Back to development release in the master's branch.	2017-07-06 16:55:56 +02:00
Dimitri Fontaine	d71da6ba66	Release pgloader 3.4.1	2017-07-06 16:53:29 +02:00
Adrian Vondendriesch	058f9d5451	Debian (#578 ) * debian: Bump compat version to 9. * debian: Bump Standards-Version to 3.9.8	2017-07-06 15:38:14 +02:00
Dimitri Fontaine	7a371529be	Implement "drop indexes" option for MySQL and MSSQL too. It was only offered for SQLite without good reason really, and tests show that it works as well with MySQL of course. Offer the option there too. See `3eab88b144` for details.	2017-07-06 10:06:03 +02:00
Dimitri Fontaine	2363d8845f	Fix create schema handling in data only scenarios. In `b301aa9394` the "create schema" default changed to true, which is a good idea. As a consequence pgloader should consider this operation only when "create tables" is set: we don't want to start with creating target schemas in a target database that is said to be ready to host the data.	2017-07-06 09:48:03 +02:00
Dimitri Fontaine	dfe5c38185	Fix quoting policy in PostgreSQL ddl formating. We already have apply-identifier-case and identifier-case to decide how and when to quote our SQL object names, so don't force extra quotes in format string: refrain from using ~s.	2017-07-06 09:47:48 +02:00
Dimitri Fontaine	9da012ca51	Fix identifiers quoting when reading PostgreSQL catalogs. We sure can trust PostgreSQL to use names it knows how to handle. Still, it will be happy to store in its catalogs names containing upper case, and in that case we must quote them.	2017-07-06 03:16:06 +02:00
Dimitri Fontaine	e87477ed31	Restrict condition handling to relevant conditions. In md-methods copy-database function, don't pretend we are able to handle any condition when preparing the PostgreSQL schema, database-error is all we are dealing with there really.	2017-07-06 03:16:05 +02:00
Dimitri Fontaine	d3d40cd47d	Have git ignore local desktop files.	2017-07-06 03:16:05 +02:00
Dimitri Fontaine	e37cb3a9e7	Split SQL queries into their own files. This change was long overdue. Ideally we would use something like the YeSQL library for Clojure, but it seems like the cl-yesql equivalent is not ready yet, and it depends on an experimental build system... So this patch introduces an URL abstraction built on-top of a hash table. You can then reference src/pgsql/sql/list-all-columns.sql as (sql "pgsql/list-all-columns.sql") in the source code directly. So for now the templating system is CL's format language. It is still an improvement from embedded string. Again, one step at a time.	2017-07-06 03:16:05 +02:00
Dimitri Fontaine	d50ed64635	Defensive programming, after though. It might be that a column-type-name is actually an sqltype instance, and then #'string= won't be happy. Prevent that now with discarding any smarts when the type name does not satisfies stringp.	2017-07-06 00:59:36 +02:00
Dimitri Fontaine	26d372bca3	Implement support for non-btree indexes (e.g. MySQL spatial keys). When pgloader fetches the index list from a source database, it doesn't fetch information about access methods for the indexes: I don't even know if the overlap in between index access methods from one RDMBS to another covers more than just btree... It could happen that MySQL indexes a "geometry" column tho. This datatype is converted automatically to "point" by pgloader, which is good. But the index creation would fail with the following error message: Database error 42704: data type point has no default operator class for access method "btree" In this patch when setting up the target schema we issue a PostgreSQL catalog query to dynamically list those datatypes without btree support and fetch their opclasses, with an hard-coded preference to GiST, then GIN, so as to be able to automatically use the proper access method when btree isn't available. And now pgloader transparently issues the proper statement: CREATE INDEX idx_168468_idx_location ON pagila.address USING gist(location); Currently this exploration is limited to indexes with a single column. To implement the general case we would need a more complex lookup: we would have to find the intersection of all the supported access methods for all involved columns. Of course we might need to do that someday. One step at a time is plenty good enough tho.	2017-07-06 00:42:43 +02:00
Dimitri Fontaine	8405c331a9	Error handling improvements for PostgreSQL schema. In the complete PostgreSQL schema step, an error would be logged as you expect but poorly handled: it would have the whole transaction rolled back, meaning that a single Primary Key definition failure would cancel all the others, plus the foreign keys, and also the triggers and comments. It happens that other systems allow a primary column to contain NULL values, which is forbidden in the standard and enforced by PostgreSQL, so that's not a theoritical concern here.	2017-07-05 17:53:33 +02:00
Dimitri Fontaine	bae40d40c3	Fix identifier quoting corner cases. In cases when pgloader needs to build a new identifer from existing ones (mainly for renaming indexes, because they are unique per-table in the source database and unique per-schema in PostgreSQL), and we compose the new name from already quoted strings, pgloader was doing the wrong thing. Fix that by having a build-identifier function that may unquote parts then re-quote properly (if needed) the new identifier.	2017-07-05 15:37:21 +02:00
Dimitri Fontaine	f6cb428c6d	Check empty strings in DB3 numeric fields. Another blind attempt at fixing pgloader from a bug report on gitter, see	2017-07-04 23:15:47 +02:00
Dimitri Fontaine	652e435843	Only catch thread errors in pgloader-image. In the REPL we're going to have all errors pop in the interactive debugger, and that should be what we want...	2017-07-04 01:55:27 +02:00

... 5 6 7 8 9 ...

1568 Commits