pgloader

mirror of https://github.com/dimitri/pgloader.git synced 2026-01-27 18:11:02 +01:00

Author	SHA1	Message	Date
Dimitri Fontaine	d69b72053a	Implement default unsigned casting rules for MySQL. The following casting rules are now the default for MySQL: - type tinyint when unsigned to smallint drop typemod - type smallint when unsigned to integer drop typemod - type mediumint when unsigned to integer drop typemod - type integer when unsigned to bigint drop typemod Fixes #678.	2017-11-22 10:29:11 -08:00
Dimitri Fontaine	5c60f8c35c	Implement a new type casting guard: unsigned. MySQL allows using unsigned data types and pgloader should then target a signed type of a larger capacity so that values can fit. For example, the data definition “smallint(5) unsigned” should be casted to “integer”. This patch allows user defined cast rules to be written against “unsigned” data types as per their MySQL catalog representation. See #678.	2017-11-22 10:26:03 -08:00
Dimitri Fontaine	a9e8bfd4d7	Support for colon characters in PostgreSQL socket path. Google Cloud SQL instances are now using the following format for the name of their socket <PROJECT-ID>:<REGION>:<INSTANCE_NAME>. We do that by allowing to escape a colon in the socket directory name by doubling it, as in the username field. It also allows to accept any character in the socket directory name, which is a good cleanup. Fix #621.	2017-08-30 15:22:42 +02:00
Dimitri Fontaine	d5072d11e5	Implement support for a pgpass file. The implementation follows PostgreSQL specifications as closely as possible, with the escaping rules and the matching rules. The default path where to find the .pgpass (or pgpass.conf on windows) are as documented in PostgreSQL too. Only missing are the file permissions check. Fix #460.	2017-08-29 03:16:35 +02:00
Dimitri Fontaine	01e5c23763	Add support for explicit TARGET TABLE clause in load commands. It used to be that you would give the target table name as an option to the PostgreSQL connection string, which is untasteful: load ... into pgsql://user@host/dbname?tablename=foo.bar ... Or even, for backwards compatibility: load ... into pgsql://user@host/dbname?foo.bar ... The new syntax makes provision for a separate clause for the target table name, possibly schema-qualified: load ... into pgsql://user@host/dbname target table foo.bar ... Which is much better, in particular when used together with the target columns clause. Implementing this seemingly quite small feature had impact on many parsing related features of pgloader, such as the regression testing facility. So much so that some extra refactoring got into its way here, around the lisp-code-for-loading-from-<source> functions and their usage in `load-data'. While at it, this patch simplifies a lot the `load-data' function by making a good use of &allow-other-keys and :allow-other-keys t. Finally, this patch splits main.lisp into main.lisp and api.lisp, with the latter intended to contain functions for Common Lisp programs wanting to use pgloader as a library. The API itself is still the same as before this patch, tho. Just in another file for clarity.	2017-08-25 01:57:54 +02:00
Dimitri Fontaine	9263baeb49	Implement sslmode for MySQL connections. This allows to bypass SSL when you don't need it, like over localhost for instance. Takes the same syntax as the PostgreSQL sslmode connection string parameter.	2017-08-24 14:56:59 +02:00
Dimitri Fontaine	f719d2976d	Implement a template system for pgloader commands. This feature has been asked several times, and I can't see any way to fix the GETENV parsing mess that we have. In this patch the GETENV support is retired and replaced with a templating system, using the Mustache syntax. To get back the GETENV feature, our implementation of the Mustache template system adds support for fetching the template variable values from the OS environment. Fixes #555, Fixes #609. See #500, #477, #278.	2017-08-16 01:33:11 +02:00
Dimitri Fontaine	b1fa3aec3c	Implement a separate switch to drop the schemas. The with option “include drop” used to also apply to schemas, which is not that useful and problematic when trying to DROP SCHEMA public, because you might not connect as the owner of that schema. Even if we don't target the public schema by default, users can choose to do so thanks to our ALTER SCHEMA ... RENAME TO ... command. Fixes #594.	2017-07-18 13:13:36 +02:00
Dimitri Fontaine	471f2b6d88	Implement automagic guessing of CSV parameters. As we know how many columns we expect from the input file, it's possible to read a sample (10 lines as of this patch) and try many different CSV reader parameters combinations until we find one that works: it returns the right number of fields. It is still possible of course to specify parameters on the command line or in a load file if necessary, but it makes the simple case even simpler. As simple as: pgloader file.csv pgsql:///pgloader?tablename=target	2017-07-07 02:16:53 +02:00
Dimitri Fontaine	3eab88b144	Add a new "drop indexes" option for databases. This allows to use a combination of "data only, drop indexes" so that when the target database already exists, pgloader will use the existing schema and still DROP INDEX before loading the data and do the CREATE INDEX dance in parallel and all at the end of it. Also, as I couldn't reproduce neither #539 (which is good, it's supposed to be fixed now) nor #550 (that was open due to a regression): fixes #550.	2017-07-04 00:15:58 +02:00
Dimitri Fontaine	0549e74f6d	Implement multiple reader per table for MySQL. Experiment with the idea of splitting the read work in several concurrent threads, where each reader is reading portions of the target table, using a WHERE id <= x and id > y clause in its SELECT query. For this to kick-in a number of conditions needs to be met, as described in the documentation. The main interest might not be faster queries to overall fetch the same data set, but better concurrency with as many readers as writters and each couple its own dedicated queue.	2017-06-28 16:23:18 +02:00
Dimitri Fontaine	6d66280fa5	Review parallelism and memory behavior. The previous patch made format-vector-row allocate its memory in one go rather than byte after byte with vector-push-extend. In this patch we review our usage of batches and parallelism. Now the reader pushes each row directly to the lparallel queue and writers concurrently consume from it, cook batches in COPY format, and then send that chunk of data down to PostgreSQL. When looking at runtime profiles, the time spent writing in PostgreSQL is a fraction of the time spent reading from MySQL, so we consider that the writing thread has enough time to do the data mungling without slowing us down. The most interesting factor here is the memory behavor of pgloader, which seems more stable than before, and easier to cope with for SBCL's GC. Note that batch concurrency is no more, replaced by prefetch rows: the reader thread no longer build batches and the count of items in the reader queue is now a number a rows, not of batches of them. Anyway, with this patch in I can't reproduce the following issues: Fixes #337, Fixes #420.	2017-06-27 23:10:33 +02:00
Dimitri Fontaine	352f4adc8d	Implement support for MySQL SET parameters. pgloader had support for PostgreSQL SET parameters (gucs) from the beginning, and in the same vein it might be necessary to tweak MySQL connection parameters, and allow pgloader users to control them. See #337 and #420 where net_read_timeout and net_write_timeout might need to be set in order to be able to complete the migration, due to high volumes of data being processed.	2017-06-27 10:00:47 +02:00
Dimitri Fontaine	c6b634caad	Provide "on error stop" as a WITH option. As seen in #546 it would be easier to be able to specify the option in the load command directly rather than only at the command line. Here we go!	2017-06-01 16:43:09 +02:00
Dimitri Fontaine	9e2b95d9b7	Implement support for PostgreSQL storage parameters. In PostgreSQL it is possible at CREATE TABLE time to set some extra storage parameters, the most useful of them in the context of pgloader being the FILLFACTOR. For the setting to be useful, it needs to be positionned at CREATE TABLE time, before we load the data. The BEFORE LOAD clause of the pgloader command allows to run SQL scripts that will be executed before the load, and even before the creation of the target schema when pgloader does that, which is nice for other use case. Here we implement a new `ALTER TABLE` rule that one can set in the pgloader command in order to change storage parameters at CREATE TABLE time: ALTER TABLE NAMES MATCHING ~/\./ SET (fillfactor='40') Fix #516.	2017-02-25 21:49:06 +01:00
Dimitri Fontaine	ed217b7b28	Add some docs about FreeTDS and encoding. It turns out that it's possible and not too complex, when using the FreeTDS driver, to enforce the client encoding for MS SQL to be utf-8. Document how to tweak ~/.freetds.conf to that end.	2017-01-27 22:16:59 +01:00
Dimitri Fontaine	effa916b31	Improve parallelism setup documentation. The code comment displayed in the release notes for 3.3.1 is reported to be better at explaining the concurrency control than what we had in the main documentation, so add it there. Fix #496.	2017-01-03 23:13:01 +01:00
Dimitri Fontaine	db9fa2f001	Improve docs for connection strings. Some parts of the connection strings might be provided from the environment, such as in the MySQL case. Fix #485.	2016-12-03 15:51:39 +01:00
Dimitri Fontaine	6eef0c6c00	Improve docs with default parallelism settings. Fix #442 by adding the default values of concurrency and workers.	2016-12-03 15:30:34 +01:00
Dimitri Fontaine	210664fff5	Fix typo: Performance, singular. Fixed #432.	2016-08-07 21:40:28 +02:00
Dimitri Fontaine	70572a2ea7	Implement support for existing target databases. Also known as the ORM case, it happens that other tools are used to create the target schema. In that case pgloader job is to fill in the exiting target tables with the data from the source tables. We still focus on load speed and pgloader will now DROP the constraints (Primary Key, Unique, Foreign Keys) and indexes before running the COPY statements, and re-install the schema it found in the target database once the data load is done. This behavior is activated when using the “create no tables” option as in the following test-case setup: with create no tables, include drop, truncate Fixes #400, for which I got a test-case to play with!	2016-08-06 20:19:15 +02:00
Dimitri Fontaine	7344e1d81e	Improve docs for FILENAMES MATCHING support. This format of source file specifications is available for CSV, COPY and FIXED formats but was only documented for the CSV one. The paragraph is copy/pasted around in the hope to produce per-format man pages and web documentation in a fully automated way sometime. Fix #397.	2016-05-18 11:07:28 +02:00
Dimitri Fontaine	42e9e521e0	Add option "max parallel create index". By default, pgloader will start as many parallel CREATE INDEX commands as the maximum number of indexes you have on any single table that takes part in the load. As this number might be so great as to exhaust the target PostgreSQL server (e.g. maintenance_work_mem), we add an option to limit that to something reasonnable when the source schema isn't. Fix #386 in which 150 indexes are found on a single source table.	2016-04-11 17:40:52 +02:00
Dimitri Fontaine	35155654df	Allow to ALTER TABLE ... IN SCHEMA. That brings the ALTER TABLE feature to MS SQL source.	2016-03-26 20:50:05 +01:00
Dimitri Fontaine	8fc9a474d9	Document --dry-run and --on-error-stop options.	2016-03-21 21:24:39 +01:00
Dimitri Fontaine	c724018840	Implement ALTER TABLE clause for MySQL migrations. The new ALTER TABLE facility allows to act on tables found in the MySQL database before the migration happens. In this patch the only provided actions are RENAME TO and SET SCHEMA, which fixes #224. In order to be able to provide the same option for MS SQL users, we will have to make it work at the SCHEMA level (ALTER SCHEMA ... RENAME TO ...) and modify the internal schema-struct so that the schema slot of our table instances are a schema instance rather than its name. Lacking MS SQL test database and instance, the facility is not yet provided for that source type.	2016-03-06 21:51:33 +01:00
Dimitri Fontaine	486be8c068	SQLite integer default values might be quoted. Fix #351 by having a new transformation function to process SQLite integers, that may be quoted...	2016-03-03 14:59:27 +01:00
Dimitri Fontaine	c108b85290	Allow package prefix in CAST ... USING clause. Also, in passing, ass a new transformation function for MySQL allowing to transform from varbinary to text.	2016-02-04 16:09:22 +01:00
Dimitri Fontaine	7dd69a11e1	Implement concurrency and workers for files sources. More than the syntax and API tweaks, this patch also make it so that a multi-file specification (using e.g. ALL FILENAMES IN DIRECTORY) can be loaded with several files in the group in parallel. To that effect, tweak again the md-connection and md-copy implementations.	2016-01-16 22:53:55 +01:00
Dimitri Fontaine	eb45bf0338	Expose concurrency settings to the end users. Add the workers and concurrency settings to the LOAD commands for database sources so that users can tweak them now, and add mentions of them in the documentation too. From the documentation string of the copy-from method as found in src/sources/common/methods.lisp: We allow WORKER-COUNT simultaneous workers to be active at the same time in the context of this COPY object. A single unit of work consist of several kinds of workers: - a reader getting raw data from the COPY source with `map-rows', - N transformers preparing raw data for PostgreSQL COPY protocol, - N writers sending the data down to PostgreSQL. The N here is setup to the CONCURRENCY parameter: with a CONCURRENCY of 2, we start (+ 1 2 2) = 5 concurrent tasks, with a CONCURRENCY of 4 we start (+ 1 4 4) = 9 concurrent tasks, of which only WORKER-COUNT may be active simultaneously. Those options should find their way in the remaining sources, that's for a follow-up patch tho.	2016-01-15 23:22:32 +01:00
Dimitri Fontaine	735cdc8fdc	Document the remove-null-characters transform. Both as a new transformation function available, and as the default for Text conversions when coming from MySQL. See #258, Fixes #219.	2015-12-08 21:04:47 +01:00
Dimitri Fontaine	e23de0ce9f	Improve SQLite table names filtering. Filter the list of tables we migrate directly from the SQLite query, avoiding to return useless data. To do that, use the LIKE pattern matching supported by SQLite, where the REGEX operator is only available when extra features are loaded apparently. See #310 where filtering out the view still caused errors in the loading.	2015-11-22 22:10:26 +01:00
Dimitri Fontaine	6fc40c4844	Implement MS SQL option to skip creating schemas, fix #263 . Allow the user to control whether pgloader should create the same set of schema as found on the MS SQL database.	2015-08-15 16:10:15 +02:00
Dimitri Fontaine	d2a1a5643e	Improve SQL blocks support, fix #265 . It's now possible to use several files in a BEFORE LOAD EXECUTE section, and to mix DO and EXECUTE parts, bringing lots of flexibility in the commands. Also it actually simplifies the parser.	2015-07-24 17:41:35 +02:00
Dimitri Fontaine	3af99051d2	Fix the preserve index names option. MySQL names its primary keys "PRIMARY" and we need to always uniquify this name even when the used asked pgloader to preserve index names. Also, the create-indexes-again function now needs to ask for index names to be preserved specifically.	2015-07-18 23:39:32 +02:00
Dimitri Fontaine	49bf7e56f2	Implement a "drop indexes" option in CSV mode, fix #251 . When loading against a table that already has index definitions, the load can be quite slow. Previous commit introduced a warning in such a case. This commit introduces the option "drop indexes" that is not used by default. When this option is used, pgloader drops the indexes before loading the data then create the indexes again with the same definitions as before. All the indexes are created again in parallel to optimize performances. Only primary key indexes can't be created in parallel, so those are created in two steps (create unique index then alter table).	2015-07-16 12:22:58 +02:00
Dimitri Fontaine	3848ad6ae5	SQLite integers can host bigints, fix #227 .	2015-04-30 18:17:13 +02:00
Dimitri Fontaine	95a5eb3184	Implement more COPY options, fix #218 . The COPY format now supports user defined delimiter and null options, and we don't require the column names anymore as it's useless in that context.	2015-04-30 14:30:16 +02:00
Dimitri Fontaine	7d2d09ce68	Add the option to preserve MySQL index names, fix #187 . See test/parse/hans.goeuro.load for an example usage of the new option. In passing, any error when creating indexes is now properly reported and logged, which was missing previously. Oops.	2015-03-07 20:19:47 +01:00
Dimitri Fontaine	48f451bdbc	Implement the option to disable triggers when loading data. This option is dangerous and allows to skip ALL triggers when loading data against PostgreSQL. This includes foreign key constraints definitions and will allow loading data out of order. When using both the options "create no table" and "disable triggers" it will be possible to load data into a schema prepared by your favorite external tool, at the cost of not validating FK constraints. Use with care. Fix #167.	2015-02-19 15:05:10 +01:00
Dimitri Fontaine	a8e728a740	Document with extra auto_increment casting rules guard, fix #173 .	2015-02-17 22:14:54 +01:00
Dimitri Fontaine	087d4d28cb	Review website material, introduce pgloader cli operations.	2015-01-15 16:52:10 +01:00
Dimitri Fontaine	560c838d34	Improve documentation, readying for next release. The docs now fully cover all supported source types, including COPY and MSSQL, and expand some more on the command-line only operations for pgloader.	2015-01-15 00:29:41 +01:00
Dimitri Fontaine	559e1c3348	Forgot to push the changes to the manpage.	2015-01-06 12:35:35 +01:00
Dimitri Fontaine	6d76bc57e3	Allow - from the command line to process from standard input. Also augment the documentation with examples of bare stdin reading and of advantages of the unix pipes to stream even remove archived content down to PostgreSQL.	2014-12-27 21:20:40 +01:00
Dimitri Fontaine	f2bf5c4a62	Adjusting manpage text to fit github's markdown...	2014-12-27 17:09:37 +01:00
Dimitri Fontaine	44504542c9	Small fixes in the pgloader.1.md documentation.	2014-12-27 17:08:17 +01:00
Dimitri Fontaine	e45ab7f1e2	Add an EXAMPLES section to the man page.	2014-12-23 22:37:26 +01:00
Dimitri Fontaine	6eac0d9dd8	Implement --before and --after options on the command line. That allows using SQL scripts to run before and after the main data processing and loading done by pgloader when used only from the command line.	2014-12-23 12:21:44 +01:00
Dimitri Fontaine	65c2043694	Improve pgloader usage from the command line. Make it so that the following command line usages are accepted when using pgloader without a command file: ./build/bin/pgloader ./test/sqlite/sqlite.db postgresql:///pgloader ./build/bin/pgloader --set "search_path='sakila'" \ mysql://root@localhost/sakila \ postgresql:///sakila ./build/bin/pgloader --type csv \ --field id --field field \ --with truncate \ --with "fields terminated by ','" \ ./test/data/matching-1.csv \ postgres:///pgloader?matching It's now possible in most cases to just use command-line options, which should make the entry bar to pgloader much lower.	2014-12-23 02:40:13 +01:00

1 2

78 Commits