pgloader

mirror of https://github.com/dimitri/pgloader.git synced 2026-01-25 17:11:02 +01:00

Author	SHA1	Message	Date
Dimitri Fontaine	db9fa2f001	Improve docs for connection strings. Some parts of the connection strings might be provided from the environment, such as in the MySQL case. Fix #485.	2016-12-03 15:51:39 +01:00
Dimitri Fontaine	6eef0c6c00	Improve docs with default parallelism settings. Fix #442 by adding the default values of concurrency and workers.	2016-12-03 15:30:34 +01:00
Dimitri Fontaine	210664fff5	Fix typo: Performance, singular. Fixed #432.	2016-08-07 21:40:28 +02:00
Dimitri Fontaine	70572a2ea7	Implement support for existing target databases. Also known as the ORM case, it happens that other tools are used to create the target schema. In that case pgloader job is to fill in the exiting target tables with the data from the source tables. We still focus on load speed and pgloader will now DROP the constraints (Primary Key, Unique, Foreign Keys) and indexes before running the COPY statements, and re-install the schema it found in the target database once the data load is done. This behavior is activated when using the “create no tables” option as in the following test-case setup: with create no tables, include drop, truncate Fixes #400, for which I got a test-case to play with!	2016-08-06 20:19:15 +02:00
Krzysztof Jurewicz	1378949eee	Fix docs about char and varchar casting in MySQL (#409 )	2016-05-18 21:55:36 +02:00
Krzysztof Jurewicz	13f5821547	Add the “set not null” cast option for MySQL (#407 ) Use case: Django dissuades setting NULL “on string-based fields […] because empty string values will always be stored as empty strings, not as NULL. If a string-based field has null=True, that means it has two possible values for »no data«: NULL, and the empty string. In most cases, it’s redundant to have two possible values for »no data«; the Django convention is to use the empty string, not NULL.”. pgloader already supports custom transformations which can be used to replace NULL values in string-based columns with empty strings. Setting NOT NULL constraint on those columns could possibly be achieved by running a database query to extract their names and then generating relevant ALTER TABLE statements, but a cast option in pgloader is a more convenient way.	2016-05-18 21:50:09 +02:00
Dimitri Fontaine	7344e1d81e	Improve docs for FILENAMES MATCHING support. This format of source file specifications is available for CSV, COPY and FIXED formats but was only documented for the CSV one. The paragraph is copy/pasted around in the hope to produce per-format man pages and web documentation in a fully automated way sometime. Fix #397.	2016-05-18 11:07:28 +02:00
Dimitri Fontaine	42e9e521e0	Add option "max parallel create index". By default, pgloader will start as many parallel CREATE INDEX commands as the maximum number of indexes you have on any single table that takes part in the load. As this number might be so great as to exhaust the target PostgreSQL server (e.g. maintenance_work_mem), we add an option to limit that to something reasonnable when the source schema isn't. Fix #386 in which 150 indexes are found on a single source table.	2016-04-11 17:40:52 +02:00
Dimitri Fontaine	35155654df	Allow to ALTER TABLE ... IN SCHEMA. That brings the ALTER TABLE feature to MS SQL source.	2016-03-26 20:50:05 +01:00
Dimitri Fontaine	8fc9a474d9	Document --dry-run and --on-error-stop options.	2016-03-21 21:24:39 +01:00
Dimitri Fontaine	c724018840	Implement ALTER TABLE clause for MySQL migrations. The new ALTER TABLE facility allows to act on tables found in the MySQL database before the migration happens. In this patch the only provided actions are RENAME TO and SET SCHEMA, which fixes #224. In order to be able to provide the same option for MS SQL users, we will have to make it work at the SCHEMA level (ALTER SCHEMA ... RENAME TO ...) and modify the internal schema-struct so that the schema slot of our table instances are a schema instance rather than its name. Lacking MS SQL test database and instance, the facility is not yet provided for that source type.	2016-03-06 21:51:33 +01:00
Dimitri Fontaine	486be8c068	SQLite integer default values might be quoted. Fix #351 by having a new transformation function to process SQLite integers, that may be quoted...	2016-03-03 14:59:27 +01:00
Dimitri Fontaine	c108b85290	Allow package prefix in CAST ... USING clause. Also, in passing, ass a new transformation function for MySQL allowing to transform from varbinary to text.	2016-02-04 16:09:22 +01:00
Maxim Filippov	6d02591e9c	Fix typo in documentation	2016-01-20 12:41:50 +03:00
Dimitri Fontaine	7dd69a11e1	Implement concurrency and workers for files sources. More than the syntax and API tweaks, this patch also make it so that a multi-file specification (using e.g. ALL FILENAMES IN DIRECTORY) can be loaded with several files in the group in parallel. To that effect, tweak again the md-connection and md-copy implementations.	2016-01-16 22:53:55 +01:00
Dimitri Fontaine	eb45bf0338	Expose concurrency settings to the end users. Add the workers and concurrency settings to the LOAD commands for database sources so that users can tweak them now, and add mentions of them in the documentation too. From the documentation string of the copy-from method as found in src/sources/common/methods.lisp: We allow WORKER-COUNT simultaneous workers to be active at the same time in the context of this COPY object. A single unit of work consist of several kinds of workers: - a reader getting raw data from the COPY source with `map-rows', - N transformers preparing raw data for PostgreSQL COPY protocol, - N writers sending the data down to PostgreSQL. The N here is setup to the CONCURRENCY parameter: with a CONCURRENCY of 2, we start (+ 1 2 2) = 5 concurrent tasks, with a CONCURRENCY of 4 we start (+ 1 4 4) = 9 concurrent tasks, of which only WORKER-COUNT may be active simultaneously. Those options should find their way in the remaining sources, that's for a follow-up patch tho.	2016-01-15 23:22:32 +01:00
Dimitri Fontaine	a3fd22acd3	Review pgloader encoding story. Thanks to Common Lisp character data type, it's easy for pgloader to enforce always speaking to PostgreSQL in utf-8, and that's what has been done from the beginning actually. Now, without good reason for that, the first example of a SET clause that has been added to the docs where about how to set client_encoding, which should NOT be done. Fix that at the use level by removing the bad example from the docs and adding a WARNING whenever the client_encoding is set to a known bad value. It's a WARNING because we then simply force 'utf-8' anyway. Also, review completely the format-vector-row function to avoid doing double work with the Postmodern facilities we piggyback on. This was done halfway through and the utf-8 conversion was actually done twice.	2016-01-11 01:27:36 +01:00
Dimitri Fontaine	735cdc8fdc	Document the remove-null-characters transform. Both as a new transformation function available, and as the default for Text conversions when coming from MySQL. See #258, Fixes #219.	2015-12-08 21:04:47 +01:00
Dimitri Fontaine	e23de0ce9f	Improve SQLite table names filtering. Filter the list of tables we migrate directly from the SQLite query, avoiding to return useless data. To do that, use the LIKE pattern matching supported by SQLite, where the REGEX operator is only available when extra features are loaded apparently. See #310 where filtering out the view still caused errors in the loading.	2015-11-22 22:10:26 +01:00
Dimitri Fontaine	6bf26c52ec	Implement a TimeZone option for IXF loading. The local-time:encode-timestamp function takes a default timezone and it is necessary to have control over it when loading from pgloader. Hence, add a timezone option to the IXF option list, that is now explicit and local to the IXF parser rather than shared with the DBF option list.	2015-10-05 16:46:15 +02:00
Dimitri Fontaine	6fc40c4844	Implement MS SQL option to skip creating schemas, fix #263 . Allow the user to control whether pgloader should create the same set of schema as found on the MS SQL database.	2015-08-15 16:10:15 +02:00
Dimitri Fontaine	d2a1a5643e	Improve SQL blocks support, fix #265 . It's now possible to use several files in a BEFORE LOAD EXECUTE section, and to mix DO and EXECUTE parts, bringing lots of flexibility in the commands. Also it actually simplifies the parser.	2015-07-24 17:41:35 +02:00
Dimitri Fontaine	3af99051d2	Fix the preserve index names option. MySQL names its primary keys "PRIMARY" and we need to always uniquify this name even when the used asked pgloader to preserve index names. Also, the create-indexes-again function now needs to ask for index names to be preserved specifically.	2015-07-18 23:39:32 +02:00
Dimitri Fontaine	49bf7e56f2	Implement a "drop indexes" option in CSV mode, fix #251 . When loading against a table that already has index definitions, the load can be quite slow. Previous commit introduced a warning in such a case. This commit introduces the option "drop indexes" that is not used by default. When this option is used, pgloader drops the indexes before loading the data then create the indexes again with the same definitions as before. All the indexes are created again in parallel to optimize performances. Only primary key indexes can't be created in parallel, so those are created in two steps (create unique index then alter table).	2015-07-16 12:22:58 +02:00
Dimitri Fontaine	d75c100399	Expose cl-csv escape mode option, fix #80 . Some CSV files are using the CSV escape character internally in their fields. In that case we enter a parsing bug in cl-csv where backtracking from parsing the escape string isn't possible (or at least unimplemented). To handle the case, change the quote parameter from \" to just \ and let cl-csv use its escape-quote mechanism to decide if we're escaping only separators or just any data. See https://github.com/AccelerationNet/cl-csv/issues/17 where the escape mode feature was introduced for pgloader issue #80 already.	2015-06-25 14:10:36 +02:00
Alex Baretta	49dcae8068	Fix DROP TABLE statements on tables with foreign keys	2015-05-28 14:24:28 -07:00
Dimitri Fontaine	d1fce3728a	Allow more PostgreSQL URI options, fix #199 . As per PostgreSQL documentation on connection strings, allow overriding of main URI components in the options parts, with a percent-encoded syntax for parameters. It allows to bypass the main URI parser limitations as seen in #199 (how to have a password start with a colon?). See: http://www.postgresql.org/docs/9.3/interactive/libpq-connect.html#LIBPQ-CONNSTRING	2015-05-22 23:39:04 +02:00
Dimitri Fontaine	bffec4cc63	Allow for more options in the CSV escape character, fix #38 . To allow for importing JSON one-liners as-is in the database it can be interesting to leverage the CSV parser in a compatible setup. That setup requires being able to use any separator character as the escape character.	2015-05-22 12:31:06 +02:00
Dimitri Fontaine	abbc105c41	Implement CSV headers support. Some CSV files are given with an header line containing the list of their column names, use that when given the option "csv header". Note that when both "skip header" and "csv header" options are used, pgloader first skip as many required lines and then uses the next one as the csv header. Because of temporary failure to install the `ronn` documentation tool, this patch only commits the changes to the source docs and omits to update the man page (pgloader.1). A following patch is intended to be pushed that fixed that. See #236 which is using shell tricks to retrieve the field list from the CSV file itself and motivated this patch to finally get written.	2015-05-21 12:55:23 +02:00
Dimitri Fontaine	3848ad6ae5	SQLite integers can host bigints, fix #227 .	2015-04-30 18:17:13 +02:00
Dimitri Fontaine	95a5eb3184	Implement more COPY options, fix #218 . The COPY format now supports user defined delimiter and null options, and we don't require the column names anymore as it's useless in that context.	2015-04-30 14:30:16 +02:00
Jon Dufresne	b88ef6bdea	Fix typo.	2015-04-02 09:45:44 -07:00
Jon Dufresne	8038931f5a	Remove trailing whitespace.	2015-04-02 09:45:36 -07:00
Dimitri Fontaine	7d2d09ce68	Add the option to preserve MySQL index names, fix #187 . See test/parse/hans.goeuro.load for an example usage of the new option. In passing, any error when creating indexes is now properly reported and logged, which was missing previously. Oops.	2015-03-07 20:19:47 +01:00
Dimitri Fontaine	48f451bdbc	Implement the option to disable triggers when loading data. This option is dangerous and allows to skip ALL triggers when loading data against PostgreSQL. This includes foreign key constraints definitions and will allow loading data out of order. When using both the options "create no table" and "disable triggers" it will be possible to load data into a schema prepared by your favorite external tool, at the cost of not validating FK constraints. Use with care. Fix #167.	2015-02-19 15:05:10 +01:00
Dimitri Fontaine	a8e728a740	Document with extra auto_increment casting rules guard, fix #173 .	2015-02-17 22:14:54 +01:00
Stefan Eletzhofer	9efe68504f	Update pgloader.1.md Fixed a very minor formatting change in the man page markdown source.	2015-01-22 11:10:22 +01:00
Dimitri Fontaine	087d4d28cb	Review website material, introduce pgloader cli operations.	2015-01-15 16:52:10 +01:00
Dimitri Fontaine	560c838d34	Improve documentation, readying for next release. The docs now fully cover all supported source types, including COPY and MSSQL, and expand some more on the command-line only operations for pgloader.	2015-01-15 00:29:41 +01:00
Dimitri Fontaine	ad8fb0b2a4	Implement machine readable summary files, fixes #144 . It's now possible to have pgloader print out its summary in one of several formats: human-readable (default), csv, copy or json. The choice of format is made depending on the extension of the summary filename picked on the command line with the option --summary.	2015-01-06 01:22:01 +01:00
Dimitri Fontaine	6d76bc57e3	Allow - from the command line to process from standard input. Also augment the documentation with examples of bare stdin reading and of advantages of the unix pipes to stream even remove archived content down to PostgreSQL.	2014-12-27 21:20:40 +01:00
Dimitri Fontaine	f2bf5c4a62	Adjusting manpage text to fit github's markdown...	2014-12-27 17:09:37 +01:00
Dimitri Fontaine	44504542c9	Small fixes in the pgloader.1.md documentation.	2014-12-27 17:08:17 +01:00
Dimitri Fontaine	40f3c4f769	Improve HTTP handling of CSV and Fixed data sources. In passing also allow --field to specify the whole field list, there's no point in forcing the user to have as many --field switches on the command line as they have columns in their data source file.	2014-12-27 17:02:19 +01:00
Dimitri Fontaine	e45ab7f1e2	Add an EXAMPLES section to the man page.	2014-12-23 22:37:26 +01:00
Dimitri Fontaine	6eac0d9dd8	Implement --before and --after options on the command line. That allows using SQL scripts to run before and after the main data processing and loading done by pgloader when used only from the command line.	2014-12-23 12:21:44 +01:00
Dimitri Fontaine	65c2043694	Improve pgloader usage from the command line. Make it so that the following command line usages are accepted when using pgloader without a command file: ./build/bin/pgloader ./test/sqlite/sqlite.db postgresql:///pgloader ./build/bin/pgloader --set "search_path='sakila'" \ mysql://root@localhost/sakila \ postgresql:///sakila ./build/bin/pgloader --type csv \ --field id --field field \ --with truncate \ --with "fields terminated by ','" \ ./test/data/matching-1.csv \ postgres:///pgloader?matching It's now possible in most cases to just use command-line options, which should make the entry bar to pgloader much lower.	2014-12-23 02:40:13 +01:00
Matt Tyson	56aa447cdf	Add support for MySQL tinyint auto increment	2014-12-18 09:06:20 +10:00
Dimitri Fontaine	a63a4e8e61	Merge pull request #139 from dmison/patch-1 command synopsis wasn't rendering in github preview	2014-12-17 10:10:37 +01:00
Dimitri Fontaine	17c38179cf	Merge pull request #136 from mtyson01/mysql_mediumint Add support for MySQL mediumint auto increment	2014-12-17 10:09:40 +01:00

1 2 3

130 Commits