pgloader

mirror of https://github.com/dimitri/pgloader.git synced 2025-08-10 08:17:00 +02:00

Author	SHA1	Message	Date
Dimitri Fontaine	b7347a567c	Add test cases for MySQL. At the moment it's a very manual process, and it might get automated someday. Meanwhile it's still useful to have. See #569 for an issue that got a test case added.	2017-09-14 15:59:10 +02:00
Dimitri Fontaine	a498313074	Implement support for MySQL FULLTEXT indexes. PostgreSQL btree indexes are limited in the size of the values they can index: values must fit in an index page (8kB). So when porting a MySQL full text index over full documents, we might get into an error like the following: index row size 2872 exceeds maximum 2712 for index "idx_5199509_search" To fix, query MySQL for the index type which is FULLTEXT rather than BTREE in those cases, and port it over to a PostgreSQL Full Text index with an hard-coded 'simple' configuration, such as the following test case: CREATE INDEX idx_75421_search ON mysql.fcm_batches USING gin(to_tsvector('simple', raw_payload)); Of course users might want to use a better configuration, including proper dictionnary for the documents. When using PostgreSQL each document may have its own configuration attached and yet they can all get indexed into the same index, so that's a task for the application developpers, not for pgloader. In passing, fix the list-typenames-without-btree-support.sql query to return separate entries for each index type rather than an {array,representation} of the result, as Postmodern won't turn the PostgreSQL array into a Common Lisp array by default. I'm kept wondering how it worked before. Fix #569.	2017-09-14 15:40:34 +02:00
Dimitri Fontaine	987c0703ad	Some default values come properly quoted from MariaDB now. Adjust the default value formating to check if the default value is already single-quoted and only add new 'single quotes' when it's not the case. Apparently ENUM default values in MariaDB 10 are now properly single quoted.	2017-09-14 15:39:04 +02:00
Dimitri Fontaine	dfac729daa	Refrain from querying the catalogs again. When we already have the information in the pgloader internal catalogs, don't issue another MySQL query. In this case, it's been used to fetch the list of columns and their data types so that we can choose to send either `colname` or maybe astext(`colname`) as `colname` for some geographic types. That's one less MySQL query per table.	2017-09-14 15:35:45 +02:00
Dimitri Fontaine	181f344159	Add support for current_timestamp() default spelling. That's new in MariaDB 10 apparently.	2017-09-14 15:33:18 +02:00
Dimitri Fontaine	f921658866	Remove useless noise in the logs. The individual CAST decisions are visible in the CREATE TABLE statements that are logged a moment later. Also, calling `format-create-sql' on a column definition that's not finished to be casted will process default values before they get normalized, and issue WARNING to the poor user. Not helpful. Bye.	2017-09-14 15:30:29 +02:00
Dimitri Fontaine	dbadab9e9e	Implement a new “snake_case” quoting rule. In passing, add the identifiers case option to SQLite support, which makes it easier to test here, and add a table named "TableName" to our local test database. Fix #631.	2017-09-13 22:55:10 +02:00
Dimitri Fontaine	d2d4be2ed0	Fix test/csv-guess.load for old PostgreSQL. In travis environment we still test with PostgreSQL 9.1 and 9.6, and there's no reason for this test to use a modern spelling of create schema, after all. It works because the test/csv-before-after.load creates the schema and is ran before test/csv-guess.load. That's good enough for now.	2017-09-09 00:59:39 +02:00
Dimitri Fontaine	38712d98e0	Fix regression testing. Previous patch made regression failures obvious that were hidden by strange bugs with CCL. One such regression was introduced in commit `ab7e77c2d0` where we played with the complex code generation for field projection, where the following two cases weren't cleanly processed anymore: column text using "constant" column text using "field-name" In the first case we want to load a user-defined constant in the column, in the second case we want to load the value of the field "field-name" in the column --- we just have different source and target names. Another regression was introduced in the recent commit `01e5c23763` where the create-table function was called too early, before we have fetched pgsql-reserved-keywords. As a consequence table names weren't always properly quoted as shown in the test/csv-header.load file which targets a table named "group". Finally, skip the test/dbf.load regression test when using CCL as this environment doesn't have the necessary CP850 code page / encoding.	2017-09-09 00:51:07 +02:00
Dimitri Fontaine	ebf9f7a6a9	Review and cleanup the logging monitor thread. Due to errors in regression testing when using CCL, review this part of pgloader. It turns out that cl-log:stop-messenger on a text-stream-messenger closes the stream, which isn't a good idea when given standard-output. At least it makes CCL chokes when it then wants to output something of its own, such as when running in --batch mode (which is nice because it outputs more diagnostic information). To solve that problem, initialize the text-stream-messenger with a broadcast stream made from standard-output, which we now may close at will.	2017-09-08 23:03:41 +02:00
Dimitri Fontaine	e7f6505d7d	Review compile time dependencies. The parser files don't depend on the sources, it's the other way round nowadays. Also, the responsability to decipher the sections context should be restricted to the monitor.lisp file, which is now the case. And this time, fix #628 for real.	2017-09-08 15:38:32 +02:00
Dimitri Fontaine	9be130cdbe	Fix symbol export hacks to execute at load time. It seems that when compiling with CCL in “batch” mode, that is using buildapp, the local symbol exporting facility didn't work at all. It needs to be run at load time so that the compiler sees the symbols. Fix #628.	2017-09-08 12:33:24 +02:00
Dimitri Fontaine	a9e8bfd4d7	Support for colon characters in PostgreSQL socket path. Google Cloud SQL instances are now using the following format for the name of their socket <PROJECT-ID>:<REGION>:<INSTANCE_NAME>. We do that by allowing to escape a colon in the socket directory name by doubling it, as in the username field. It also allows to accept any character in the socket directory name, which is a good cleanup. Fix #621.	2017-08-30 15:22:42 +02:00
Dimitri Fontaine	d5072d11e5	Implement support for a pgpass file. The implementation follows PostgreSQL specifications as closely as possible, with the escaping rules and the matching rules. The default path where to find the .pgpass (or pgpass.conf on windows) are as documented in PostgreSQL too. Only missing are the file permissions check. Fix #460.	2017-08-29 03:16:35 +02:00
Dimitri Fontaine	bcc934d7aa	Cleanup. Some code was pasted twice in src/api.lisp, and a defstruct with no slots isn't spelled the way I did in previous patches. We use a defstruct with no slots for defining a hierarchy on which to dispatch our pretty-print function.	2017-08-26 20:31:24 +02:00
Dimitri Fontaine	33ab9bcdd5	Typo Fix. oops.	2017-08-25 22:21:34 +02:00
Dimitri Fontaine	01e5c23763	Add support for explicit TARGET TABLE clause in load commands. It used to be that you would give the target table name as an option to the PostgreSQL connection string, which is untasteful: load ... into pgsql://user@host/dbname?tablename=foo.bar ... Or even, for backwards compatibility: load ... into pgsql://user@host/dbname?foo.bar ... The new syntax makes provision for a separate clause for the target table name, possibly schema-qualified: load ... into pgsql://user@host/dbname target table foo.bar ... Which is much better, in particular when used together with the target columns clause. Implementing this seemingly quite small feature had impact on many parsing related features of pgloader, such as the regression testing facility. So much so that some extra refactoring got into its way here, around the lisp-code-for-loading-from-<source> functions and their usage in `load-data'. While at it, this patch simplifies a lot the `load-data' function by making a good use of &allow-other-keys and :allow-other-keys t. Finally, this patch splits main.lisp into main.lisp and api.lisp, with the latter intended to contain functions for Common Lisp programs wanting to use pgloader as a library. The API itself is still the same as before this patch, tho. Just in another file for clarity.	2017-08-25 01:57:54 +02:00
Dimitri Fontaine	72c58306ba	Fix the previous fix. See #614. Again. Should be ok now.	2017-08-25 01:56:34 +02:00
Dimitri Fontaine	f20a5a0667	Fix schema name comparing with quoted schema names. In the previous commit we introduced support for database names including spaces, which means that by default pgloader creates a target schema in PostgreSQL with a space in its name. That works well as soon as you always double-quote the schema name, which pgloader does. Now, in our internal catalogs, we keep the schema name double-quoted. And when comparing that schema names with quotes to the raw schema name from PostgreSQL, they won't match, and pgloader tries to create the schema again: ERROR Database error 42P06: schema "my sql" already exists Fix the comparing to compare unquoted schema name, fix #614 again: the previous fix would only work the first time.	2017-08-25 01:47:49 +02:00
Dimitri Fontaine	9d4743f598	Allow database names to contain spaces. Then they must be quoted (single or double quotes accepted), of course. Fix #614.	2017-08-24 23:05:26 +02:00
Dimitri Fontaine	9263baeb49	Implement sslmode for MySQL connections. This allows to bypass SSL when you don't need it, like over localhost for instance. Takes the same syntax as the PostgreSQL sslmode connection string parameter.	2017-08-24 14:56:59 +02:00
Dimitri Fontaine	b685c8801d	Improve guessing of CSV parameters. In this commit we fail the guess faster, allowing to test for a much larger sample. The sample is still hard-coded, but this time to 1000 lines. Also add a test case, see #618.	2017-08-24 13:30:14 +02:00
Dimitri Fontaine	8004a9dd59	Improve report output with bytes information. Understanding the timings requires not only the number of rows copied into each table but also how many bytes that represent. We add that information now in tht output. The number of bytes presented is computed from the unicode representation we prepare in pgloader for each row before sending it down to PostgreSQL.	2017-08-24 12:45:51 +02:00
Dimitri Fontaine	3b93ffa37a	Rewrite the reporting support entirely. Use a generic function protocol in order to implement the human readable, verbose, csv, copy and json reporting output formats. This is much cleaner and extensible than the previous way. Use that new power to implement a real JSON output from the internal state object.	2017-08-24 12:33:51 +02:00
Dimitri Fontaine	4fcb24f448	Reintroduce manual Garbage Collect in SBCL. It seems that SBCL still needs some help in deciding when to GC with very large values. In a test case with a “data” column averaging 375kB (up to about 3 MB per datum), it allows much larger batch size and prefetch rows settings without entering lldb.	2017-08-23 16:27:14 +02:00
Dimitri Fontaine	4f9eb8c06b	Track bytes sent to PostgreSQL. The pgstate infrastructure already had lots of details about what's going on, add to it the information about how many bytes are sent in every batch, and use this information in the monitor when something long is happening to display how many rows we sent from the beginning for this (supposedly) huge table, along with bytes and speed (bytes per seconds).	2017-08-23 11:55:49 +02:00
Dimitri Fontaine	1f242cd29e	Fix comment support to schema qualify target tables.	2017-08-23 11:26:08 +02:00
Dimitri Fontaine	a849f893a6	Implement a base46-decode transformation function.	2017-08-21 17:06:06 +02:00
Dimitri Fontaine	c62f4279c0	Be more verbose with long-running loads. Add a message every 20 batches so that the user knows it's still going on. Also, in passing, fix some messages: present is not precise enough to decide if the log refers to an event that is being done or starting next.	2017-08-21 16:50:16 +02:00
Dimitri Fontaine	28db6b9f13	Desultory cleanup of a useless declaim.	2017-08-21 16:46:32 +02:00
Dimitri Fontaine	03a8d57a50	Review --verbose log message. The verbosity is not that easy to adjust. Remove useless messages and add a new one telling when the COPY of a table is done. As we might have to wait for some time for indexes being built. keep the CREATE INDEX lines. Also keep the ALTER TABLE both for primary keys and foreign keys, again because the user might have to wait for quite some time.	2017-08-21 15:27:13 +02:00
Dimitri Fontaine	f719d2976d	Implement a template system for pgloader commands. This feature has been asked several times, and I can't see any way to fix the GETENV parsing mess that we have. In this patch the GETENV support is retired and replaced with a templating system, using the Mustache syntax. To get back the GETENV feature, our implementation of the Mustache template system adds support for fetching the template variable values from the OS environment. Fixes #555, Fixes #609. See #500, #477, #278.	2017-08-16 01:33:11 +02:00
Dimitri Fontaine	e21ce09ad7	Implement support for MySQL linestring data type. This data type is now converted automatically to a PostgreSQL path data type, using the open path notation with square brackets: https://www.postgresql.org/docs/current/static/datatype-geometric.html#AEN7103 Fix #445.	2017-08-15 15:26:06 +02:00
Dimitri Fontaine	20a85055f4	Implement support for MS SQL set parameters. It is sometimes needed to tweak MS SQL server parameters, such as the textsize parameters which allows fetching the whole set of bytes of a text of binary column (not kidding). Now it's possible to add such a line in the load file: set mssql parameters textsize to '104857600' Fixes #603.	2017-08-12 23:43:22 +02:00
Dimitri Fontaine	30f359735c	Make it easier to test “main” code. This code path is exercised from the command line only, which means I don't get to run it that often. And it's a pain to debug. So make it easier to run `process-source-and-target` from the REPL.	2017-08-10 21:58:53 +02:00
Dimitri Fontaine	773dcaeca3	Fix a race condition in the monitor thread. Startup log messages could be lost because the monitor would be started but not ready to process messages. Fix that by “warming up” the monitoring thread, having it execute a small computation and more importantly wait for the result to be received back, blocking. See #599 where parsing errors from a wrong URL were missed in the command line output, quite disturbingly.	2017-08-10 21:51:55 +02:00
Dimitri Fontaine	370038a74e	Fix the PostgreSQL URL in the MySQL howto. See #599 again, wherein I missed that the URL error was not a copy-paste'o but rather an error in the documentation itself…	2017-08-10 21:49:51 +02:00
Dimitri Fontaine	952e7da191	Bug fix CREATE TYPE in schema (previous patch). The previous patch fixed CREATE TYPE so that ENUM types are created in the same schema than the table using them, but failed to update the DROP TYPE statements to also target this schema...	2017-08-10 21:19:25 +02:00
Dimitri Fontaine	073a5c1e37	Fix Ergast link in MySQL howto. See #599.	2017-08-10 20:58:24 +02:00
Dimitri Fontaine	5a65da2147	Create new types in the proper schema. Previously to this patch, pgloader wouldn't care about which schema it creates extra types in. Extra types are mainly ENUM and SET support from MySQL. Now, pgloader creates those extra PostgreSQL ENUM types in the same schema as the table using them, which is a more sound default.	2017-08-10 18:57:09 +02:00
Dimitri Fontaine	981b801ce7	Fix user defined rules to cast ENUM to Text. The MySQL enum are casted to PostgreSQL enum types just fine, but sometimes that's not what the user wants. In case when we have a CAST rule for an ENUM column, recognize the fact and respect user choice. Fixes #608.	2017-08-10 18:01:17 +02:00
Dimitri Fontaine	049a1199c2	Implement support for SQLite current_date default value. The spelling in SQLite for the default value is "current_date", instruct pgloader about that. This commit also adds a test case in our sqlite.db unit tests database. Fixes #607.	2017-08-08 21:55:15 +02:00
Luke Snape	ecd6a8e25c	Ignore nulls in varbinary-to-string transform (#606 )	2017-08-07 21:37:37 +02:00
Dimitri Fontaine	38a6b4968d	Improve bundle building. Now when building a bundle file for source distribution of pgloader, always test it by building a binary image from the bundle tarball in a test directory. Also make it easy to target "latest" Quicklisp distribution with the following spelling: make BUNDLEDIST=latest bundle	2017-08-01 19:20:15 +02:00
Dimitri Fontaine	72431d4708	Improve the Quicklist dist support for bundles. When distributing a pgloader bundle we're using the ql-dist facility. In recent commit we hand-picked the last known working distribution of quicklisp for pgloader. Make it easy to target "latest" known distribution or hard-code one from the Makefile or the bundle/ql.lisp file.	2017-08-01 18:48:20 +02:00
Dimitri Fontaine	5c1c4bf3ff	Fix MySQL Enum parsing. We use a CSV parser for the MySQL enum values, but the quote escaping wasn't properly setup: MySQL quotes ENUM values with a single-quote (') and uses two of them ('') for escaping single-quotes when found in the ENUM value itself. Fixes #597.	2017-08-01 18:40:27 +02:00
Dimitri Fontaine	3103b0dc72	Escape SQL identifiers in SQLite catalog queries. SQLite supports the backtick escaping for SQL identifiers and we'd rather use it. Fixes #600.	2017-07-31 23:11:29 +02:00
Dimitri Fontaine	d37ad27754	Handle empty tables in concurrency support for MySQL. When the table is empty we get nil for min and max values of the id column. In that case we don't compute a set of ranges and “cancel” concurrency support for the empty table. Fixes #596.	2017-07-18 13:35:01 +02:00
Dimitri Fontaine	b1fa3aec3c	Implement a separate switch to drop the schemas. The with option “include drop” used to also apply to schemas, which is not that useful and problematic when trying to DROP SCHEMA public, because you might not connect as the owner of that schema. Even if we don't target the public schema by default, users can choose to do so thanks to our ALTER SCHEMA ... RENAME TO ... command. Fixes #594.	2017-07-18 13:13:36 +02:00
Dimitri Fontaine	ae0c6ed119	Add support for preserving index names in SQLite. See #187.	2017-07-17 11:04:12 +02:00

... 3 4 5 6 7 ...

1488 Commits