Commit Graph

1488 Commits

Author SHA1 Message Date
Dimitri Fontaine
b7347a567c Add test cases for MySQL.
At the moment it's a very manual process, and it might get automated
someday. Meanwhile it's still useful to have.

See #569 for an issue that got a test case added.
2017-09-14 15:59:10 +02:00
Dimitri Fontaine
a498313074 Implement support for MySQL FULLTEXT indexes.
PostgreSQL btree indexes are limited in the size of the values they can
index: values must fit in an index page (8kB). So when porting a MySQL full
text index over full documents, we might get into an error like the
following:

  index row size 2872 exceeds maximum 2712 for index "idx_5199509_search"

To fix, query MySQL for the index type which is FULLTEXT rather than BTREE
in those cases, and port it over to a PostgreSQL Full Text index with an
hard-coded 'simple' configuration, such as the following test case:

  CREATE INDEX idx_75421_search ON mysql.fcm_batches USING gin(to_tsvector('simple', raw_payload));

Of course users might want to use a better configuration, including proper
dictionnary for the documents. When using PostgreSQL each document may have
its own configuration attached and yet they can all get indexed into the
same index, so that's a task for the application developpers, not for
pgloader.

In passing, fix the list-typenames-without-btree-support.sql query to return
separate entries for each index type rather than an {array,representation}
of the result, as Postmodern won't turn the PostgreSQL array into a Common
Lisp array by default. I'm kept wondering how it worked before.

Fix #569.
2017-09-14 15:40:34 +02:00
Dimitri Fontaine
987c0703ad Some default values come properly quoted from MariaDB now.
Adjust the default value formating to check if the default value is already
single-quoted and only add new 'single quotes' when it's not the case.

Apparently ENUM default values in MariaDB 10 are now properly single quoted.
2017-09-14 15:39:04 +02:00
Dimitri Fontaine
dfac729daa Refrain from querying the catalogs again.
When we already have the information in the pgloader internal catalogs,
don't issue another MySQL query. In this case, it's been used to fetch the
list of columns and their data types so that we can choose to send either
`colname` or maybe astext(`colname`) as `colname` for some geographic types.

That's one less MySQL query per table.
2017-09-14 15:35:45 +02:00
Dimitri Fontaine
181f344159 Add support for current_timestamp() default spelling.
That's new in MariaDB 10 apparently.
2017-09-14 15:33:18 +02:00
Dimitri Fontaine
f921658866 Remove useless noise in the logs.
The individual CAST decisions are visible in the CREATE TABLE statements
that are logged a moment later. Also, calling `format-create-sql' on a
column definition that's not finished to be casted will process default
values before they get normalized, and issue WARNING to the poor user.

Not helpful. Bye.
2017-09-14 15:30:29 +02:00
Dimitri Fontaine
dbadab9e9e Implement a new “snake_case” quoting rule.
In passing, add the identifiers case option to SQLite support, which makes
it easier to test here, and add a table named "TableName" to our local test
database.

Fix #631.
2017-09-13 22:55:10 +02:00
Dimitri Fontaine
d2d4be2ed0 Fix test/csv-guess.load for old PostgreSQL.
In travis environment we still test with PostgreSQL 9.1 and 9.6, and there's
no reason for this test to use a modern spelling of create schema, after
all.

It works because the test/csv-before-after.load creates the schema and is
ran before test/csv-guess.load. That's good enough for now.
2017-09-09 00:59:39 +02:00
Dimitri Fontaine
38712d98e0 Fix regression testing.
Previous patch made regression failures obvious that were hidden by strange
bugs with CCL.

One such regression was introduced in commit
ab7e77c2d0 where we played with the complex
code generation for field projection, where the following two cases weren't
cleanly processed anymore:

  column text using "constant"
  column text using "field-name"

In the first case we want to load a user-defined constant in the column, in
the second case we want to load the value of the field "field-name" in the
column --- we just have different source and target names.

Another regression was introduced in the recent commit
01e5c23763 where the create-table function was
called too early, before we have fetched *pgsql-reserved-keywords*. As a
consequence table names weren't always properly quoted as shown in the
test/csv-header.load file which targets a table named "group".

Finally, skip the test/dbf.load regression test when using CCL as this
environment doesn't have the necessary CP850 code page / encoding.
2017-09-09 00:51:07 +02:00
Dimitri Fontaine
ebf9f7a6a9 Review and cleanup the logging monitor thread.
Due to errors in regression testing when using CCL, review this part of
pgloader. It turns out that cl-log:stop-messenger on a text-stream-messenger
closes the stream, which isn't a good idea when given *standard-output*.

At least it makes CCL chokes when it then wants to output something of its
own, such as when running in --batch mode (which is nice because it outputs
more diagnostic information).

To solve that problem, initialize the text-stream-messenger with a broadcast
stream made from *standard-output*, which we now may close at will.
2017-09-08 23:03:41 +02:00
Dimitri Fontaine
e7f6505d7d Review compile time dependencies.
The parser files don't depend on the sources, it's the other way round
nowadays. Also, the responsability to decipher the *sections* context should
be restricted to the monitor.lisp file, which is now the case.

And this time, fix #628 for real.
2017-09-08 15:38:32 +02:00
Dimitri Fontaine
9be130cdbe Fix symbol export hacks to execute at load time.
It seems that when compiling with CCL in “batch” mode, that is using
buildapp, the local symbol exporting facility didn't work at all. It needs
to be run at load time so that the compiler sees the symbols.

Fix #628.
2017-09-08 12:33:24 +02:00
Dimitri Fontaine
a9e8bfd4d7 Support for colon characters in PostgreSQL socket path.
Google Cloud SQL instances are now using the following format for the name
of their socket <PROJECT-ID>:<REGION>:<INSTANCE_NAME>. We do that by
allowing to escape a colon in the socket directory name by doubling it, as
in the username field. It also allows to accept any character in the socket
directory name, which is a good cleanup.

Fix #621.
2017-08-30 15:22:42 +02:00
Dimitri Fontaine
d5072d11e5 Implement support for a pgpass file.
The implementation follows PostgreSQL specifications as closely as possible,
with the escaping rules and the matching rules. The default path where to
find the .pgpass (or pgpass.conf on windows) are as documented in PostgreSQL
too. Only missing are the file permissions check.

Fix #460.
2017-08-29 03:16:35 +02:00
Dimitri Fontaine
bcc934d7aa Cleanup.
Some code was pasted twice in src/api.lisp, and a defstruct with no slots
isn't spelled the way I did in previous patches. We use a defstruct with no
slots for defining a hierarchy on which to dispatch our pretty-print
function.
2017-08-26 20:31:24 +02:00
Dimitri Fontaine
33ab9bcdd5 Typo Fix. oops. 2017-08-25 22:21:34 +02:00
Dimitri Fontaine
01e5c23763 Add support for explicit TARGET TABLE clause in load commands.
It used to be that you would give the target table name as an option to the
PostgreSQL connection string, which is untasteful:

   load ... into pgsql://user@host/dbname?tablename=foo.bar ...

Or even, for backwards compatibility:

   load ... into pgsql://user@host/dbname?foo.bar ...

The new syntax makes provision for a separate clause for the target table
name, possibly schema-qualified:

   load ... into pgsql://user@host/dbname target table foo.bar ...

Which is much better, in particular when used together with the target
columns clause.

Implementing this seemingly quite small feature had impact on many parsing
related features of pgloader, such as the regression testing facility. So
much so that some extra refactoring got into its way here, around the
lisp-code-for-loading-from-<source> functions and their usage in
`load-data'.

While at it, this patch simplifies a lot the `load-data' function by making
a good use of &allow-other-keys and :allow-other-keys t.

Finally, this patch splits main.lisp into main.lisp and api.lisp, with the
latter intended to contain functions for Common Lisp programs wanting to use
pgloader as a library. The API itself is still the same as before this
patch, tho. Just in another file for clarity.
2017-08-25 01:57:54 +02:00
Dimitri Fontaine
72c58306ba Fix the previous fix.
See #614. Again. Should be ok now.
2017-08-25 01:56:34 +02:00
Dimitri Fontaine
f20a5a0667 Fix schema name comparing with quoted schema names.
In the previous commit we introduced support for database names including
spaces, which means that by default pgloader creates a target schema in
PostgreSQL with a space in its name. That works well as soon as you always
double-quote the schema name, which pgloader does.

Now, in our internal catalogs, we keep the schema name double-quoted. And
when comparing that schema names with quotes to the raw schema name from
PostgreSQL, they won't match, and pgloader tries to create the schema again:

  ERROR Database error 42P06: schema "my sql" already exists

Fix the comparing to compare unquoted schema name, fix #614 again: the
previous fix would only work the first time.
2017-08-25 01:47:49 +02:00
Dimitri Fontaine
9d4743f598 Allow database names to contain spaces.
Then they must be quoted (single or double quotes accepted), of course.

Fix #614.
2017-08-24 23:05:26 +02:00
Dimitri Fontaine
9263baeb49 Implement sslmode for MySQL connections.
This allows to bypass SSL when you don't need it, like over localhost for
instance. Takes the same syntax as the PostgreSQL sslmode connection string
parameter.
2017-08-24 14:56:59 +02:00
Dimitri Fontaine
b685c8801d Improve guessing of CSV parameters.
In this commit we fail the guess faster, allowing to test for a much larger
sample. The sample is still hard-coded, but this time to 1000 lines.

Also add a test case, see #618.
2017-08-24 13:30:14 +02:00
Dimitri Fontaine
8004a9dd59 Improve report output with bytes information.
Understanding the timings requires not only the number of rows copied into
each table but also how many bytes that represent. We add that information
now in tht output.

The number of bytes presented is computed from the unicode representation we
prepare in pgloader for each row before sending it down to PostgreSQL.
2017-08-24 12:45:51 +02:00
Dimitri Fontaine
3b93ffa37a Rewrite the reporting support entirely.
Use a generic function protocol in order to implement the human readable,
verbose, csv, copy and json reporting output formats. This is much cleaner
and extensible than the previous way.

Use that new power to implement a real JSON output from the internal state
object.
2017-08-24 12:33:51 +02:00
Dimitri Fontaine
4fcb24f448 Reintroduce manual Garbage Collect in SBCL.
It seems that SBCL still needs some help in deciding when to GC with very
large values. In a test case with a “data” column averaging 375kB (up to
about 3 MB per datum), it allows much larger batch size and prefetch rows
settings without entering lldb.
2017-08-23 16:27:14 +02:00
Dimitri Fontaine
4f9eb8c06b Track bytes sent to PostgreSQL.
The pgstate infrastructure already had lots of details about what's going
on, add to it the information about how many bytes are sent in every batch,
and use this information in the monitor when something long is happening to
display how many rows we sent from the beginning for this (supposedly) huge
table, along with bytes and speed (bytes per seconds).
2017-08-23 11:55:49 +02:00
Dimitri Fontaine
1f242cd29e Fix comment support to schema qualify target tables. 2017-08-23 11:26:08 +02:00
Dimitri Fontaine
a849f893a6 Implement a base46-decode transformation function. 2017-08-21 17:06:06 +02:00
Dimitri Fontaine
c62f4279c0 Be more verbose with long-running loads.
Add a message every 20 batches so that the user knows it's still going on.
Also, in passing, fix some messages: present is not precise enough to decide
if the log refers to an event that is being done or starting next.
2017-08-21 16:50:16 +02:00
Dimitri Fontaine
28db6b9f13 Desultory cleanup of a useless declaim. 2017-08-21 16:46:32 +02:00
Dimitri Fontaine
03a8d57a50 Review --verbose log message.
The verbosity is not that easy to adjust. Remove useless messages and add a
new one telling when the COPY of a table is done. As we might have to wait
for some time for indexes being built. keep the CREATE INDEX lines. Also
keep the ALTER TABLE both for primary keys and foreign keys, again because
the user might have to wait for quite some time.
2017-08-21 15:27:13 +02:00
Dimitri Fontaine
f719d2976d Implement a template system for pgloader commands.
This feature has been asked several times, and I can't see any way to fix
the GETENV parsing mess that we have. In this patch the GETENV support is
retired and replaced with a templating system, using the Mustache syntax.

To get back the GETENV feature, our implementation of the Mustache template
system adds support for fetching the template variable values from the OS
environment.

Fixes #555, Fixes #609.
See #500, #477, #278.
2017-08-16 01:33:11 +02:00
Dimitri Fontaine
e21ce09ad7 Implement support for MySQL linestring data type.
This data type is now converted automatically to a PostgreSQL path data
type, using the open path notation with square brackets:

  https://www.postgresql.org/docs/current/static/datatype-geometric.html#AEN7103

Fix #445.
2017-08-15 15:26:06 +02:00
Dimitri Fontaine
20a85055f4 Implement support for MS SQL set parameters.
It is sometimes needed to tweak MS SQL server parameters, such as the
textsize parameters which allows fetching the whole set of bytes of a text
of binary column (not kidding).

Now it's possible to add such a line in the load file:

  set mssql parameters textsize to '104857600'

Fixes #603.
2017-08-12 23:43:22 +02:00
Dimitri Fontaine
30f359735c Make it easier to test “main” code.
This code path is exercised from the command line only, which means I don't
get to run it that often. And it's a pain to debug. So make it easier to run
`process-source-and-target` from the REPL.
2017-08-10 21:58:53 +02:00
Dimitri Fontaine
773dcaeca3 Fix a race condition in the monitor thread.
Startup log messages could be lost because the monitor would be started but
not ready to process messages. Fix that by “warming up” the monitoring
thread, having it execute a small computation and more importantly wait for
the result to be received back, blocking.

See #599 where parsing errors from a wrong URL were missed in the command
line output, quite disturbingly.
2017-08-10 21:51:55 +02:00
Dimitri Fontaine
370038a74e Fix the PostgreSQL URL in the MySQL howto.
See #599 again, wherein I missed that the URL error was not a copy-paste'o
but rather an error in the documentation itself…
2017-08-10 21:49:51 +02:00
Dimitri Fontaine
952e7da191 Bug fix CREATE TYPE in schema (previous patch).
The previous patch fixed CREATE TYPE so that ENUM types are created in the
same schema than the table using them, but failed to update the DROP TYPE
statements to also target this schema...
2017-08-10 21:19:25 +02:00
Dimitri Fontaine
073a5c1e37 Fix Ergast link in MySQL howto.
See #599.
2017-08-10 20:58:24 +02:00
Dimitri Fontaine
5a65da2147 Create new types in the proper schema.
Previously to this patch, pgloader wouldn't care about which schema it
creates extra types in. Extra types are mainly ENUM and SET support from
MySQL. Now, pgloader creates those extra PostgreSQL ENUM types in the same
schema as the table using them, which is a more sound default.
2017-08-10 18:57:09 +02:00
Dimitri Fontaine
981b801ce7 Fix user defined rules to cast ENUM to Text.
The MySQL enum are casted to PostgreSQL enum types just fine, but sometimes
that's not what the user wants. In case when we have a CAST rule for an ENUM
column, recognize the fact and respect user choice.

Fixes #608.
2017-08-10 18:01:17 +02:00
Dimitri Fontaine
049a1199c2 Implement support for SQLite current_date default value.
The spelling in SQLite for the default value is "current_date", instruct
pgloader about that. This commit also adds a test case in our sqlite.db
unit tests database.

Fixes #607.
2017-08-08 21:55:15 +02:00
Luke Snape
ecd6a8e25c Ignore nulls in varbinary-to-string transform (#606) 2017-08-07 21:37:37 +02:00
Dimitri Fontaine
38a6b4968d Improve bundle building.
Now when building a bundle file for source distribution of pgloader, always
test it by building a binary image from the bundle tarball in a test
directory. Also make it easy to target "latest" Quicklisp distribution with
the following spelling:

    make BUNDLEDIST=latest bundle
2017-08-01 19:20:15 +02:00
Dimitri Fontaine
72431d4708 Improve the Quicklist dist support for bundles.
When distributing a pgloader bundle we're using the ql-dist facility. In
recent commit we hand-picked the last known working distribution of
quicklisp for pgloader. Make it easy to target "latest" known distribution
or hard-code one from the Makefile or the bundle/ql.lisp file.
2017-08-01 18:48:20 +02:00
Dimitri Fontaine
5c1c4bf3ff Fix MySQL Enum parsing.
We use a CSV parser for the MySQL enum values, but the quote escaping wasn't
properly setup: MySQL quotes ENUM values with a single-quote (') and uses
two of them ('') for escaping single-quotes when found in the ENUM value
itself.

Fixes #597.
2017-08-01 18:40:27 +02:00
Dimitri Fontaine
3103b0dc72 Escape SQL identifiers in SQLite catalog queries.
SQLite supports the backtick escaping for SQL identifiers and we'd rather
use it. Fixes #600.
2017-07-31 23:11:29 +02:00
Dimitri Fontaine
d37ad27754 Handle empty tables in concurrency support for MySQL.
When the table is empty we get nil for min and max values of the id column.
In that case we don't compute a set of ranges and “cancel” concurrency
support for the empty table.

Fixes #596.
2017-07-18 13:35:01 +02:00
Dimitri Fontaine
b1fa3aec3c Implement a separate switch to drop the schemas.
The with option “include drop” used to also apply to schemas, which is not
that useful and problematic when trying to DROP SCHEMA public, because you
might not connect as the owner of that schema.

Even if we don't target the public schema by default, users can choose to do
so thanks to our ALTER SCHEMA ... RENAME TO ... command.

Fixes #594.
2017-07-18 13:13:36 +02:00
Dimitri Fontaine
ae0c6ed119 Add support for preserving index names in SQLite.
See #187.
2017-07-17 11:04:12 +02:00