Commit Graph

1568 Commits

Author SHA1 Message Date
Dimitri Fontaine
9263baeb49 Implement sslmode for MySQL connections.
This allows to bypass SSL when you don't need it, like over localhost for
instance. Takes the same syntax as the PostgreSQL sslmode connection string
parameter.
2017-08-24 14:56:59 +02:00
Dimitri Fontaine
b685c8801d Improve guessing of CSV parameters.
In this commit we fail the guess faster, allowing to test for a much larger
sample. The sample is still hard-coded, but this time to 1000 lines.

Also add a test case, see #618.
2017-08-24 13:30:14 +02:00
Dimitri Fontaine
8004a9dd59 Improve report output with bytes information.
Understanding the timings requires not only the number of rows copied into
each table but also how many bytes that represent. We add that information
now in tht output.

The number of bytes presented is computed from the unicode representation we
prepare in pgloader for each row before sending it down to PostgreSQL.
2017-08-24 12:45:51 +02:00
Dimitri Fontaine
3b93ffa37a Rewrite the reporting support entirely.
Use a generic function protocol in order to implement the human readable,
verbose, csv, copy and json reporting output formats. This is much cleaner
and extensible than the previous way.

Use that new power to implement a real JSON output from the internal state
object.
2017-08-24 12:33:51 +02:00
Dimitri Fontaine
4fcb24f448 Reintroduce manual Garbage Collect in SBCL.
It seems that SBCL still needs some help in deciding when to GC with very
large values. In a test case with a “data” column averaging 375kB (up to
about 3 MB per datum), it allows much larger batch size and prefetch rows
settings without entering lldb.
2017-08-23 16:27:14 +02:00
Dimitri Fontaine
4f9eb8c06b Track bytes sent to PostgreSQL.
The pgstate infrastructure already had lots of details about what's going
on, add to it the information about how many bytes are sent in every batch,
and use this information in the monitor when something long is happening to
display how many rows we sent from the beginning for this (supposedly) huge
table, along with bytes and speed (bytes per seconds).
2017-08-23 11:55:49 +02:00
Dimitri Fontaine
1f242cd29e Fix comment support to schema qualify target tables. 2017-08-23 11:26:08 +02:00
Dimitri Fontaine
a849f893a6 Implement a base46-decode transformation function. 2017-08-21 17:06:06 +02:00
Dimitri Fontaine
c62f4279c0 Be more verbose with long-running loads.
Add a message every 20 batches so that the user knows it's still going on.
Also, in passing, fix some messages: present is not precise enough to decide
if the log refers to an event that is being done or starting next.
2017-08-21 16:50:16 +02:00
Dimitri Fontaine
28db6b9f13 Desultory cleanup of a useless declaim. 2017-08-21 16:46:32 +02:00
Dimitri Fontaine
03a8d57a50 Review --verbose log message.
The verbosity is not that easy to adjust. Remove useless messages and add a
new one telling when the COPY of a table is done. As we might have to wait
for some time for indexes being built. keep the CREATE INDEX lines. Also
keep the ALTER TABLE both for primary keys and foreign keys, again because
the user might have to wait for quite some time.
2017-08-21 15:27:13 +02:00
Dimitri Fontaine
f719d2976d Implement a template system for pgloader commands.
This feature has been asked several times, and I can't see any way to fix
the GETENV parsing mess that we have. In this patch the GETENV support is
retired and replaced with a templating system, using the Mustache syntax.

To get back the GETENV feature, our implementation of the Mustache template
system adds support for fetching the template variable values from the OS
environment.

Fixes #555, Fixes #609.
See #500, #477, #278.
2017-08-16 01:33:11 +02:00
Dimitri Fontaine
e21ce09ad7 Implement support for MySQL linestring data type.
This data type is now converted automatically to a PostgreSQL path data
type, using the open path notation with square brackets:

  https://www.postgresql.org/docs/current/static/datatype-geometric.html#AEN7103

Fix #445.
2017-08-15 15:26:06 +02:00
Dimitri Fontaine
20a85055f4 Implement support for MS SQL set parameters.
It is sometimes needed to tweak MS SQL server parameters, such as the
textsize parameters which allows fetching the whole set of bytes of a text
of binary column (not kidding).

Now it's possible to add such a line in the load file:

  set mssql parameters textsize to '104857600'

Fixes #603.
2017-08-12 23:43:22 +02:00
Dimitri Fontaine
30f359735c Make it easier to test “main” code.
This code path is exercised from the command line only, which means I don't
get to run it that often. And it's a pain to debug. So make it easier to run
`process-source-and-target` from the REPL.
2017-08-10 21:58:53 +02:00
Dimitri Fontaine
773dcaeca3 Fix a race condition in the monitor thread.
Startup log messages could be lost because the monitor would be started but
not ready to process messages. Fix that by “warming up” the monitoring
thread, having it execute a small computation and more importantly wait for
the result to be received back, blocking.

See #599 where parsing errors from a wrong URL were missed in the command
line output, quite disturbingly.
2017-08-10 21:51:55 +02:00
Dimitri Fontaine
370038a74e Fix the PostgreSQL URL in the MySQL howto.
See #599 again, wherein I missed that the URL error was not a copy-paste'o
but rather an error in the documentation itself…
2017-08-10 21:49:51 +02:00
Dimitri Fontaine
952e7da191 Bug fix CREATE TYPE in schema (previous patch).
The previous patch fixed CREATE TYPE so that ENUM types are created in the
same schema than the table using them, but failed to update the DROP TYPE
statements to also target this schema...
2017-08-10 21:19:25 +02:00
Dimitri Fontaine
073a5c1e37 Fix Ergast link in MySQL howto.
See #599.
2017-08-10 20:58:24 +02:00
Dimitri Fontaine
5a65da2147 Create new types in the proper schema.
Previously to this patch, pgloader wouldn't care about which schema it
creates extra types in. Extra types are mainly ENUM and SET support from
MySQL. Now, pgloader creates those extra PostgreSQL ENUM types in the same
schema as the table using them, which is a more sound default.
2017-08-10 18:57:09 +02:00
Dimitri Fontaine
981b801ce7 Fix user defined rules to cast ENUM to Text.
The MySQL enum are casted to PostgreSQL enum types just fine, but sometimes
that's not what the user wants. In case when we have a CAST rule for an ENUM
column, recognize the fact and respect user choice.

Fixes #608.
2017-08-10 18:01:17 +02:00
Dimitri Fontaine
049a1199c2 Implement support for SQLite current_date default value.
The spelling in SQLite for the default value is "current_date", instruct
pgloader about that. This commit also adds a test case in our sqlite.db
unit tests database.

Fixes #607.
2017-08-08 21:55:15 +02:00
Luke Snape
ecd6a8e25c Ignore nulls in varbinary-to-string transform (#606) 2017-08-07 21:37:37 +02:00
Dimitri Fontaine
38a6b4968d Improve bundle building.
Now when building a bundle file for source distribution of pgloader, always
test it by building a binary image from the bundle tarball in a test
directory. Also make it easy to target "latest" Quicklisp distribution with
the following spelling:

    make BUNDLEDIST=latest bundle
2017-08-01 19:20:15 +02:00
Dimitri Fontaine
72431d4708 Improve the Quicklist dist support for bundles.
When distributing a pgloader bundle we're using the ql-dist facility. In
recent commit we hand-picked the last known working distribution of
quicklisp for pgloader. Make it easy to target "latest" known distribution
or hard-code one from the Makefile or the bundle/ql.lisp file.
2017-08-01 18:48:20 +02:00
Dimitri Fontaine
5c1c4bf3ff Fix MySQL Enum parsing.
We use a CSV parser for the MySQL enum values, but the quote escaping wasn't
properly setup: MySQL quotes ENUM values with a single-quote (') and uses
two of them ('') for escaping single-quotes when found in the ENUM value
itself.

Fixes #597.
2017-08-01 18:40:27 +02:00
Dimitri Fontaine
3103b0dc72 Escape SQL identifiers in SQLite catalog queries.
SQLite supports the backtick escaping for SQL identifiers and we'd rather
use it. Fixes #600.
2017-07-31 23:11:29 +02:00
Dimitri Fontaine
d37ad27754 Handle empty tables in concurrency support for MySQL.
When the table is empty we get nil for min and max values of the id column.
In that case we don't compute a set of ranges and “cancel” concurrency
support for the empty table.

Fixes #596.
2017-07-18 13:35:01 +02:00
Dimitri Fontaine
b1fa3aec3c Implement a separate switch to drop the schemas.
The with option “include drop” used to also apply to schemas, which is not
that useful and problematic when trying to DROP SCHEMA public, because you
might not connect as the owner of that schema.

Even if we don't target the public schema by default, users can choose to do
so thanks to our ALTER SCHEMA ... RENAME TO ... command.

Fixes #594.
2017-07-18 13:13:36 +02:00
Dimitri Fontaine
ae0c6ed119 Add support for preserving index names in SQLite.
See #187.
2017-07-17 11:04:12 +02:00
Dimitri Fontaine
cf6182fafa Add a notice message with guessed parameters.
We might have to help users debug our decision, and I expect we will have to
improve our guess “engine” here.
2017-07-07 02:34:23 +02:00
Dimitri Fontaine
471f2b6d88 Implement automagic guessing of CSV parameters.
As we know how many columns we expect from the input file, it's possible to
read a sample (10 lines as of this patch) and try many different CSV reader
parameters combinations until we find one that works: it returns the right
number of fields.

It is still possible of course to specify parameters on the command line or
in a load file if necessary, but it makes the simple case even simpler. As
simple as:

  pgloader file.csv pgsql:///pgloader?tablename=target
2017-07-07 02:16:53 +02:00
Dimitri Fontaine
14e1830b77 Fix CLI insistance of --field.
From a load file, as soon as pgloader can retrieve the schema of the target
table the source field list defaults to the target column list. Let's apply
the same rules to the command line.
2017-07-07 01:00:55 +02:00
Dimitri Fontaine
154c74f85e Update online docs with new release.
The docs/ directory goes to http://pgloader.io.
2017-07-06 17:07:55 +02:00
Dimitri Fontaine
64959595fc Back to development release in the master's branch. 2017-07-06 16:55:56 +02:00
Dimitri Fontaine
d71da6ba66 Release pgloader 3.4.1 2017-07-06 16:53:29 +02:00
Adrian Vondendriesch
058f9d5451 Debian (#578)
* debian: Bump compat version to 9.

* debian: Bump Standards-Version to 3.9.8
2017-07-06 15:38:14 +02:00
Dimitri Fontaine
7a371529be Implement "drop indexes" option for MySQL and MSSQL too.
It was only offered for SQLite without good reason really, and tests show
that it works as well with MySQL of course. Offer the option there too.

See 3eab88b144 for details.
2017-07-06 10:06:03 +02:00
Dimitri Fontaine
2363d8845f Fix create schema handling in data only scenarios.
In b301aa9394 the "create schema" default
changed to true, which is a good idea. As a consequence pgloader should
consider this operation only when "create tables" is set: we don't want to
start with creating target schemas in a target database that is said to be
ready to host the data.
2017-07-06 09:48:03 +02:00
Dimitri Fontaine
dfe5c38185 Fix quoting policy in PostgreSQL ddl formating.
We already have apply-identifier-case and *identifier-case* to decide how
and when to quote our SQL object names, so don't force extra quotes in
format string: refrain from using ~s.
2017-07-06 09:47:48 +02:00
Dimitri Fontaine
9da012ca51 Fix identifiers quoting when reading PostgreSQL catalogs.
We sure can trust PostgreSQL to use names it knows how to handle. Still, it
will be happy to store in its catalogs names containing upper case, and in
that case we must quote them.
2017-07-06 03:16:06 +02:00
Dimitri Fontaine
e87477ed31 Restrict condition handling to relevant conditions.
In md-methods copy-database function, don't pretend we are able to handle
any condition when preparing the PostgreSQL schema, database-error is all we
are dealing with there really.
2017-07-06 03:16:05 +02:00
Dimitri Fontaine
d3d40cd47d Have git ignore local desktop files. 2017-07-06 03:16:05 +02:00
Dimitri Fontaine
e37cb3a9e7 Split SQL queries into their own files.
This change was long overdue. Ideally we would use something like the YeSQL
library for Clojure, but it seems like the cl-yesql equivalent is not ready
yet, and it depends on an experimental build system...

So this patch introduces an URL abstraction built on-top of a hash table.
You can then reference src/pgsql/sql/list-all-columns.sql as

  (sql "pgsql/list-all-columns.sql")

in the source code directly.

So for now the templating system is CL's format language. It is still an
improvement from embedded string. Again, one step at a time.
2017-07-06 03:16:05 +02:00
Dimitri Fontaine
d50ed64635 Defensive programming, after though.
It might be that a column-type-name is actually an sqltype instance, and
then #'string= won't be happy. Prevent that now with discarding any smarts
when the type name does not satisfies stringp.
2017-07-06 00:59:36 +02:00
Dimitri Fontaine
26d372bca3 Implement support for non-btree indexes (e.g. MySQL spatial keys).
When pgloader fetches the index list from a source database, it doesn't
fetch information about access methods for the indexes: I don't even know if
the overlap in between index access methods from one RDMBS to another covers
more than just btree...

It could happen that MySQL indexes a "geometry" column tho. This datatype is
converted automatically to "point" by pgloader, which is good. But the index
creation would fail with the following error message:

  Database error 42704: data type point has no default operator class for access method "btree"

In this patch when setting up the target schema we issue a PostgreSQL
catalog query to dynamically list those datatypes without btree support and
fetch their opclasses, with an hard-coded preference to GiST, then GIN, so
as to be able to automatically use the proper access method when btree isn't
available. And now pgloader transparently issues the proper statement:

  CREATE INDEX idx_168468_idx_location ON pagila.address USING gist(location);

Currently this exploration is limited to indexes with a single column. To
implement the general case we would need a more complex lookup: we would
have to find the intersection of all the supported access methods for all
involved columns.

Of course we might need to do that someday. One step at a time is plenty
good enough tho.
2017-07-06 00:42:43 +02:00
Dimitri Fontaine
8405c331a9 Error handling improvements for PostgreSQL schema.
In the complete PostgreSQL schema step, an error would be logged as you
expect but poorly handled: it would have the whole transaction rolled back,
meaning that a single Primary Key definition failure would cancel all the
others, plus the foreign keys, and also the triggers and comments.

It happens that other systems allow a primary column to contain NULL values,
which is forbidden in the standard and enforced by PostgreSQL, so that's not
a theoritical concern here.
2017-07-05 17:53:33 +02:00
Dimitri Fontaine
bae40d40c3 Fix identifier quoting corner cases.
In cases when pgloader needs to build a new identifer from existing
ones (mainly for renaming indexes, because they are unique per-table in the
source database and unique per-schema in PostgreSQL), and we compose the new
name from already quoted strings, pgloader was doing the wrong thing.

Fix that by having a build-identifier function that may unquote parts then
re-quote properly (if needed) the new identifier.
2017-07-05 15:37:21 +02:00
Dimitri Fontaine
f6cb428c6d Check empty strings in DB3 numeric fields.
Another blind attempt at fixing pgloader from a bug report on gitter, see
2017-07-04 23:15:47 +02:00
Dimitri Fontaine
652e435843 Only catch thread errors in pgloader-image.
In the REPL we're going to have all errors pop in the interactive debugger,
and that should be what we want...
2017-07-04 01:55:27 +02:00