The new ALTER TABLE facility allows to act on tables found in the MySQL
database before the migration happens. In this patch the only provided
actions are RENAME TO and SET SCHEMA, which fixes#224.
In order to be able to provide the same option for MS SQL users, we will
have to make it work at the SCHEMA level (ALTER SCHEMA ... RENAME TO
...) and modify the internal schema-struct so that the schema slot of
our table instances are a schema instance rather than its name.
Lacking MS SQL test database and instance, the facility is not yet
provided for that source type.
More than the syntax and API tweaks, this patch also make it so that a
multi-file specification (using e.g. ALL FILENAMES IN DIRECTORY) can be
loaded with several files in the group in parallel.
To that effect, tweak again the md-connection and md-copy
implementations.
Add the workers and concurrency settings to the LOAD commands for
database sources so that users can tweak them now, and add mentions of
them in the documentation too.
From the documentation string of the copy-from method as found in
src/sources/common/methods.lisp:
We allow WORKER-COUNT simultaneous workers to be active at the same time
in the context of this COPY object. A single unit of work consist of
several kinds of workers:
- a reader getting raw data from the COPY source with `map-rows',
- N transformers preparing raw data for PostgreSQL COPY protocol,
- N writers sending the data down to PostgreSQL.
The N here is setup to the CONCURRENCY parameter: with a CONCURRENCY of
2, we start (+ 1 2 2) = 5 concurrent tasks, with a CONCURRENCY of 4 we
start (+ 1 4 4) = 9 concurrent tasks, of which only WORKER-COUNT may be
active simultaneously.
Those options should find their way in the remaining sources, that's for
a follow-up patch tho.
Thanks to Common Lisp character data type, it's easy for pgloader to
enforce always speaking to PostgreSQL in utf-8, and that's what has been
done from the beginning actually.
Now, without good reason for that, the first example of a SET clause
that has been added to the docs where about how to set client_encoding,
which should NOT be done.
Fix that at the use level by removing the bad example from the docs and
adding a WARNING whenever the client_encoding is set to a known bad
value. It's a WARNING because we then simply force 'utf-8' anyway.
Also, review completely the format-vector-row function to avoid doing
double work with the Postmodern facilities we piggyback on. This was
done halfway through and the utf-8 conversion was actually done twice.
Filter the list of tables we migrate directly from the SQLite query,
avoiding to return useless data. To do that, use the LIKE pattern
matching supported by SQLite, where the REGEX operator is only available
when extra features are loaded apparently.
See #310 where filtering out the view still caused errors in the
loading.
The local-time:encode-timestamp function takes a default timezone and it
is necessary to have control over it when loading from pgloader. Hence,
add a timezone option to the IXF option list, that is now explicit and
local to the IXF parser rather than shared with the DBF option list.
It's now possible to use several files in a BEFORE LOAD EXECUTE section,
and to mix DO and EXECUTE parts, bringing lots of flexibility in the
commands. Also it actually simplifies the parser.
MySQL names its primary keys "PRIMARY" and we need to always uniquify
this name even when the used asked pgloader to preserve index names.
Also, the create-indexes-again function now needs to ask for index names
to be preserved specifically.
When loading against a table that already has index definitions, the
load can be quite slow. Previous commit introduced a warning in such a
case. This commit introduces the option "drop indexes" that is not used
by default.
When this option is used, pgloader drops the indexes before loading the
data then create the indexes again with the same definitions as before.
All the indexes are created again in parallel to optimize performances.
Only primary key indexes can't be created in parallel, so those are
created in two steps (create unique index then alter table).
Some CSV files are using the CSV escape character internally in their
fields. In that case we enter a parsing bug in cl-csv where backtracking
from parsing the escape string isn't possible (or at least
unimplemented).
To handle the case, change the quote parameter from \" to just \ and let
cl-csv use its escape-quote mechanism to decide if we're escaping only
separators or just any data.
See https://github.com/AccelerationNet/cl-csv/issues/17 where the escape
mode feature was introduced for pgloader issue #80 already.
As per PostgreSQL documentation on connection strings, allow overriding
of main URI components in the options parts, with a percent-encoded
syntax for parameters. It allows to bypass the main URI parser
limitations as seen in #199 (how to have a password start with a
colon?).
See:
http://www.postgresql.org/docs/9.3/interactive/libpq-connect.html#LIBPQ-CONNSTRING
To allow for importing JSON one-liners as-is in the database it can be
interesting to leverage the CSV parser in a compatible setup. That setup
requires being able to use any separator character as the escape
character.
Some CSV files are given with an header line containing the list of
their column names, use that when given the option "csv header".
Note that when both "skip header" and "csv header" options are used,
pgloader first skip as many required lines and then uses the next one as
the csv header.
Because of temporary failure to install the `ronn` documentation tool,
this patch only commits the changes to the source docs and omits to
update the man page (pgloader.1). A following patch is intended to be
pushed that fixed that.
See #236 which is using shell tricks to retrieve the field list from the
CSV file itself and motivated this patch to finally get written.
See test/parse/hans.goeuro.load for an example usage of the new option.
In passing, any error when creating indexes is now properly reported and
logged, which was missing previously. Oops.
This option is dangerous and allows to skip ALL triggers when loading
data against PostgreSQL. This includes foreign key constraints
definitions and will allow loading data out of order.
When using both the options "create no table" and "disable triggers" it
will be possible to load data into a schema prepared by your favorite
external tool, at the cost of not validating FK constraints. Use with
care.
Fix#167.
It's now possible to have pgloader print out its summary in one of
several formats: human-readable (default), csv, copy or json. The
choice of format is made depending on the extension of the summary
filename picked on the command line with the option --summary.
Also augment the documentation with examples of bare stdin reading and
of advantages of the unix pipes to stream even remove archived content
down to PostgreSQL.
In passing also allow --field to specify the whole field list, there's
no point in forcing the user to have as many --field switches on the
command line as they have columns in their data source file.
Make it so that the following command line usages are accepted when
using pgloader without a command file:
./build/bin/pgloader ./test/sqlite/sqlite.db postgresql:///pgloader
./build/bin/pgloader --set "search_path='sakila'" \
mysql://root@localhost/sakila \
postgresql:///sakila
./build/bin/pgloader --type csv \
--field id --field field \
--with truncate \
--with "fields terminated by ','" \
./test/data/matching-1.csv \
postgres:///pgloader?matching
It's now possible in most cases to just use command-line options, which
should make the entry bar to pgloader much lower.
In passing, refactor the *pgconn- dynamic bindings in favor of directly
using the connection property list straight from the connection string
parser, processing it when necessary. That allows to make it simple to
add an internal :use-ssl property.
This allows users to benefit from the same flexible machinery when using
SQLite as when using MySQL, and also allows to add some more default
cast rules too.
It's not possible to use a comma separator when using more than one
source field option at the same time, and for better readability the
options are to be found enclosed in squared brackets.
Also, it's now possible to spell out "from" and "for" keywords on the
source definitions, making it easier to read and maintain the load file,
as in this full example:
(
a from 0 for 10,
b from 10 for 8,
c from 18 for 8,
d from 26 for 17 [null if blanks, trim right whitespace]
)