Filter the list of tables we migrate directly from the SQLite query,
avoiding to return useless data. To do that, use the LIKE pattern
matching supported by SQLite, where the REGEX operator is only available
when extra features are loaded apparently.
See #310 where filtering out the view still caused errors in the
loading.
The local-time:encode-timestamp function takes a default timezone and it
is necessary to have control over it when loading from pgloader. Hence,
add a timezone option to the IXF option list, that is now explicit and
local to the IXF parser rather than shared with the DBF option list.
It's now possible to use several files in a BEFORE LOAD EXECUTE section,
and to mix DO and EXECUTE parts, bringing lots of flexibility in the
commands. Also it actually simplifies the parser.
MySQL names its primary keys "PRIMARY" and we need to always uniquify
this name even when the used asked pgloader to preserve index names.
Also, the create-indexes-again function now needs to ask for index names
to be preserved specifically.
When loading against a table that already has index definitions, the
load can be quite slow. Previous commit introduced a warning in such a
case. This commit introduces the option "drop indexes" that is not used
by default.
When this option is used, pgloader drops the indexes before loading the
data then create the indexes again with the same definitions as before.
All the indexes are created again in parallel to optimize performances.
Only primary key indexes can't be created in parallel, so those are
created in two steps (create unique index then alter table).
Some CSV files are using the CSV escape character internally in their
fields. In that case we enter a parsing bug in cl-csv where backtracking
from parsing the escape string isn't possible (or at least
unimplemented).
To handle the case, change the quote parameter from \" to just \ and let
cl-csv use its escape-quote mechanism to decide if we're escaping only
separators or just any data.
See https://github.com/AccelerationNet/cl-csv/issues/17 where the escape
mode feature was introduced for pgloader issue #80 already.
As per PostgreSQL documentation on connection strings, allow overriding
of main URI components in the options parts, with a percent-encoded
syntax for parameters. It allows to bypass the main URI parser
limitations as seen in #199 (how to have a password start with a
colon?).
See:
http://www.postgresql.org/docs/9.3/interactive/libpq-connect.html#LIBPQ-CONNSTRING
To allow for importing JSON one-liners as-is in the database it can be
interesting to leverage the CSV parser in a compatible setup. That setup
requires being able to use any separator character as the escape
character.
Some CSV files are given with an header line containing the list of
their column names, use that when given the option "csv header".
Note that when both "skip header" and "csv header" options are used,
pgloader first skip as many required lines and then uses the next one as
the csv header.
Because of temporary failure to install the `ronn` documentation tool,
this patch only commits the changes to the source docs and omits to
update the man page (pgloader.1). A following patch is intended to be
pushed that fixed that.
See #236 which is using shell tricks to retrieve the field list from the
CSV file itself and motivated this patch to finally get written.
See test/parse/hans.goeuro.load for an example usage of the new option.
In passing, any error when creating indexes is now properly reported and
logged, which was missing previously. Oops.
This option is dangerous and allows to skip ALL triggers when loading
data against PostgreSQL. This includes foreign key constraints
definitions and will allow loading data out of order.
When using both the options "create no table" and "disable triggers" it
will be possible to load data into a schema prepared by your favorite
external tool, at the cost of not validating FK constraints. Use with
care.
Fix#167.
It's now possible to have pgloader print out its summary in one of
several formats: human-readable (default), csv, copy or json. The
choice of format is made depending on the extension of the summary
filename picked on the command line with the option --summary.
Also augment the documentation with examples of bare stdin reading and
of advantages of the unix pipes to stream even remove archived content
down to PostgreSQL.
In passing also allow --field to specify the whole field list, there's
no point in forcing the user to have as many --field switches on the
command line as they have columns in their data source file.
Make it so that the following command line usages are accepted when
using pgloader without a command file:
./build/bin/pgloader ./test/sqlite/sqlite.db postgresql:///pgloader
./build/bin/pgloader --set "search_path='sakila'" \
mysql://root@localhost/sakila \
postgresql:///sakila
./build/bin/pgloader --type csv \
--field id --field field \
--with truncate \
--with "fields terminated by ','" \
./test/data/matching-1.csv \
postgres:///pgloader?matching
It's now possible in most cases to just use command-line options, which
should make the entry bar to pgloader much lower.
In passing, refactor the *pgconn- dynamic bindings in favor of directly
using the connection property list straight from the connection string
parser, processing it when necessary. That allows to make it simple to
add an internal :use-ssl property.
This allows users to benefit from the same flexible machinery when using
SQLite as when using MySQL, and also allows to add some more default
cast rules too.
It's not possible to use a comma separator when using more than one
source field option at the same time, and for better readability the
options are to be found enclosed in squared brackets.
Also, it's now possible to spell out "from" and "for" keywords on the
source definitions, making it easier to read and maintain the load file,
as in this full example:
(
a from 0 for 10,
b from 10 for 8,
c from 18 for 8,
d from 26 for 17 [null if blanks, trim right whitespace]
)
As seen in #116, it might be better for the users to be able to ask for
field trimming right in the source definition, like we do for processing
nulls.
To avoid wasting everybody's time when trying to debug --load
command.load, rename the option to be more explicit about what it does.
Also implement some basic guards in the form of testing that the
filename extension is part of a very short whitelist: .lisp, .cl, .lsp
and .asd.
It might be important to be able to use the exact same pgloader commands
file but adapt its source and target depending on the environment where
the command is to be run (production, development, staging, etc).
Introduce the new sub-clause GETENV 'variable-name' to that effect.
The regression test facility that we have now isn't nearly sophisticated
enough to support this, so the feature isn't yet covered.