Compare commits

...

757 Commits

Author SHA1 Message Date
Svante von Erichsen
d9ca38eacf
Merge pull request #1674 from fossum/feature/fix-dockerfile-casing
Eliminate docker build casing warning
2025-06-04 09:52:57 +02:00
Eric Fossum
b76be4450c Eliminate docker build casing warning 2025-06-03 08:41:13 -07:00
Svante von Erichsen
096992acbc fix IPv6 hostname parsing (#1004) 2025-05-25 01:26:22 +02:00
Arunprasad Rajkumar
70f3557670
Use latest cosign action to fix signing issue on docker publish (#1613)
Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
2024-09-18 14:26:47 +02:00
Arunprasad Rajkumar
edc1a4fde9
Install CA certificates on docker image (#1612)
Currently, we install ca-certificates package only on the builder, not
on the docker image which is distributed to the user. Without CA
certificates, we see errors like below,
```
2024-09-18T05:30:46.112001Z ERROR Connecting to PostgreSQL <host name>: SSL verify error: 20 X509_V_ERR_UNABLE_TO_GET_ISSUER_CERT_LOCALLY
DB-CONNECTION-ERROR: Failed to connect to pgsql at "<host name>" (port 30025) as user "tsdbadmin": SSL verify error: 20 X509_V_ERR_UNABLE_TO_GET_ISSUER_CERT_LOCALLY
An unhandled error condition has been signalled:
   Failed to connect to pgsql at "<host name>" (port 30025) as user "tsdbadmin": SSL verify error: 20 X509_V_ERR_UNABLE_TO_GET_ISSUER_CERT_LOCALLY
```

Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
2024-09-18 11:37:57 +02:00
Christoph Berg
29afa9de05 Merge branch 'debian' 2024-06-11 17:01:37 +00:00
Christoph Berg
44f04aff78 Bump pgloader minor version and add checks
The version number in `pgloader --version` was missed in the last few
releases.
2024-06-11 16:59:21 +00:00
Christoph Berg
f0409e549d Limit architectures to those that have sbcl available and working thread support (notably, this excludes armel and armhf). 2024-03-22 15:01:36 +01:00
Svante von Erichsen
2079646c81
Merge pull request #1500 from jarinom/fix-dockerfile-debian-bookworm
Fix Dockerfile build after new Debian release
2023-11-08 16:35:32 +01:00
Christoph Berg
af8c3c1472 pgloader 3.6.10 2023-11-02 16:49:50 +00:00
Christoph Berg
c722964096 Debian: Bump ip4r dependencies to 16.
(Closes: #1052837)
2023-11-02 17:44:59 +01:00
Jari Ylimäinen
e58809aff3 Pin Debian version to Bookworm and fix dependencies 2023-06-13 15:50:56 +03:00
kobaaa
999791d013
allow db names with dot (#1059) 2022-11-03 14:42:35 +01:00
Christoph Berg
b0f0f8313b Debian: Check if build really built pgloader
Cannot save core with multiple threads running.
2022-10-24 13:52:05 +02:00
Christoph Berg
3e06d1b9e1 New version 3.6.9 2022-10-24 13:12:05 +02:00
Christoph Berg
99090836a2 Use pgapt repository for testing
Required to make postgresql-15-ip4r available.
2022-10-24 13:07:18 +02:00
Christoph Berg
84ed9c6c48 Bump ip4r dependencies to 15
Closes: #1022296
2022-10-24 12:59:19 +02:00
Dimitri Fontaine
6d61c8c770
Add a setup for readthedocs. 2022-10-18 19:51:15 +02:00
Dimitri Fontaine
9011fcde13
Clean-up docs/conf.py (remove old cruft). 2022-10-18 19:41:57 +02:00
Dimitri Fontaine
626f437963
Improve docs formatting of command line options. 2022-10-18 19:20:52 +02:00
Dimitri Fontaine
925996000b
Improve pgloader docs (Table of Contents, titles, organisation). (#1440)
Make it easier to nagivate our docs, which are dense enough to warrant
proper organisation and guided Table of Contents.
2022-10-18 17:28:34 +02:00
Justin Falk
6d73667685
Add default support for MSSQL auto-incrementing bigint and smallint (#1435)
* Add default support for MSSQL auto-incrementing bigint and smallint

* Update list-all-columns.sql

Add support for sysdatetimeoffset defaults

* Update mssql-cast-rules.lisp

Correct bigint cast
2022-10-18 16:54:08 +02:00
Christoph Berg
759777ae08 New upstream version. 2022-09-26 14:25:44 +02:00
Christoph Berg
59d2c5c7fd Depend on libsqlite3-0.
pgloader  --regress allcols.load

debugger invoked on a SIMPLE-ERROR in thread
#<THREAD "main thread" RUNNING {10070C8003}>:
  Error opening shared object "libsqlite3.so.0":
  libsqlite3.so.0: cannot open shared object file: No such file or directory.
2022-09-26 11:41:33 +02:00
Christoph Berg
350bcc09d1 Debian: run wrap-and-sort -ast 2022-09-26 11:08:11 +02:00
padinko
90dea2ad4e
Fix mapping mysql signed int with auto_incement to postgresql serial (#1248) (#1437) 2022-09-22 13:11:34 +02:00
willyrk1
f25f1b659c
(Re-)Fix documentation link (#1434) 2022-09-21 00:40:51 +02:00
noctarius aka Christoph Engelbert
644f2617e7
Added support for sequences with minvalue defined (#1429)
When a sequence is defined with a minimum value, and the sequence to migrate is empty, the setval fails due to 1 being the default (which may be lower than the defined minimum value)

Signed-off-by: Christoph Engelbert (noctarius) <me@noctarius.com>
2022-09-12 16:17:49 +02:00
Eugen Mayer
8ff0c574b3
Fix documentation link (#1426) 2022-08-29 15:53:50 +02:00
Dimitri Fontaine
75c00b5ff4
Use the unix-namestring as the hash key for SQL queries. (#1420)
The way we manage and then fetch the SQL queries embedded in the pgloader
binary we should really take the unix-namestring rather than the
native-namestring. Of course, this only matters when the host OS is NOT
unix, which is why this bug existed for so long.
2022-08-18 14:16:58 +02:00
Dimitri Fontaine
696617d930
Upgrade Clozure-CL in the Dockerfile.ccl. 2022-08-18 14:01:03 +02:00
Christoph Berg
755b55d2b3 New upstream version 2022-08-13 10:35:39 +02:00
Christoph Berg
b24eba972d Set SBCL dynamic space size to 16 GB on 64 bit architectures. 2022-08-12 22:57:51 +02:00
Dimitri Fontaine
28ef36a6dc
README and install docs improvements. 2022-08-12 17:32:55 +02:00
Dimitri Fontaine
5f5734cf01
Update current version number, latest release being 3.6.6. 2022-08-12 17:23:14 +02:00
Dimitri Fontaine
ff33ec5e2e
Improve README and add proper INSTALL docs. (#1417) 2022-08-12 17:10:34 +02:00
Dimitri Fontaine
fac03a68d4
Install a github action to publish docker images. (#1416)
* Install a github action to publish docker images.

* Our main branch name is still "master".
2022-08-12 16:39:05 +02:00
Dimitri Fontaine
8d97a313fa
Improve documentation with command lines and defaults. (#1415)
* Improve documentation with command lines and defaults.

The advanced documentation coverage skipped the simple approach and didn't
explain fully what to do with the sample commands.

Fixes #1413.

* Fix docs indentation in the MySQL connection string options.

* Desultory docs and docs config fixes.
2022-08-12 15:27:40 +02:00
Dimitri Fontaine
eeefcaa98e
SBCL compiler notes should not be fatal to pgloader. (#1411)
* SBCL compiler notes should not be fatal to pgloader.

The compile function returns warnings-p and failure-p values, use that to
decide if the code could be compiled, and only signal a condition when it
has been fatal to compiling the code at run-time.

The SBCL compiler is getting smarter at removing unreachable code, and it
looks like pgloader is producing some unreachable code from parsing the user
provided commands.

* Let's make the code look like actual lisp code now.

* Another fix.

* Improve condition handling and simplify processing of compile values.

We don't need to react to any condition signaled from inside pgloader, only
to errors and serious-conditions (an error is a serious-condition). With
that, we can just ignore the compiler warnings and style notes.

* Fix the handler-bind to only consider serious-conditions too.

* Capture compiler output as log it as a debug level message.

* Fix previous attempt.

* Improve capturing of the compiler output (include summary).

* Actually call the new compile function in all places.

Co-authored-by: Dimitri Fontaine <dimitri@citusdata.com>
2022-08-11 17:06:27 +02:00
Christoph Berg
2c52da12cb Sync version numbers 2022-06-27 11:03:22 +02:00
Christoph Berg
b890b32cc1 releasing package pgloader version 3.6.5-1 2022-06-27 10:25:28 +02:00
Christoph Berg
10ee9d931a Remove regression output on clean 2022-06-26 23:32:55 +02:00
Christoph Berg
7c0d478064 Debian: Run tests at build-time as well. 2022-06-26 23:31:54 +02:00
Christoph Berg
05282173d4 bootstrap-centos.sh: Bump sbcl version to 2.2.5
Close #956
2022-06-24 16:28:51 +02:00
Christoph Berg
12d4885f3d Remove bundle files on clean 2022-06-24 16:22:00 +02:00
Christoph Berg
a56f5a4b25 Build bundle file
Close #1347
2022-06-24 16:17:29 +02:00
Christoph Berg
f667fcc666 debian/watch: Look at tags instead of releases 2022-06-24 15:12:05 +02:00
Christoph Berg
3f1ca18229 New upstream version 3.6.4 2022-06-24 14:39:50 +02:00
Vincent GRAILLOT
55d76af6c9 Fix documentation typo
CSV Source Specification: From
One of special values are `FILENAME MATCHING` and not `FILENAMES MATCHING`
2022-06-24 14:13:20 +02:00
Christoph Berg
ebad5e2e57 debian/tests/testsuite: Use trust authentication 2022-06-24 13:21:29 +02:00
Christoph Berg
e19329be99 debian/tests/testsuite: Use installed pgloader binary 2022-06-24 10:47:29 +02:00
Christoph Berg
4eb618d45f .github/workflows/debian-ci.yml: We are not on focal, drop special rule 2022-06-24 10:45:49 +02:00
Christoph Berg
9c904b67d1 debian/tests/testsuite: Run "regress" instead of "all" 2022-06-24 10:41:14 +02:00
Christoph Berg
e234ff188e test: Create csv schema in partial.load 2022-06-24 10:39:15 +02:00
Dimitri Fontaine
e2418891a4
Fix looping over sbcl *external-formats*.
The internal represtation of the SBCL *external-formats* has changed to a
new structure which is not an hash-table anymore.
2022-06-23 17:15:47 +02:00
Christoph Berg
3853c8996f test: Depend on postgresql-14-ip4r 2022-06-23 16:25:55 +02:00
Christoph Berg
087ddce749 Run testsuite 2022-06-23 16:20:14 +02:00
Christoph Berg
b54ed07175 Ignore some Debian build artifacts 2022-06-23 16:08:36 +02:00
Christoph Berg
8451ca5158 test: Don't run sakila test by default; set variables for tests
Sakila needs a MySQL database running. Set DBPATH and SOURCEFILE for the
sqlite-env.load and csv-districts-env.out tests.
2022-06-23 15:59:11 +02:00
Brendan Ball
4114daf190 parameterize DYNSIZE in dockerfiles 2022-06-23 14:20:38 +02:00
Christoph Berg
85f2d3e55b Add Debian autopkgtest 2022-06-23 14:11:37 +02:00
Christoph Berg
71c922f0dd Remove docker-ci workflow
It was never properly hooked up.
2022-06-23 13:51:04 +02:00
Darryl T. Agostinelli
c9616e2675 specify v8 of freetds to deal with #1068 'MSSQL to Postgres: is not of type VECTOR' 2022-06-23 13:48:34 +02:00
Scott Thomas
7e1f7c51c8 Allow underscores in cast type names
Underscores might exist in cast type names, for example if the type being cast is an enum (which can have arbitrary names).

Fixes #1378
2022-06-23 13:43:21 +02:00
Joakim Soderlund
6de6457d65 Fix minor casing issue in intro docs 2022-06-23 13:33:51 +02:00
Athos Ribeiro
248c2f709a Force libcrypto reload in src/hooks.lisp
cl+ssl::libcrypto is also read at startup. If not properly closed in the
hooks, together with libssl, libcrypto will be loaded at startup and if
the first cl+ssl file alternative fails, the debugger is invoked.

* Fixes #1370
2022-06-23 13:28:35 +02:00
Christoph Berg
a94a0a3327 debian/tests/ssl: Force md5 auth if cl-postmodern is too old. 2021-12-22 18:08:21 +01:00
Christoph Berg
0925960989 pgloader 3.6.3 2021-12-22 17:27:14 +01:00
Christoph Berg
3b6fa226b8 Debian: Revert temporarily ignored file 2021-12-22 17:25:47 +01:00
Christoph Berg
4f1650f084 Debian: Remove obsolete lintian overrides 2021-12-22 17:24:50 +01:00
Christoph Berg
11dc31d05c docs/ref/pgsql-redshift.rst: Fix too short title underline 2021-12-22 17:07:31 +01:00
Christoph Berg
a8c50e37f8 Fix places2k.zip location
Close #375
2021-12-21 10:36:14 +01:00
ChristophKaser
40f6ba1ff4 Fixes typo in MS SQL Migration Options docs 2021-12-21 10:20:39 +01:00
Dmitry Shvetsov
2e728f5754 Fix typo in the log message 2021-12-21 10:10:38 +01:00
Christoph Berg
92dfb3f706 Debian: Remove cl-pgloader, deprecated upstream. 2021-12-21 10:10:23 +01:00
Phil Ingram
f49252d6b4 Add openssl-devel dependency to pgloader.spec 2021-12-09 16:56:09 +01:00
Scott Markwell
4bd4c0ef08 Spelling correction 2021-12-09 16:54:12 +01:00
marcel-tomes
a8512e60fa Fix typo 2021-12-09 16:53:25 +01:00
Nitish
3047c9afe1
Typo fix in MS SQL documentation (#1242) 2020-12-14 01:04:47 +01:00
Rodolphe Quiédeville
63e4eea5f0
Add postgres version 13 tto Travis (#1235)
Update the travis-ci configuration to run test on latest stable version ie 13
2020-11-21 20:13:08 +01:00
Jeffrey van den Hondel
45a4b6f353
chore: create workflow docker-ci (#1217)
* chore: create workflow docker-ci

* fix: missing closing fi

Co-authored-by: Jeffrey van den Hondel <jeffrey.vandenhondel@autoflex.nl>
2020-10-19 17:44:48 +02:00
kocio-pl
b60c5feedd
Update index.rst (#1213)
Fix the header "On error stop / On error resume next"
2020-10-19 17:26:15 +02:00
lukesilvia
48d8ed0613
fix: ranged load does not load last record. (#1203) 2020-08-31 20:12:32 +02:00
Aleksi Kinnunen
9788cc64ee
Add MySQL unsigned int casting rules (#1200)
Fixes dimitri/pgloader#1186
2020-08-28 11:24:36 +02:00
Dimitri Fontaine
e388909f0c Implement a retry loop when SQLite database is "BUSY".
It turns our that you can't do some operations on SQLite from several
concurrent connections, such as a "pgrama encoding" query.

Fixes #1193.
2020-07-27 21:12:55 +02:00
Christoph Berg
49e5877853 Merge branch 'debian' 2020-07-14 17:35:53 +02:00
Christoph Berg
11e6627ea8 releasing package pgloader version 3.6.2-1 2020-07-14 17:02:51 +02:00
Christoph Berg
455800139f debian: Note that we need cl-plus-ssl 20190204 or later. 2020-07-14 16:52:17 +02:00
Christoph Berg
c1d58b6dd9 debian: Note that we need cl-csv 20180712 or later. 2020-07-14 16:15:45 +02:00
Christoph Berg
9c2d8d2baa debian: Note that we need cl-plus-ssl 20180328 or later. 2020-07-14 15:33:31 +02:00
Christoph Berg
63274dbec4 debian/rules: Skip dh_dwz like dh_strip as it fails on buster. 2020-07-14 15:08:41 +02:00
Christoph Berg
c8ceb1cf8f debian: Add patch to remove sphinx theme options on bionic 2020-07-13 17:17:09 +02:00
Christoph Berg
2100690402 debian: Clean buildapp log 2020-07-13 16:48:05 +02:00
Christoph Berg
5bfa5430cf debian/rules: Print actual compiler log. 2020-07-13 16:30:12 +02:00
Dimitri Fontaine
f5139cbf29 Add support for DATETIME() function call as a SQLite default values.
Fixes #1177.
2020-07-03 18:39:37 +02:00
Christoph Berg
164726eab9 DH 13. 2020-06-10 15:28:27 +02:00
Christoph Berg
d024552f56 Bump required cl-db3 version to 20200212. 2020-06-10 15:27:50 +02:00
Christoph Berg
a06900e898 New upstream version. 2020-06-10 14:32:40 +02:00
Christoph Berg
f8ef9c2dc3 Merge tag 'v3.6.2' into debian 2020-06-10 14:26:33 +02:00
Dimitri Fontaine
689dd4a806 Simplify Postgres version string parsing further. 2020-06-06 17:07:06 +02:00
Dimitri Fontaine
cc8975bb88 Improve Postgres version string parsing.
Turns out we have way more variety in the field than what I anticipated...

Should fix #992.
2020-06-05 18:47:32 +02:00
Dimitri Fontaine
2189acfb63 Attempt at fixing #1060. 2020-06-05 18:22:47 +02:00
Drew Repasky
a76f7e1e8c
typo (#1152) 2020-05-26 21:06:17 +02:00
Dimitri Fontaine
1bdc0ee5f4 Allow SQLite type names such as "double precision".
The parsing of the type names from the SQLite catalogs needs to allow for
names with spaces in them, because SQLite allows that too.

Fixes #921.
2020-05-09 23:55:59 +02:00
Semen Miroshnichenko
11d926126e
Keep name casing for foreign keys on mssql (#1145) 2020-05-09 17:49:04 +02:00
Dimitri Fontaine
38a62a7143 Attempt to add Github sponsors to the pgloader home page. 2020-05-05 18:51:18 +02:00
Dimitri Fontaine
d5314a6640 Add support for type "signed long" in SQLite.
Fixes #1126.
2020-04-11 18:33:21 +02:00
Ian L
cb989e1155
Fix MySQL query to compute number of rows per table (#1128)
The CAST should target `unsigned` rather than `integer`, so that we are compatible with MySQL 5.7. Also empty tables might have NULL entries, which we transform to zero entries here, as expected by the Lisp code.

Fixes #1127.
2020-04-11 18:22:02 +02:00
Dimitri Fontaine
86b6a5cb80 We COPY the MS SQL data in the MS SQL column ordering.
Fix #1124.
2020-04-05 15:12:46 +02:00
Dimitri Fontaine
11970bbca8
Implement tables row count ordering for MySQL. (#1120)
This should help optimise the duration of migrating databases with very big
tables and lots of smaller ones. It might be a little too naive as far as
the optimisation goes, while still being an improvement on the default
alphabetical one.

Fixes #1099.
2020-04-04 16:40:53 +02:00
Dimitri Fontaine
14fb15bfbd Force the summary file to be opened in UTF-8.
After all the default summary output contains the “✓” character and that
won't fit in the ascii external format.

Fixes #1103.
2020-04-03 23:49:20 +02:00
Dimitri Fontaine
49910027c5 Typo fix in the MySQL default casting rules documentation.
See #1123.
2020-04-03 23:34:53 +02:00
Michał "phoe" Herda
7b47c00ea7
Delete pgloader.lisp (#1119)
Fixes #1115
2020-03-28 16:53:19 +01:00
Michał "phoe" Herda
bd9cdcea82
Update pgloader.spec for 3.6.2 release (#1118)
Fixes #1116.
2020-03-28 16:53:04 +01:00
Michał "phoe" Herda
bab6aaf890
Simplify and fix errors/warnings/notes on Travis configuration (#1112)
Fixes #1111.

We remove unnecessary .travis.sh code for removing old Postgres versions.
2020-03-28 13:42:18 +01:00
Michał "phoe" Herda
5e7de5d68d
Change Travis build matrix (#1110)
We remove the Travis jobs without the PGLOADER environment variable set to fix #1109 and add Postgres versions 10, 11, and 12 to build and test pgloader against them.
2020-03-28 12:22:02 +01:00
Dimitri Fontaine
64643bff83 Fix MS SQL bigint casting to numeric.
Fixes #937.
2020-03-28 00:00:21 +01:00
Rudi Bruchez
c2b9f79413
handling mssql datetimeoffset (#1113) 2020-03-27 23:50:35 +01:00
Dimitri Fontaine
b3cd5f28d6 Set the MS SQL port in the environment.
As our API to connect to MS SQL does not provide a facility to set the
target port number, place it in the TDSPORT environment variable, which is
reported to have the expected impact.

Should fix #1094.
2020-03-27 23:24:16 +01:00
Svante von Erichsen
8c59f8c9f9
fix typo that chooses wrong option rule (#1107) 2020-03-27 23:14:58 +01:00
Michał "phoe" Herda
6b111ba483
Begin v3.6.3 development 2020-03-22 22:38:46 +01:00
Michał "phoe" Herda
484d3e1dd4
Release v3.6.2 2020-03-22 22:26:18 +01:00
Michał "phoe" Herda
e235c6049d
Update Quicklisp dist to 2020-02-18 (#1106)
#1092 requires us to switch to a new Quicklisp distribution in order to bump CFFI to version 0.21.0. This commit switches to the newest available QL dist (2020-02-18) to achieve this.
2020-03-22 16:59:19 +01:00
Jeff Fendley
be43a49646
Fix unfortunate typo in Redshift doc (#1075) 2020-03-22 16:42:48 +01:00
Dimitri Fontaine
c899d3b5c4 Fix zero-dates-to-null.
Fixes #1098.
2020-03-22 15:56:57 +01:00
Nicolas Delperdange
cc2dc8d671
fix(mysql): use new st_astext function (#1100) 2020-03-22 15:47:57 +01:00
Dimitri Fontaine
ebc72c454e Add a MySQL use case.
See #1102.
2020-03-22 15:20:19 +01:00
Dimitri Fontaine
0daace9d70 Allow source type names to be doube-quoted in CAST rules.
Fixes #1015.
2020-03-21 17:37:54 +01:00
Michał "phoe" Herda
94d0612c12
Do not reload pgloader in Makefile (#1091) 2020-03-21 12:36:31 +01:00
Dimitri Fontaine
30376b2cfe Add support for "datetime2" data type in MS SQL.
That's a blind fix, or almost blind: we're given some information but I'm
not in a position to test the fix myself. Hope it works.

Fixes #1036.
Fixes #1018.
2020-02-29 22:23:58 +01:00
Dimitri Fontaine
df94340396 Avoid parser look-ahead when not necessary.
Fixes #1082 where letters from the URI are doubled because of the
look-ahead in the previous parser's implementation.

Thanks @svantevonerichsen6906 for the fix!
2020-02-29 22:04:06 +01:00
Dimitri Fontaine
3b5c29b030 Attempt to fix foreign-key creation to tables that have been filtered out.
See #1016 where we try to build the DDL for a foreign key that references
tables that are not found in our catalogs. We should probably just ignore
those foreign keys, as we might have a partial load to implement.
2020-02-12 21:50:18 +01:00
Phil
2e8ce7a83c
Update rpm spec file (#1039)
* Update pgloader.spec
- Update from 3.3.2 to 3.6.1
- Use Requires and BuildRequires
- Variablise Source0
- Fix Prep and Files to match source tarball
- Update spec file Changelog

* link to install documentation for RedHat/CentOS. fix tab indentation of debian code block

* Update install instructions for RHEL/CentOS
2020-02-12 21:02:49 +01:00
Dimitri Fontaine
bbcce92418 Allow underscores in SQLite type names.
Fixes #1049.
2020-02-12 00:07:58 +01:00
Dimitri Fontaine
d4da90648e Implement proper hostname parsing, following labels specifications.
Bug report #1053 was the occasion to wander in the specification for DNS
hostnames and their label components, and the syntactic rules for those. It
turns out that my implementation was nothing like the specs:

  https://en.wikipedia.org/wiki/Domain_Name_System#Domain_name_syntax

Fixes #1053.
2020-02-12 00:01:07 +01:00
Dimitri Fontaine
e551099463 Attempt at fixing a MS SQL bug.
That's another blind fix, but it looks like this should be doing it.
Hopefully. Chances to break pgloader even more are quite slim, so let's try
it in the master branch.

Fixes #1069.
2020-02-11 22:43:02 +01:00
Dimitri Fontaine
2fef253d28 Implement AFTER CREATE SCHEMA DO for more sources.
It was only implemented for Postgres sources even though the implementation
is generic enough to be shared. It's only a matter of instructing our parser
about the new facility, which this patch does.

Fixes #1062.
2020-02-11 22:20:08 +01:00
Dimitri Fontaine
8a13c02561 Review our Postgres catalog usage for default values.
The pg_catalog.pg_attribute column adsrc is populated only once, and then
not maintained by the server. The function pg_get_expr(adbin, adrelid) is
what to use and has been here since release 8.0, so that's what pgloader
should be using.

Fixed #1934.
2020-02-11 21:46:10 +01:00
Dimitri Fontaine
26cc9ca79f Implement support for DBF deleted records.
Skip over deleted records rather than blindly importing them. Requires an
update of the underlying DBF library.

Fixes #985.
2019-06-10 22:58:12 +02:00
Dimitri Fontaine
d8b0bd5145 Allow casting rules to guard on signed data types.
It used to be that our casting rules mechanism would allow for matching
unsigned data types only, and we sometimes have a need to do special
behavior on signed data types.

In particular, a signed bigint(20) in MySQL has the same values range as a
PostgreSQL bigint, so we don't need to target a numeric in that case. It's
only when the bigint is unsigned that we need to target a numeric.

In passing update some of the default casting rules documentation to match
the code.

Fix #982.
2019-06-04 15:22:25 +02:00
Dimitri Fontaine
b8da7dd2e9
Generic Function API for Materialized Views support. (#970)
Implement a generic-function API to discover the source database schema and
populate pgloader internal version of the catalogs. Cut down three copies of about
the same code-path down to a single shared one, thanks to applying some amount
of OOP to the code.
2019-05-20 19:28:38 +02:00
Elias Carter
ee75bc4765 Fix ccl docker images not having libssl (#968)
pgloader:ccl.latest throws an error: Shared library not open: "libssl.so.1.1". This commit adds libssl1.1 to the docker image which fixes the issue.
2019-05-17 09:47:18 +02:00
Dimitri Fontaine
12e788094b Improve sexp parser and standard symbols support.
Also add split-sequence to the list of special cases that we can use and is
not found in the pgloader.transforms package.

Fixes #965.
2019-05-14 15:49:24 +02:00
Dimitri Fontaine
501cbed745 Quote database name in ALTER DATABASE "..." SET search_path TO
Fixes #933.
2019-05-12 00:50:59 +02:00
Dimitri Fontaine
06216eea99 Refrain from using fancy schema names in the regression tests.
Then being compatible with PostgreSQL 9.6 is too much trouble.
2019-05-12 00:10:28 +02:00
Dimitri Fontaine
e5f78d978e Remove added DBF tests from the Travis target.
Clozure doesn't have the CP866 encoding that the DBF files are using, and
then PostgreSQL 9.6 doesn't have "create schema if not exists", which makes
the tests fail on Travis.
2019-05-11 23:52:47 +02:00
Dimitri Fontaine
98b465fbef Add the new DBF tests in the test suite.
All with expected results so that we can track regressions there.
2019-05-11 22:13:18 +02:00
Dimitri Fontaine
350cffffad Improve DBF support.
The cl-db3 lib just got improvements for new dbase file types and field
types, reflect those in pgloader.

Also, cl-db3 now can read the encoding of the file (language driver)
directly in the header, meaning we can rely on that metadata by default, and
only override it when the users tells us to.

See #961.
2019-05-11 20:50:52 +02:00
Dimitri Fontaine
a51819f874 Have db3-numeric-to-pgsql-integer accept integer values.
The cl-db3 driver is now parsing type I fields and emit them as native
integers. We need to deal with that.

See #929.
2019-05-10 22:09:52 +02:00
Dimitri Fontaine
954eca02d0 Optimize Travis-CI workload.
Before this patch Travis would build the pgloader image twice, one when
doing the `make clones save` command and another time when doing the main
command `make check`, that depends on Make targets that `make save` does not
build, such as buildapp.

Also, fix `make check-saved` to stop requiring that we save pgloader again.

Finally, use uiop:featurep to clean-up the code in the src/save.lisp file.
2019-05-09 10:52:09 +02:00
Dimitri Fontaine
351ce3faaf Rename pgloader's own DB3 field structure.
That's to avoid a name conflict with the cl-db3 package. In SBCL 1.5.2 in
Travis this conflict is an hard error and provoque failure to build the
pgloader binary.
2019-05-09 10:41:40 +02:00
Dimitri Fontaine
ca92cdbf20 Introduce core-compression discovery in make save to fix Travis builds. 2019-05-09 10:27:20 +02:00
Dimitri Fontaine
781e586816 Fix the SBCL 1.5.2 checksum.
Now that we know that the checksum facility actually works and fails when
given the wrong checksum, install the right checksum. Ahem.
2019-05-09 00:46:11 +02:00
Dimitri Fontaine
ede385bce7 Upgrade travis testing to SBCL 1.5.2.
That's needed for current ironclad.
2019-05-09 00:36:58 +02:00
Dimitri Fontaine
0643cf0869 Remove the old mention about building the docs in the README.
It's been a while we switched to the Read The Docs format, the ruby-ronn
manpage is a thing of the past now.
2019-05-09 00:28:01 +02:00
Dimitri Fontaine
3118602702 Attempt to add a Read The Docs badge on the GitHub README for pgloader. 2019-05-09 00:26:05 +02:00
Dimitri Fontaine
1be0f02057 Attempt to get Travis back on its feet again.
The current problem with Travis build is that it's using the Quicklisp
version of the cl-db3 librairy rather than the current master's branch
version, which current sources of pgloader depend on.

In the Dockerfiles we use the alternate build method for preparing the lisp
image, and invoke the `make clones` target explicitely in order to make sure
that we have the right set of dependencies at build time.

In this patch we apply the same approach to Travis, hoping to solve the
current build issues there.
2019-05-09 00:18:11 +02:00
Dimitri Fontaine
291af994ba Let's see how those badges look when on the same line. 2019-05-09 00:14:43 +02:00
Dimitri Fontaine
f17562f62c Attempt to add DockerHub Build Status to the README at GitHub. 2019-05-09 00:13:30 +02:00
Dimitri Fontaine
2b6fb3e6c2 Add a casting rule for the I data type in DBF files.
The I data type is mapped to a PostgreSQL integer.

See #929.
2019-05-08 23:36:17 +02:00
Dimitri Fontaine
7d2e5ae941 Allow lambda expression as cast rules using functions.
Before that it was necessary to install a function in the lisp environment
either in the source itself in src/utils/transforms.lisp, or in a lisp file
loaded with --load-lisp-file (or -l for shorts).

While this could be good enough, sometimes a very simple combination of
existing features is required to transform a function and so doing some
level of lisp coding directly in the load command is a nice to have.

Fixes #961.
2019-05-08 19:02:18 +02:00
Mathieu Santostefano
6aa42ec68f Remove invalid WITH default option (#960)
It seems like `create schema` option for WITH statement is invalid.
2019-05-07 23:06:02 +02:00
Dimitri Fontaine
b1d55e07d6 Implement header parsing for fixed file formats.
Fixed file formats might contain an header line with column names and a hint
of the size of each column. While it might be a long shot that we can
acutally use that as a proper fixed-format specification, this patch
implements a guess mode that also outputs the parsed header.

In case when the parsing is wrong in some level of details, it might
actually be a good start to copy/paste from the command output and go from
there.

Fixes #958.
2019-05-07 22:57:55 +02:00
Dimitri Fontaine
27b1a83b9f Allow quoted table and column names in user-defined CASTing rules.
Fixes #955.
2019-04-30 16:55:26 +02:00
Dimitri Fontaine
d4369cc605 Move DB3 transformations function in pglaoder.transforms package.
That makes them available for user defined cast rules.

Fix #929.
2019-04-29 16:52:24 +02:00
Dimitri Fontaine
febb2c11be Update HTTP/ZIP/DBF test case.
The URL of the test case source has changed. Use the new one. Also set the
encoding properly, the client_encoding trick has been deprecated for awhile
now, as pgloader only talks to Postgres in UTF-8.
2019-04-28 23:34:31 +02:00
Dimitri Fontaine
6dbe62af1c Attempt to fix MS SQL drop view for Mat Views facility.
It seems that the current open-coding of the materialized views data
structure is still causing some confusion, so use the same code as in the
MySQL support in the MS SQL parts of the code when trying to fix #950.

See #950.
2019-04-28 22:53:08 +02:00
Dimitri Fontaine
70b6845852 Apply identifier case to MS SQL column names in fkey definitions.
This is a blind fix: it looks like we forgot to take care of identifier case
when adding columns to foreign key definitions in MS SQL catalogs.

Fixes #953.
2019-04-28 22:48:07 +02:00
Dimitri Fontaine
513455f552 Implement support for MySQL bitstrings.
We migrate bit(xx) to the same PostgreSQL datatype bit(xx) where in Postgres
we can use bitstring as documented at the following URL. In particular the
COPY syntax accepts the notation Xabcd for the values, which is quite nice
when MySQL sends the data to us a a byte vector:

  https://www.postgresql.org/docs/current/datatype-bit.html

Fixes #943.
2019-04-19 12:52:04 +02:00
Dimitri Fontaine
a9133256a7 Change the main Dockerfile for sbcl to use make save too.
That building script does not rely on buildapp, which might be an advantage
in docker environments: first it's one less dependency, second, we have seen
weird error cases when using buildapp.
2019-04-17 14:56:54 +03:00
Dimitri Fontaine
739be3a730 Fix Dockerfile.ccl
Stop installing sbcl in the CCL image, that's not needed, and also make sure
we use a git clone of some of the libs rather than the current QL version of
them.

Fixes #926.
2019-04-17 14:51:11 +03:00
Dimitri Fontaine
4b9cbcbce3 Fix MySQL processing of Mat Views, again.
The previous fix left to be desired in that it didn't update the code
expectations about what a view-name looks like in fetch-metadata. In
particular we would use (cons NIL "table-name") in the only-table facility,
which expects non qualified names as strings.

Switch to using the :including filters facility instead, as we do in MS SQL.
Later, we might want to deprecate our internal :only-tables facility.

Fix #932. Again.
2019-04-15 23:06:57 +02:00
Dimitri Fontaine
1a4ce4fb46 Re-indent block before editing.
It turns out that current Emacs settings aren't agreeing with the file
indenting, let's clean that up before modifying the file, so that later
reviews are easier.
2019-04-15 23:05:28 +02:00
Dimitri Fontaine
701d54bfdf Bypass using buildapp in the Docker build for CCL. 2019-04-15 12:23:49 +02:00
Dimitri Fontaine
bc1167d3e3 Implement support for DB3 MEMO files.
The support code is actually found in cl-db3, the changes that are necessary
for pgloader are only about updating to the new API.

Fixes #926.
2019-04-14 22:22:33 +02:00
Dimitri Fontaine
7b10fabd94 Update DB3 documentation to include user-defined casting rules.
See #927, #938.
2019-04-14 21:56:04 +02:00
Dimitri Fontaine
39fc78e08f Implement user-defined casting rules support for DB3.
The casting support for DB3 was hand-crafted and didn't get upgraded to
using the current CAST grammar and facilities, for no other reasons than
lack of time and interest. It so happens what implementing it now fixes two
bug reports.

Bug #938 is about conversion defaulting to "not null" column, and that's due
to the usage of the internal pgloader catalogs where the target column's
nullable field is NIL by default, which doesn't make much sense. With
support for user-defined casting rules, the default is nullable columns, so
that's kind of a free fix.

Fixes #927.
Fixes #938.
2019-04-14 21:46:04 +02:00
Dimitri Fontaine
efe70ba3c3 Fix MySQL Materialized Views support, recently broken.
When adding support for Mat Views to MS SQL, we added support for the view
names to be fully qualified (with their schema), using a cons to host
the (schema . name) data.

Well, turns out the MySQL side of things didn't get the memo.

Blind attempt at fixing #932, see also #918.
2019-04-14 00:37:40 +02:00
Dimitri Fontaine
c83a0375a0 Fix glitch in Materialized Views support for MSSQL.
Thanks to @sorix6 for a bug report with a fix!

Fixes #928.
2019-04-14 00:23:37 +02:00
Dimitri Fontaine
957caa877e Add a new transform function for byte-vector to hexadecimal string.
In some cases when migrating from MySQL we want to transform data from
binary representation to an hexadecimal number. One such case is going from
MySQL binary(16) to PostgreSQL UUID data type.

Fixes #904.
2019-03-18 14:21:33 +01:00
Dimitri Fontaine
4d005b5c9c Improve Postgres connection string parsing.
Add support for single-quoting database name as in the Postgres
documentation about connection string at:

  https://www.postgresql.org/docs/10/libpq-connect.html#LIBPQ-CONNSTRING

In passing, allow for ipv6 as well as ipv4 host ip addresses.

Fixes #923.
2019-03-17 17:25:41 +01:00
Andreas Hildebrandt
4ec8613884 Correctly parse PostgreSQL version string on Windows.
The regex used in parse-postgresql-version-string failed to handle
"PostgreSQL 10.6, compiled by Visual C++ build 1800, 64-bit"
correctly.
2019-02-25 15:48:40 +01:00
Dimitri Fontaine
0081fb6560 Oops, fix trailing closing parens.
When using interactive recompiling of the code in Emacs/SLIME, extra closing
parens are just ignored in Emacs before sending the current form to the CL
compiler. When compiling from the source files, of course, that doesn't
work.

See #910.
2019-02-19 12:50:56 +01:00
Dimitri Fontaine
fbdc95ede6 Fix parsing of MS GUID with mixed endianness.
It turns out that MS SQL Server is using its own representation for GUID
with a mixed endianness. In this blind patch we attempt to parse the binary
vector in the right way when building our internal representation of an
UUID, before making a string out of it for Postgres, which doesn't use the
same mixed-endianness format.

Fixes #910. Maybe?
2019-02-18 19:37:42 +01:00
Dimitri Fontaine
7c146c46b9 Citus support small bug fixes.
When calling the create_distributed_table() function, the column name is
given as a literal parameter to the function and should be quoted that way,
with single quotes. In particular, if our column-name is already
double-quoted, we need to get rid of those extra quotes.

Also, the source-table-name might be a cons object when qualified, or a
plain string when not schema-qualified. Adjust the citus-find-table code to
take that into account.
2019-02-16 15:48:46 +01:00
Dimitri Fontaine
2e6a941d25 Restore --list-encodings option.
It got broken, somehow. Let's move it further up the chain of command line
option processing.

Fixed #909.
2019-02-15 21:18:38 +01:00
Dimitri Fontaine
69d9b381dc Fix previous patch: also process the bad row.
The bad row needs to go to the reject file, and its condition error message
to the reject log. Ooops, forgot.

See #836.
2019-02-15 00:07:39 +01:00
Dimitri Fontaine
632f7f5b4e Implement COPY error handling for non-parsable error messages.
pgloarder parses the COPY error messages to find out the line number where
we have a problem in the batch, allowing for a quite efficient recovery
mechanism where it's easy enough to just skip the known faulty input.

Now, some error messages do not contain a COPY line number, such as fkey
violation messages:

  Database error 23503: insert or update on table "produtos" violates
  foreign key constraint "produtos_categorias_produtos_fk"

In that case rather than failing the whole batch at once (thanks to the
previous commit, we used to just badly fail before that), we can retry the
batch one row at a time until we find our culprit, and then continue one
input row at a time.

Fixes #836.
2019-02-15 00:05:47 +01:00
Dimitri Fontaine
8eea90bb51 Improve Foreign Key error handling.
We don't know how to parse the PostgreSQL condition sent when there is a
fkey error... and the message would not contain the row number where that
error happened anyway.

At the moment it means that the retry-batch facility errors out for failing
to realize that NIL isn't a number on which we can do arithmetic, which in
itself in a little sad.

In this patch we install a condition handler that knows how to deal with
retry-batch failing, so that pgloader may try and continue rather than
appear locked to the user, when I suspect that the debugger is waiting for
input.

See #836, where that's the first half of the fix. The real fix is to handle
foreign key errors correctly of course.
2019-02-14 23:10:12 +01:00
Dimitri Fontaine
2cbf716112 Ensure column names are double-quoted in the COPY command.
In some cases we have to quote column names and it's not been done yet, for
instance when dealing with PostgreSQL as a source database.

Patch mostly from @m0n5t3r, only cosmetic changes applied. Thanks!

Fixes #905.
2019-02-14 22:16:53 +01:00
Dimitri Fontaine
0caa9c30ce Fix Postgres to Postgres migrations with schema name change.
When migrating from PostgreSQL, pgloader takes the index and foreign key
definitions from the server directly, using pg_get_indexdef() and other
catalog functions. That's very useful in that it embeds all the necessary
quoting of the objects, and schema-qualify them.

Of course we can't use the SQL definition as on the source system when we
target a schema name that is different from the source system, which the
code didn't realize before this patch. Here we simply invalidate the
pre-computed SQL statement and resort to using the classic machinery to
build the statement from pieces again.

Fixes #903.
2019-02-14 18:24:50 +01:00
Dimitri Fontaine
213edbe930 Compare type names case-insensitively in casting rules.
The types TEXT and text are the same, and should match both when used as a
casting rule and when checking a catalog merge.

See #132.
2019-02-05 18:26:40 +03:00
ChristophKaser
36fbadded6 Allow "preserve index names" for MSSQL (#902)
With this change, preserve index names is also supported for MSSQL-connections.
2019-02-05 11:08:23 +03:00
Christoph Berg
a6df4e9807 debian/rules: Sync loaded systems with Makefile.
Cf. 25c937879a
2019-01-22 10:42:56 +01:00
Christoph Berg
398802a1f0 debian/tests/ssl: Add --debug to get backtraces.
Cf. #893.
2019-01-21 22:03:35 +01:00
William Hakizimana
be2815fda2 Typo: Change issing to issuing (#895) 2019-01-21 18:13:52 +01:00
Dimitri Fontaine
eafaf80b3c Back to non-release development.
Master's branch is now preparing for 3.6.2.
2019-01-21 17:51:44 +01:00
William Hakizimana
3208145e46 Typo: change directly to directory (#894) 2019-01-21 17:50:30 +01:00
Christoph Berg
15106489d6 New upstream version.
* New upstream version.
* SSL is always enabled now, drop our patch.
* Add B-D on python3-sphinx-rtd-theme.
2019-01-21 16:19:50 +01:00
Christoph Berg
de38a4473a Merge branch 'master' into debian 2019-01-21 16:09:05 +01:00
Dimitri Fontaine
25c937879a Fix building for 3.6.1.
The pgloader-image feature must be added in the lisp image before
reading/compiling the pgloader sources for it to be useful.
2019-01-21 15:02:39 +01:00
Dimitri Fontaine
dae5dec03c Allow fields/columns projections when parsing header.
When using a CSV header, we might find fields in a different order than the
target table columns, and maybe not all of the fields are going to be read.
Take account of the header we read rather than expecting the header to look
like the target table definition.

Fix #888.
2019-01-15 22:39:08 +01:00
Dimitri Fontaine
1306b4c953 Desultory improvements.
Killing tasks in the error handling must be done carefully, and given this
testing session it seems better to refrain from doing it when erroring out
at COPY init time (missing column is an example of that). The approach
around that is still very much ad-hoc rather than systematic.

In passing improve the `make save` option to producing a binary image: have
the make recipe respect the CL variable. The command line options
differences were already accounted for.
2019-01-09 18:57:33 +01:00
Dimitri Fontaine
2147a1d07b Implement ALTER TABLE ... SET TABLESPACE ... as a pgloader clause.
This allows creating tables in any target tablespace rather than the default
one, and is supported for the various sources having support for the ALTER
TABLE clause already.
2019-01-08 22:50:24 +01:00
Dimitri Fontaine
f28f8e577d Review log-level for stored procedures.
Some MySQL schema level features (on update current_timestamp) are migrated
to stored procedures and triggers. We would log the CREATE PROCEDURE
statements as LOG level entries instead of SQL level entries, most likely a
stray devel/debug choice.
2019-01-08 22:44:07 +01:00
Dimitri Fontaine
44514deaa7 Improve ALTER TABLE documentation. 2019-01-08 22:09:13 +01:00
Dimitri Fontaine
a4a52db594 Improve SQLite support for autoincrement and sequences.
It turns out that SQLite only creates an entry in its sqlite_sequence
catalogs when some data make it to a table using a sequence, not at create
table time. It means that pgloader must do some more catalog querying to
figure out if a column is "autoincrement", and apparently the only way to
get to the information is to parse the SQL statement given in the
sqlite_master table.

Fixes #882.
2019-01-07 23:52:29 +01:00
Dimitri Fontaine
204a0119cd Add another debugging guard #+pgloader-image. 2019-01-07 23:51:58 +01:00
Dimitri Fontaine
e4a4edb276 Make interactive debugging easier.
It's fair game to handle errors and issue logs instead when using the
pgloader binary image, as it distracts users a lot. That said, as a
developer the interactive debugger is very useful.

In passing install some experimental thread killing behavior in case of
errors and using on-error-stop setting (default for database migrations).
2019-01-07 20:44:16 +01:00
Dimitri Fontaine
9ce4088b48 Improvements to the make save facility. 2019-01-07 20:44:16 +01:00
Stéphane Wirtel
13bdb2d118 Fix section in rest (#883) 2019-01-07 20:24:59 +01:00
Dimitri Fontaine
b8e8cf7d18 Fix bugs in the recent extended support for materialized views.
Materialized views without an explicit schema name are supported, but then
would raise an error when trying to use destructuring-bind on a string
rather than the (cons schema-name table-name). This patch fixes that.
2018-12-28 10:53:01 +01:00
Dimitri Fontaine
65d323e4a3 Refrain from matching typemod expression to NIL typemod.
Fixes #879.
2018-12-23 20:51:36 +01:00
Dimitri Fontaine
3d08996777 Review the new documentation material. 2018-12-20 10:05:54 +01:00
Dimitri Fontaine
eab1cbf326 More docs improvements.
Explain the feature list of pgloader better for improving discoverability of
what can be achieved with our nice little tool.
2018-12-19 22:40:32 +01:00
Dimitri Fontaine
ec071af0ad Add a Feature Matrix to the documentation.
That helps having both an overview of what pgloader is capable of doing with
a database migration, and also documenting that some sources don't have the
full support for some features yet.
2018-12-19 15:31:25 +01:00
Dimitri Fontaine
2cafa8360c Document newly added MATERIALIZE VIEWS for new sources.
Now it's possible to use this clause with a PostgreSQL or an MS SQL database
source.

Fixes #817.
2018-12-19 10:51:04 +01:00
Dimitri Fontaine
c019c16113 Implement MATERIALIZE VIEWS support for MS SQL, and distribute.
The latter is not tested yet, but should have no impact if not used. Given
how rare it is that I get a chance to play around with a MS SQL instance
anyway, it might be better to push blind changes for it when it doesn't
impact existing features…
2018-12-19 01:25:27 +01:00
Dimitri Fontaine
bda06f8ac0 Implement Citus support from a MySQL database. 2018-12-17 16:31:47 +01:00
Dimitri Fontaine
290ad68d61 Implement materialize views in PostgreSQL source support. 2018-12-16 23:17:37 +01:00
Dimitri Fontaine
007003647d Improve Redshift support documentation. 2018-12-14 18:21:34 +09:00
Dimitri Fontaine
f72afeeae7 Switch the documentation to the ReadTheDocs template. 2018-12-12 09:34:20 +09:00
Dimitri Fontaine
b6de8f1ead Improve Citus documentation. 2018-12-12 09:34:05 +09:00
Dimitri Fontaine
56d24de67a Update documentation with new features.
We have a lot of new features to document. This is a first patch about that,
some more work is to be done. That said, it's better than nothing already.
2018-12-11 14:25:08 +09:00
Dimitri Fontaine
af2995b918 Apply quoting rules to SQLite index column names.
The previous fix was wrong for missing the point: rather than unquote column
names in the table definition when matching the column names in the index
definition, we should in the first place have quoted the index column names
when needed.

Fixes #872 for real this time.
2018-12-02 00:17:26 +01:00
Dimitri Fontaine
a939d20dff Unquote names when searching for an index column name in its table.
If the source database is using a keyword (such as "order") as a column
name, then pgloader is going to quote this column name in its internal
catalogs. In that case, unquote the column in the pgloader catalogs when
matching it against the unquoted column name we have in the index
definition.

Fixes #872.
2018-12-01 21:27:26 +01:00
Dimitri Fontaine
ab2cadff24 Simplify the regular expresion parsing the PostgreSQL version string.
The debian/Ubuntu packaging would defeat the quite simple regexp parsing
PostgreSQL version string that we have in pgloader. To make it more robust,
make it more open to unforeseen strings.

See #800, see #810.
2018-11-30 15:39:27 +01:00
Dimitri Fontaine
801d8a6e09 Add support for MS SQL time data type.
As for the other datetime types we have to use CONVERT at the SQL level in
order to get a format that PostgreSQL understands. This time the magic
number for it is 114.
2018-11-23 10:43:58 +01:00
Dimitri Fontaine
6e325f67e0 Implement the save.lisp idea for the bundle.
This should make it easier to build pgloader with CCL rather than SBCL, all
from the bundle distribution, and also easier to support windows.

In passing, add a new file in the bundle distribution: version.sexp should
contain a CL string containing the pgloader version string.
2018-11-21 21:44:56 +01:00
Dimitri Fontaine
18bcf10903 Blind fix for a strange use-case.
A user reported a case where pgloader fails to find the table an index has
been created on in pgloader catalogs. That's a weird case. For now, just
issue a warning about the situation and skip the index.
2018-11-21 18:17:34 +01:00
Dimitri Fontaine
4ab26e5387 Handle other conditions in process-catalogs.
It might be that some random condition is signaled during process-catalogs,
causing the errors reported so far and that I can't reproduce. Let's add
some handler-case protection to have more clues about what could be
happening.

See #865, #800, #810, #859, #824.
2018-11-21 17:31:11 +01:00
Dimitri Fontaine
743769d750 Improve handling of errors when fetching the source catalogs.
We might have MS SQL failures at this stage, or even Redshift or other
PostgreSQL variants failing to execute our catalog queries. Handle
conditions by cleanly logging them and returning from copy-database without
doing anything. That's the best we can do here.

Fixes #605, fixes #757.
2018-11-21 10:38:19 +01:00
Dimitri Fontaine
1c18b41cd7 Implement a new way of building pgloader: make save.
This time we directly call into the save-lisp-and-die feature of the
implementation. As pgloader only supports SBCL and CCL at the time being,
doing things without an abstraction layer is easy enough.

This needs more testing and a special version for the bundle case too. One
step at a time, etc.
2018-11-20 22:59:43 +01:00
Dimitri Fontaine
3f2f10eef1 Finish implementation of CAST rules for PostgreSQL source databases.
Add a link to the table from the internal catalogs for columns so that we
can match table-source-name in cast rules when migrating from PostgreSQL.
2018-11-19 19:33:37 +01:00
Dimitri Fontaine
aa8ae159e2 Improve error handling when applying Citus distribution rules.
Make it so that we generate a proper error message to the user when failing
to figure out the PATH to the distribution key, rather than failing with an
internal error about The value NIL is not of type PGLOADER.CATALOG:TABLE.
2018-11-18 18:21:51 +01:00
Dimitri Fontaine
f07ac61269 Fix default/serial handling of pgsql as a source.
In the recent patch that added support for Redshift "identity" columns, we
broke support for PostgreSQL sequences. Unbreak that.
2018-11-18 17:46:41 +01:00
Dimitri Fontaine
1fd0576ace Fix Citus support related debug print instructions. 2018-11-16 00:08:27 +01:00
Dimitri Fontaine
8b1acbae87 Make sure the image knows how to print circular data structures.
Our catalogs representation is designed to be circular, which helps
navigating the graph from anywhere when processing it. This means that we
need to have *print-circle* set to t in the pgloader image, otherwise we
might run into Control stack exhausted when trying to print out debug
information...

Fixes #865, #800, #810, #859, #824.
2018-11-16 00:03:31 +01:00
Dimitri Fontaine
e291c502ba Install a call to cl+ssl:reload at image startup time, again.
Testing shows that it's not just debian which needs it, it's always
necessary. Just re-add our tweak now.

See #866, see #816, see #807, #794.
2018-11-15 23:59:51 +01:00
Dimitri Fontaine
16dda01f37 Deal with SSL verify error the wrong way.
This patch adds an option --no-ssl-cert-verification that allows bypassing
OpenSSL server certificate verification. It's hopefully a temporary measure
that we set up in order to make progress when confronted to:

  SSL verify error: 20 X509_V_ERR_UNABLE_TO_GET_ISSUER_CERT_LOCALLY

The real solution is of course to install the SSL certificates at a place
where pgloader will look for them, which defaults to
~/.postgresql/postgresql.crt at the moment. It's not clear what the story is
with the defaults from /etc/ssl, or how to make things happen in a better
way.

See #648, See #679, See #768, See #748, See #775.
2018-11-15 00:13:21 +01:00
Dimitri Fontaine
5ecf04acb9 Implement null if support as a WITH option.
This gives a default "null if" option to all the input columns at once, and
it's still possible to override the default per column.

In passing, fix project-fields declarations that SBCL now complains about
when they're not true, such as declaring a vector when we might have :null
or nil. As a result, remove the (declare (optimize speed)) in the generated
field processing code.
2018-11-13 21:41:27 +01:00
Dimitri Fontaine
a6ef7a56a9 Implement ipv6 hostname support in .pgpass rules.
An hostname could be written [::1] in .pgass, without having to escape the
colon characters, and with a proper enclosing in square brackets, as common
for ipv6 addresses.

Fixes #837.
2018-11-10 21:01:30 +01:00
Dimitri Fontaine
656bf85075 Review field to column projection code emitted.
The code emitted by pgloader to transform input fields into PostgreSQL
column values was using too many optimization declarations, some of them
that SBCL failed to follow through for lack of type marking in the generated
code.

As SBCL doesn't have enough information to be optimizing anyway, at least we
can make it so that we don't have a warning about it. The new code does that.

Fixes #803.
2018-11-10 20:22:04 +01:00
Dimitri Fontaine
6eaad0621b Desultory code maintenance for MS SQL identity support.
The code expects the keyword :auto-increment rather than a string nowadays
in order to process an extra column bits of information as meaning that we
want to cast to a serial/bigserial datatype.
2018-11-09 22:42:31 +01:00
Dimitri Fontaine
6c80404249 Implement support for Redshift "identity" columns.
At this stage we don't even parse the details of the Redshift identity such
as the seed and step values and consider them the same as a MySQL
auto_increment extra description field.

Fixes #860 (again).
2018-11-09 22:41:14 +01:00
Dimitri Fontaine
794bc7fc64 Improve redshift support: string_agg() doesn't exist there.
Neither does array_agg(), unnest() and other very useful PostgreSQL
functions. Redshift is from 8.0 times, so do things the old way: parse the
output of the index definition that get from calling pg_index_def().

For that, this patch introduces the notion of SQL support that depends on
PostgreSQL major version. If no major-version specific query is found in the
pgloader source tree, then we use the generic one.

Fixes #860.
2018-11-07 21:23:56 +01:00
Dimitri Fontaine
207cd82726 Improve SQLite type names parsing.
Allow spaces in more random places, as SQLite doesn't seem to normalize the
user input. Fixes #548 again.
2018-11-07 11:01:06 +01:00
Dimitri Fontaine
f8460c1705 Allow usernames and dbnames starting with digits (again).
It turns out that the rules about the names of users and databases are more
lax than pgloader would know, so it might be a good move for our DSN parsing
to accept more values and then let the source/target systems to complain
when something goes wrong.

See #230 which got broke again somewhere.
2018-10-20 19:28:19 +02:00
Jason Rigby
6e7ea90806 add cl-ironclad and cl-babel dependencies to docker builds (#854) 2018-10-18 18:56:40 +02:00
Larry Gebhardt
0e6f599282 Add Docker build instructions (#853) 2018-10-18 18:55:56 +02:00
Dimitri Fontaine
7b487ddaca Add a Citus distribution test case, from the citus tutorial. 2018-10-18 15:42:17 +02:00
Dimitri Fontaine
d3b21ac54d Implement automatic discovery of the Citus distribution rules.
With this patch, the following distribution rule

   distribute companies using id

is equivalent to the following distribution rule set, given foreign keys in
the source schema:

   distribute companies using id
   distribute campaigns using company_id
   distribute ads using company_id from campaigns
   distribute clicks using company_id from ads, campaigns
   distribute impressions using company_id from ads, campaigns

In the current code (of this patch) pgloader walks the foreign-keys
dependency tree and knows how to automatically derive distribution rules
from a single rule and the foreign keys.
2018-10-18 15:31:29 +02:00
Dimitri Fontaine
8112a9b54f Improve Citus Distribution Support.
With this patch it's now actually possible to backfill the data on the fly
when using the "distribute" new commands. The schema is modified to add the
distribution key where specified, and changes to the primary and foreign
keys happen automatically. Then a JOIN is generated to get the data directly
during the COPY streaming to the Citus cluster.
2018-10-16 18:53:41 +02:00
Dimitri Fontaine
760763be4b Use the constraint name when we have it.
That's important for Citus, which doesn't know how to ADD a constraint
without a name.
2018-10-10 15:44:21 -07:00
Dimitri Fontaine
381ac9d1a2 Add initial support for Citus distribution from pgloader.
The idea is for pgloader to tweak the schema from a description of the
sharding model, the distribute clause. Here's an example of such a clause:

   distribute company using id
   distribute campaign using company_id
   distribute ads using company_id from campaign
   distribute clicks using company_id from ads, campaign

Given such commands, pgloader adds the distibution key to the table when
needed, to the primary key definition of the table, and also to the foreign
keys that are pointing to the changed primary key.

Then when SELECTing the data from the source database, the idea is for
pgloader to automatically JOIN the base table with the source table where to
find the distribution key, in case it was just added in the schema.

Finally, pgloader also calls the following Citus commands:

  SELECT create_distributed_table('company', 'id');
  SELECT create_distributed_table('campaign', 'company_id');
  SELECT create_distributed_table('ads', 'company_id');
  SELECT create_distributed_table('clicks', 'company_id');
2018-10-10 14:35:12 -07:00
Dimitri Fontaine
344d0ca61b Implement AFTER SCHEMA sql code blocks.
This allows pgloader users to run SQL commands in between pgloader's schema
creation and the actual loading of the data.
2018-10-10 11:08:28 -07:00
Jon Snell
0957bd0efa Fix pgloader bug #844 by adding support for mssql real types (#845) 2018-10-05 12:47:54 +02:00
Dimitri Fontaine
d356bd501b Accept even more ragged date format input.
When parsing a date string from a date format, accept that the ms or us part
be completely missing, rather than just missing some digits.

Fixed #828.
2018-09-10 19:37:36 +02:00
Dimitri Fontaine
5119d864f4 Assorted bug fixes in the context of Redshift support as a source.
The catalog queries used in pgloader have to be adjusted for Redshift
because this thing forked PostgreSQL 8.0, which is a long time ago now.
Also, we had a couple bugs here and there that were not really related to
Redshift support but were shown in that context.

Fixes #813.
2018-09-04 11:49:21 +02:00
Dimitri Fontaine
0f58a3c84d Assorted fixes: catalogs SQLtypes and MySQL decoding as.
It turns out that when trying to debug "decoding as" the SQLtype listing
support in sqltype-list was found broken, so this patch fixes it. Then goes
on to fix the DECODING AS filters support, which we have switched to using
the better regexp-or-string filter struct but forgot to update the matching
code accordingly.

Fixes #665.
2018-08-31 22:51:41 -07:00
Dimitri Fontaine
4fbfd9e522 Refrain from using regexp_match() function, introduced in Pg10.
Instead use the substring() function which has been there all along.

See #813.
2018-08-22 10:52:01 +02:00
Dimitri Fontaine
c9b905b7ac Simplify our ASD system definition by using :serial t.
This allows to drop manually maintained list of files dependencies, instead
implying them by the order in which we list the files.
2018-08-20 11:55:47 +02:00
Dimitri Fontaine
cb633aa092 Refrain from some introspections on non-PGDG PostgreSQL variants.
When dealing with PostgreSQL protocol compatible databases, often enough
they don't support the same catalogs as PostgreSQL itself. Redshift for
instance lacks foreign key support.
2018-08-20 11:52:59 +02:00
Dimitri Fontaine
d3bfb1db31 Bugfix previous commit: filter list format changed.
We now accept the more general string and regex match rules, but the code to
generate including and excluding lists from the catalogs had not been updated.
2018-08-20 11:50:50 +02:00
Dimitri Fontaine
fc3a1949f7 Add support for PostgreSQL as a source database.
It's now possible to use pgloader to migrate from PostgreSQL to PostgreSQL.
That might be useful for several reasons, including applying user defined
cast rules at COPY time, or just moving from an hosted solution to another.
2018-08-20 11:09:52 +02:00
Dimitri Fontaine
1ee389d121 Fix parsing empty hostname fields in pgpass.
Fixes #823.
2018-08-14 10:07:05 +03:00
uniquestring
34cc25383a Improved Dockerfiles/docker image size (#821)
* Add dockerfiles to .dockerignore

Otherwise changes in the dockerfiles would invalidate the cache

* Rewrite Dockerfile

- Fix deprecated MAINTAINER instruction
- Move maintainer label to the bottom (improving cache)
- Tidy up apt-get
- Use COPY instead of ADD
  see https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#add-or-copy
- Remove WORKDIR instruction (we don't really need this)
- Combine remaining RUN layers to reduce layer count
- Move final binary instead of copying (reduce image size)

* Use -slim image an multistage build

Reduce size by using multistage builds and the -slim image.
Use debian:stable instead of an specific code name (future proof).

* [cosmetic] indent Dockerfile instructions

Make it easier to see where a new build stage begins

* Rewrite Dockerfile.ccl

Apply the same changes to Dockerfile.ccl as we did for Dockerfile
2018-08-11 01:08:00 +02:00
Christoph Berg
1a811707c6 releasing package pgloader version 3.5.2-3 2018-07-31 16:24:41 +02:00
alexknips
5ca3ee8aad Fix documentation of default MySQL cast rules (#815)
The default rule is `type int to bigint    when  (>= 10 precision)`.
2018-07-20 14:38:06 +02:00
Dimitri Fontaine
46d14af0d3 Add more default rules to MySQL datetime handling.
Given the variety of ways to setup default behavior for datetime and
timestamp data types in MySQL, we need yet more default casting rules. It
might be time to think about a more principled way to solve the problem, but
on the other hand, this ad-hoc one also comes with full overriding
flexibility for the end user.

Fixes #811.
2018-07-08 20:37:06 +02:00
Christoph Berg
1844823bce Rename regress test to ssl
And move ca-certificates dependency to correct test
2018-07-05 21:52:54 +02:00
Christoph Berg
a199db1ae4 Debian: Make cl-pgloader test depend on ca-certificates
Make cl-pgloader test depend on ca-certificates so the snakeoil certificate is
recognized as a valid CA. (Needs the /etc/ssl/certs/*.0 file.)
2018-07-05 19:07:56 +02:00
Dimitri Fontaine
1b150182dc Fix cl-csv delimiter type.
Travis spotted a bug with CCL that I failed to see, and that happens with
Clozure-CL but not with SBCL apparently:

2018-07-03T21:04:11.053795Z FATAL The value "\\\"", derived from the initarg :DELIMITER, can not be used to set the value of the slot CL-CSV::DELIMITER in #<CL-CSV::READ-DISPATCH-TABLE-ENTRY #x30200143DDCD>, because it is not of type (VECTOR (OR (MEMBER T NIL) CHARACTER)).

To fix, prefer the syntax #(#\\ #\") rather than "\\\"".
2018-07-04 01:32:40 +02:00
Christoph Berg
4eb8c7367f releasing package pgloader version 3.5.2-2 2018-07-03 22:53:02 +02:00
Christoph Berg
852b3bc888 debian: Test cl-pgloader through sbcl --eval. 2018-07-03 22:49:27 +02:00
Christoph Berg
647bf4cb86 debian/rules: invoke help2man without path 2018-07-03 22:22:01 +02:00
Christoph Berg
d46c3b8c59 debian/rules: Properly format buildapp invocation 2018-07-03 20:16:29 +02:00
Christoph Berg
bba850479b debian: Skip building and manpage generation in arch-indep builds. 2018-07-03 20:13:15 +02:00
Christoph Berg
ded148228d debian: Install pgloader.asd into correct location. (Closes: #857226) 2018-07-03 18:48:04 +02:00
Christoph Berg
4f5e426fc7 debian: #864309 was fixed in 3.5.2-1 2018-07-03 18:27:51 +02:00
Christoph Berg
8263e587f0 debian/source/options: Ignore changes in src/params.lisp (release vs non-release). 2018-07-03 17:17:28 +02:00
Christoph Berg
906fd96bf1 debian: Build manpage using help2man. 2018-07-03 17:13:35 +02:00
Christoph Berg
b4fae61d41 debian/copyright: syntax fixups 2018-07-03 17:13:35 +02:00
Dimitri Fontaine
8537bd661f Back to not being a release.
Maybe I should find a way to avoid this extra back-and-forth commit.
Someday.
2018-07-03 17:11:38 +02:00
Dimitri Fontaine
63af7e7373 Release 3.5.2.
This release fixes debian packaging, includes support for Redhift as a
target, and also fixes some bugs.
2018-07-03 16:58:55 +02:00
Christoph Berg
cb528c2e19 All included test data has been verified as free, stop building a +dfsg tarball. 2018-07-03 16:37:43 +02:00
Christoph Berg
f19e301c81 debian: Build sphinx docs
While we are at it, remove built docs on clean
2018-06-25 15:02:32 +02:00
Christoph Berg
7a974d712e docs: Remove sidebar_collapse: false
Sphinx's alabaster module on Debian stretch doesn't support
sidebar_collapse yet; remove the setting so the docs build everywhere
2018-06-25 14:48:29 +02:00
Christoph Berg
a1d42028a3 Build and install new sphinx docs instead. 2018-06-25 12:47:20 +02:00
Dimitri Fontaine
9661c5874d Fix previous patch.
It's easy to avoid having the warning about unused lexical variable with the
proper declaration, that I failed to install before because of a syntax
error when I tried. Let's fix it now that I realise what was wrong.
2018-06-23 00:50:35 +02:00
Dimitri Fontaine
8930734bea Ensure unquoted file names for logs and data.
The previous code could create files having as an example the following,
unhelpful name: \"errors\"/\"err\".\"errors\".log.

Fix #808.
2018-06-22 23:02:07 +02:00
Christoph Berg
ee44f19815 debian: Enable SSL in src/hooks.lisp. 2018-06-22 14:35:59 +02:00
Christoph Berg
2160d0abb2 debian: force SSL usage in test via PGSSLMODE 2018-06-22 14:25:12 +02:00
Dimitri Fontaine
047cf84341 Add support for PGSSLMODE environment variable.
PostgreSQL supports many environment variable to drive its connection
behavior, as documented at the following reference:

  https://www.postgresql.org/docs/current/static/libpq-envars.html

We don't yet support everything, adding them one piece at a time.
2018-06-22 14:13:15 +02:00
Christoph Berg
7b4821e26c debian: Depend on the libssl version cl-plus-ssl depends on. 2018-06-22 10:55:16 +02:00
Christoph Berg
eaf3370a16 debian/rules: remove orig target 2018-06-22 10:30:32 +02:00
Christoph Berg
ae26ca5c1d debian: remove debian/home on clean 2018-06-22 10:15:20 +02:00
Christoph Berg
12af803612 debian: test installed pgloader binary 2018-06-22 10:13:32 +02:00
Christoph Berg
20e5b0bf2a debian/tests: Depend on postgresql, and run test as root
Prepare to test SSL connections, but do not exercise the test yet,
cl-plus-ssl is still broken.
2018-06-21 15:36:52 +02:00
Dimitri Fontaine
7d8ac3b352 Reproduce a test case from issue #795. 2018-06-15 22:14:04 +02:00
Dimitri Fontaine
a0bac47101 Refrain from TRUNCAT'ing an empty list of tables.
Fixed #789.
2018-06-15 17:46:31 +02:00
Dimitri Fontaine
8c2cda75e5 Allow more punctuation signs in the parsers: dollar and percent.
For some reasons some people might use those in their connection strings, as
part of a username or such.

Fixes #809.
2018-06-15 17:26:51 +02:00
Christoph Berg
7220fc2038 debian: Add watch file. 2018-06-04 12:58:35 +02:00
Christoph Berg
0f1da26a27 Run only allcols.load test for now 2018-06-04 12:52:30 +02:00
Christoph Berg
30f90cb848 test/Makefile: Allow configuring the superuser database name
Also, don't ignore errors while setting up the database
2018-06-04 10:52:14 +02:00
Dimitri Fontaine
dfedce2aba Fix support for discovery of DBF target table name.
When the LOAD command does not provide the name of the target table for a
DBF source, we can get the name of the table from the DBF file itself. That
feature got broken, here's a fix.

Fix #805.
2018-06-01 11:23:51 -04:00
Christoph Berg
4cd26c09fd Add new B-D cl-zs3 2018-05-27 21:37:06 +02:00
Björn Häuser
ad7ce4a66b Fix documentation for binary datatype changes (#801)
When reading http://pgloader.readthedocs.io/en/latest/ref/mysql.html I came across the section of binary. On the documentation the indentation is off and is kinda hard to read :)
2018-05-26 13:54:28 +02:00
Dimitri Fontaine
bcf9cf9bf4 Redshift doesn't have support for the COPY format.
Instead, it needs to parse CSV files. On the other hand, as we don't have to
implement the COPY protocol from within pgloader for Redshift (because it's
using S3 as the data source, and not STDIN), we don't need the level of
control that we are using when talking to a normal PostgreSQL.
2018-05-23 13:45:16 +02:00
Dimitri Fontaine
3db3ecf81b Review Redshift data type dumb-down choices.
It's a little more involved that what was done previously. In particular we
need to pay attention to MySQL varchar(x) and transform them into something
big enough when counting bytes rather than chars, like varchar(3x).

Then there's the "text" datatype to take into account, and some more.
2018-05-23 13:43:28 +02:00
Dimitri Fontaine
05b4c7c978 Fix default MySQL casting rules for on update current timestamp.
Due to how type casting matching is implemented in pgloader, we need to add
two more MySQL casting rules in the default pgloader set to handle
specically the case when a datetime or timestamp column in MySQL has the
"extra" bit of information "on update current timestamp".

That's because for a casting rule to match, both the type definition and the
casting rule must have the :on-update-current-timestamp property positionned
the same, so that the existing default rules would not apply.
2018-05-23 10:34:34 +02:00
Dimitri Fontaine
9ac400b623 Implement copying data through S3 for Redshift.
Also add some schema-level support by disabling our usual index and
constraint support when the target is Redshift, because it doesn't support
those parts of SQL.

The S3 parameters are read form either the process environment variables or
from the AWS configuration files in ~/.aws.
2018-05-21 21:22:15 +02:00
Dimitri Fontaine
d4dc4499a8 Add schema migration support for Redshift as a target.
Redshift looks like a very old PostgreSQL (8.0.2) with some extra features
and a very limited selection of data types. In this patch we parse the
PostgreSQL version() function output and automatically determine if we're
connected to Redshift.

When connected to Redshift, we then dumb-down our target catalogs to the
subset of data types that Redshift actually does support.

Also, some catalog queries can't be done in Redshift, and 8.0 didn't have
fully compliant VALUES statement, so we use a temporary table in places
where we used to use SELECT ... FROM (VALUES(...)) in pgloader.

COPYing data to Redshift isn't possible with just this set of changes,
because Redshift also don't support the COPY FROM STDIN form. COPY sources
are limited, and another patch will have to be cooked to prepare the data
from pgloader into a format and location that Redshift knows how to handle.

At least, it's possible to migrate a database schema to Redshift already.
2018-05-19 19:16:58 +02:00
Dimitri Fontaine
8fce6c84fc Move all typemod functions at the same place.
Having the parse-column-typemod function in the pgloader.transforms package
makes it available from everywhere in the pgloader code base.
2018-05-19 19:15:30 +02:00
Dimitri Fontaine
1f354131d0 Release pgloader 3.5.1.
Lots of bug fixes did happen, time to release.
2018-05-17 10:41:40 +02:00
Dimitri Fontaine
f30f596eca Review bundle and regression test facilities.
Some path computation didn't work when trying to regression test the
produced bundle.

Also, the bundle building steps would use the pgloader system definition and
dependencies from what's currently available in Quicklisp rather than from
the local pgloader.asd being built.
2018-05-17 10:39:32 +02:00
Dimitri Fontaine
1fe835d31b Add sample data for fields escaped by double-quote.
See #787.
2018-04-29 19:05:52 +02:00
Dimitri Fontaine
a392328dad Allow any ordering of guards and extra cast rule clauses.
It used to be that extra were forced to being parsed before guards, but
there's no reason why a user wouldn't think to write its clauses the other
way round, so add support for that as well.

See #779.
2018-04-29 19:00:20 +02:00
Dimitri Fontaine
01f877bad7 Testing a change in the way we load CL+SSL.
Apparently cl+ssl needs to be reloaded a very specific way at image startup
time, and provides a function to do just that. Let's try and use this piece
of magic rather cffi:load-foreign-library directly.
2018-04-16 15:46:16 +02:00
Dimitri Fontaine
cb9e01f4d9 Code review for previous commit.
See #771.
2018-03-27 14:55:31 +02:00
Goo
c6271506ab Add a new transformation function: hex-to-dec
Closes #771
2018-03-27 14:51:34 +02:00
Dimitri Fontaine
792c0d0357 Typo fix in docs about concurrency settings. 2018-03-26 14:05:05 +02:00
Dimitri Fontaine
e4dca1a086 Implement support for MySQL useSSL=true|false option.
The MySQL connection string parameter for SSL usage is useSSL, so map an
option name to our expected values for sslmode in database connection
strings.

See #748.
2018-03-16 16:41:40 +01:00
Denis Ivanov
a7fd776ecd Update mssql.rst (#762)
Typo in like word
2018-03-13 17:08:51 +01:00
Andy Chosak
ceb5c85f56 fix minor error in docs about logfile location (#761)
The default logfile location seems to be `/tmp/pgloader/pgloader.log`,
not `/tmp/pgloader.log` as currently documented. This is observable in
practice and also in [the source
code](5b227200a9/src/main.lisp (L110)).
2018-03-13 10:37:48 +01:00
Dimitri Fontaine
3112adea6f Fix date-with-no-separator transform.
The expected string length was hard-coded, which is not a good idea given
the support for custom date formats.
2018-03-07 23:07:00 +01:00
Dimitri Fontaine
42c9ccfbb3 DB3: pick user's choice of schema name when given.
We would hard-code the schema name into the table's name in the DB3 case on
the grounds that a db3/dbf file doesn't have a notion of a schema. But when
the user wants to add data into an existing target table, then we merge the
catalogs and must keep the given target schema and table name.

Fix #701.
2018-02-25 23:39:52 +01:00
Dimitri Fontaine
784aff6ed5 Handle parsing errors in pgpass gracefully.
Accept empty password lines in ~/.pgpass files, and when otherwise pgloader
fails to parse or process the file log a warning and return a nil password.

See #748.
2018-02-25 00:12:06 +01:00
Dimitri Fontaine
bd7eb38720 Review Dockerfiles.
Upgrade to stretch in the docker builds and improve disk footprint to some
degree, using classic docker tricks.

See #748.
2018-02-25 00:00:42 +01:00
Dimitri Fontaine
5c10f12a07 Fix duplicate package names.
In a previous commit we re-used the package name pgloader.copy for the now
separated implementation of the COPY protocol, but this package was already
in use for the implementation of the COPY file format as a pgloader source.

Oops.

And CCL was happily doing its magic anyway, so that I've been blind to the
problem.

To fix, rename the new package pgloader.pgcopy, and to avoid having to deal
with other problems of the same kind in the future, rename every source
package pgloader.source.<format>, so that we now have pgloader.source.copy
and pgloader.pgcopy, two visibily different packages to deal with.

This light refactoring came with a challenge tho. The split in between the
pgloader.sources API and the rest of the code involved some circular
depencendies in the namespaces. CL is pretty flexible here because it can
reload code definitions at runtime, but it was still a mess. To untangle it,
implement a new namespace, the pgloader.load package, where we can use the
pgloader.sources API and the pgloader.connection and pgloader.pgsql APIs
too.

A little problem gave birth to quite a massive patch. As it happens when
refactoring and cleaning-up the dirt in any large enough project, right?

See #748.
2018-02-24 19:24:22 +01:00
Dimitri Fontaine
4301503df2 Add a new test case for {{ENVVAR}} template support.
See #555.
2018-02-20 14:45:07 +01:00
Dimitri Fontaine
48af01dbbc Fix implementation of foreign keys in data only mode.
In data-only mode, the foreign keys parameter (which defaults to True) means
something special: we remove the fkey definitions prior to the data only
load then re-install the fkeys.

This got broken in a previous commit, the WITH clause option being processed
like the other DDL ones that only make sense when creating the schema. While
fixing the setting in copy-database, we have to also fix a nesting bug in
complete-pgsql-database that would prevent fkey to be installed again at the
end of the load.

This patch not only fix that choice, but also review the implementation of
the drop-pgsql-fkeys support function to use more modern internal API,
preparing a list of SQL statements to be sent to the psql-execute level.

Fixes #745.
2018-02-19 22:07:43 +01:00
Dimitri Fontaine
e129e77eb6 Fix SQL execute counters maintenance. 2018-02-19 22:06:51 +01:00
Dimitri Fontaine
957c975b9b Improve summary reporting of errors.
Not all error paths are counted correctly at this point, this commit
improves the situation in passing. A thorough review should probably be
planned sometime.
2018-02-19 22:05:53 +01:00
Dimitri Fontaine
4fed8c5eca Fix support for newid() from MS SQL.
Several places in the code are involved to deal with the default values from
MS SQL. The catalog query is dealing with strange quoting rules on the
source side and used to fill in directly the PostgreSQL expected value. But
then the quoting of a function call wasn't properly handled.

Rather than coping with the quoting rules here, have the catalog query
return a pgloader specific placeholder "GENERATE_UUID". Then the MS SQL
specific code can normalize that to the symbol :generate_uuid. Then the
generic PostgreSQL DDL code can implement the proper replacement for that
symbol, not having to know where it comes from.

Fix #742.
2018-02-17 00:25:33 +01:00
Dimitri Fontaine
0a315214f3 Some improvements on the GitHub issue template.
Well, let's be more direct to the user.
2018-02-17 00:18:42 +01:00
Dimitri Fontaine
a4a9fdf668 Add a GitHub issue template. 2018-02-17 00:12:36 +01:00
Dimitri Fontaine
5e3acbb462 When merging catalogs, "float" and "double precision" the same type.
PostgreSQL understands both spellings of the data type name and implements
float as being a double precision value, so we should refrain from any
warning about that non-discrepency when doing a data-only load.

Should fix #746.
2018-02-16 23:42:46 +01:00
Dimitri Fontaine
67a1b1d408 Fix SQLite SQL queries.
Some copy-paste errors made their way to those queries and prevented usage
of pgloader, but I missed that because I was using a previous version of the
query text files in my interactive environment.

Also, SQLite doesn't like the queries finishing with a semi-colon, so remove
them.

Fixes #747.
2018-02-16 17:51:58 +01:00
Christoph Berg
8e3ebd5f1e Remove obsolete "make docs" target
pgloader.1.md doesn't exist anymore
2018-02-13 21:02:59 +01:00
Xavier Noria
d0fbd2bf5c Typo fix in the README (MacOSX, macOs)
updates the name of Mac OS X (#743)
2018-02-13 11:28:26 +01:00
Dimitri Fontaine
ea6c91b429 Fix "drop default" casting rules for all databases.
The support for drop default in (user defined) casting rules was completely
broken in SQLite, because the code didn't even bother looking at what's
returning after applying the casting rules.

This patch fixes the code so that is uses the pgcol instance's default
value, as per after applying casting rules. The bug also existed in a subtle
form for MySQL and MS SQL, but would only show up there when the default
value is spelled using a known variation of “current timestamp”.
2018-02-08 23:33:51 +01:00
Dimitri Fontaine
29506e6fa6 Assorted fixes for SQLite.
First review the `sqlite_sequence` support so that we can still work with
databases that don't have this catalog, which doesn't always exists -- it
might depend on the SQLite version though.

Then while at it use the sql macro to host the SQLite “queries” in their own
files, enhancing the hackability of the system to some degrees. Not that
much, because we have to use a lot of PGRAMA command and then the column
output isn't documented with the query text itself.
2018-02-08 22:55:15 +01:00
Dimitri Fontaine
20d7858e27 Implement SQLite casting rule for “decimal”.
Fix #739.
2018-02-07 20:47:47 +01:00
Dimitri Fontaine
976e4c1c1d Fix SQLite processing of columns with a sequence attached.
The handling of the SQLite catalogs where fixed in a previous patch, but
either it's been broken in between or it never actually worked (oops).

Moreover, the recent patch about :on-update-current-timestamp changed the
casting rules matching code and we should position :auto-increment from the
SQLite module rather than "auto_increment" as before. That's better, but
wasn't done.

Fix #563 again, tested with a provided test-case (thanks!).
2018-01-31 22:49:10 +01:00
Dimitri Fontaine
4612e68435 Implement support for new casting rules guards and actions.
Namely the actions are “keep extra” and “drop extra” and the casting rule
guard is “with extra on update current timestamp”. Having support for those
elements in the casting rules allow such a definition as the following:

      type timestamp with extra on update current timestamp
        to "timestamp with time zone" drop extra

The effect of such as cast rule would be to ignore the MySQL extra
definition and then refrain pgloader from creating the PostgreSQL triggers
that implement the same behavior.

Fix #735.
2018-01-31 15:17:05 +01:00
Dimitri Fontaine
5ecd03ceba Don't push-row a nil value.
In case of a failure to pre-process or transform values in the row that as
been read, we need to refrain from pushing the row into our next batch.

See #726, that got hit by the recent bug in the middle of something else
entirely.
2018-01-25 23:53:11 +01:00
Dimitri Fontaine
25152f6054 Add a restart-case for interactive debugging.
When dealing with MATERIALIZING VIEWS test cases and failing in the middle
of them, as it happens when fixing bugs, then it was tedious (to say the
least) to clean-up manually the view each time.

That said, for end-users, doing it automatically would risk cleaning-up the
wrong view definition if they had a typo in their pgloader command, say.

Common Lisp helps a lot here: we simply create a restart that is only
available interactively for the developers of pgloader!
2018-01-25 23:38:59 +01:00
Dimitri Fontaine
7b08b6e3d3 Refrain from creating tables in “data only” operations.
We forgot that rule in the case of creating the target tables for the
materializing views commands, which led to surprising and wrong behavior.

Fix #721, and add a new test case while at it.
2018-01-25 23:32:31 +01:00
Dimitri Fontaine
5ba42edb0c Review misleading error message with schema not found.
It might be that the schema exists but we didn't find what we expected to
in there, so that it didn't make it to pgloader's internal catalogs. Be
friendly to the user with a better error message.

Fix #713.
2018-01-25 23:29:36 +01:00
Dimitri Fontaine
a603cd8882 Step back on (safety 0) optimization.
It doesn't appear worth it at this time yet, too risky.
2018-01-24 23:26:37 +01:00
Dimitri Fontaine
6ae3bd1862 Docs cleanup.
Don't maintain generated files in git, it's useless (thanks mainly to
readthedocs), also remove the previous format of the docs.
2018-01-24 22:47:37 +01:00
Dimitri Fontaine
f86371970f Review the pgloader COPY implementation further.
Refactor file organisation further to allow for adding a “direct stream”
option when the on-error-stop behavior has been selected. This happens
currently by default for databases sources.

Introduce the new WITH option “on error resume next” which forces the
classic behavior of pgloader. The option “on error stop” already existed,
its implementation is new.

When this new behavior is activated, the data is sent to PostgreSQL
directly, without intermediate batches being built. It means that the whole
operation fails at the first error, and we don't have any information in
memory to try replaying any COPY of the data. It's gone.

This behavior should be fine for database migrations as you don't usually
want to fix the data manually in intermediate files, you want to fix the
problem at the source database and do the whole dance all-over again, up
until your casting rules are perfect.

This patch might also incurr some performance benenits in terms of both
timing and memory usage, though the local testing didn't show much of
anything for the moment.
2018-01-24 22:45:23 +01:00
Dimitri Fontaine
8ee799070a Simplify format-vector-row a lot.
Copy some code over from cl-postgres-trivial-utf-8 and add the support for
PostgreSQL COPY escaping right at the same place, allowing to allocate our
formatted utf-8 buffer only once, with the escaping already installed.

This patch was expected to be more about perfs, but it's actually only about
code cleaning it seems, as it doesn't make a big difference in the testing I
could do here.

That said, getting rid of one intermediate buffer should be nice in terms of
memory management.
2018-01-24 00:10:40 +01:00
Dimitri Fontaine
adf03c47ad Clean up source code organisation.
The copy format and batch facilities are no longer the meat of your
PostgreSQL support in the src/pgsql directory, so have them leave in their
own space.
2018-01-23 19:52:13 +01:00
Dimitri Fontaine
3bb128c5db Review format-vector-row.
This function prepares the data to be sent down to PostgreSQL as a clean
COPY text with unicode handled correctly. This commit is mainly a clean-up
of the function, and also adds some smarts to try and make it faster.

In testing, the function is now tangentially faster than before, but not by
much. The hope here is that it's now easier to optimize it.
2018-01-22 21:37:14 +01:00
Dimitri Fontaine
ba2d8669c3 Add support for the newer Qmynd error handling.
We now have a qmynd-impl::decoding-error condition to deal with, which as a
very good error reporting, so that we don't need to poke into babel details
anymore. The error message adds the column name, type and collation to the
output, too.

We keep the babel handlers for a while until people have all migrated to
using the patch in qmynd.

With the fix to Qmynd, Fix #716.
2018-01-22 16:14:05 +01:00
Dimitri Fontaine
572f6a3dbe Fix CSV separator parsing.
The previous patch introduced parser conflicts and we couldn't parse some
expressions any more, such as the following:

        fields escaped by '\',

It's now possible to represent single quote as either '''', '\'', or '0x27'
and we still can parse '\' as being a single backslash character.

See #705.
2018-01-14 15:33:47 +01:00
Julien Danjou
bb6c3d0a32 doc: fix a few link format (#711)
They are still in Markdown format, remove or move to rst.
2018-01-09 19:22:21 +01:00
Olivier Macchioni
b683292784 Fix broken link to https://pgloader.io/ (#706) 2017-12-28 18:59:50 +01:00
Dimitri Fontaine
81be9ae60e Implement support for \' as the CSV separator.
The option "fields optionally enclosed by" was missing a way to easily
specify a single quote as the quoting character. Add '\'' to the existing
solution '0x27' which isn't as friendly.

See #705.
2017-12-26 21:04:06 +01:00
Dimitri Fontaine
07cdf3e7e5 Use MySQL column names in MySQL queries.
The query for concurrency-support didn't get the memo that we should ignore
PostgreSQL identifier-case when querying the source MySQL database. Fix the
query string to include column names as given by the MySQL catalogs.

In bug report #703, the problem is found in PostgreSQL queries. This has
been fixed before already. Trying to reproduce the bug produced an error in
the concurrency-support query instead, so let's fix this one.

Fix #703.
2017-12-22 14:15:46 +01:00
Dimitri Fontaine
25c79dfebc Switch the documentation to using Sphinx.
The website is moving to pgloader.org and readthedocs.io is going to be
integrated. Let's see what happens. The docs build fine locally with the
sphinx tools and the docs/Makefile.

Having separate files for the documentation should help ease the maintenance
and add new topics, such as support for Common Lisp Hackers level docs,
which are currently missing.
2017-12-21 17:45:09 +01:00
Dimitri Fontaine
21f8baabab Update CNAME 2017-12-21 17:21:19 +01:00
Dimitri Fontaine
62b45e4d16 Fix log type output for summary files.
A summary file could be asked for which is not of either csv, json or copy
format. Then use the text format.

Fix #695.
2017-12-06 20:57:19 +01:00
Dimitri Fontaine
b7d87a9eb1 Fix MySQL bit(1) casting function.
When this function was written, pgloader would get an array of numbers over
the wire, nowadays it looks like it's receiving an array of characters
instead (in other words, a string).

Improve the `bits-to-boolean` function to accept either input, and raise an
error in another case.

My theory is that something changed either in MySQL (with version 10) or in
the Qmynd driver somehow... but tonight we just go easy and fix the bug
locally rather than try and understand where it might be coming from.

Fixes #684.
2017-12-03 23:06:54 +01:00
Dimitri Fontaine
c05183fcba Implement support for Foreign Tables and Partitionned Tables.
Due to the way pgloader queries the PostgreSQL catalogs, it restricted the
target table to be “ordinary” tables, as per the relkind description in the
https://www.postgresql.org/docs/current/static/catalog-pg-class.html
PostgreSQL documentation.

Extend this to support relkind of 'r', 'f' and 'p'.

Fixes #587, fixes #690.
2017-12-01 22:13:47 +01:00
Dimitri Fontaine
52f13456d9 Rewrite the SQLite type name parsing.
SQLite being very very liberal in type names (I think it accepts anything
and everything actually), our simple approach of tokenizing the input and
discarding noise words is not enough.

In this patch, we implement a new light parser for the SQLite type names to
better cope with noise words and random spacing of the catalog values that
SQLite failed to normalize. Well it didn't attempt, apparently.

Fix #548.
2017-11-28 18:19:12 +01:00
Dimitri Fontaine
2b861a3e96 New SQLite test cases. 2017-11-25 16:31:42 -08:00
Dimitri Fontaine
87f35e8852 Refrain from loading incomplete foreign key references in SQLite.
Given INCLUDING and EXCLUDING support it might be possible that we migrate a
table from SQLite without having selecting tables pointed to by foreign
keys. In that case, pgloader should still be able to load the data
definition and content fine, just skipping the incomplete fkey definitions.

That's implemented in this patch, which has been tested thanks to a
reproducible data set being made available!

Fixes #681.
2017-11-25 16:31:41 -08:00
Olleg Samoylov
62d776f5e8 Uppercase the SQL queries for MS SQL
In cases when the MS SQL database is setup with a case sensitive collation, then it would not find the catalog objects referenced from the query. To fix, just use UPPERCASE names, as they do work in both case insensitive and case sensitive collations.

In passing, add `system-index.txt` to `.gitignore` (generated by make).

Fixes #651.
2017-11-25 02:23:25 +01:00
Dimitri Fontaine
d69b72053a Implement default unsigned casting rules for MySQL.
The following casting rules are now the default for MySQL:

  - type tinyint when unsigned to smallint   drop typemod
  - type smallint when unsigned to integer  drop typemod
  - type mediumint when unsigned to integer  drop typemod
  - type integer when unsigned to bigint    drop typemod

Fixes #678.
2017-11-22 10:29:11 -08:00
Dimitri Fontaine
5c60f8c35c Implement a new type casting guard: unsigned.
MySQL allows using unsigned data types and pgloader should then target a
signed type of a larger capacity so that values can fit. For example, the
data definition “smallint(5) unsigned” should be casted to “integer”.

This patch allows user defined cast rules to be written against “unsigned”
data types as per their MySQL catalog representation.

See #678.
2017-11-22 10:26:03 -08:00
Dimitri Fontaine
6964764fb4 Find schema names unquoted.
When doing a MySQL to PostgreSQL migration in data only mode, pgloader
matches schema names found on both source and target database, and much like
with table names must do so ensuring unquoted schema names.

Otherwise we fail to find the schema name again, because one spelling has
the quotes, but not the other one, when using the “quote identifiers”
option.

Fix #659, at least some forms of it.
2017-11-19 17:12:21 +01:00
Dimitri Fontaine
1d7706c045 Fix the MySQL encoding error handling.
The error handling would try and read past the error buffer in some cases,
when the BABEL lib would give a position that's after the buffer read.

Fix #661.
2017-11-13 11:27:47 +01:00
Christoph Berg
5c4a64197d Run regression tests via autopkgtest 2017-11-12 21:05:48 +01:00
Christoph Berg
78df9c314a Sync Depends to cl-pgloader.
* Sync Depends to cl-pgloader.
* Priority: optional, move cl-pgloader to Section: lisp.
* Update S-V.
2017-11-11 17:14:18 +01:00
Christoph Berg
3002f4d30e Add new B-D cl-mustache and cl-yason. 2017-11-11 16:47:08 +01:00
Christoph Berg
28ea825d85 Run wrap-and-sort -st. 2017-11-11 16:39:19 +01:00
Dimitri Fontaine
db7a91d6c4 Add the MySQL target schema to the search_path.
In the next release, pgloader defaults to targetting a new schema named the
same as the MySQL database, because that's what makes more sense. But people
are used to having 'public' in the search_path and everything in there.

So when creating our target schema, when migrating from MySQL, arrange it so
that the new schema is in the search_path by issuing a command like:

  ALTER DATABASE plop SET search_path TO public, f1db;

And make this command visible in verbose (NOTICE) mode too, so that user can
see what happens.

Fix #654. I think.
2017-11-02 12:40:21 +01:00
Dimitri Fontaine
6b6c1c7d34 Add log entries for connection strings.
It helps a lot to debug what's happening, and it seems that we lost the
information when cleaning up the log levels in recent efforts to unclutter
the default output.
2017-11-02 12:38:45 +01:00
Dimitri Fontaine
501762d2f5 Update the website with the new Gumroad id. 2017-11-01 16:38:54 +01:00
Dimitri Fontaine
dd401c57f3 Fix a latent bug discovered in local testing with CCL.
It turns out that when using *print-pretty* in CCL we then have CL reader
references in the output, such as in the following example:

  QUERY: comment on table mysql.base64 is $#1=DXIDC_EMLAQ$Test decoding base64 documents$#1#$

Of course that's wrong, so prevent this from happening by
forcing *print-pretty* to nil in a top-level function. We still turn this on
in the monitor thread when printing error messages as those might contain
recursive data structures.
2017-10-21 21:06:35 +02:00
Dimitri Fontaine
0a88645eb5 Fix time measurements of the write activity.
When using --verbose or more detailed log messages, the summary prints
timings for both read and write operations separately. The write summary
timing took into account only the PostgreSQL batch activity, discarding the
formatting of the data done by pgloader.

As this formatting is quite heavy at the moment, the results are pretty
misleading without that information.
2017-10-21 21:04:55 +02:00
Dimitri Fontaine
a9afddf8ed Accept quoted namestrings as target type names for cast rules.
This allows passing "double precision" rather than float8, for example.

Fix #650.
2017-10-21 21:03:58 +02:00
Dimitri Fontaine
a28e9b3556 Prevent evaluating unused arguments in log-message.
A stop-gap has been installed to prevent sending too much trafic to the
monitor, but the log-message arguments were still evaluated, and the :data
level output from format-row-in-batch is pretty costly.
2017-10-16 17:26:07 +02:00
Dimitri Fontaine
b36f36b74e Add a (local) test case. 2017-10-16 17:25:44 +02:00
Dimitri Fontaine
9b80d2914c List files to load for system.
Install a new function in the hooks file. This function might help fix
--self-upgrade later, we keep it around for when we'll have time to see
about that.
2017-10-16 17:24:47 +02:00
Dimitri Fontaine
52720a5e6f Prefer QL overrides to ASDF setup.
The ql:*local-project-directories* is a much better facility for us to load
pgloader from the local PWD rather than from the QL distribution. It looks
like the previous method worked by accident, for once, and also downloaded
pgloader from QL, unnecessarily (we have the sources locally).
2017-10-03 13:47:48 +02:00
Dimitri Fontaine
5b227200a9 Fix error handling at monitor thread startup.
Errors such as failing to open the log file (maybe because of bad
permissions) weren't correctly handled. This fixes the problem by handling
the conditions at the lparallel task handler level and signaling a brand new
condition up to the main outside handler.

Fixes #638.
2017-10-03 01:23:59 +02:00
Dimitri Fontaine
2595ddaae3 Fix total-line in reporting.
We did it correctly for the bytes, and we need to apply the same logic to
other metric: the relevant information in the total summary line is the sum
from the data parts, not the sum from the postload parts.
2017-09-19 12:28:24 +02:00
Dimitri Fontaine
460fe6cc77 Fix quoting of default values for MariaDB 10 support.
The default values quoting changed in MariaDB 10, and we need to adjust in
pgloader: extra '' chars could defeat the default matching logic:

  "'0000-00-00'" is different from "0000-00-00"
2017-09-19 11:29:53 +02:00
Dan
62991bd5c5 Add missing column to GROUP BY. (#633) 2017-09-16 21:15:11 +02:00
Dimitri Fontaine
8a361a0ff8 Add support for multiple on update columns per table.
The MySQL special syntax "on update current_timestamp()" used to support
only a single column per table (in MySQL), and so did pgloader. In MariaDB
version 10 it's now possible to have several column with that special
treatment, so adapt pgloader to migrate that too.

What pgloader does is recognize that several columns are to receive the same
pre-update processing, and creates a single function that does the both of
them, as in the following example, from pgloader logs in a test case:

    CREATE OR REPLACE FUNCTION mysql.on_update_current_timestamp_onupdate()
      RETURNS trigger
      LANGUAGE plpgsql
      AS
    $$
    BEGIN
       NEW.update_date = now();
       NEW.calc_date = now();
       RETURN NEW;
    END;
    $$;
    CREATE TRIGGER on_update_current_timestamp
            BEFORE UPDATE ON mysql.onupdate
          FOR EACH ROW
      EXECUTE PROCEDURE mysql.on_update_current_timestamp_onupdate();

Fixes #629.
2017-09-15 01:04:57 +02:00
Dimitri Fontaine
b7347a567c Add test cases for MySQL.
At the moment it's a very manual process, and it might get automated
someday. Meanwhile it's still useful to have.

See #569 for an issue that got a test case added.
2017-09-14 15:59:10 +02:00
Dimitri Fontaine
a498313074 Implement support for MySQL FULLTEXT indexes.
PostgreSQL btree indexes are limited in the size of the values they can
index: values must fit in an index page (8kB). So when porting a MySQL full
text index over full documents, we might get into an error like the
following:

  index row size 2872 exceeds maximum 2712 for index "idx_5199509_search"

To fix, query MySQL for the index type which is FULLTEXT rather than BTREE
in those cases, and port it over to a PostgreSQL Full Text index with an
hard-coded 'simple' configuration, such as the following test case:

  CREATE INDEX idx_75421_search ON mysql.fcm_batches USING gin(to_tsvector('simple', raw_payload));

Of course users might want to use a better configuration, including proper
dictionnary for the documents. When using PostgreSQL each document may have
its own configuration attached and yet they can all get indexed into the
same index, so that's a task for the application developpers, not for
pgloader.

In passing, fix the list-typenames-without-btree-support.sql query to return
separate entries for each index type rather than an {array,representation}
of the result, as Postmodern won't turn the PostgreSQL array into a Common
Lisp array by default. I'm kept wondering how it worked before.

Fix #569.
2017-09-14 15:40:34 +02:00
Dimitri Fontaine
987c0703ad Some default values come properly quoted from MariaDB now.
Adjust the default value formating to check if the default value is already
single-quoted and only add new 'single quotes' when it's not the case.

Apparently ENUM default values in MariaDB 10 are now properly single quoted.
2017-09-14 15:39:04 +02:00
Dimitri Fontaine
dfac729daa Refrain from querying the catalogs again.
When we already have the information in the pgloader internal catalogs,
don't issue another MySQL query. In this case, it's been used to fetch the
list of columns and their data types so that we can choose to send either
`colname` or maybe astext(`colname`) as `colname` for some geographic types.

That's one less MySQL query per table.
2017-09-14 15:35:45 +02:00
Dimitri Fontaine
181f344159 Add support for current_timestamp() default spelling.
That's new in MariaDB 10 apparently.
2017-09-14 15:33:18 +02:00
Dimitri Fontaine
f921658866 Remove useless noise in the logs.
The individual CAST decisions are visible in the CREATE TABLE statements
that are logged a moment later. Also, calling `format-create-sql' on a
column definition that's not finished to be casted will process default
values before they get normalized, and issue WARNING to the poor user.

Not helpful. Bye.
2017-09-14 15:30:29 +02:00
Dimitri Fontaine
dbadab9e9e Implement a new “snake_case” quoting rule.
In passing, add the identifiers case option to SQLite support, which makes
it easier to test here, and add a table named "TableName" to our local test
database.

Fix #631.
2017-09-13 22:55:10 +02:00
Dimitri Fontaine
d2d4be2ed0 Fix test/csv-guess.load for old PostgreSQL.
In travis environment we still test with PostgreSQL 9.1 and 9.6, and there's
no reason for this test to use a modern spelling of create schema, after
all.

It works because the test/csv-before-after.load creates the schema and is
ran before test/csv-guess.load. That's good enough for now.
2017-09-09 00:59:39 +02:00
Dimitri Fontaine
38712d98e0 Fix regression testing.
Previous patch made regression failures obvious that were hidden by strange
bugs with CCL.

One such regression was introduced in commit
ab7e77c2d00decce64ab739d0eb3d2ca5bdb6a7e where we played with the complex
code generation for field projection, where the following two cases weren't
cleanly processed anymore:

  column text using "constant"
  column text using "field-name"

In the first case we want to load a user-defined constant in the column, in
the second case we want to load the value of the field "field-name" in the
column --- we just have different source and target names.

Another regression was introduced in the recent commit
01e5c2376390749c2b7041b17b9a974ee8efb6b2 where the create-table function was
called too early, before we have fetched *pgsql-reserved-keywords*. As a
consequence table names weren't always properly quoted as shown in the
test/csv-header.load file which targets a table named "group".

Finally, skip the test/dbf.load regression test when using CCL as this
environment doesn't have the necessary CP850 code page / encoding.
2017-09-09 00:51:07 +02:00
Dimitri Fontaine
ebf9f7a6a9 Review and cleanup the logging monitor thread.
Due to errors in regression testing when using CCL, review this part of
pgloader. It turns out that cl-log:stop-messenger on a text-stream-messenger
closes the stream, which isn't a good idea when given *standard-output*.

At least it makes CCL chokes when it then wants to output something of its
own, such as when running in --batch mode (which is nice because it outputs
more diagnostic information).

To solve that problem, initialize the text-stream-messenger with a broadcast
stream made from *standard-output*, which we now may close at will.
2017-09-08 23:03:41 +02:00
Dimitri Fontaine
e7f6505d7d Review compile time dependencies.
The parser files don't depend on the sources, it's the other way round
nowadays. Also, the responsability to decipher the *sections* context should
be restricted to the monitor.lisp file, which is now the case.

And this time, fix #628 for real.
2017-09-08 15:38:32 +02:00
Dimitri Fontaine
9be130cdbe Fix symbol export hacks to execute at load time.
It seems that when compiling with CCL in “batch” mode, that is using
buildapp, the local symbol exporting facility didn't work at all. It needs
to be run at load time so that the compiler sees the symbols.

Fix #628.
2017-09-08 12:33:24 +02:00
Dimitri Fontaine
a9e8bfd4d7 Support for colon characters in PostgreSQL socket path.
Google Cloud SQL instances are now using the following format for the name
of their socket <PROJECT-ID>:<REGION>:<INSTANCE_NAME>. We do that by
allowing to escape a colon in the socket directory name by doubling it, as
in the username field. It also allows to accept any character in the socket
directory name, which is a good cleanup.

Fix #621.
2017-08-30 15:22:42 +02:00
Dimitri Fontaine
d5072d11e5 Implement support for a pgpass file.
The implementation follows PostgreSQL specifications as closely as possible,
with the escaping rules and the matching rules. The default path where to
find the .pgpass (or pgpass.conf on windows) are as documented in PostgreSQL
too. Only missing are the file permissions check.

Fix #460.
2017-08-29 03:16:35 +02:00
Dimitri Fontaine
bcc934d7aa Cleanup.
Some code was pasted twice in src/api.lisp, and a defstruct with no slots
isn't spelled the way I did in previous patches. We use a defstruct with no
slots for defining a hierarchy on which to dispatch our pretty-print
function.
2017-08-26 20:31:24 +02:00
Dimitri Fontaine
33ab9bcdd5 Typo Fix. oops. 2017-08-25 22:21:34 +02:00
Dimitri Fontaine
01e5c23763 Add support for explicit TARGET TABLE clause in load commands.
It used to be that you would give the target table name as an option to the
PostgreSQL connection string, which is untasteful:

   load ... into pgsql://user@host/dbname?tablename=foo.bar ...

Or even, for backwards compatibility:

   load ... into pgsql://user@host/dbname?foo.bar ...

The new syntax makes provision for a separate clause for the target table
name, possibly schema-qualified:

   load ... into pgsql://user@host/dbname target table foo.bar ...

Which is much better, in particular when used together with the target
columns clause.

Implementing this seemingly quite small feature had impact on many parsing
related features of pgloader, such as the regression testing facility. So
much so that some extra refactoring got into its way here, around the
lisp-code-for-loading-from-<source> functions and their usage in
`load-data'.

While at it, this patch simplifies a lot the `load-data' function by making
a good use of &allow-other-keys and :allow-other-keys t.

Finally, this patch splits main.lisp into main.lisp and api.lisp, with the
latter intended to contain functions for Common Lisp programs wanting to use
pgloader as a library. The API itself is still the same as before this
patch, tho. Just in another file for clarity.
2017-08-25 01:57:54 +02:00
Dimitri Fontaine
72c58306ba Fix the previous fix.
See #614. Again. Should be ok now.
2017-08-25 01:56:34 +02:00
Dimitri Fontaine
f20a5a0667 Fix schema name comparing with quoted schema names.
In the previous commit we introduced support for database names including
spaces, which means that by default pgloader creates a target schema in
PostgreSQL with a space in its name. That works well as soon as you always
double-quote the schema name, which pgloader does.

Now, in our internal catalogs, we keep the schema name double-quoted. And
when comparing that schema names with quotes to the raw schema name from
PostgreSQL, they won't match, and pgloader tries to create the schema again:

  ERROR Database error 42P06: schema "my sql" already exists

Fix the comparing to compare unquoted schema name, fix #614 again: the
previous fix would only work the first time.
2017-08-25 01:47:49 +02:00
Dimitri Fontaine
9d4743f598 Allow database names to contain spaces.
Then they must be quoted (single or double quotes accepted), of course.

Fix #614.
2017-08-24 23:05:26 +02:00
Dimitri Fontaine
9263baeb49 Implement sslmode for MySQL connections.
This allows to bypass SSL when you don't need it, like over localhost for
instance. Takes the same syntax as the PostgreSQL sslmode connection string
parameter.
2017-08-24 14:56:59 +02:00
Dimitri Fontaine
b685c8801d Improve guessing of CSV parameters.
In this commit we fail the guess faster, allowing to test for a much larger
sample. The sample is still hard-coded, but this time to 1000 lines.

Also add a test case, see #618.
2017-08-24 13:30:14 +02:00
Dimitri Fontaine
8004a9dd59 Improve report output with bytes information.
Understanding the timings requires not only the number of rows copied into
each table but also how many bytes that represent. We add that information
now in tht output.

The number of bytes presented is computed from the unicode representation we
prepare in pgloader for each row before sending it down to PostgreSQL.
2017-08-24 12:45:51 +02:00
Dimitri Fontaine
3b93ffa37a Rewrite the reporting support entirely.
Use a generic function protocol in order to implement the human readable,
verbose, csv, copy and json reporting output formats. This is much cleaner
and extensible than the previous way.

Use that new power to implement a real JSON output from the internal state
object.
2017-08-24 12:33:51 +02:00
Dimitri Fontaine
4fcb24f448 Reintroduce manual Garbage Collect in SBCL.
It seems that SBCL still needs some help in deciding when to GC with very
large values. In a test case with a “data” column averaging 375kB (up to
about 3 MB per datum), it allows much larger batch size and prefetch rows
settings without entering lldb.
2017-08-23 16:27:14 +02:00
Dimitri Fontaine
4f9eb8c06b Track bytes sent to PostgreSQL.
The pgstate infrastructure already had lots of details about what's going
on, add to it the information about how many bytes are sent in every batch,
and use this information in the monitor when something long is happening to
display how many rows we sent from the beginning for this (supposedly) huge
table, along with bytes and speed (bytes per seconds).
2017-08-23 11:55:49 +02:00
Dimitri Fontaine
1f242cd29e Fix comment support to schema qualify target tables. 2017-08-23 11:26:08 +02:00
Dimitri Fontaine
a849f893a6 Implement a base46-decode transformation function. 2017-08-21 17:06:06 +02:00
Dimitri Fontaine
c62f4279c0 Be more verbose with long-running loads.
Add a message every 20 batches so that the user knows it's still going on.
Also, in passing, fix some messages: present is not precise enough to decide
if the log refers to an event that is being done or starting next.
2017-08-21 16:50:16 +02:00
Dimitri Fontaine
28db6b9f13 Desultory cleanup of a useless declaim. 2017-08-21 16:46:32 +02:00
Dimitri Fontaine
03a8d57a50 Review --verbose log message.
The verbosity is not that easy to adjust. Remove useless messages and add a
new one telling when the COPY of a table is done. As we might have to wait
for some time for indexes being built. keep the CREATE INDEX lines. Also
keep the ALTER TABLE both for primary keys and foreign keys, again because
the user might have to wait for quite some time.
2017-08-21 15:27:13 +02:00
Dimitri Fontaine
f719d2976d Implement a template system for pgloader commands.
This feature has been asked several times, and I can't see any way to fix
the GETENV parsing mess that we have. In this patch the GETENV support is
retired and replaced with a templating system, using the Mustache syntax.

To get back the GETENV feature, our implementation of the Mustache template
system adds support for fetching the template variable values from the OS
environment.

Fixes #555, Fixes #609.
See #500, #477, #278.
2017-08-16 01:33:11 +02:00
Dimitri Fontaine
e21ce09ad7 Implement support for MySQL linestring data type.
This data type is now converted automatically to a PostgreSQL path data
type, using the open path notation with square brackets:

  https://www.postgresql.org/docs/current/static/datatype-geometric.html#AEN7103

Fix #445.
2017-08-15 15:26:06 +02:00
Dimitri Fontaine
20a85055f4 Implement support for MS SQL set parameters.
It is sometimes needed to tweak MS SQL server parameters, such as the
textsize parameters which allows fetching the whole set of bytes of a text
of binary column (not kidding).

Now it's possible to add such a line in the load file:

  set mssql parameters textsize to '104857600'

Fixes #603.
2017-08-12 23:43:22 +02:00
Dimitri Fontaine
30f359735c Make it easier to test “main” code.
This code path is exercised from the command line only, which means I don't
get to run it that often. And it's a pain to debug. So make it easier to run
`process-source-and-target` from the REPL.
2017-08-10 21:58:53 +02:00
Dimitri Fontaine
773dcaeca3 Fix a race condition in the monitor thread.
Startup log messages could be lost because the monitor would be started but
not ready to process messages. Fix that by “warming up” the monitoring
thread, having it execute a small computation and more importantly wait for
the result to be received back, blocking.

See #599 where parsing errors from a wrong URL were missed in the command
line output, quite disturbingly.
2017-08-10 21:51:55 +02:00
Dimitri Fontaine
370038a74e Fix the PostgreSQL URL in the MySQL howto.
See #599 again, wherein I missed that the URL error was not a copy-paste'o
but rather an error in the documentation itself…
2017-08-10 21:49:51 +02:00
Dimitri Fontaine
952e7da191 Bug fix CREATE TYPE in schema (previous patch).
The previous patch fixed CREATE TYPE so that ENUM types are created in the
same schema than the table using them, but failed to update the DROP TYPE
statements to also target this schema...
2017-08-10 21:19:25 +02:00
Dimitri Fontaine
073a5c1e37 Fix Ergast link in MySQL howto.
See #599.
2017-08-10 20:58:24 +02:00
Dimitri Fontaine
5a65da2147 Create new types in the proper schema.
Previously to this patch, pgloader wouldn't care about which schema it
creates extra types in. Extra types are mainly ENUM and SET support from
MySQL. Now, pgloader creates those extra PostgreSQL ENUM types in the same
schema as the table using them, which is a more sound default.
2017-08-10 18:57:09 +02:00
Dimitri Fontaine
981b801ce7 Fix user defined rules to cast ENUM to Text.
The MySQL enum are casted to PostgreSQL enum types just fine, but sometimes
that's not what the user wants. In case when we have a CAST rule for an ENUM
column, recognize the fact and respect user choice.

Fixes #608.
2017-08-10 18:01:17 +02:00
Dimitri Fontaine
049a1199c2 Implement support for SQLite current_date default value.
The spelling in SQLite for the default value is "current_date", instruct
pgloader about that. This commit also adds a test case in our sqlite.db
unit tests database.

Fixes #607.
2017-08-08 21:55:15 +02:00
Luke Snape
ecd6a8e25c Ignore nulls in varbinary-to-string transform (#606) 2017-08-07 21:37:37 +02:00
Dimitri Fontaine
38a6b4968d Improve bundle building.
Now when building a bundle file for source distribution of pgloader, always
test it by building a binary image from the bundle tarball in a test
directory. Also make it easy to target "latest" Quicklisp distribution with
the following spelling:

    make BUNDLEDIST=latest bundle
2017-08-01 19:20:15 +02:00
Dimitri Fontaine
72431d4708 Improve the Quicklist dist support for bundles.
When distributing a pgloader bundle we're using the ql-dist facility. In
recent commit we hand-picked the last known working distribution of
quicklisp for pgloader. Make it easy to target "latest" known distribution
or hard-code one from the Makefile or the bundle/ql.lisp file.
2017-08-01 18:48:20 +02:00
Dimitri Fontaine
5c1c4bf3ff Fix MySQL Enum parsing.
We use a CSV parser for the MySQL enum values, but the quote escaping wasn't
properly setup: MySQL quotes ENUM values with a single-quote (') and uses
two of them ('') for escaping single-quotes when found in the ENUM value
itself.

Fixes #597.
2017-08-01 18:40:27 +02:00
Dimitri Fontaine
3103b0dc72 Escape SQL identifiers in SQLite catalog queries.
SQLite supports the backtick escaping for SQL identifiers and we'd rather
use it. Fixes #600.
2017-07-31 23:11:29 +02:00
Dimitri Fontaine
d37ad27754 Handle empty tables in concurrency support for MySQL.
When the table is empty we get nil for min and max values of the id column.
In that case we don't compute a set of ranges and “cancel” concurrency
support for the empty table.

Fixes #596.
2017-07-18 13:35:01 +02:00
Dimitri Fontaine
b1fa3aec3c Implement a separate switch to drop the schemas.
The with option “include drop” used to also apply to schemas, which is not
that useful and problematic when trying to DROP SCHEMA public, because you
might not connect as the owner of that schema.

Even if we don't target the public schema by default, users can choose to do
so thanks to our ALTER SCHEMA ... RENAME TO ... command.

Fixes #594.
2017-07-18 13:13:36 +02:00
Dimitri Fontaine
ae0c6ed119 Add support for preserving index names in SQLite.
See #187.
2017-07-17 11:04:12 +02:00
Dimitri Fontaine
cf6182fafa Add a notice message with guessed parameters.
We might have to help users debug our decision, and I expect we will have to
improve our guess “engine” here.
2017-07-07 02:34:23 +02:00
Dimitri Fontaine
471f2b6d88 Implement automagic guessing of CSV parameters.
As we know how many columns we expect from the input file, it's possible to
read a sample (10 lines as of this patch) and try many different CSV reader
parameters combinations until we find one that works: it returns the right
number of fields.

It is still possible of course to specify parameters on the command line or
in a load file if necessary, but it makes the simple case even simpler. As
simple as:

  pgloader file.csv pgsql:///pgloader?tablename=target
2017-07-07 02:16:53 +02:00
Dimitri Fontaine
14e1830b77 Fix CLI insistance of --field.
From a load file, as soon as pgloader can retrieve the schema of the target
table the source field list defaults to the target column list. Let's apply
the same rules to the command line.
2017-07-07 01:00:55 +02:00
Dimitri Fontaine
154c74f85e Update online docs with new release.
The docs/ directory goes to http://pgloader.io.
2017-07-06 17:07:55 +02:00
Dimitri Fontaine
64959595fc Back to development release in the master's branch. 2017-07-06 16:55:56 +02:00
Dimitri Fontaine
d71da6ba66 Release pgloader 3.4.1 2017-07-06 16:53:29 +02:00
Adrian Vondendriesch
058f9d5451 Debian (#578)
* debian: Bump compat version to 9.

* debian: Bump Standards-Version to 3.9.8
2017-07-06 15:38:14 +02:00
Dimitri Fontaine
7a371529be Implement "drop indexes" option for MySQL and MSSQL too.
It was only offered for SQLite without good reason really, and tests show
that it works as well with MySQL of course. Offer the option there too.

See 3eab88b1440a8166786e90b95f563d153e2ba4dc for details.
2017-07-06 10:06:03 +02:00
Dimitri Fontaine
2363d8845f Fix create schema handling in data only scenarios.
In b301aa93948f05b5189382f641cf1e040fc655f2 the "create schema" default
changed to true, which is a good idea. As a consequence pgloader should
consider this operation only when "create tables" is set: we don't want to
start with creating target schemas in a target database that is said to be
ready to host the data.
2017-07-06 09:48:03 +02:00
Dimitri Fontaine
dfe5c38185 Fix quoting policy in PostgreSQL ddl formating.
We already have apply-identifier-case and *identifier-case* to decide how
and when to quote our SQL object names, so don't force extra quotes in
format string: refrain from using ~s.
2017-07-06 09:47:48 +02:00
Dimitri Fontaine
9da012ca51 Fix identifiers quoting when reading PostgreSQL catalogs.
We sure can trust PostgreSQL to use names it knows how to handle. Still, it
will be happy to store in its catalogs names containing upper case, and in
that case we must quote them.
2017-07-06 03:16:06 +02:00
Dimitri Fontaine
e87477ed31 Restrict condition handling to relevant conditions.
In md-methods copy-database function, don't pretend we are able to handle
any condition when preparing the PostgreSQL schema, database-error is all we
are dealing with there really.
2017-07-06 03:16:05 +02:00
Dimitri Fontaine
d3d40cd47d Have git ignore local desktop files. 2017-07-06 03:16:05 +02:00
Dimitri Fontaine
e37cb3a9e7 Split SQL queries into their own files.
This change was long overdue. Ideally we would use something like the YeSQL
library for Clojure, but it seems like the cl-yesql equivalent is not ready
yet, and it depends on an experimental build system...

So this patch introduces an URL abstraction built on-top of a hash table.
You can then reference src/pgsql/sql/list-all-columns.sql as

  (sql "pgsql/list-all-columns.sql")

in the source code directly.

So for now the templating system is CL's format language. It is still an
improvement from embedded string. Again, one step at a time.
2017-07-06 03:16:05 +02:00
Dimitri Fontaine
d50ed64635 Defensive programming, after though.
It might be that a column-type-name is actually an sqltype instance, and
then #'string= won't be happy. Prevent that now with discarding any smarts
when the type name does not satisfies stringp.
2017-07-06 00:59:36 +02:00
Dimitri Fontaine
26d372bca3 Implement support for non-btree indexes (e.g. MySQL spatial keys).
When pgloader fetches the index list from a source database, it doesn't
fetch information about access methods for the indexes: I don't even know if
the overlap in between index access methods from one RDMBS to another covers
more than just btree...

It could happen that MySQL indexes a "geometry" column tho. This datatype is
converted automatically to "point" by pgloader, which is good. But the index
creation would fail with the following error message:

  Database error 42704: data type point has no default operator class for access method "btree"

In this patch when setting up the target schema we issue a PostgreSQL
catalog query to dynamically list those datatypes without btree support and
fetch their opclasses, with an hard-coded preference to GiST, then GIN, so
as to be able to automatically use the proper access method when btree isn't
available. And now pgloader transparently issues the proper statement:

  CREATE INDEX idx_168468_idx_location ON pagila.address USING gist(location);

Currently this exploration is limited to indexes with a single column. To
implement the general case we would need a more complex lookup: we would
have to find the intersection of all the supported access methods for all
involved columns.

Of course we might need to do that someday. One step at a time is plenty
good enough tho.
2017-07-06 00:42:43 +02:00
Dimitri Fontaine
8405c331a9 Error handling improvements for PostgreSQL schema.
In the complete PostgreSQL schema step, an error would be logged as you
expect but poorly handled: it would have the whole transaction rolled back,
meaning that a single Primary Key definition failure would cancel all the
others, plus the foreign keys, and also the triggers and comments.

It happens that other systems allow a primary column to contain NULL values,
which is forbidden in the standard and enforced by PostgreSQL, so that's not
a theoritical concern here.
2017-07-05 17:53:33 +02:00
Dimitri Fontaine
bae40d40c3 Fix identifier quoting corner cases.
In cases when pgloader needs to build a new identifer from existing
ones (mainly for renaming indexes, because they are unique per-table in the
source database and unique per-schema in PostgreSQL), and we compose the new
name from already quoted strings, pgloader was doing the wrong thing.

Fix that by having a build-identifier function that may unquote parts then
re-quote properly (if needed) the new identifier.
2017-07-05 15:37:21 +02:00
Dimitri Fontaine
f6cb428c6d Check empty strings in DB3 numeric fields.
Another blind attempt at fixing pgloader from a bug report on gitter, see
2017-07-04 23:15:47 +02:00
Dimitri Fontaine
652e435843 Only catch thread errors in pgloader-image.
In the REPL we're going to have all errors pop in the interactive debugger,
and that should be what we want...
2017-07-04 01:55:27 +02:00
Dimitri Fontaine
3f7853491f Refactor PostgreSQL error handling.
The code was too complex and the transaction / connection handling wasn't
good enough, too many reconnections when a ROLLBACK; is all we need to be
able to continue our processing.

Also fix some stats counters about errors handled, and improve error message
by adding PostgreSQL explicitely, and the name of the table where the error
comes from.
2017-07-04 01:41:08 +02:00
Dimitri Fontaine
3eab88b144 Add a new "drop indexes" option for databases.
This allows to use a combination of "data only, drop indexes" so that when
the target database already exists, pgloader will use the existing schema
and still DROP INDEX before loading the data and do the CREATE INDEX dance
in parallel and all at the end of it.

Also, as I couldn't reproduce neither #539 (which is good, it's supposed to
be fixed now) nor #550 (that was open due to a regression): fixes #550.
2017-07-04 00:15:58 +02:00
Dimitri Fontaine
fc01c7acc9 Fix DBF handling of "empty" date strings.
Blind code a fix for an error when parsing empty date strings in a DBF file.
The small amount of information is surprising, I can't quite figure out
which input string can produce " - - " with the previous coding of
db3-date-to-pgsql-date.

Anyway, it seems easy enough to add some checks to a very optimistic
function and return nil when our checks aren't met.

Fixes #589, hopefully.
2017-06-30 13:37:36 +02:00
Dimitri Fontaine
1e436555a8 Refactor PostgreSQL conditions.
Use a single deftype postgresql-unavailable rather than copy/pasting the
same list of conditions in several places.
2017-06-29 14:08:52 +02:00
Dimitri Fontaine
60c1146e18 Assorted fixes.
Refrain from killing the Common Lisp image when doing interactive regression
testing if we typo'ed the regression test file name...
2017-06-29 12:35:40 +02:00
Dimitri Fontaine
cea82a6aa8 Reconnect to PostgreSQL in case of connection lost.
It may happen that PostgreSQL is restarted while pgloader is running, or
that for some other reason we lose the connection to the server, and in most
cases we know how to gracefully reconnect and retry, so just do so.

Fixes #546 initial report.
2017-06-29 12:34:34 +02:00
Dimitri Fontaine
f0d1f4ef8c Fix reduce usage with max function.
The (reduce #'max ...) requires an initial value to be provided, as the max
function wants at least 1 argument, as we can see here:

CL-USER> (handler-case (reduce #'max nil) (condition (e) (format t "~a" e)))
Too few arguments in call to #<Compiled-function MAX #x300000113C2F>:
0 arguments provided, at least 1 required.
2017-06-28 16:37:27 +02:00
Dimitri Fontaine
17a63e18ed Review "main" error handling.
The "main" function only gets used at the command line, and errors where not
cleanly reported to the users. Mainly because I almost never get to play
with pgloader that way, prefering a load command file and the REPL
environment, but that's not even acceptable as an excuse.

Now the binary program should be able to exit cleanly in all situations. In
testing, it may happens on unexpected erroneous situations that we quit
before printing all the messages in the monitoring queue, but at least now
we quit cleanly and with a non-zero exit status.

Fix #583.
2017-06-28 16:36:08 +02:00
Dimitri Fontaine
0549e74f6d Implement multiple reader per table for MySQL.
Experiment with the idea of splitting the read work in several concurrent
threads, where each reader is reading portions of the target table, using a
WHERE id <= x and id > y clause in its SELECT query.

For this to kick-in a number of conditions needs to be met, as described in
the documentation. The main interest might not be faster queries to overall
fetch the same data set, but better concurrency with as many readers as
writters and each couple its own dedicated queue.
2017-06-28 16:23:18 +02:00
Dimitri Fontaine
6d66280fa5 Review parallelism and memory behavior.
The previous patch made format-vector-row allocate its memory in one go
rather than byte after byte with vector-push-extend. In this patch we review
our usage of batches and parallelism.

Now the reader pushes each row directly to the lparallel queue and writers
concurrently consume from it, cook batches in COPY format, and then send
that chunk of data down to PostgreSQL. When looking at runtime profiles, the
time spent writing in PostgreSQL is a fraction of the time spent reading
from MySQL, so we consider that the writing thread has enough time to do the
data mungling without slowing us down.

The most interesting factor here is the memory behavor of pgloader, which
seems more stable than before, and easier to cope with for SBCL's GC.

Note that batch concurrency is no more, replaced by prefetch rows: the
reader thread no longer build batches and the count of items in the reader
queue is now a number a rows, not of batches of them.

Anyway, with this patch in I can't reproduce the following issues:

Fixes #337, Fixes #420.
2017-06-27 23:10:33 +02:00
Dimitri Fontaine
7f737a5f55 Reduce memory allocation in format-vector-row.
This function is used on every bit of data we send down to PostgreSQL, so I
have good hopes of reducing its memory allocation having an impact on
loading times. In particular for sizeable data sets.
2017-06-27 15:31:49 +02:00
Dimitri Fontaine
46d6f339df Add a user friendly message about what's happening...
Still in the abnormal termination case. pgloader might get stuck and if the
user knows it's waiting for threads to complete, they might be less worried
about the situation and opportunity to kill pgloader...
2017-06-27 11:19:07 +02:00
Dimitri Fontaine
2341ef195d Review abnormal termination code path.
In case of an exceptional condition leading to termination of the pgloader
program we tried to use log-message after the monitor should have been
closed. Also the 0.3s delay to let latests messages out looks like a poor
design.

This patch attempts to remedy both the situation: refrain from using a
closed down monitoring thread, and properly wait until it's done before
returning to the shell.

See #583.
2017-06-27 10:45:54 +02:00
Dimitri Fontaine
352f4adc8d Implement support for MySQL SET parameters.
pgloader had support for PostgreSQL SET parameters (gucs) from the
beginning, and in the same vein it might be necessary to tweak MySQL
connection parameters, and allow pgloader users to control them.

See #337 and #420 where net_read_timeout and net_write_timeout might need to
be set in order to be able to complete the migration, due to high volumes of
data being processed.
2017-06-27 10:00:47 +02:00
Otheus
b5a593af14 Update INSTALL.md (#585)
Add instructions for redhat/centos7
2017-06-21 21:27:27 +02:00
Dimitri Fontaine
a222a82f66 Improve docs on pgloader.io.
In the SQLite and MySQL cases, expand on the simple case before detailing
the command language. With our solid defaults, most times a single command
line with the source and target connection strings are going to be all you
need.
2017-06-20 16:24:25 +02:00
Dimitri Fontaine
cae86015a0 Let's be more specific about the license.
Upon GitHub's suggestion, add a LICENSE file to make it clear we are using
The PostgreSQL License. Assign the copyright to The PostgreSQL Global
Development Group as it's done for PostgreSQL, as it seems to be the right
thing to do.
2017-06-17 19:21:33 +02:00
Dimitri Fontaine
e11ccf7bb7 Fix on-error-stop signaling.
To properly handle on-error-stop condition, make it a specific pgloader
condition with a specific handling behavior. In passing add some more log
messages for surprising conditions.

Fix #546.
2017-06-17 19:02:05 +02:00
Dimitri Fontaine
5faf8605ce Fix corner cases and how we log them.
In the prepare-pgsql-database method we were logging too much details, such
as DDL warnings on if-not-exists for successful queries. And those logs are
to be found in PostgreSQL server logs anyway.

Also fix trying to create or drop a "nil" schema.
2017-06-17 18:16:18 +02:00
Dimitri Fontaine
6c931975de Refrain from pushing too much logging trafic.
In this patch we hard-code some cases when we know the log message won't be
displayed anywhere so as to avoid sending it to the monitor thread. It
certainly is a modularity violation, but given the performance impact...
2017-06-17 18:12:33 +02:00
Dimitri Fontaine
422fab646a Typo-level fix.
Using ~s with extra quotes is quite disturbing at this place (logs only, but
still).
2017-06-17 17:23:44 +02:00
Dimitri Fontaine
7f55b21044 Improve support for http(s) resources.
The code used to take into account content-length HTTP header to load that
number of bytes in memory from the remote server. Not only it's better to
use a fixed size allocated-once buffer for that (now 4k), but also doing so
allows downloading content that you don't know the content-length of.

In passing tell the HTTP-URI parser rule that we also accept https:// as a
prefix, not just http://.

This allows running pgloader in such cases:

  $ pgloader https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite_AutoIncrementPKs.sqlite pgsql:///chinook

And it just works!
2017-06-17 16:48:15 +02:00
Dimitri Fontaine
b301aa9394 Review create-schemas default behavior.
Get back in line with what the documentation says, and also fix the case for
default MySQL migrations now that we target a PostgreSQL schema with the
same name as the MySQL database name.

Open question yet: should we also register the new schema on the search_path
by default?

  ALTER DATABASE ... SET search_path TO public, newschema, ...;

Is it more of a POLA violation to alter the search_path or to not do it?

Fix #582.
2017-06-16 09:01:18 +02:00
Peter Matseykanets
cd16faee8a brew insist on capital --HEAD
Reword our documentation to use the uppercase variant so that users may copy and paste and benefit.

Fix #581.
2017-06-16 08:43:54 +02:00
Dimitri Fontaine
b3cb7b256d Travis: let's actually use the new EXTRA_OPTS. 2017-06-14 21:32:41 +02:00
Dimitri Fontaine
c02defa5f0 Travis: explicitely pass down the CL variable.
It seems that the test/Makefile didn't get the memo.
2017-06-14 21:25:18 +02:00
Dimitri Fontaine
1469789ede Try to get more information from CCL in testing.
The “magic” options --batch and --heap-reserve will be processed by CCL
itself before pgloader gets to see them, so try that in the testing
environment.
2017-06-14 21:12:54 +02:00
Dimitri Fontaine
de9b43c332 Add support for the MS-SYBDATE datatype.
Fixes #568, thanks to a test case being provided!
2017-06-14 21:02:00 +02:00
Dimitri Fontaine
2c644d55f2 Add --batch to CCL run options.
This option provides lots of information when it crashes, and should help us
with understanding Travis and DockerHub errors with CCL.
2017-06-14 11:49:22 +02:00
Adrian Vondendriesch
90a33b4b4c MSSQL: Add ON UPDATE / DELETE support for fkeys (#580)
The former query to find foreign key constraints doesn't consider ON
UPDATE and ON DELETE rules.
2017-06-13 12:03:51 +02:00
Adrian Vondendriesch
d966d37579 MSSQL: Fix Default value translation for getdate() (#576) (#577)
Currently the default value getdate() is replaced by 'now' which creates
statements like:
  CREATE TABLE ... created timestamp 'now' ...
which leads to table definitions like:
  default '2017-06-12 17:54:04.890129'::timestamp without time zone

This is because 'now' is evaluated when creating the table.

This commit fixes the issue by using CURRENT_TIMESTAMP as default
instead of 'now'.
2017-06-12 21:42:35 +02:00
Chris Bandy
b59421cb48 Multiple Travis CI jobs (#575)
* Test with multiple versions of PostgreSQL in Travis

* Test with multiple implementations of Lisp in Travis
2017-06-11 13:25:48 +02:00
Chris Bandy
1a5194de1d Repair Travis CI builds (#574)
* Uninstall PostgreSQL during Travis setup

Recent build images start with PostgreSQL running. Uninstall it so we
can install and run a specific version.

* Skip authentication in Travis builds

The wrong combination of connection parameters will cause psql and other
utilities to prompt for a password, stalling the Travis build
indefinitely.

* Move PostgreSQL setup for Travis builds into script

* Use a newer version of SBCL in Travis builds

Recent versions of bordeaux-threads require ASDF >= 3.1
2017-06-10 12:06:40 +02:00
Adrian Vondendriesch
1f3659941e MSSQL: fix typmod conversion for "max" typemods (#573)
The previous commit makes it possible to convert typemods for various
text types. In MSSQL it's possible to create a column like varchar(max).
Internally this is reported as varchar(-1) which results in a CREATE
TABLE statement that contains e.g. varchar(-1).

This patch drops the typemod if it's -1 (max).

It's based on Dimitris patch slightly modified by myself.
2017-06-09 16:12:17 +02:00
Adrian Vondendriesch
d0c273a512 MSSQL: Add support for typemods of various text types (#572)
If a custom CAST rule is defined (e.g CAST type varchar to varchar) the
original typemod get's lost. This commit is based on Dimitris patch from
571 and adds typmod support for "char", "nchar", "varchar", "nvarchar"
and "binary".
2017-06-09 11:25:02 +02:00
Dimitri Fontaine
355aedfd72 Fix "drop default" casting rule.
The previous coding would discard any work done at the apply-casting-rules
step when adding source specific smarts about handling default, because of
what looks like negligence and bad tests. A test case scenario exists but
was not exercized :(

Fix that by defaulting the default value to the one given back at the
apply-casting-rules stage, where we apply the "drop default" clause.
2017-06-08 21:39:06 +02:00
Dimitri Fontaine
d74c9625a3 Clarify the code.
(unless (remove-if-not ...)) reads better as (when (notany ...)) so rewrite
the test that way. For reference and context see #563.
2017-06-08 13:39:38 +02:00
Dimitri Fontaine
776f8fcf6f Add support for SQLite implicit pkeys.
It turns out that in some cases SQLite will not list its primary key indexes
in the pragma index_list(), and that's related to an implementation detail
about using its internal btree structure when rowid can be exploited as the
integer pkey.

Reverse engineer that bahavior in pgloader so that the PostgreSQL schema
contains the primary keys even when they are implicit. In doing that we must
be careful that the pkeys might also appear explicitely, in which case we
refrain from declaring them twice.

SQLite catalogs are not my favorite, not to say hackish as hell.

Fixes #563.
2017-06-08 12:11:38 +02:00
Dimitri Fontaine
25e5ea9ac3 Refactor error handling in complete-pgsql-database.
Given new SQLite test case from issue #563 we see that pgloader doesn't
handle errors gracefully in post-copy stage. That's because the API were not
properly defined, we should use pgsql-execute-with-timing rather than other
construct here, because it allows the "on error resume next" behavior we
want with after load DDL statements.

See #563.
2017-06-08 12:09:11 +02:00
Dimitri Fontaine
9fb37bf513 Fix SQLite usage of sqlite_sequence catalog.
It turns out that this catalog table in SQLite may or may not exists
depending on whether the current schema actually uses the feature. So we
need to learn to query the catalogs about the catalogs before querying the
catalogs to learn about the current schema. Thanks SQLite.
2017-06-02 22:34:58 +02:00
Dimitri Fontaine
57a7353a94 Add a log entry before executing SQL blocks.
That seems helpful.
2017-06-02 22:34:25 +02:00
Dimitri Fontaine
c6b634caad Provide "on error stop" as a WITH option.
As seen in #546 it would be easier to be able to specify the option in the
load command directly rather than only at the command line. Here we go!
2017-06-01 16:43:09 +02:00
Dimitri Fontaine
45719645da Implement GumRoad payment for the Moral License.
Let's make it easy for interested parties to actually buy a license, and
offer several levels of “partnership” with the project. Be a Sponsor, a
Partner or even consider having pgloader custom development made for you.
Oracle™ support anyone?
2017-05-31 15:27:06 +02:00
Dimitri Fontaine
21208980fd Improve readme content about building from source.
As reported we still have some complex situations to deal with when buildind
from sources in some environments: windows and MacOXS macports are not in a
very good shape at the moment.

Add some good content in the README to better help people in those
environments get their build.

Fix #444.
2017-05-29 13:01:40 +02:00
Dimitri Fontaine
8f92cc5a7d Fix our SQL parser.
In order to support custom SQL files with several queries and psql like
advanced \i feature, we have our own internal SQL parser in pgloader. The
PostgreSQL variant of SQL is pretty complex and allows dollar-quoting and
other subtleties that we need to take care of.

Here we fix the case when we have a dollar sign ($) as part of a single
quoted text (such as a regexp), so both not a new dollar-quoting tag and a
part of a quoted text being read.

In passing we also fix reading double-quoted identifiers, even when they
contain a dollar sign. After all the following is totally supported by
PostgreSQL:

  create table dollar("$id" serial, foo text);
  select "$id", foo from dollar;

Fix #561.
2017-05-29 12:42:12 +02:00
Dimitri Fontaine
1f9a0b6391 Handle SQLite Sequences (auto_increment).
It turns out that the SQLite catalogs have a way of registering if a column
is an auto_increment. You have to look into the sqlite_sequence catalog for
the name of the table the column is found into, and when there's a non-zero
entry there and the table as a single pk-id column of integer type then it's
safe to assume that we have an "auto_increment" column.

The current way to handle this property is suboptimal in pgloader too where
we need to attach an "extra" slot to our column representation with a value
of the "auto_increment" string. But at least it's easy to hack the code
here... anyway.

Fix #556.
2017-05-21 18:43:10 +02:00
Dimitri Fontaine
3b4af49e22 Implement ALTER SCHEMA and ALTER TABLE for SQLite.
It turns out we forgot to add support for internal catalog munging clauses
to SQLite support. The catalogs being normalized means there's no extra work
here other than allowing the parser to accept those clauses and then pass
them over to our generic `copy-database' method implementation.

It is to be noted that SQLite has no support for schemas as per the standard
and PostgreSQL, so that when we inspect the database schema we create a nil
entry here. It's then not possible to ALTER SCHEMA nil RENAME TO 'target';
unfortunately, but it's easy enough to SET search_path to 'target' anyway,
as shown in the modified test case.

Fix #552.
2017-05-14 20:47:01 +02:00
Dimitri Fontaine
65cdc48c1e Fix error conditions for unrecognized source.
When the source is not recognized, pgloader would not always output a useful
error message with the hint about using --type, as per #248. Here we fix the
conditions applying to this error message.
2017-05-12 13:08:30 +02:00
Dimitri Fontaine
fe2e1ee956 Fix comments handling in IXF support.
There's no need to add empty string comments, guard against doing that.
2017-05-06 15:10:40 +02:00
Dimitri Fontaine
e7afe993fa Review database migration logic and options again.
In #539 it appears that we fail to honor some actions, and it seems to be
because of sloppy reasonning in how to make sense of all of them. It could
well be that we have too many options for them to make sense in all possible
combinations, but for now it looks safe to bet on an easy fix...
2017-05-06 14:45:05 +02:00
Mikael Sand
3c4e64ed26 Fix spelling error in Windows-only code path (#545)
Fix spelling error for uiop:make-pathname* key by changing :direction to :directory
2017-04-30 19:53:53 +02:00
Dimitri Fontaine
8254d63453 Fix incorrect per-table total time metrics.
The concurrency nature of pgloader made it non obvious where to implement
the timers properly, and as a result the tracking of how long it took to
actually transfer the data was... just wrong.

Rather than trying to measure the time spent in any particular piece of the
code, we now emit "start" and "stop" stats messages to the monitor thread at
the right places (which are way easier to find, in the worker threads) and
have the monitor figure out how long it took really.

Fix #506.
2017-04-30 18:09:50 +02:00
Dimitri Fontaine
20ea1d78c4 Improve default summary readability.
Now that we have fixed the output of the per-table total timing, we can only
show that timing by default. With more verbosity pgloader will add the extra
columns, and in computer oriented formats (json, csv, copy) all the details
are always provided of course.

See #506.
2017-04-30 18:09:50 +02:00
geethaRam
0e12d77a7f Update pgloader.spec (#537)
Updated version in spec
2017-04-17 23:20:14 +02:00
Dimitri Fontaine
9b4bbdfef7 Review --load-lisp-file error handling.
The handler-case form installed would catch any non-fatal warning and would
also fail to display any error to the user. Both are wrong behavior that
this patch fixes, using *error-output* (that's stderr) explicitely for any
thing that may happen while loading the user provided code.

Fix #526.
2017-04-16 21:22:46 +02:00
Dimitri Fontaine
538464f078 Avoid operator is not unique errors.
When the intarray extension is installed our PostgreSQL catalog query fails
because we now have more than one operator solving smallint[] <@ smallint[].
It is easy to avoid that problem by casting to integer[], smallint being an
implementation detail here anyway.

Fix #532.
2017-04-06 23:55:06 +02:00
Dimitri Fontaine
0219f55071 Review DROP INDEX objects quoting.
Force double-quoting of objects name in DROP INDEX commands by using the
format directive ~s. The names of the objects we are dropping usually come
from a PostgreSQL catalog, but still might contain force-quote conditions
like starting with a number, as shown in #530.

This fix certainly means we will have to review all the DDL formatting we do
in pgloader and apply a single method of quoting all along. The simpler one
is of course to force quote every object name in "", but it might not be the
smartest one (what if some sources are sending already quoted object names,
that needs a check), and it's certainly not the prettier way to go at it:
people usually like to avoid unnecessary quotes, calling them clutter.

Fix #530.
2017-04-01 22:37:26 +02:00
Dimitri Fontaine
e2bc7e4fd4 Fix github language statistics.
As documented in https://github.com/github/linguist#overrides and from the
conversation in https://github.com/github/linguist/issues/3540 we add a
.gitattributes file to the project wherein we pretend that the sql files are
all vendored-in.

This should allow GitHub to realize that pgloader is all about Common Lisp
and not a single line of PLpgSQL...
2017-03-26 11:26:29 +02:00
Dimitri Fontaine
b2f9590f58 Add support for MS SQL XML data type.
Given a test case and some reading of the FreeTDS source code, it appears
that the XML data type is sent on the wire as (unicode) text. This patch
makes pgloader aware of that and also revisit the choice of casting XML to
PostgreSQL XML data type (thanks to the test case where we see it just works
without surprise).

Fix #503.
2017-03-25 21:26:16 +01:00
Dimitri Fontaine
296e571e27 Fix MS SQL tinyint identity support.
Add a cast rule to support tinyint being an “identity” type in MS SQL, which
means using a sequence to derive its values from. We didn't address the
whole MS SQL integer type tower here, and suspect we will have to add more
in the future.

Fix #528 where I could have access to a test case and easily reproduce the
bug, thanks!
2017-03-22 11:38:40 +01:00
Dimitri Fontaine
940fc63a5e Distribute *root-dir* to all threads.
The creation of the reject and data files didn't happen in the
right *root-dir* setting for lack of sending the main value to the worker
threads.
2017-03-18 18:51:22 +01:00
Dimitri Fontaine
ab7e77c2d0 Fix double transformation call in CSV projections.
In advanced projections it could be that we call the transformation function
for some input fields twice. This is a bug that manifest in particular when
the output of the transformation can't be used/parsed again by the same
function as shown in the bug reported.

Fix #523.
2017-03-04 15:55:08 +01:00
Dimitri Fontaine
3fac222432 Fix MSSQL index column names quoting.
We have to pay attention that column names in MS SQL don't follow the same
rules as in PostgreSQL and may e.g. begin with numbers. Apply identifier
case and rules to index column names too.
2017-03-03 21:30:58 +01:00
Dimitri Fontaine
1023577f50 Review internal database migration logic.
Many options are now available to pgloader users, including short cuts that
where not defined clearly enough. That could result in stupid things being
done at times.

In particular, when picking the "data only" option then indexes are not to
be dropped before loading the data, but pgloader would still try and create
them again at the end of the load, because the option that controls that
behavior default to true and is not impacted by the "data only" choice.

In this patch we review the logic and ensure it's applied in the same
fashion in the different phases of the database migration: preparation,
copying, rebuilding of indexes and completion of the database model.

See also 96b2af6b2a2b163f7e9e3c0ba744da1733b23979 where we began fixing
oddities but didn't go far enough.
2017-02-26 14:48:36 +01:00
Dimitri Fontaine
8ec2ea04db Add support for MySQL geometry points.
The new version of the sakila database uses geometry typed columns that
contain POINT data. Add support for that kind of data by copying what we did
already for POINT datatype.
2017-02-25 21:52:41 +01:00
Dimitri Fontaine
9e2b95d9b7 Implement support for PostgreSQL storage parameters.
In PostgreSQL it is possible at CREATE TABLE time to set some extra storage
parameters, the most useful of them in the context of pgloader being the
FILLFACTOR. For the setting to be useful, it needs to be positionned at
CREATE TABLE time, before we load the data.

The BEFORE LOAD clause of the pgloader command allows to run SQL scripts
that will be executed before the load, and even before the creation of the
target schema when pgloader does that, which is nice for other use case.

Here we implement a new `ALTER TABLE` rule that one can set in the pgloader
command in order to change storage parameters at CREATE TABLE time:

  ALTER TABLE NAMES MATCHING ~/\./ SET (fillfactor='40')

Fix #516.
2017-02-25 21:49:06 +01:00
Dimitri Fontaine
57dd9fcf47 Add int as an alias for integer.
We cast MS SQL "int" type to "integer" in PostgreSQL, so add an entry in our
type name mapping where they are known equivalent to avoid WARNINGs about
the situation in DATA ONLY loads.
2017-02-25 17:54:57 +01:00
Dimitri Fontaine
5fd1e9f3aa Fix catalog merge hasards.
When reading table names from PostgreSQL, we might find some that need
systematic quoting (such as names that begin with a digit). In that case,
when later comparing the catalogs to match source database table names
against PostgreSQL catalog table names, we need to unquote the PostgreSQL
table name we are using.

In passing, force the *identifier-case* to :none when reading object names
from the PostgreSQL catalogs.
2017-02-25 17:53:08 +01:00
Dimitri Fontaine
96b2af6b2a Fix a hang scenario in schema-only.
The parallelism in pgloader is now smart enough to begin fetching data from
the next table while the previous one is still not done being written down
to PostgreSQL, but when doing so I introduced a bug in the way indexes are
taken care of.

Specifically, in schema-only mode of operations, we would wait for indexes
we skipped creating. The skipping is the bug here, so make sure we create
indexes even when we don't want to copy any data over.
2017-02-25 17:09:07 +01:00
Dimitri Fontaine
2f7169e286 Fix MS SQL N'' default values.
MS SQL apparently sends default values as Nvarchar, and in this case it
means we have to deal ourselves with the N'' representation of it.
2017-02-25 16:14:26 +01:00
Jan Moringen
57bc1ca886 Shadow symbols NAMESTRING, NUMBER and INLINE in pgloader.parser package (#515)
Defining rules on standard symbols like CL:NAMESTRING is a bad idea
since other systems may do the same, inadvertently overwriting each
other's rules.

Furthermore, future esrap versions will probably prevent defining
rules whose names are symbols in locked packages, making this change
mandatory.
2017-02-12 15:14:34 +01:00
Dimitri Fontaine
024579c60d Fix SBCL version requirements.
As we now depend on recent enough version of ASDF in some of our build
dependencies, that raises the bar to ABCL 1.2.5 or newer now.

Fixes #497.
2017-01-28 18:17:19 +01:00
Dimitri Fontaine
6bd17f45da Add support for MS SQL smalldatatime data type.
Availability of a test case for MS SQL allows to make progress on this
limitation and add support to the smalldatetime data type. It is
converted server-side with the same CONVERT expression as the longer
datetime datatype.

Fixes #431.
2017-01-28 18:08:55 +01:00
Dimitri Fontaine
a799cd5f5f Improve error handling for MS SQL.
In particular, implement more solid handling of poorly encoded data or
badly setup connections, by reporting the error and continuing the load.
2017-01-28 17:47:44 +01:00
Dimitri Fontaine
ddda2f92ca Force column ordering in SQLite support.
In the case of targetting an already existing PostgreSQL database,
columns might have been reordered. Add the column name list to the COPY
command we send so that we figure the mapping out automatically.

Fixes #509.
2017-01-28 17:45:33 +01:00
Dimitri Fontaine
b54ca576cb Raise some log messages.
We should be able to follow the progress more easily at the log level
NOTICE, so raise some log messages from INFO to NOTICE.
2017-01-28 17:44:18 +01:00
Dimitri Fontaine
ed217b7b28 Add some docs about FreeTDS and encoding.
It turns out that it's possible and not too complex, when using the
FreeTDS driver, to enforce the client encoding for MS SQL to be utf-8.
Document how to tweak ~/.freetds.conf to that end.
2017-01-27 22:16:59 +01:00
Dimitri Fontaine
1d025bcd5a Fix log levels.
It looks like we missed the INFO level messages in the DEBUG output.
2017-01-23 21:52:38 +01:00
Dimitri Fontaine
bd84c6fec9 Fix default value handling in MS SQL.
When the column is known to be non-nullable, refrain from adding a null
default value to it. This also fixes the case of casting from an

  [int] IDENTITY(1,1) NOT NULL

That otherwise did get transformed into a

  bigserial not null default NULL

Causing then the error

  Database error 42601: multiple default values specified for column ... of table ...
2017-01-23 21:50:16 +01:00
Dimitri Fontaine
c0f9569ddd In passing aesthetic concerns. 2017-01-23 21:49:42 +01:00
Dimitri Fontaine
1d35290914 Assorted fixes for MS SQL support.
When updating the catalog support we forgot to fix the references to the
index and fkey name slots that are now provided centrally for the
catalog of all database source types.

Again, we don't have unit test cases for MS SQL, so that's a blind
fix (but at least that compiles).

See #343.
2017-01-10 21:19:29 +01:00
Dimitri Fontaine
dbf7d6e48f Don't double-quote identifiers in catalog queries.
Avoid double quoting the schema names when used in PostgreSQL catalog
queries, where the identifiers are used as literal values and need to be
single-quoted.

Fix #476, again.
2017-01-10 21:12:34 +01:00
Dimitri Fontaine
8da09d7bed Log PostgreSQL Catalog queries at SQL log level.
See #476 where it would have been helpful to see the PostgreSQL catalog
queries with `--log-min-messages sql` in the bug report. Also more
generally useful.
2017-01-10 21:12:34 +01:00
Dimitri Fontaine
17536e84a4 Create CNAME 2017-01-06 21:37:59 +01:00
Dimitri Fontaine
effa916b31 Improve parallelism setup documentation.
The code comment displayed in the release notes for 3.3.1 is reported to
be better at explaining the concurrency control than what we had in the
main documentation, so add it there.

Fix #496.
2017-01-03 23:13:01 +01:00
Dimitri Fontaine
21a10235db Refrain from issuing the summary twice.
Now that we have a proper flush system for reporting the summary at the
proper time (see 7c5396f0975be405910d66f4b5aedc89acd75c1d), refrain from
also taking care of the reporting when stopping the monitor.

Adapt the regression driver code to flush the summary after loading the
expected data, which also provides better output.

When the summary output is sent to a file, that would also create a
backup file and replace our summary with an empty new file at monitor
stop...

Fixes #499.
2017-01-03 23:07:58 +01:00
Dimitri Fontaine
b239e6b556 Fix #498. 2017-01-03 22:28:58 +01:00
Dimitri Fontaine
381ba18b50 Add a new log level: SQL.
This sits between NOTICE and INFO, allowing to have a complete log of
the SQL queries sent to the server while avoiding the very verbose
trafic of the DEBUG log level.

See #498.
2017-01-03 22:27:17 +01:00
Dimitri Fontaine
4931604361 Allow ALTER SCHEMA command for MySQL.
This pgloader command allows to migrate tables while changing the schema
they are found into in between their MySQL source database and their
PostgreSQL target database.

This changes the default behavior of pgloader with MySQL from always
targetting the 'public' schema to targetting by default a schema named
the same as the MySQL database. You can revert to the old behavior by
adding a rule:

   ALTER SCHEMA 'dbname' RENAME TO 'public

We might want to add a patch to re-install the default behavior later.

Also see #489 where it used not to be possible to rename the schema at
migration time, causing strange errors (you need to spot NIL as the
schema name in the "failed to find target table" messages.
2016-12-18 19:31:21 +01:00
Dimitri Fontaine
bdaacae3e7 Fix Primary Keys count.
That was broken in a recent patch refactoring the PostgreSQL SQL execute
API that now accepts a list of commands to execute.
2016-12-18 19:31:21 +01:00
Dimitri Fontaine
320a545533 Fix SQL types creation: consider views too.
When migrating views from e.g. MySQL it is necessary to consider the
user defined SQL types (ENUMs) those views might be using.
2016-12-18 19:31:21 +01:00
Dimitri Fontaine
ad56cf808b Fix PostgreSQL index naming.
A PostgreSQL index is always created in the same schema as the table it
is defined against, and the CREATE INDEX command doesn't accept schema
qualified index names.
2016-12-18 19:31:21 +01:00
Andy Freeland
9a0c50f700 Make sure EPEL is enabled when installing SBCL (#494) 2016-12-17 16:30:57 +01:00
Dimitri Fontaine
1c927beb81 Fix cl-postgres packaging (typo). 2016-12-12 12:04:40 +01:00
Dimitri Fontaine
37fc4ba550 Back to development mode. 2016-12-04 14:09:36 +01:00
Dimitri Fontaine
ac202dc70e Prepare release 3.3.2. 2016-12-03 17:38:52 +01:00
Dimitri Fontaine
db9fa2f001 Improve docs for connection strings.
Some parts of the connection strings might be provided from the
environment, such as in the MySQL case. Fix #485.
2016-12-03 15:51:39 +01:00
Dimitri Fontaine
6eef0c6c00 Improve docs with default parallelism settings.
Fix #442 by adding the default values of concurrency and workers.
2016-12-03 15:30:34 +01:00
Dimitri Fontaine
7c5396f097 Review fatal errors handling.
Make it so that fatal errors are printed only once, and when possible
included in the usual log format as handled by our monitoring thread.
Also, improve error and summary reporting when we load from several
sources on the same command line.

All this work has been triggered because of an edge case where the OS
return value of the pgloader command was 0 (zero, success) although the
given file on the command line does not exists.

Fixes #486.
2016-11-27 23:58:50 +01:00
Dimitri Fontaine
2dc733c4d6 Fix corner case in creating indexes again.
When the option "drop indexes" is in use in loading data from a file, we
collect the indexes from the PostgreSQL catalogs and then issue DROP
commands against them before the load, then CREATE commands when it's
done.

The CREATE is done in parallel, and we create an lparallel kernel for
that. The kernel must have a worker-count of at least 1, and we where
not considering the case of 0 indexes on the target table.

Fix #484.
2016-11-20 17:17:15 +01:00
Christoph Berg
27b67c6cf6 Add tzdata to build-depends (Closes: #839468) (#483) 2016-11-20 16:58:30 +01:00
Dimitri Fontaine
526fafb4b7 Allow quoting identifiers in db uri tablename.
As shown in #476, it is sometimes needed to be able to quote the
identifier names even when loading from a file, that is when specifying
the target table name in the database uri.

To that ends, allow the option "identifier case" to be used in the file
based cases too. Fixes #476.
2016-11-13 22:14:48 +01:00
Dimitri Fontaine
e5c8b8d159 Fix Advanced Howto for Geolite.
The example was still using a very old syntax for per-field options, and
even the current debian package doesn't support this syntax anymore...
Update the docs to use current syntax.

Fix #475.
2016-11-13 21:54:09 +01:00
Elias Pipping
af46dc280f Use uiop:run-program ... :directory ... (#473) 2016-11-06 22:40:12 +01:00
Dimitri Fontaine
0bd4488d78 Update scripts to reference freetds-devel.
I'm not sure if anyone is using those scripts anymore, but I suppose
keeping them known broken isn't helping anyone either. This is a blind
fix in reaction to latest comment in bug #131.
2016-11-01 21:08:31 +01:00
Dimitri Fontaine
43bb87943f Fix compilation error.
Introduced recently when refactoring the match rules, forgot to update
all call sites, and the bug went unnoticed for a while, oops. Not sure
the fix is all we need to get back a working feature (alter schema
rename to), but it allows to compile and that's all I have the time to
handle today.

See #466.
2016-10-25 11:54:56 +02:00
Dimitri Fontaine
4a431498ca Typo fix... 2016-10-02 20:09:55 +02:00
Dimitri Fontaine
ac91ea97d0 Add iwoca as a sponsor to pgloader.
Thanks guys ;-)
2016-10-02 16:28:43 +02:00
Dimitri Fontaine
d7d36c5766 Review identifier case :quote.
We added some confution about who's responsible to quote the SQL obejct
names in between src/utils/quoting.lisp and src/pgsql/pgsql-ddl.lisp and
as a result some migrations from MySQL with identifier case set to quote
where broken, as in #439.

To fix, remove any use of the format directive ~s in the PostgreSQL ddl
output methods: we consider that the quoting of ~s is to be decided in
apply-identifier-case. We then use ~a instead of ~s.

Fix #439.
2016-09-17 22:45:45 +02:00
Dimitri Fontaine
8fb542bc90 Improve INCLUDING rule matching for MySQL.
In the MySQL source we have explicit support for both string equality
and regexps for the INCLUDING and EXCLUDING clauses. This got broken
when moved to be shared with the ALTER TABLE implementation, because
we were no longer using the type system in the same way in all places.

To fix, create new abstractions for strings and regexps and use those
new structs in the proper way (thanks to defstruct and CLOS).

Fixes #441.
2016-09-10 18:54:11 +02:00
Dimitri Fontaine
5b6adb02b0 Implement and use DROP ... IF EXISTS.
In cases where we have a WITH include drop option, we are generating
lots of SQL DROP statements. We may be running an empty target database
or in other situations where the target object of the DROP command might
not exists. Add support for that case.
2016-09-10 18:01:04 +02:00
Dimitri Fontaine
5ad21bdbfb Add stats about how many files we processed.
In the FILENAME MATCHING case it might be good to have the information,
which can also explain some of the timing spent. The example in
test/bossa.load currently loads data from 296 files total...
2016-09-02 13:39:21 +02:00
Dimitri Fontaine
3569980378 Fix error reporting of catalogs.
The internal catalog representation are deeply recursive in order to
make it easy to traverse the catalog both downwards (catalog to schema
to tables) and upward (table to its schema to its catalog).

In consequence we need to set *print-circles* to non-nil when we're
going to log the catalogs, so turn it to non-nil before generating the
log messages.

While at it, add logging of such catalogs in the :data log verbosity
mode. The catalog output is very verbose, but it's easy to copy/paste it
from a bug report into being a live object we can inspect in the REPL,
thanks to Common Lisp notion of a reader and readable printer!
2016-08-30 23:25:35 +02:00
Dimitri Fontaine
7070f82976 Back to development mode, not a release anymore.
The next version is going to be either 3.3.2 or 3.4.0.50 depending on
whether we have mainly bug fixes or new features.
2016-08-30 23:17:03 +02:00
Dimitri Fontaine
cb30891fbb Release pgloader v3.3.1. 2016-08-28 20:31:31 +02:00
Dimitri Fontaine
f2dcf982d8 Fix stats collections in some cases.
Calling a -with-timing from within a with-stats-collection macro is
redundant and will have the numbers counted twice. Which in this case
didn't happen because the stats label was manually copied, but borked
with a typo in one copy.
2016-08-28 20:29:53 +02:00
Dimitri Fontaine
0b06bc6ad6 Update an old archive test case. 2016-08-28 20:29:30 +02:00
Dimitri Fontaine
9e574ce884 Rename web/ into docs/
This allows to benefit from github pages without having to maintain a
separate orphaned branch.
2016-08-19 20:55:29 +02:00
Dimitri Fontaine
a86a606d55 Improve existing PostgreSQL database handling.
When loading data into an existing PostgreSQL catalog, we DROP the
indexes for better performance of the data loading. Some of the indexes
are UNIQUE or even PRIMARY KEYS, and some FOREIGN KEYS might depend on
them in the PostgreSQL dependency tracking of the catalog.

We used to use the CASCADE option when dropping the indexes, which hides
a bug: if we exclude from the load tables with foreign keys pointing to
tables we target, then we would DROP those foreign keys because of the
CASCADE option, but fail to install them again at the end of the load.

To prevent that from happening, pgloader now query the PostgreSQL
pg_depend system catalog to list the “missing” foreign keys and add them
to our internal catalog representation, from which we know to DROP then
CREATE the SQL object at the proper times.

See #400 as this was an oversight in fixing this issue.
2016-08-10 22:02:06 +02:00
Dimitri Fontaine
43261e0016 Fix double-counting of fkeys in stats reports. 2016-08-08 21:09:15 +02:00
Dimitri Fontaine
53924fab01 Fix foreign key definition formatting.
When we do have a condef (constraint definition in the PostgreSQL
catalog slang), use it rather than trying to invent it again from the
bits and pieces. See #400, which it actually fixes now...
2016-08-08 01:18:36 +02:00
Dimitri Fontaine
210664fff5 Fix typo: Performance, singular.
Fixed #432.
2016-08-07 21:40:28 +02:00
Dimitri Fontaine
ffef9bc28e Improve pgloader bundle distribution.
Include the local git clones in the bundle so that git is not needed at
build time for consumers of the bundle. Fixes #428.
2016-08-07 21:30:48 +02:00
Dimitri Fontaine
c2c98b8b42 Allow any character in a quoted CSV field name.
We used to force overly strict rules for a quoted field name in a CSV
load file, now accept any character but a quote to be part of the field
name.

Fixes #416.
2016-08-07 20:35:37 +02:00
Dimitri Fontaine
70572a2ea7 Implement support for existing target databases.
Also known as the ORM case, it happens that other tools are used to
create the target schema. In that case pgloader job is to fill in the
exiting target tables with the data from the source tables.

We still focus on load speed and pgloader will now DROP the
constraints (Primary Key, Unique, Foreign Keys) and indexes before
running the COPY statements, and re-install the schema it found in the
target database once the data load is done.

This behavior is activated when using the “create no tables” option as
in the following test-case setup:

  with create no tables, include drop, truncate

Fixes #400, for which I got a test-case to play with!
2016-08-06 20:19:15 +02:00
Dimitri Fontaine
2d47c4f0f5 Use internal catalog when loading from files.
Replace the ad-hoc code that was used before in the load from file code
path to use our full internal catalog representation, and adjust APIs to
that end.

The goal is to use catalogs everywhere in the PostgreSQL target API and
allowing to process reason explicitely about source and target catalogs,
see #400 for the main use case.
2016-08-05 11:42:06 +02:00
Dimitri Fontaine
42c8012e94 Cleanup: remove the now unused file! 2016-08-05 11:39:16 +02:00
Dimitri Fontaine
2aedac7037 Improve our internal catalog representation.
First, add index and foreign keys to the list of objects supported by
the shared catalog facility, where is was only found in the pgsql schema
specific package for historical raisons.

Then also add to our catalog internal structures the notion of a trigger
and a stored procedure, allowing for cleaner advanced default values
support in the MySQL cast functions.

Once we now have a proper and complete catalog, review the pgsql module
DDL output function in terms of the catalog and rewrite the schema
creation support so that it takes direct benefit of our internal
catalogs representation.

In passing, clean-up the code organisation of the pgsql target support
module to be easier to work with.

Next step consists of getting rid of src/pgsql/queries.lisp: this
facility should be replaced by the usage of a target catalog that we
fetch the usual way, thanks to the new src/pgsql/pgsql-schema.lisp file
and list-all-* functions.

That will in turn allow for an explicit step of merging the pre-existing
PostgreSQL catalog when it's been created by other tools than pgloader,
that is when migrating with the help of an ORM. See #400 for details.
2016-08-01 23:14:58 +02:00
Dimitri Fontaine
87f6d3a0a0 Clean-up overloaded parse rule for numbers.
The MSSQL index filters parser needs to parse digits and keep them as
text, but was piggybacking on the main parsers and the fixed file format
positions parser by re-using the rule name "number".

My understanding was that by calling `defrule' in different packages one
would create a separate set of rules. It might have been wrong from the
beginning or just changed in newer versions of esrap. Will have to
investigate more.

This fixes #434 while not applying suggested code: the comment about
where to fix the bug is spot on.

Also, it should be noted that the regression tests framework seems to be
failing us and returns success in that error case, despite code
installed to properly handle the situation. This will also need to be
investigated.
2016-07-31 23:54:18 +02:00
Dimitri Fontaine
7daee9405f Fix column names quoting in reset-all-sequences.
The other user-provided names (schema and table) were already quoted
using the quote_ident() PostgreSQL functio, but the column name (attname
in the catalogs) were not.

Blind attempt to fix #425.
2016-06-20 20:52:24 +02:00
Gert Van Gool
3109ba14dc Update bootstrap CentOS scripts (#424)
* Corrects CentOS7 instruction (incorrect group name)

* Update CentOS 6 bootstrap info

- More recent SBCL (1.1 -> 1.3)
- Missing freetds dependency
2016-06-17 23:15:29 +02:00
Krzysztof Jurewicz
fa9f437095 Override encoding in every testing connection (#410)
Also: reuse connection in process-regression-test.

Fix #408.
2016-05-31 23:12:41 +02:00
Krzysztof Jurewicz
1378949eee Fix docs about char and varchar casting in MySQL (#409) 2016-05-18 21:55:36 +02:00
Krzysztof Jurewicz
13f5821547 Add the “set not null” cast option for MySQL (#407)
Use case: Django dissuades setting NULL “on string-based fields […]
because empty string values will always be stored as empty strings, not
as NULL. If a string-based field has null=True, that means it has two
possible values for »no data«: NULL, and the empty string. In most
cases, it’s redundant to have two possible values for »no data«; the
Django convention is to use the empty string, not NULL.”.

pgloader already supports custom transformations which can be used to
replace NULL values in string-based columns with empty strings. Setting
NOT NULL constraint on those columns could possibly be achieved by
running a database query to extract their names and then generating
relevant ALTER TABLE statements, but a cast option in pgloader is a more
convenient way.
2016-05-18 21:50:09 +02:00
Dimitri Fontaine
7344e1d81e Improve docs for FILENAMES MATCHING support.
This format of source file specifications is available for CSV, COPY and
FIXED formats but was only documented for the CSV one. The paragraph is
copy/pasted around in the hope to produce per-format man pages and web
documentation in a fully automated way sometime.

Fix #397.
2016-05-18 11:07:28 +02:00
alex
49c9a2f016 add the postgres debian ppa key in the correct way (#406)
* add the postgres debian ppa key in the correct way

* experimental: remove dist-upgrade

* experimental: install asdf/sbcl via apt
2016-05-16 20:19:55 +02:00
alex
09c178c33b makefile: perform shallow clones when cloning deps (#405) 2016-05-16 20:17:47 +02:00
leonardsson
826d975985 Fix bug in max-parallel-create-index (#398)
Fixes #395
2016-05-05 22:00:38 +02:00
porshkevich
65e08fe187 fix type drop to cascade (#393)
if you have function or operator with  type which is removed, you will have error

error: cannot drop type because other objects depend on it
2016-04-27 21:43:02 +02:00
Dimitri Fontaine
44b9ec81c9 Fix non-deterministic projection in MySQL query.
In MySQL the information_schema.statistics table lists all indexes and
has a row per index column, which means that the index level properties
are duplicated on every row of the view.

Our query against that catalog was lazily assuming the classic and
faulty MySQL behavior where GROUP BY would allow non aggregated columns
to be reported even when the result isn't deterministic, this patch
fixes that by using a trick: the NON_UNIQUE column is 0 for a unique
index and 1 otherwise, so we sum the numbers and process 0 equality.

Fix #345 again.
2016-04-27 21:14:59 +02:00
Dimitri Fontaine
7af6c7ac41 Filter out incomplete foreign key definitions.
It's possible that in MySQL a foreign key constraint definition is
pointing to a non-existing table. In such a case, issue an error message
and refrain from trying to then reinstall the faulty foreign key
definition.

The lack of error handling at this point led to a frozen instance of
pgloader apparently, I think because it could not display the
interactive debugger at the point where the error occurs.

See #328, also #337 that might be fixed here.
2016-04-19 17:23:05 -04:00
Dimitri Fontaine
42e9e521e0 Add option "max parallel create index".
By default, pgloader will start as many parallel CREATE INDEX commands
as the maximum number of indexes you have on any single table that takes
part in the load.

As this number might be so great as to exhaust the target PostgreSQL
server (e.g. maintenance_work_mem), we add an option to limit that to
something reasonnable when the source schema isn't.

Fix #386 in which 150 indexes are found on a single source table.
2016-04-11 17:40:52 +02:00
Dimitri Fontaine
31f8b5c5f0 Set application_name to 'pgloader' by default.
It's always been possible to set application_name to anything really,
making it easier to follow the PostgreSQL queries made by pgloader.
Force that setting to 'pgloader' by default.

Fix #387.
2016-04-11 17:14:38 +02:00
Dimitri Fontaine
0805ee32b8 Add a CCL dockerfile.
For some reasons, with the default DYNSIZE and even when using the 64
bits Clozure-CL variant, I get a series of error messages like the one
below, so that I had to restrain to using 256 MB only:

  Fatal error in "buildapp" : Fault during read of memory address #x7F8C37522668
  Fatal error in "buildapp" : Fault during
  Fatal error in "buildapp" : Stack overflow on temp stack.
  Fatal error in "buildapp" : Fault during read of memory address #x7F8C37522668

It's worth trying something else as the limitation might be tied to my
local virtual build environment.

See #327 where the SBCL Garbage Collector is introducing problems which
might not appear at all when compiling with Clozure-CL instead.
2016-04-08 10:44:04 +02:00
Dimitri Fontaine
6d09de995b Add FusionBox to the sponsor page.
FusionBox bought a Moral License and helped test case pgloader against a
test instance of SQL Server with which it was easy to reproduce bugs.
Those got fixed thanks to their support!
2016-04-02 20:46:16 +02:00
Dimitri Fontaine
c439ea4b9c Explicitely allow for Return as whitespace.
Windows default end of line is #\Return then #\Newline and the parser
gets to see both of them, so it needs to be prepared. See #159 which is
all about windows support.
2016-03-30 10:41:50 +02:00
Dimitri Fontaine
7fc0812f79 Can't reduce an empty list with the max function.
The max function requires at least 1 argument to be given, and in the
case where we have no table to load it then fails badly, as show here:

  CL-USER> (handler-case
               (reduce #'max nil)
             (condition (c)
               (format nil "~a" c)))
  "invalid number of arguments: 0"

Of course Common Lisp comes with a very easy way around that problem:

  CL-USER> (reduce #'max nil :initial-value 0)
  0

Fix #381.
2016-03-29 21:02:31 +02:00
deepy
dcc926e90c Adds cast for image -> bytea 2016-03-29 20:54:27 +02:00
Dimitri Fontaine
177f48863b Fix regression testing.
It's been broken by a recent commit where we did force the internal
table representation to always be an instance of the table structure,
which wasn't yet true for regression testing.

In passing, re-indent a large portion of the function, which accounts
for most of the diff.
2016-03-27 21:28:51 +02:00
Dimitri Fontaine
b1d4e94f2a Fix integer parsing support for SQLite.
The function needs to return a string to be added to the COPY stream, we
still need to make sure whatever given here looks like an integer. Given
the very dynamic nature of data types in SQLite, the integer-to-string
function was already a default now, but failed to be published before in
its fixed version, somehow.
2016-03-27 20:42:40 +02:00
Dimitri Fontaine
fe3601b04c Fix SQLite index support, add foreign keys support.
It turns out recent changes broke tne SQLite index support (from adding
support for MS SQL partial/filtered indexes), so fix it by using the
pgsql-index structure rather than the specific sqlite-idx one.

In passing, improve detection of PRIMARY KEY indexes, which was still
lacking. This work showed that the introspection done by pgloader was
wrong, it's way more crazy that we though, so adjust the code to loop
over PRAGMA calls for each object we inspect.

While adding PRAGMA calls, add support for foreign keys too, we have the
code infrastructure that makes it easy now.
2016-03-27 20:39:13 +02:00
Dimitri Fontaine
cdc5d2f06b Review on update CURRENT_TIMESTAMP support.
Make it work on the second run, when the triggers and functions have
already been deplyed, by doing the DROP function and trigger before we
CREATE the table, then CREATE them again: we need to split the list
again.
2016-03-27 19:13:33 +02:00
Dimitri Fontaine
45924be87d Add support for MS SQL newid() function.
The newid() function seems to be equivalent to the newsequentialid() one
if I'm to believe issue #204, so let's just add that assumption in the
code.

Fix #204.
2016-03-27 01:09:22 +01:00
Dimitri Fontaine
d72c711b45 Implement support for on update CURRENT_TIMESTAMP.
That's the MySQL slang for a simple ON UPDATE trigger, and that's what
pgloader nows translate the expression to. Fix #195.
2016-03-27 01:01:40 +01:00
Dimitri Fontaine
156f5a4418 Merge pull request #378 from pylaligand/esrap
Removed reference to defunct build target.
2016-03-26 20:51:10 +01:00
Dimitri Fontaine
35155654df Allow to ALTER TABLE ... IN SCHEMA.
That brings the ALTER TABLE feature to MS SQL source.
2016-03-26 20:50:05 +01:00
Dimitri Fontaine
fcc6e8f813 Implement ALTER SCHEMA ... RENAME TO...
That's only available for MS SQL as of now, as it's the only source
database we have where the notion of a schema makes sense. Fix #224.
2016-03-26 20:25:03 +01:00
Dimitri Fontaine
3d061a5f88 Improve regression tests to detect more errors.
By changing the order of the relations in the EXCEPT query, we can now
detect when the target table is loaded empty.
2016-03-26 20:23:12 +01:00
P.Y. Laligand
8523410555 Removed reference to defunct build target. 2016-03-26 12:19:54 -07:00
Dimitri Fontaine
7b33b9c853 Switch back again to the main esrap code.
The WIP branch about better error messages made its way through the main
code, so switch back to the mainline as available directly in Quicklisp.

See https://github.com/nikodemus/esrap/issues/26.
2016-03-26 18:36:04 +01:00
Dimitri Fontaine
787be7f188 Review fixed source import.
The clone method was missing specific slots of fixed-copy class.
2016-03-26 18:33:04 +01:00
Dimitri Fontaine
6f078daeb9 Ensure logging of errors.
The first error of a batch was lost somewhere in the recent changes. My
current best guess is that the rewrite of the copy-batch function made
the handler-bind form setup by the handling-pgsql-notices macro
ineffective, but I can't see why that is.

See #85.
2016-03-26 17:51:38 +01:00
Dimitri Fontaine
d1cfe90f5d Another MS SQL index filter fix.
The common lisp default printer is nice enough to know how to print
symbols as strings, but that won't cut it when the symbol :is-not-null
needs to be printed out "is not null", without the dashes.

See #365.
2016-03-22 00:37:07 +01:00
Dimitri Fontaine
e2fcd86868 Handle failure to convert index filters gracefully.
We should not block any processing just because we can't parse an index.
The best we can do just tonight is to try creating the index without the
filter, ideally we would have to skip building the index entirely.
That's for a later effort though, it's running late here.

See #365.
2016-03-22 00:29:25 +01:00
Dimitri Fontaine
ac7f326447 Fix support for <> in MS SQL filter parsing.
Beware of the order of the parser attempts...

See #365.
2016-03-22 00:26:43 +01:00
Dimitri Fontaine
44660326d7 Review previous patch.
The only case with a test is the "([deleted]=(0))" case, which showed a
tad too much in the current implementation of the MS SQL index filters
parsing. Try to prepare better for next filters.

Next step: adding some test cases.

See #365.
2016-03-22 00:17:13 +01:00
Dimitri Fontaine
5e18cfd7d4 Implement support for partial indexes.
MS SQL has a notion of a "filtered index" that matches the notion of a
PostgreSQL partial index: the index only applies to the rows matching
the index WHERE clause, or filter.

The WHERE clause in both case are limited to simple expressions over a
base table's row at a time, so we implement a limited WHERE clause
parser for MS SQL filters and a transformation routine to rewrite the
clause in PostgreSQL slang.

In passing, we transform the filter constants using the same
transformation functions as in the CAST rules, so that e.g. a MS SQL
bit(1) value that got transformed into a PostgreSQL boolean is properly
translated, as in the following example:

  MS SQL:     "([deleted]=(0))"  (that's from the catalogs)
  PostgreSQL: deleted = 'f'

Of course the parser is still very badly tested, let's see what happens
in the wild now.

(Should) Fix #365.
2016-03-21 23:39:45 +01:00
Dimitri Fontaine
8fc9a474d9 Document --dry-run and --on-error-stop options. 2016-03-21 21:24:39 +01:00
Dimitri Fontaine
1ed07057fd Implement --on-error-stop command line option.
The implementation uses the dynamic binding *on-error-stop* so it's also
available when pgloader is used as Common Lisp librairy.
The (not-all-that-) recent changes made to the error handling make that
implementation straightforward enough, so let's finally do it!

Fix #85.
2016-03-21 20:52:50 +01:00
Dimitri Fontaine
8476c1a359 Allow setting search_path with multiple schemas.
The PostgreSQL search_path allows multiple schemas and might even need
it to be able to reference types and other tables. Allow setting more
than one schema by using the fact that PostgreSQL schema names don't
need to be individually quoted, and passing down the exact content of
the SET search_path value down to PostgreSQL.

Fix #359.
2016-03-20 20:54:08 +01:00
Dimitri Fontaine
63c3b3b1c7 Fix MS SQL text values processing.
The previous code required non-zero data length for all MS SQL returned
values, where it makes no sense for text like values (an empty string is
ok). Also, the code was trimming spaces from both ends on syb-char data,
and in testing that return type is used for varchar too.

Fix #366. Fix #368.
2016-03-20 20:15:02 +01:00
Dimitri Fontaine
4155d06ae5 Improve support for MS SQL multicolumn indexes.
Once more we can't use an aggregate over a text column in MS SQL to
build the index definition from its catalog structure, so we have to do
that in the lisp part of the code.

Multi-column indexes are now supported, but filtered indexes still are a
problem: the WHERE clause in MS SQL is not compatible with the
PostgreSQL syntax (because of [names] and type casting.

For example we cast MS SQL bit to PostgreSQL boolean, so

  WHERE ([deleted]=(0))

should be translated to

  WHERE not deleted

And the code to do that is not included yet.

The following documentation page offers more examples of WHERE
expression we might want to support:

  https://technet.microsoft.com/en-us/library/cc280372.aspx

  WHERE EndDate IS NOT NULL
    AND ComponentID = 5
    AND StartDate > '01/01/2008'

  EndDate IN ('20000825', '20000908', '20000918')

It might be worth automating the translation to PostgreSQL syntax and
operators, but it's not done in this patch.

See #365, where the created index will now be as follows, which is a
problem because of being UNIQUE: some existing data won't reload fine.

  CREATE UNIQUE INDEX idx_<oid>_foo_name_unique ON dbo.foo (name, type, deleted);
2016-03-18 11:01:06 +01:00
Dimitri Fontaine
d2a1ac639f Fix MS SQL foreign key support.
Avoid registering the first column name twice in the foreign key
definition.
2016-03-16 22:01:01 +01:00
Dimitri Fontaine
4cb83ec6a5 DEBUG mode should list all SQL queries sent.
Even for MS SQL source.
2016-03-16 21:55:40 +01:00
Dimitri Fontaine
3e8b7df0d3 Improve column formatting.
Have a pretty-print option where we try to be nice for the reader, and
don't use it in the CAST debug messages. Also allow working with the
real maximum length of column names rather than hardcoding 22 cols...
2016-03-16 21:46:41 +01:00
Dimitri Fontaine
f1fe9ab702 Assorted fixes to MS SQL support.
Having been given a test instance of a MS SQL database allows to quickly
fix a series of assorted bugs related to schema handling of MS SQL
databases. As it's the only source with a proper notion of schema that
pgloader supports currently, it's not a surprise we had them.

Fix #343. Fix #349. Fix #354.
2016-03-16 21:43:04 +01:00
Dimitri Fontaine
c1fc4f0879 Review MySQL foreign key introspection SQL query.
It turns out sloppy SQL code made its way to pgloader wherein the GROUP
BY clause of the foreign key listing wasn't reference the whole set of
non aggregated output columns.

Thanks to thiagokronig for the new query, which fixes #345.
2016-03-09 18:36:44 +01:00
Dimitri Fontaine
b7a873c03f Drop default value on bigserial CAST in MS SQL.
This is a blind attempt to fix #354.
2016-03-09 18:30:18 +01:00
Dimitri Fontaine
57f7fd1d4e Find foreign keys with #'string= by default.
Blind attempt at fixing #343 and #330, which now is on at the same
level.
2016-03-09 16:33:44 +01:00
Dimitri Fontaine
c724018840 Implement ALTER TABLE clause for MySQL migrations.
The new ALTER TABLE facility allows to act on tables found in the MySQL
database before the migration happens. In this patch the only provided
actions are RENAME TO and SET SCHEMA, which fixes #224.

In order to be able to provide the same option for MS SQL users, we will
have to make it work at the SCHEMA level (ALTER SCHEMA ... RENAME TO
...) and modify the internal schema-struct so that the schema slot of
our table instances are a schema instance rather than its name.

Lacking MS SQL test database and instance, the facility is not yet
provided for that source type.
2016-03-06 21:51:33 +01:00
Dimitri Fontaine
d4737a39ca Leave ssl lib alone in src/hooks.lisp.
That means we no longer eagerly load it when we think we will need it,
and also refrain from unloading it from the binary at image saving time.

In my local tests, doing so fix #330 by avoiding the error entirely in
the docker image, where obviously the libs found at build-time are found
again at the same place at run time.
2016-03-05 22:45:59 +01:00
Dimitri Fontaine
68aa205db5 Also commit SQLite test case changes.
See #351 for context, this adds a proper test case.
2016-03-03 14:59:57 +01:00
Dimitri Fontaine
486be8c068 SQLite integer default values might be quoted.
Fix #351 by having a new transformation function to process SQLite
integers, that may be quoted...
2016-03-03 14:59:27 +01:00
Dimitri Fontaine
62edd5a2c8 Register "nocase" as a SQLite noise word.
SQLite types include "text nocase" apparently, so add "nocase" as one of
the managed noise words. It might be time we handle those the other way
round, with a whitelist of expected tokens somewhere in the type
definition rather than a blacklist of unknown words to exclude...

Anyway, fix #350.
2016-03-03 00:21:43 +01:00
Dimitri Fontaine
b026a860c1 Fix MS SQL fetch metadata function.
It should return the fetched catalog rather than the count of objects,
which is only used for statistics purposes. Fix #349.

This problem once again shows that we lack proper testing environment
for MS SQL source :/
2016-03-02 16:20:55 +01:00
Dimitri Fontaine
eaa5807244 Adapt to CURRENT_TIMESTAMP(x) default values.
We target CURRENT_TIMESTAMP as the PostgreSQL default value for columns
when it was different before on the grounds that the type casting in
PostgreSQL is doing the job, as in the following example:

    pgloader# create table test_ts(ts timestamptz(6) not null default CURRENT_TIMESTAMP);
    CREATE TABLE
    pgloader# insert into test_ts VALUES(DEFAULT);
    INSERT 0 1
    pgloader# table test_ts;
                  ts
    -------------------------------
     2016-02-24 18:32:22.820477+01
    (1 row)

    pgloader# drop table test_ts;
    DROP TABLE
    pgloader# create table test_ts(ts timestamptz(0) not null default CURRENT_TIMESTAMP);
    CREATE TABLE
    pgloader# insert into test_ts VALUES(DEFAULT);
    INSERT 0 1
    pgloader# table test_ts;
               ts
    ------------------------
     2016-02-24 18:32:44+01
    (1 row)

Fix #341.
2016-02-24 18:30:16 +01:00
Dimitri Fontaine
40c1581794 Review transaction and error handling in COPY.
The PostgreSQL COPY protocol requires an explicit initialization phase
that may fail, and in this case the Postmodern driver transaction is
already dead, so there's no way we can even send ABORT to it.

Review the error handling of our copy-batch function to cope with that
fact, and add some logging of non-retryable errors we may have.

Also improve the thread error reporting when using a binary image from
where it might be difficult to open an interactive debugger, while still
having the full blown Common Lisp debugging experience for the project
developers.

Add a test case for a missing column as in issue #339.

Fix #339, see #337.
2016-02-21 15:56:06 +01:00
Dimitri Fontaine
9512ab187e Fix the fix, see #343.
Someday I should either stop working on pgloader in between other things
or have a better test suite, including MS SQL and all. Probably both.
And read compiler notes and warnings too, while at that...
2016-02-20 14:15:13 +01:00
Dimitri Fontaine
197258951c Improve MS SQL usage of the schema structs.
The function qualify-name is not in use anymore, but the MSSQL parts
didn't get the memo... fix #343.
2016-02-19 17:55:54 +01:00
Dimitri Fontaine
765bbb70aa Fix auto_increment support in cast rules.
This fixes #141 again when users are forcing MySQL bigint(20) into
PostgreSQL bigint types so that foreign keys can be installed. To this
effect, as cast rule such as the following is needing:

   cast type bigint when (= 20 precision) to bigint drop typemod

Before this patch, this user provided cast rule would also match against
MySQL types "with extra auto_increment", and it should not.

If you're having the problem that this patch fixes on an older pgloader
that you can't or won't upgrade, consider the following user provided
set of cast rules to achieve the same effect:

   cast type bigint with extra auto_increment to bigserial drop typemod,
        type bigint when (= 20 precision) to bigint drop typemod
2016-02-05 21:26:31 +01:00
Dimitri Fontaine
c108b85290 Allow package prefix in CAST ... USING clause.
Also, in passing, ass a new transformation function for MySQL allowing
to transform from varbinary to text.
2016-02-04 16:09:22 +01:00
Dimitri Fontaine
782561fd4e Handle default value transforms errors, fix #333.
It turns out that MySQL catalog always store default value as strings
even when the column itself is of type bytea. In some cases, it's then
impossible to transform the expected bytea from a string.

In passing, move some code around to fix dependencies and make it
possible to issue log warnings from the default value printing code.
2016-02-03 12:27:58 +01:00
Dimitri Fontaine
e7771ff3d8 Remove platform specific tar options. 2016-02-02 15:28:00 +01:00
Dimitri Fontaine
029ea0027a Upgrade version string.
We just tagged the repository as version 3.3.0.50 to be able to release
an experimental pgloader bundle, and we did tag the repository. The
first commit after that should then change the version string.
2016-01-31 21:49:43 +01:00
Dimitri Fontaine
1280ae0b8c Add a bundle distribution.
Using Quicklisp bundle facility it is possible to prepare a
self-contained archive of all the code needed to build pgloader.

Doing that should allow users to easily build pgloader when they are
being a restrictive proxy, and packagers to work from a source tarball
that has a very limited build dependencies.
2016-01-31 21:47:14 +01:00
Dimitri Fontaine
76668c2626 Review package dependencies.
The decision to use lots of different packages in pgloader has quite
strong downsides at times, and the manual managment of dependencies is
one of the, in particular how to avoid circular ones.
2016-01-31 18:42:01 +01:00
Dimitri Fontaine
64ab4d28dc Error out when using ignored options.
In the theory that it's a better service to the user to refuse doing
anything at all rather than ignore his/her commands, print out FATAL
errors when options are used that are incompatible with a load command
file.

See #327 for a case where this did happen.

In passing, tweak our report code to avoid printing the footer when we
didn't print anything at all previously.
2016-01-25 11:46:36 +01:00
Dimitri Fontaine
4e36bd3c55 Improve threads error handling.
See #328 where we are lacking useful stack trace in a --debug run
because of the previous talk-handler-bind coding, that was there to
avoid sinking the users into too many details. Let's try another
approach here.
2016-01-24 21:43:46 +01:00
Dimitri Fontaine
b2ec66c84b Force external-format of the logs files, see #328.
In the issue #328 the --debug level output is not helpful because of an
encoding error in the logfile. Let's see about forcing the log file
external format to utf-8 then.
2016-01-20 21:53:13 +01:00
Dimitri Fontaine
4c84954a0d Merge pull request #329 from maksimf/patch-1
Fix typo in documentation
2016-01-20 21:38:23 +01:00
Maxim Filippov
6d02591e9c Fix typo in documentation 2016-01-20 12:41:50 +03:00
Dimitri Fontaine
327745110a MySQL bytea default value can be "". Fix 291.
Thanks to a reproducable test case we can see that MySQL default for a
varbinary column is an empty string, so tweak the transform function
byte-vector-to-bytea in order to cope with that.
2016-01-18 21:55:01 +01:00
Dimitri Fontaine
d9d9e06c0f Another attempt at fixing #323.
Rather than trying hard to have PostgreSQL fully qualify the index name
with tricks around search_path setting at the time ::regclass is
executed, simply join on pg_namespace to retrieve that schema in a new
slot in our pgsql-index structure so that we can then reuse it when
needed.

Also add a test case for the scenario, including both a UNIQUE
constraint and a classic index, because the DROP and CREATE/ALTER
instructions differ.
2016-01-17 01:54:36 +01:00
Dimitri Fontaine
7dd69a11e1 Implement concurrency and workers for files sources.
More than the syntax and API tweaks, this patch also make it so that a
multi-file specification (using e.g. ALL FILENAMES IN DIRECTORY) can be
loaded with several files in the group in parallel.

To that effect, tweak again the md-connection and md-copy
implementations.
2016-01-16 22:53:55 +01:00
Dimitri Fontaine
aa8b756315 Fix when to create indexes.
In the recent refactoring and improvements of parallelism the indexes
creation would kick in before we know that the data is done being copied
over to the target table.

Fix that by maintaining a writers-count hashtable and only starting to
create indexes when that count reaches zero, meaning all the concurrent
tasks started to handle the COPY of the data are now done.
2016-01-16 19:50:21 +01:00
Dimitri Fontaine
dcc8eb6d61 Review api around worker-count.
It was worker-count and it's now exposed as the worker in the WITH
clause, but we can actually keep it as worker-count in the internal API,
and it feels better that way.
2016-01-16 19:49:52 +01:00
Dimitri Fontaine
eb45bf0338 Expose concurrency settings to the end users.
Add the workers and concurrency settings to the LOAD commands for
database sources so that users can tweak them now, and add mentions of
them in the documentation too.

From the documentation string of the copy-from method as found in
src/sources/common/methods.lisp:

   We allow WORKER-COUNT simultaneous workers to be active at the same time
   in the context of this COPY object. A single unit of work consist of
   several kinds of workers:

     - a reader getting raw data from the COPY source with `map-rows',
     - N transformers preparing raw data for PostgreSQL COPY protocol,
     - N writers sending the data down to PostgreSQL.

   The N here is setup to the CONCURRENCY parameter: with a CONCURRENCY of
   2, we start (+ 1 2 2) = 5 concurrent tasks, with a CONCURRENCY of 4 we
   start (+ 1 4 4) = 9 concurrent tasks, of which only WORKER-COUNT may be
   active simultaneously.

Those options should find their way in the remaining sources, that's for
a follow-up patch tho.
2016-01-15 23:22:32 +01:00
Dimitri Fontaine
fb40a472ab Simplify database WITH option handling.
Share more code by having a common flattening function as a semantic
predicate in the grammar.
2016-01-15 22:34:27 +01:00
Dimitri Fontaine
bfdbb2145b Fix with drop index option, fix #323.
Have PostgreSQL always fully qualify the index related objects and SQL
definition statements when fetching the list of indexes of a table, by
playing with an empty search_path.

Also improve the whole index creation by passing the table object as the
context where to derive the table-name from, so that schema qualified
tables are taken into account properly.
2016-01-15 15:04:07 +01:00
Dimitri Fontaine
1ff204c172 Typo fix. 2016-01-15 14:45:19 +01:00
Dimitri Fontaine
44a2bd14d4 Fix custom CAST rules with expressions, fix #322.
In a previous commit the typemod matching code had been broken, and we
failed to notice that until now. Thanks to bug report #322 we just got
the memo...

Add a test case in the local-only MySQL database.

The regression testing facilities should be improved to be able to test
a full database, and then to dynamically create said database from code
or something to ease test coverage of those cases.
2016-01-12 14:55:17 +01:00
Dimitri Fontaine
2c200f5747 Improve error handling for pkeys creation.
When creating the primary keys on top of the unique indexes, we might
still have errors (e.g. with NULL values). Make it so that a failure in
one pkey doesn't fail every other one, by having them all run within a
single connection rather than a single transaction.
2016-01-12 14:53:42 +01:00
Dimitri Fontaine
133028f58d Desultory review code indentation. 2016-01-12 14:52:44 +01:00
Dimitri Fontaine
ee69b8d4ce Randomly tweak batch sizes.
In order to avoid all concurrently prepared batches of rows to get sent
to PostgreSQL COPY command at the same time exactly, randomly vary the
size of each batch between -30% and +30% of the batch rows parameter.
2016-01-11 21:29:29 +01:00
Dimitri Fontaine
f256e12a4f Review load parallelism settings.
pgloader parallel workload is still hardcoded, but at least the code now
uses clear parameters as input so that it will be possible in a later
patch to expose them to the end-user.

The notions of workers and concurrency are now handled as follows:

  - concurrency is how many tasks are allowed to happen at once, by
    default we have a reader thread, a transformer thread and a COPY
    thread all actives for each table being loaded,

  - worker-count is how many parallel threads are allowed to run
    simultaneously and default to 8 currently, which means that in a
    typical migration from a database source and given default
    concurrency or 1 (3 threads), we might be loaded up to 3 different
    tables at any time.

The idea is to expose those settings to the user in the load file and as
command line options (such as --jobs) and see what it gives us. It might
help e.g. use more cores in loading a single CSV file.

As of this patch, there still can only be only one reader thread and the
number of transformer threads must be the same as the number of COPY
threads.

Finally, the CSV-like files user-defined projections are now handled in
the tranformation threads rather than in the reader thread...
2016-01-11 01:43:38 +01:00
Dimitri Fontaine
94ef8674ec Typo fix (of sorts)
Some API didn't get the table-name to table memo...
2016-01-11 01:42:18 +01:00
Dimitri Fontaine
a3fd22acd3 Review pgloader encoding story.
Thanks to Common Lisp character data type, it's easy for pgloader to
enforce always speaking to PostgreSQL in utf-8, and that's what has been
done from the beginning actually.

Now, without good reason for that, the first example of a SET clause
that has been added to the docs where about how to set client_encoding,
which should NOT be done.

Fix that at the use level by removing the bad example from the docs and
adding a WARNING whenever the client_encoding is set to a known bad
value. It's a WARNING because we then simply force 'utf-8' anyway.

Also, review completely the format-vector-row function to avoid doing
double work with the Postmodern facilities we piggyback on. This was
done halfway through and the utf-8 conversion was actually done twice.
2016-01-11 01:27:36 +01:00
Dimitri Fontaine
cf73a0e6c0 Merge pull request #318 from richardkmichael/detect-sbcl-core-compression
Detect sbcl core compression and Makefile gardening.
2016-01-10 17:53:48 +01:00
Richard Michael
6dcdf4711b Easier install by detecting SBCL core-compression.
Various Linux distributions provide SBCL without core-compression
enabled. On the other hand, Mac OSX (at least via `homebrew`) SBCL with
core-compression enabled.  To make installation easier, teach the make
process to detect core-compression, and use it if possible.
2016-01-09 22:17:02 -05:00
Dimitri Fontaine
d60b64c03b Implement MS SQL newsequentialid() default value.
We convert the default value call to newsequentialid() into a call to
the PostgreSQL uuid-ossp uuid_generate_v1() which seems like the
equivalent function.

The extension "uuid-ossp" needs to be installed in the target database.

(Blind) Fix #246.
2016-01-08 22:43:38 +01:00
Dimitri Fontaine
8a596ca933 Move connection into utils.
There's no reason why this file should be in the src/ top-level.
2016-01-07 16:42:43 +01:00
Dimitri Fontaine
d1a2e3f46b Improve the Dockerfile and the versioning.
When building from sources within the git environement, the version
number is ok, but it was wrong when building in the docker image. Fix
the version number to 3.3.0.50 to show that we're talking about a
development snapshot that is leading to version 3.3.1.

Yeah, 4 parts version numbers. That happens, apparently.
2016-01-07 10:21:52 +01:00
Dimitri Fontaine
ee2a68f924 Improve Dockerfile.
It was quite idiotic to RUN a git clone rather than just use the files
from the docker context...
2016-01-05 11:28:19 +01:00
Dimitri Fontaine
286a39f6e6 Proof read of the README.md file.
Some advice was pretty ancient, and we should now mention debian
packaging support and the docker hub image.
2016-01-04 23:22:52 +01:00
Dimitri Fontaine
f8cb7601c5 Implement a Dockerfile.
Apparently it's quite common nowadays for people to use docker to build
and run software in a contained way, so provide users with the facility
they need in order to do that.
2016-01-04 21:05:46 +01:00
Dimitri Fontaine
1bbbf96ba7 Fix minor API glitch/typo. 2016-01-04 21:01:15 +01:00
Dimitri Fontaine
a7291e9b4b Simplify copy-database implementation further.
Following-up to the recent refactoring effort, the IXF and DB3 source
classes didn't get the memo that they could piggyback on the generic
copy-database implementation. This patch implements that.

In passing, also simplify the instanciate-table-copy-object method for
copy subclasses that need specialization here, by using change-class and
call-next-method so as to reuse the generic code as much as possible.
2016-01-01 14:28:09 +01:00
Dimitri Fontaine
24cd0de9f7 Install the :create-schemas option back.
In the previous refactoring patch that option mistakenly went away,
although it is still needed for MS SQL and it is planned to make use of
it in the other source types too...

See #316 for reference.
2016-01-01 13:35:35 +01:00
Dimitri Fontaine
9e4938cea4 Implement PostgreSQL catalogs data structure.
In order to share more code in between the different source types,
finally have a go at the quite horrible mess of anonymous data
structures floating around.

Having a catalog and schema instances not only allows for code cleanup,
but will also allow to implement some bug fixes and wishlist items such
as mapping tables from a schema to another one.

Also, supporting database sources having a notion of "schema" (in
between "catalog" and "table") should get easier, including getting
on-par with MySQL in the MS SQL support (materialized views has been
asked for already).

See #320, #316, #224 for references and a notion of progress being made.

In passing, also clean up the copy-databases methods for database source
types, so that they all use a fetch-metadata generic function and a
prepare-pgsql-database and a complete-pgsql-database generic function.
Actually, a single method does the job here.

The responsibility of introspecting the source to populate the internal
catalog/schema representation is now held by the fetch-metadata generic
function, which in turn will call the specialized versions of
list-all-columns and friends implementations. Once the catalog has been
fetched, an explicit CAST call is then needed before we can continue.

Finally, the fields/columns/transforms slots in the copy objects are
still being used by the operative code, so the internal catalog
representation is only used up to starting the data copy step, where the
copy class instances are then all that's used.

This might be refactored again in a follow-up patch.
2015-12-30 21:53:01 +01:00
Dimitri Fontaine
d84ec3f808 Add SQLite test case for before/after load commands.
See bug #321, this change should have been part of previous commit.
2015-12-23 21:58:56 +01:00
Dimitri Fontaine
8355b2140e Implement before/after load support for SQLite, fix #321.
If there ever was a good reason not to implement before/after support
for SQLite, it's no longer valid: done.
2015-12-23 21:56:10 +01:00
Dimitri Fontaine
72e7c2af70 At long last, log cast rule choices, see #317.
To help debug the casting rule choices, output a line for each decision
that is made with the input and the output of the decision.
2015-12-08 21:27:33 +01:00
Dimitri Fontaine
735cdc8fdc Document the remove-null-characters transform.
Both as a new transformation function available, and as the default for
Text conversions when coming from MySQL. See #258, Fixes #219.
2015-12-08 21:04:47 +01:00
Aliaksei Urbanski
3a55d80411 Default cast rules for MySQL's text types fixed, see #219 2015-12-08 20:59:29 +01:00
Dimitri Fontaine
b4bfa18877 Fix more table name quoting, fix #163 again.
Now that we can have several threads doing COPY, each of them need to
know about the *pgsql-reserved-keywords* list. Make sure that's the case
and in passing fix some call sites to apply-identifier-case.

Also, more disturbingly, fix the code so that TRUNCATE is called from
the main thread before giving control to the COPY threads, rather than
having two concurrent threads doing the TRUNCATE twice. It's rather
strange that we got no complaint from the field on that part...
2015-12-08 11:52:43 +01:00
Dimitri Fontaine
dca3dacf4b Don't issue useless MySQL catalog queries...
When the option "WITH no foreign keys" is in use, it's not necessary to
go read the foreign key information_schema bits at all, so just don't
issue the query, and same thing with the "create no indexes" option.

In no that old versions of MySQL, the referential_constraints table of
information_schema doesn't exist, so this should make pgloader
compatible with MySQL 5.0 something and earlier.
2015-12-03 19:24:00 +01:00
Dimitri Fontaine
7c64a713d0 Fix PostgreSQL write times in the summary.
It turns out the summary write times included time spent waiting for
batches to be ready, which isn't fair to PostgreSQL COPY implementation,
and moreover doesn't help figuring out the bottlenecks...
2015-11-29 23:23:30 +01:00
Dimitri Fontaine
cca44c800f Simplify batch and transformation handling.
Make batches of raw data straight from the reader output (map-rows) and
have the transformation worker focus on changing the batch content from
raw rows to copy strings.

Also review the organisation of responsabilities in the code, allowing
to move queue.lisp into utils/batch.lisp, renaming it as its scope has
been reduced to only care about preparing batches.

This came out of trying to have multiple workers concurrently processing
the batches from the reader and feeding the hardcoded 2 COPY workers,
but it failed for multiple reasons. All is left as of now is this
cleanup, which seems to be on the faster side of things, which is always
good.
2015-11-29 17:35:25 +01:00
Dimitri Fontaine
2dd7f68a30 Fix index completion management in MySQL and SQLite.
We used to wait for the wrong number of workers, meaning the rest of the
code began running before the indexes where all available. A user report
where one of the indexes takes a very long time to compute made it
obvious.

In passing, also improve reporting of those rendez-vous sections.
2015-11-29 17:29:57 +01:00
Dimitri Fontaine
af9e423f0b Fix errors counting, see #313.
It came pretty obvious that the error counting was broken, it happens
that I forgot to pass the information down to the state handling parts
of the code.

In passing improve and fix CSV parse errors counting and fatal errors
reporting.
2015-11-27 11:51:48 +01:00
Dimitri Fontaine
533a49a261 Handle PostgreSQL notifications early, fix #311.
In some cases, like when client_min_messages is set to debug5,
PostgreSQL might send notification messages to the connecting client
even while opening a connection. Those are still considered WARNINGs by
the Postmodern driver...

Handle those warnings by just printing them out in the pgloader logs,
rather than considering those conditions as hard failures (signaling a
db-connection-error).
2015-11-24 10:52:25 +01:00
Dimitri Fontaine
93b6be43d4 Travis: adapt to PostgreSQL 9.1, again.
We didn't have CREATE SCHEMA IF EXISTS at the time...
2015-11-23 22:09:08 +01:00
Dimitri Fontaine
f109c3fdb4 Travis: prepare an "err" schema.
The test/errors.load set the search_path to include the 'err' schema,
which is to be created by the test itself. PostgreSQL 9.1 raises an
error where 9.4 and following just accept the setting, and Travis runs a
9.1 PostgreSQL.

Let's just create the schema before-hand so that we can still run tests
against SET search_path from the load file.
2015-11-23 15:26:18 +01:00
Dimitri Fontaine
4e23de1b2b Missing file from previous commit.
Somehow it still happens :/
2015-11-22 23:32:22 +01:00
Dimitri Fontaine
973339abc8 Add a SQLite test case from #310. 2015-11-22 22:16:05 +01:00
Dimitri Fontaine
e23de0ce9f Improve SQLite table names filtering.
Filter the list of tables we migrate directly from the SQLite query,
avoiding to return useless data. To do that, use the LIKE pattern
matching supported by SQLite, where the REGEX operator is only available
when extra features are loaded apparently.

See #310 where filtering out the view still caused errors in the
loading.
2015-11-22 22:10:26 +01:00
Dimitri Fontaine
a81f017222 Review SQLite integration with recent changes.
The current way to do parallelism in pgloader was half baked in the
SQLite source implementation, get it up to speed again.
2015-11-22 21:30:20 +01:00
Dimitri Fontaine
5f60ce3d96 Travis: create the "expected" schema.
As we don't use the `make -C test prepare` target, reproduce a missing
necessary precondition to running the unit tests.
2015-11-21 21:19:37 +01:00
Dimitri Fontaine
bc870ac96c Use 2 copy threads per target table.
It's been proven by Andres Freund benchmarks that the best number of
parallel COPY threads concurrently active against a single table is 2 as
of PostgreSQL current development version (up to 9.5 stable, might still
apply to 9.6 depending on how we might solve the problem).

Henceforth hardcode 2 COPY threads in pgloader. This also has the
advantage that in the presence of lots of bad rows, we should sustain a
better throughtput and not stall completely.

Finally, also improve the MySQL setup to use 8 threads by default, that
is be able to load two tables concurrently, each with 2 COPY workers, a
reader and a transformer thread.

It's all still experimental as far as performances go, next patches
should bring the capability to configure the parallelism wanted from the
command line and load command tho.

Also, other source types will want to benefit from the same
capabilities, it just happens that it's easier to play with MySQL first
for some reasons here.
2015-11-17 17:06:36 +01:00
Dimitri Fontaine
150d288d7a Improve our regression testing facility.
Next parallelism improvements will allow pgloader to use more than one
COPY thread to load data, with the impact of changing the order of rows
in the database.

Rather than doing a copy out and `diff` of the data just loaded, load
the reference data and do the diff in SQL:

          select * from loaded.data
  except
          select * from expected.data

If such a query returns any row, we know we didn't load what was
expected and the regression test is failing.

This regression testing facility should also allow us to finally add
support for multiple-table regression tests (sqlite, mysql, etc).
2015-11-17 17:03:08 +01:00
Dimitri Fontaine
6ca376ef9b Simplify the main function (refactor).
Move some code away in its own function for easier review and
modifications of the main entry point.
2015-11-16 16:01:25 +01:00
Dimitri Fontaine
da05782002 Allow date formats to miss time parts.
In case the seconds field are not provided just use "00" rather than NIL
as currently...
2015-11-14 21:14:20 +01:00
Dimitri Fontaine
6cbec206af Turns out SSL key/crt file paths should be strings.
Our PostgreSQL driver uses CFFI to load the SSL support from open ssl
and as a result the certificate and key file names should be strings
rather than pathnames. Should fix #308 again...
2015-11-11 23:10:29 +01:00
Dimitri Fontaine
f8ae9f22b9 Implement support for SSL client certificates.
This fixes #308 by automatically using the PostgreSQL Client Side SSL
files as documented in the following reference:

  http://www.postgresql.org/docs/current/static/libpq-ssl.html#LIBPQ-SSL-FILE-USAGE

This uses the Postmodern special support for it. Unfortunately couldn't
test it locally other than it doesn't break non-ssl connections. Pushing
to have user feedback.
2015-11-09 11:32:17 +01:00
Dimitri Fontaine
e3cc76b2d4 Export copy-column-list from pgloader.sources.
That allows the copy-column-list specific method for MySQL to be a
method of the common pgloader.sources::copy-column-list generic
function, and then to be called again when needed.

This fixes an oversight in #41e9eeb and fixes #132 again.
2015-11-08 18:48:54 +01:00
Dimitri Fontaine
042045c0a6 Clozure already provides a getenv function.
Trying to provide a new one fails with an error, that I missed because I
must have forgotten to `make clean` when adding the previous #+ccl
variant here...

This alone doesn't allow to fix building for CCL but already improves
the situation as reported at #303. Next failure is something I fail to
understand tonight:

  Fatal SIMPLE-ERROR:
  Compilation failed: In MAKE-DOUBLE-FLOAT: Type declarations violated in (THE FIXNUM 4294967295) in /Users/dim/dev/pgloader/build/quicklisp/local-projects/qmynd/src/common/utilities.lisp
2015-11-08 18:26:58 +01:00
Dimitri Fontaine
3673c5c341 Add a Travis CI Build badge. 2015-10-31 17:48:30 +01:00
Dimitri Fontaine
20ce095384 Merge pull request #305 from gitter-badger/gitter-badge
Add a Gitter chat badge to README.md
2015-10-31 17:46:20 +01:00
The Gitter Badger
56a60db146 Add Gitter badge 2015-10-31 16:40:52 +00:00
Dimitri Fontaine
478d24f865 Fix root-dir initialization for ccl, see #303.
When using Clozure Common Lisp apparently a :absolute directory
component for make-pathname is supposed to contain a single path
component, fix by using parse-native-namestring instead.

In case it's needed, the following spelling seems portable enough:

  CL-USER> (uiop:merge-pathnames*
            (uiop:make-pathname* :directory '(:relative "pgloader"))
            (uiop:make-pathname* :directory '(:absolute "tmp")))
  #P"/tmp/pgloader/"
2015-10-24 22:22:24 +02:00
Dimitri Fontaine
4df3167da1 Introduce another worker thread: transformers.
We used to have a reader and a writer cooperating concurrently into
loading the data from the source to PostgreSQL. The tranformation of the
data was then the responsibility of the reader thread.

Measurements showed that the PostgreSQL processes were mostly idle,
waiting for the reader to produce data fast enough.

In this patch we introduce a third worker thread that is responsible for
processing the raw data into pre-formatted batches, allowing the reader
to focus on extracting the data only. We now have two lparallel queues
involved in the processing, the raw queue contains the vectors of raw
data directly, and the processed-queue contains batches of properly
encoded strings for the COPY text protocol.

On the test laptop the performance gain isn't noticeable yet, it might
be that we need much larger data sets to see a gain here. At least the
setup isn't detrimental to performances on smaller data sets.

Next improvements are going to allow more features: specialized batch
retry thread and parallel table copy scheduling for database sources.
Let's also continue caring about performances and play with having
several worker and writer threads for each reader. In later patches.

And some day, too, we will need to make the number of workers a user
defined variable rather than something hard coded as today. It's on the
todo list, meanwhile, dear user, consider changing the (make-kernel 6)
into (make-kernel 12) or something else in src/sources/mysql/mysql.lisp,
and consider enlighting me with whatever it is you find by doing so!
2015-10-23 00:17:58 +02:00
Dimitri Fontaine
4f3b3472a2 Cleanup and timing display improvements.
Have each thread publish its own start-time so that the main thread may
compute time spent in source and target processing, in order to fix the
crude hack of taking (max read-time write-time) in the total time column
of the summary.

We still have some strange artefacts here: we consider that the full
processing time is bound to the writer thread (:target), because it
needs to have the reader done already to be able to COPY the last
batch... but in testing I've seen some :source timings higher than the
:target ones...

Let's solve problems one at a time tho, I guess multi-threading and
accurate wall clock times aren't to be expected to mix and match that
easily anyway (multi cores, single RTC and all that).
2015-10-22 22:36:40 +02:00
Dimitri Fontaine
933d1c8d6b Add test case for #302. 2015-10-22 22:35:32 +02:00
Dimitri Fontaine
88bb4e0b95 Register "auto_increment" as a SQLite noise word.
As seen in #302 it's possible to define a SQLite column of type "integer
auto_increment". In my testing tho, it doesn't mean a thing. Worse than
that, apparently when an integer column is created that is also used as
the primary key of the table, the notation "integer auto_increment
primary key" disables the rowid behavior that is certainly expected.

Let's not yet mark the bug as fixed as I suppose we will have to do
something about this rowid mess. Thanks again SQLite.
2015-10-22 21:55:34 +02:00
Dimitri Fontaine
f654f10d0d Fix CSV summary format string.
This got broken when adding read/write separate stats in the reporting.
2015-10-20 23:56:20 +02:00
Dimitri Fontaine
2ed14d595d Trick the reporting to show non-zero timings.
When calling lparallel:receive-results from the main threads we loose
the ability to measure proper per-table processing times, because it all
happens in parallel and the main threads seems to receive events in the
same millisecond as when the worker is started, meaning it's all 0.0s.

So when we don't have "secs" stats, pick the greatest of read or write
time, which we do have from the worker threads themselves.

The number are still wrong, but less so than the "0.0s" displayed before
this patch.
2015-10-20 23:39:15 +02:00
Dimitri Fontaine
69b8b0305d Fix reporting in case of missing values. 2015-10-20 23:24:03 +02:00
Dimitri Fontaine
1fb69b2039 Retry connecting to PostgreSQL in some cases.
Now that we can setup many concurrent threads working against the
PostgreSQL database, and before we open the number of workers to our
users, install an heuristic to manage the PostgreSQL error classes “too
many connection” and “configuration limit exceeded” so that pgloader
waits for some time (*retry-connect-delay*) then tries connecting again.

It's quite simplistic but should cover lots of border-line cases way
more nicely than just throwing the interactive debugger at the end user.
2015-10-20 23:15:05 +02:00
Dimitri Fontaine
633067a0fd Allow more parallelism in database migrations.
The newly added statistics are showing that read+write times are not
enough to explain how long we wait for the data copying, so it must be
the workers setup rather than the workers themselves.

From there, let lparallel work its magic in scheduling the work we do in
parallel in pgloader: rather than doing blocking receive-result calls
for each table, only receive-result at the end of the whole
copy-database processing.

On test data here on the laptop we go from 6s to 3s to migrate the
sakila database from MySQL to PostgreSQL: that's because we have lots of
very small tables, so the cost of waiting after each COPY added up quite
quickly.

In passing, stop sharing the same connection object in between parallel
workers that used to be controlled active in-sequence, see the new API
clone-connection (which takes over new-pgsql-connection).
2015-10-20 22:15:55 +02:00
Dimitri Fontaine
187565b181 Add read/write separate stats.
Add metrics to devise where the time is spent in current pgloader code
so that it's possible to then optimize away the batch processing as we
do it today.

Given the following extract of the measures, it seems that doing the
data transformations in the reader thread isn't so bright an idea. More
to come.

          table name         total time       read     write
   -----------------     --------------  --------- ---------
             extract             2.014s
         before load             0.050s
               fetch             0.000s
   -----------------     --------------  --------- ---------
    geolite.location            16.090s    15.933s    5.732s
      geolite.blocks            28.896s    28.795s    5.312s
   -----------------     --------------  --------- ---------
          after load            37.772s
   -----------------     --------------  --------- ---------
   Total import time          1m25.082s    44.728s   11.044s
2015-10-11 21:35:19 +02:00
Dimitri Fontaine
c3726ce07a Refrain from starting the logger twice in load-data. 2015-10-05 21:27:48 +02:00
Dimitri Fontaine
41e9eebd54 Rationalize common generic API implementation.
When devising the common API, the first step has been to implement
specific methods for each generic function of the protocol. It now
appears that in some cases we don't need the extra level of flexibility:
each change of the API has been systematically reported to all the
specific methods, so just use a single generic definition where possible.

In particular, introduce new intermediate class for COPY subclasses
allowing to share more common code in the methods implementation, rather
than having to copy/paste and maintain several versions of the same
code.

It would be good to be able to centralize more code for the database
sources and how they are organized around metadata/import-data/complete
schema, but it doesn't look obvious how to do it just now.
2015-10-05 21:25:21 +02:00
Dimitri Fontaine
0d9c2119b1 Send one update-stats message per batch.
Update the stats used to be a quite simple incf and doing it once per
read row was good enough, but now that it involves sending a message to
the monitor thread let's only send a message per batch, reducing the
communication load here.
2015-10-05 18:04:08 +02:00
Dimitri Fontaine
6bf26c52ec Implement a TimeZone option for IXF loading.
The local-time:encode-timestamp function takes a default timezone and it
is necessary to have control over it when loading from pgloader. Hence,
add a timezone option to the IXF option list, that is now explicit and
local to the IXF parser rather than shared with the DBF option list.
2015-10-05 16:46:15 +02:00
Dimitri Fontaine
7b9b8a32e7 Move sexp parsing into its own file.
After all, it's shared between the CSV command parsing and the Cast
Rules parsing. src/parsers/command-csv.lisp still contains lots of
facilities shared between the file based sources, will need another
series of splits.
2015-10-05 11:39:44 +02:00
Dimitri Fontaine
f1df6ee89a Forgot a new file, thanks Travis.
Someday I will learn not to code that late at night.
2015-10-05 02:21:59 +02:00
Dimitri Fontaine
c880f86bb6 Fix user defined casting rules.
Commit 598c860cf5013d52399c07c7f18f1daf0227d305 broke user defined
casting rules by interning "precision" and "scale" in the
pgloader.user-symbols package: those symbols need to be found in the
pgloader.transforms package instead.

Luckily enough the infrastructure to do that was already in place for
cl:nil.
2015-10-05 02:11:23 +02:00
Dimitri Fontaine
96a33de084 Review the stats and reporting code organisation.
In order to later be able to have more worker threads sharing the
load (multiple readers and/or writers, maybe more specialized threads
too), have all the stats be managed centrally by a single thread. We
already have a "monitor" thread that get passed log messages so that the
output buffer is not subject to race conditions, extend its use to also
deal with statistics messages.

In the current code, we send a message each time we read a row. In some
future commits we should probably reduce the messaging here to something
like one message per batch in the common case.

Also, as a nice side effect of the code simplification and refactoring
this fixes #283 wherein the before/after sections of individual CSV
files within an ARCHIVE command where not counted in the reporting.
2015-10-05 01:46:29 +02:00
Dimitri Fontaine
bc9d2d8962 Monitor events are now structures.
This allows to use typecase to dispatch events in the main loop and
avoid using destructuring-bind, as we now have properly type events.
2015-10-04 18:55:10 +02:00
Dimitri Fontaine
38a725fe74 Add support for IXF blobs to bytea.
A quick test shows it should work, so push that too.
2015-09-24 17:47:57 +02:00
Dimitri Fontaine
bd44e6423b Add support for IXF CLOB data type.
In passing, fix the condition message to read the unknown IXF data types
as decimal numbers (rather than hexadecimal) as they are documented that
way in the IBM reference documentation.
2015-09-23 23:15:48 +02:00
Dimitri Fontaine
598c860cf5 Improve user code parsing, fix #297.
To be able to use "t" (or "nil") as a column name, pgloader needs to be
able to generate lisp code where those symbols are available. It's
simple enough in that a Common Lisp package that doesn't :use :cl
fullfills the condition, so intern user symbols in a specially crafted
package that doesn't :use :cl.

Now, we still need to be able to run transformation code that is using
the :cl package symbols and the pgloader.transforms functions too. In
this commit we introduce a heuristic to pick symbols either as functions
from pgloader.transforms or anything else in pgloader.user-symbols.

And so that user code may use NIL too, we provide an override mechanism
to the intern-symbol heuristic and use it only when parsing user code,
not when producing Common Lisp code from the parsed load command.
2015-09-21 13:23:21 +02:00
Dimitri Fontaine
fe812061c4 Update README file for build instructions, fix #296.
When building from source you should really build from current's HEAD in
git master branch...

In passing, comment out the --self-update paragraph as it's know to be
broken unless you still have all the source dependencies at the right
place for ASDF to find them... making the feature developer only.
2015-09-17 20:27:55 +02:00
Dimitri Fontaine
f6aa8210b9 Load libsybdb even in --dry-run mode, fix #295. 2015-09-16 20:39:42 +02:00
Dimitri Fontaine
78c6bf097a Fix the build again.
Once more did I change a test file data and forgot to commit the changes
to the expected file of the regression test.
2015-09-12 00:40:15 +02:00
Dimitri Fontaine
98f18c4877 Improve CSV date format, fix #293.
The date format wouldn't allow using colon (:) in the noise parts of it,
and would also insist that milliseconds should be on 4 digits and micro
seconds on 6 digits. Allow for "ragged" input and take however many
digits we actually find in the input.
2015-09-12 00:35:14 +02:00
Dimitri Fontaine
a195ac6dd4 Adapt to the new cl-ixf API.
This allows fixing bugs in processing the IXF files, which pgloader
directly benefits from.
2015-09-12 00:19:02 +02:00
Dimitri Fontaine
a0dc59624c Fix schema qualified table names usage again.
When the list of columns of the PostgreSQL target table isn't given in
the load command, pgloader will happily query the system catalogs to get
that information. The list-columns query didn't get the memo about the
qualified table name format and the with-schema macro... fix #288.
2015-09-11 11:53:28 +02:00
Dimitri Fontaine
e054eb3838 Travis: set PGTZ in regress.sh
The TimeZone parameter should be set both for input and for output in
order to match our expected result file. Let's try to set PGTZ in the
shell environment...
2015-09-07 20:24:00 +02:00
Dimitri Fontaine
60d58a96b8 Travis: let's try to force client timezone.
The cvs-parse-date test is failing on Travis because the server up there
in the Cloud isn't using the same timezone as my local machine. Let's
just force the timezone in the SET clause...
2015-09-07 20:00:05 +02:00
Dimitri Fontaine
3f539b7384 Travis: update expected output file.
Forgot to update the expected output file in the previous commit, which
Travis is rightfully complaining about...
2015-09-07 17:47:03 +02:00
Dimitri Fontaine
04b2779239 Allow date format parsing to support time.
A useful use case for date parsing at tine input level is to parse
time (hour, minutes, seconds) rather than a full date (timestamp).
Improve the code so that it's possible to use the date format facility
even when the data field lacks the year/month/day information.

Fix #288.
2015-09-07 17:05:10 +02:00
Dimitri Fontaine
bd50ba45ea Make it easier to contact me from the moral license. 2015-09-06 22:00:02 +02:00
352 changed files with 29131 additions and 22969 deletions

5
.dockerignore Normal file
View File

@ -0,0 +1,5 @@
.git
.vagrant
build
Dockerfile
Dockerfile.ccl

1
.gitattributes vendored Normal file
View File

@ -0,0 +1 @@
test/**/*.sql linguist-vendored

1
.github/FUNDING.yml vendored Normal file
View File

@ -0,0 +1 @@
github: dimitri

33
.github/workflows/debian-ci.yml vendored Normal file
View File

@ -0,0 +1,33 @@
name: Debian Autopkgtest
on:
pull_request: {}
push: {}
jobs:
debian-build:
# focal is too old, use jammy
runs-on: ubuntu-22.04
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Install postgresql-common
run: sudo apt-get install -y postgresql-common
- name: Install pgapt repository
run: sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh -y
- name: Install build-dependencies
run: sudo apt-get build-dep -y .
- name: Build pgloader.deb
run: dpkg-buildpackage --no-sign --buildinfo-option=--version -b
- name: Install autopkgtest
run: sudo apt-get install -y autopkgtest
- name: Autopkgtest
run: sudo autopkgtest ./ ../pgloader_*_amd64.deb -- null

100
.github/workflows/docker-publish.yml vendored Normal file
View File

@ -0,0 +1,100 @@
name: Docker
# This workflow uses actions that are not certified by GitHub.
# They are provided by a third-party and are governed by
# separate terms of service, privacy policy, and support
# documentation.
on:
push:
branches: [ master ]
# Publish semver tags as releases.
tags: [ 'v*.*.*' ]
pull_request:
branches: [ master ]
env:
# Use docker.io for Docker Hub if empty
REGISTRY: ghcr.io
# github.repository as <account>/<repo>
IMAGE_NAME: ${{ github.repository }}
jobs:
build:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
# This is used to complete the identity challenge
# with sigstore/fulcio when running outside of PRs.
id-token: write
steps:
- name: Checkout repository
uses: actions/checkout@v2
# Install the cosign tool (not used on PR, still installed)
# https://github.com/sigstore/cosign-installer
- name: Install cosign
uses: sigstore/cosign-installer@main
with:
cosign-release: 'v2.2.3'
- name: Check cosign version
run: cosign version
# Workaround: https://github.com/docker/build-push-action/issues/461
- name: Setup Docker buildx
uses: docker/setup-buildx-action@79abd3f86f79a9d68a23c75a09a9a85889262adf
# Login against a Docker registry except on PR
# https://github.com/docker/login-action
- name: Log into registry ${{ env.REGISTRY }}
if: github.event_name != 'pull_request'
uses: docker/login-action@28218f9b04b4f3f62068d7b6ce6ca5b26e35336c
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
# Extract metadata (tags, labels) for Docker
# https://github.com/docker/metadata-action
- name: Extract Docker metadata
id: meta
uses: docker/metadata-action@v3.6.2
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=raw,value=latest,enable=${{ endsWith(github.ref, github.event.repository.default_branch) }}
type=semver,pattern={{version}}
# Build and push Docker image with Buildx (don't push on PR)
# https://github.com/docker/build-push-action
- name: Build and push Docker image
id: build-and-push
uses: docker/build-push-action@ad44023a93711e3deb337508980b4b5e9bcdc5dc
with:
context: .
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
# Sign the resulting Docker image digest except on PRs.
# This will only write to the public Rekor transparency log when the Docker
# repository is public to avoid leaking data. If you would like to publish
# transparency data even for private images, pass --force to cosign below.
# https://github.com/sigstore/cosign
- name: Sign the published Docker image
if: ${{ github.event_name != 'pull_request' }}
# This step uses the identity token to provision an ephemeral certificate
# against the sigstore community Fulcio instance.
run: cosign sign --yes ${TAGS}
env:
TAGS: ${{ steps.meta.outputs.tags }}
# should use @${{ steps.build-and-push.outputs.digest }}
# but that leads to "entity not found in registry"
COSIGN_EXPERIMENTAL: "true"

5
.gitignore vendored
View File

@ -2,6 +2,7 @@
local-data
pgloader.html
pgloader.pdf
debian/home/
debian/pgloader.debhelper.log
debian/pgloader.substvars
debian/pgloader/
@ -14,3 +15,7 @@ web/howto/mysql.html
web/howto/pgloader.1.html
web/howto/quickstart.html
web/howto/sqlite.html
.DS_Store
system-index.txt
buildapp.log
docs/_build

11
.readthedocs.yaml Normal file
View File

@ -0,0 +1,11 @@
version: 2
# Build from the docs/ directory with Sphinx
sphinx:
configuration: docs/conf.py
# Explicitly set the version of Python and its requirements
python:
version: 3.7
install:
- requirements: docs/requirements.txt

72
.travis.sh Executable file
View File

@ -0,0 +1,72 @@
#!/bin/bash
set -eu
lisp_install() {
case "$LISP" in
ccl)
ccl_checksum='08e885e8c2bb6e4abd42b8e8e2b60f257c6929eb34b8ec87ca1ecf848fac6d70'
ccl_version='1.11'
remote_file "/tmp/ccl-${ccl_version}.tgz" "https://github.com/Clozure/ccl/releases/download/v${ccl_version}/ccl-${ccl_version}-linuxx86.tar.gz" "$ccl_checksum"
tar --file "/tmp/ccl-${ccl_version}.tgz" --extract --exclude='.svn' --directory '/tmp'
sudo mv --no-target-directory '/tmp/ccl' '/usr/local/src/ccl'
sudo ln --no-dereference --force --symbolic "/usr/local/src/ccl/scripts/ccl64" '/usr/local/bin/ccl'
;;
sbcl)
sbcl_checksum='22ccd9409b2ea16d4be69235c5ad5fde833452955cb24483815312d3b1d7401c'
sbcl_version='1.5.2'
remote_file "/tmp/sbcl-${sbcl_version}.tgz" "http://prdownloads.sourceforge.net/sbcl/sbcl-${sbcl_version}-x86-64-linux-binary.tar.bz2" "$sbcl_checksum"
tar --file "/tmp/sbcl-${sbcl_version}.tgz" --extract --directory '/tmp'
( cd "/tmp/sbcl-${sbcl_version}-x86-64-linux" && sudo ./install.sh )
;;
*)
echo "Unrecognized Lisp: '$LISP'"
exit 1
;;
esac
}
pgdg_repositories() {
local sourcelist='sources.list.d/pgdg.list'
sudo tee "/etc/apt/$sourcelist" <<-repositories
deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main
deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg-testing main 10
repositories
sudo apt-key adv --keyserver 'hkp://ha.pool.sks-keyservers.net' --recv-keys 'ACCC4CF8'
sudo apt-get -o Dir::Etc::sourcelist="$sourcelist" -o Dir::Etc::sourceparts='-' -o APT::Get::List-Cleanup='0' update
}
postgresql_install() {
if [ -z "${PGVERSION:-}" ]; then
echo 'PGVERSION environment variable not set.';
exit 1
fi
xargs sudo apt-get -y install <<-packages
postgresql-${PGVERSION}
postgresql-${PGVERSION}-ip4r
packages
sudo tee /etc/postgresql/${PGVERSION}/main/pg_hba.conf > /dev/null <<-config
local all all trust
host all all 127.0.0.1/32 trust
config
sudo service postgresql restart
}
remote_file() {
local target="$1" origin="$2" sum="$3"
local check="shasum --algorithm $(( 4 * ${#sum} )) --check"
local filesum="$sum $target"
curl --location --output "$target" "$origin" && $check <<< "$filesum"
}
$1

View File

@ -1,22 +1,38 @@
language: common-lisp
language: shell
os: linux
dist: xenial
env:
matrix:
- LISP=ccl PGVERSION=9.6
- LISP=ccl PGVERSION=10
- LISP=ccl PGVERSION=11
- LISP=ccl PGVERSION=12
- LISP=ccl PGVERSION=13
- LISP=sbcl PGVERSION=9.6
- LISP=sbcl PGVERSION=10
- LISP=sbcl PGVERSION=11
- LISP=sbcl PGVERSION=12
- LISP=sbcl PGVERSION=13
install:
- wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
- echo "deb http://apt.postgresql.org/pub/repos/apt/ trusty-pgdg main" | sudo tee /etc/apt/sources.list.d/pgdg.list
- sudo apt-get update
- sudo DEBIAN_FRONTEND=noninteractive apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" dist-upgrade
- wget http://pgsql.tapoueh.org/sbcl/sbcl_1.2.0-1_amd64.deb
- sudo dpkg -i sbcl_1.2.0-1_amd64.deb
- sudo apt-get install -f
- sudo apt-get install sbcl unzip libsqlite3-dev gawk freetds-dev
- sudo apt-get install postgresql-9.1-ip4r
- ./.travis.sh lisp_install
- ./.travis.sh pgdg_repositories
- ./.travis.sh postgresql_install
- sudo apt-get install -y unzip libsqlite3-dev gawk freetds-dev
before_script:
- sudo -u postgres createuser -S -R -D -E -l pgloader
- sudo -u postgres createdb -E UTF8 -O pgloader -hlocalhost pgloader
- sudo -u postgres psql -h localhost -d pgloader -c "create extension ip4r;"
- PGUSER=postgres createuser -S -R -D -E -l pgloader
- PGUSER=postgres createdb -E UTF8 -O pgloader pgloader
- PGUSER=postgres psql -d pgloader -c "create extension ip4r;"
- PGUSER=pgloader psql -d pgloader -c "create schema expected;"
- PGUSER=pgloader psql -d pgloader -c "create schema err;"
- make --version
- make
- make "CL=$LISP" clones save
script:
- PGUSER=pgloader make check
- PGUSER=pgloader make "CL=$LISP" check-saved
notifications:
email:
- dim@tapoueh.org
- dim@tapoueh.org

53
Dockerfile Normal file
View File

@ -0,0 +1,53 @@
FROM debian:bookworm-slim AS builder
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
bzip2 \
ca-certificates \
curl \
freetds-dev \
gawk \
git \
libsqlite3-dev \
libssl3 \
libzip-dev \
make \
openssl \
patch \
sbcl \
time \
unzip \
wget \
cl-ironclad \
cl-babel \
&& rm -rf /var/lib/apt/lists/*
COPY ./ /opt/src/pgloader
ARG DYNSIZE=16384
RUN mkdir -p /opt/src/pgloader/build/bin \
&& cd /opt/src/pgloader \
&& make DYNSIZE=$DYNSIZE clones save
FROM debian:bookworm-slim
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
ca-certificates \
curl \
freetds-dev \
gawk \
libsqlite3-dev \
libzip-dev \
make \
sbcl \
unzip \
&& update-ca-certificates \
&& rm -rf /var/lib/apt/lists/*
COPY --from=builder /opt/src/pgloader/build/bin/pgloader /usr/local/bin
ADD conf/freetds.conf /etc/freetds/freetds.conf
LABEL maintainer="Dimitri Fontaine <dim@tapoueh.org>"

53
Dockerfile.ccl Normal file
View File

@ -0,0 +1,53 @@
FROM debian:bookworm-slim as builder
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
bzip2 \
ca-certificates \
curl \
freetds-dev \
gawk \
git \
libsqlite3-dev \
libssl3 \
libzip-dev \
make \
openssl \
patch \
time \
unzip \
wget \
cl-ironclad \
cl-babel \
&& rm -rf /var/lib/apt/lists/*
RUN curl -SL https://github.com/Clozure/ccl/releases/download/v1.12/ccl-1.12-linuxx86.tar.gz \
| tar xz -C /usr/local/src/ \
&& mv /usr/local/src/ccl/scripts/ccl64 /usr/local/bin/ccl
COPY ./ /opt/src/pgloader
ARG DYNSIZE=256
RUN mkdir -p /opt/src/pgloader/build/bin \
&& cd /opt/src/pgloader \
&& make CL=ccl DYNSIZE=$DYNSIZE clones save
FROM debian:bookworm-slim
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
curl \
freetds-dev \
gawk \
libsqlite3-dev \
libssl3 \
libzip-dev \
make \
sbcl \
unzip \
&& rm -rf /var/lib/apt/lists/*
COPY --from=builder /opt/src/pgloader/build/bin/pgloader /usr/local/bin
LABEL maintainer="Dimitri Fontaine <dim@tapoueh.org>"

View File

@ -2,7 +2,7 @@
pgloader version 3.x is written in Common Lisp.
## The lisp parts
## Dependencies
The steps depend on the OS you are currently using.
@ -23,7 +23,49 @@ You will note in particular:
We need a recent enough [SBCL](http://sbcl.org/) version and that means
backporting the one found in `sid` rather than using the very old one found
in current *stable* debian release. See `bootstrap-debian.sh` for details
about how to backport a recent enough SBCL here (1.1.14 or newer).
about how to backport a recent enough SBCL here (1.2.5 or newer).
### Redhat / CentOS
To build and install pgloader the Steel Bank Common Lisp package (sbcl) from EPEL,
and the freetds packages are required.
With RHEL/CentOS 6, if the packaged version of sbcl isn't >=1.3.6, you'll need
to build it from source.
It is recommended to build the RPM yourself, see below, to ensure that all installed
files are properly tracked and that you can safely update to newer versions of
pgloader as they're released.
To do an adhoc build and install run `boostrap-centos.sh` for CentOS 6 or
`bootstrap-centos7.sh` for CentOS 7 to install the required dependencies.
[Build pgloader](INSTALL.md#building-pgloader).
#### rpmbuild
The spec file in the root of the pgloader repository can be used to build your
own RPM. For production deployments it is recommended that you build this RPM on
a dedicated build box and then copy the RPM to your production environment for
use; it is considered bad practice to have compilers and build tools present in
production environments.
1. Install the [EPEL repo](https://fedoraproject.org/wiki/EPEL#Quickstart).
1. Install rpmbuild dependencies:
sudo yum -y install yum-utils rpmdevtools @"Development Tools"
1. Install pgloader build dependencies:
sudo yum-builddep pgloader.spec
1. Download pgloader source:
spectool -g -R pgloader.spec
1. Build the source and binary RPMs (see `rpmbuild --help` for other build options):
rpmbuild -ba pgloader.spec
### Mac OS X
@ -60,9 +102,9 @@ Now that the dependences are installed, just type make.
make
If using Mac OS X, and depending on how you did install `SBCL` and which
version you have (the brew default did change recently), you might need to
ask the Makefile to refrain from trying to compress your binary image:
If your `SBCL` supports core compression, the make process will use it
to generate a smaller binary. To force disabling core compression, you
may use:
make COMPRESS_CORE=no
@ -92,65 +134,12 @@ Now the `./build/bin/pgloader` that you get only uses 1GB.
## Building a docker image
We start with a `debian` image:
A `Dockerfile` is provided, to use it:
docker run -it debian bash
docker build -t pgloader:debian .
docker run --rm --name pgloader pgloader:debian bash -c "pgloader --version"
And then run the following steps:
# apt-get update
# apt-get install -y wget curl make git bzip2 time libzip-dev openssl-dev
# apt-get install -y patch unzip libsqlite3-dev gawk freetds-dev
# useradd -m -s /bin/bash dim
# su - dim
Install a binary version on SBCL, which unfortunately has no support for
core compression, so only use it to build another SBCL version from sources
with proper options:
$ mkdir sbcl
$ cd sbcl
$ wget http://prdownloads.sourceforge.net/sbcl/sbcl-1.2.6-x86-64-linux-binary.tar.bz2
$ wget http://prdownloads.sourceforge.net/sbcl/sbcl-1.2.6-source.tar.bz2?download
$ mv sbcl-1.2.6-source.tar.bz2\?download sbcl-1.2.6-source.tar.bz2
$ tar xf sbcl-1.2.6-x86-64-linux-binary.tar.bz2
$ tar xf sbcl-1.2.6-source.tar.bz2
$ exit
Install SBCL as root
# cd /home/dim/sbcl/sbcl-1.2.6-x86-64-linux
# bash install.sh
Now back as the unprivileged user (dim) to compile SBCL from sources:
# su - dim
$ cd sbcl/sbcl-1.2.6
$ sh make.sh --with-sb-core-compression --with-sb-thread > build.out 2>&1
$ exit
And install the newly compiled SBCL as root:
# cd /home/dim/sbcl/sbcl-1.2.6
# sh install.sh
Now build pgloader from sources:
# su - dim
$ git clone https://github.com/dimitri/pgloader
$ cd pgloader
$ make
$ ./build/bin/pgloader --help
$ exit
Now install pgloader in `/usr/local/bin` to make it easy to use:
# cp /home/dim/pgloader/build/bin/pgloader /usr/local/bin
# pgloader --version
Commit the docker instance and push it, from the host:
$ docker login
$ docker ps -l
$ docker commit <id> dimitri/pgloader-3.1.cd52654
$ docker push dimitri/pgloader-3.1.cd52654
The `build` step install build dependencies in a debian jessie container,
then `git clone` and build `pgloader` in `/opt/src/pgloader` and finally
copy the resulting binary image in `/usr/local/bin/pgloader` so that it's
easily available.

96
ISSUE_TEMPLATE.md Normal file
View File

@ -0,0 +1,96 @@
Thanks for contributing to [pgloader](https://pgloader.io) by reporting an
issue! Reporting an issue is the only way we can solve problems, fix bugs,
and improve both the software and its user experience in general.
The best bug reports follow those 3 simple steps:
1. show what you did,
2. show the result you got,
3. explain how the result is not what you expected.
In the case of pgloader, here's the information I will need to read in your
bug report. Having all of this is a big help, and often means the bug you
reported can be fixed very efficiently as soon as I get to it.
Please provide the following information:
<!-- delete text above this line -->
- [ ] pgloader --version
```
<fill pgloader version here>
```
- [ ] did you test a fresh compile from the source tree?
Compiling pgloader from sources is documented in the
[README](https://github.com/dimitri/pgloader#build-from-sources), it's
easy to do, and if patches are to be made to fix your bug, you're going
to have to build from sources to get the fix anyway…
- [ ] did you search for other similar issues?
- [ ] how can I reproduce the bug?
Incude a self-contained pgloader command file.
If you're loading from a database, consider attaching a database dump to
your issue. For MySQL, use `mysqldump`. For SQLite, just send over your
source file, that's easy. Maybe be the one with your production data, of
course, the one with just the sample of data that allows me to reproduce
your bug.
When using a proprietary database system as a source, consider creating
a sample database on some Cloud service or somewhere you can then give
me access to, and see my email address on my GitHub profile to send me
the credentials. Still open a public issue for tracking and as
documentation for other users.
```
--
-- EDIT THIS FILE TO MATCH YOUR BUG REPORT
--
LOAD CSV
FROM INLINE with encoding 'ascii'
INTO postgresql:///pgloader
TARGET TABLE jordane
WITH truncate,
fields terminated by '|',
fields not enclosed,
fields escaped by backslash-quote
SET work_mem to '128MB',
standard_conforming_strings to 'on'
BEFORE LOAD DO
$$ drop table if exists jordane; $$,
$$ CREATE TABLE jordane
(
"NOM" character(20),
"PRENOM" character(20)
)
$$;
BORDET|Jordane
BORDET|Audrey
LASTNAME|"opening quote
BONNIER|testprenombe~aucouptroplong
JOURDAIN|héhé¶
```
- [ ] pgloader output you obtain
```
PASTE HERE THE OUTPUT OF THE PGLOADER COMMAND
```
- [ ] data that is being loaded, if relevant
```
PASTE HERE THE DATA THAT HAS BEEN LOADED
```
- [ ] How the data is different from what you expected, if relevant

9
LICENSE Normal file
View File

@ -0,0 +1,9 @@
pgloader
Copyright (c) 2005-2017, The PostgreSQL Global Development Group
Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies.
IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

130
Makefile
View File

@ -1,17 +1,19 @@
# pgloader build tool
APP_NAME = pgloader
VERSION = 3.2.1.preview
VERSION = 3.6.10
# use either sbcl or ccl
CL = sbcl
# default to 4096 MB of RAM size in the image
DYNSIZE = 4096
# default to 16 GB of RAM size in the image
DYNSIZE = 16384
LISP_SRC = $(wildcard src/*lisp) \
$(wildcard src/monkey/*lisp) \
$(wildcard src/utils/*lisp) \
$(wildcard src/load/*lisp) \
$(wildcard src/parsers/*lisp) \
$(wildcard src/pg-copy/*lisp) \
$(wildcard src/pgsql/*lisp) \
$(wildcard src/sources/*lisp) \
pgloader.asd
@ -22,6 +24,12 @@ QLDIR = $(BUILDDIR)/quicklisp
MANIFEST = $(BUILDDIR)/manifest.ql
LATEST = $(BUILDDIR)/pgloader-latest.tgz
BUNDLEDIST = 2022-02-20
BUNDLENAME = pgloader-bundle-$(VERSION)
BUNDLEDIR = $(BUILDDIR)/bundle/$(BUNDLENAME)
BUNDLE = $(BUILDDIR)/$(BUNDLENAME).tgz
BUNDLETESTD= $(BUILDDIR)/bundle/test
ifeq ($(OS),Windows_NT)
EXE = .exe
COMPRESS_CORE = no
@ -35,27 +43,26 @@ BUILDAPP_CCL = $(BUILDDIR)/bin/buildapp.ccl$(EXE)
BUILDAPP_SBCL = $(BUILDDIR)/bin/buildapp.sbcl$(EXE)
ifeq ($(CL),sbcl)
BUILDAPP = $(BUILDAPP_SBCL)
CL_OPTS = --no-sysinit --no-userinit
BUILDAPP = $(BUILDAPP_SBCL)
BUILDAPP_OPTS = --require sb-posix \
--require sb-bsd-sockets \
--require sb-rotate-byte
CL_OPTS = --noinform --no-sysinit --no-userinit
else
BUILDAPP = $(BUILDAPP_CCL)
CL_OPTS = --no-init
endif
COMPRESS_CORE ?= yes
ifeq ($(CL),sbcl)
COMPRESS_CORE ?= $(shell $(CL) --noinform \
--quit \
--eval '(when (member :sb-core-compression cl:*features*) (write-string "yes"))')
endif
# note: on Windows_NT, we never core-compress; see above.
ifeq ($(COMPRESS_CORE),yes)
COMPRESS_CORE_OPT = --compress-core
else
COMPRESS_CORE_OPT =
endif
endif
ifeq ($(CL),sbcl)
BUILDAPP_OPTS = --require sb-posix \
--require sb-bsd-sockets \
--require sb-rotate-byte
endif
DEBUILD_ROOT = /tmp/pgloader
@ -63,25 +70,21 @@ DEBUILD_ROOT = /tmp/pgloader
all: $(PGLOADER)
clean:
rm -rf $(LIBS) $(QLDIR) $(MANIFEST) $(BUILDAPP) $(PGLOADER)
docs:
ronn -roff pgloader.1.md
rm -rf $(LIBS) $(QLDIR) $(MANIFEST) $(BUILDAPP) $(PGLOADER) \
buildapp.log build/bundle/* build/pgloader-bundle* build/quicklisp.lisp docs/_build
$(MAKE) -C test clean
$(QLDIR)/local-projects/qmynd:
git clone https://github.com/qitab/qmynd.git $@
git clone --depth 1 https://github.com/qitab/qmynd.git $@
$(QLDIR)/local-projects/cl-ixf:
git clone https://github.com/dimitri/cl-ixf.git $@
git clone --depth 1 https://github.com/dimitri/cl-ixf.git $@
$(QLDIR)/local-projects/cl-db3:
git clone https://github.com/dimitri/cl-db3.git $@
git clone --depth 1 https://github.com/dimitri/cl-db3.git $@
$(QLDIR)/local-projects/cl-csv:
git clone https://github.com/AccelerationNet/cl-csv.git $@
$(QLDIR)/local-projects/esrap:
git clone -b wip-better-errors https://github.com/scymtym/esrap.git $@
git clone --depth 1 https://github.com/AccelerationNet/cl-csv.git $@
$(QLDIR)/setup.lisp:
mkdir -p $(BUILDDIR)
@ -96,13 +99,14 @@ quicklisp: $(QLDIR)/setup.lisp ;
clones: $(QLDIR)/local-projects/cl-ixf \
$(QLDIR)/local-projects/cl-db3 \
$(QLDIR)/local-projects/cl-csv \
$(QLDIR)/local-projects/qmynd \
$(QLDIR)/local-projects/esrap ;
$(QLDIR)/local-projects/qmynd ;
$(LIBS): $(QLDIR)/setup.lisp clones
$(CL) $(CL_OPTS) --load $(QLDIR)/setup.lisp \
--eval '(push "$(PWD)/" asdf:*central-registry*)' \
--eval '(ql:quickload "pgloader")' \
$(LIBS): $(QLDIR)/setup.lisp
$(CL) $(CL_OPTS) --load $(QLDIR)/setup.lisp \
--eval '(push :pgloader-image *features*)' \
--eval '(setf *print-circle* t *print-pretty* t)' \
--eval '(push "$(PWD)/" ql:*local-project-directories*)' \
--eval '(ql:quickload "pgloader")' \
--eval '(quit)'
touch $@
@ -141,8 +145,11 @@ $(PGLOADER): $(MANIFEST) $(BUILDAPP) $(LISP_SRC)
--manifest-file $(MANIFEST) \
--asdf-tree $(QLDIR)/dists \
--asdf-path . \
--load-system $(APP_NAME) \
--load-system cffi \
--load-system cl+ssl \
--load-system mssql \
--load src/hooks.lisp \
--load-system $(APP_NAME) \
--entry pgloader:main \
--dynamic-space-size $(DYNSIZE) \
$(COMPRESS_CORE_OPT) \
@ -161,20 +168,59 @@ pgloader-standalone:
--dynamic-space-size $(DYNSIZE) \
$(COMPRESS_CORE_OPT) \
--output $(PGLOADER)
test: $(PGLOADER)
$(MAKE) PGLOADER=$(realpath $(PGLOADER)) -C test regress
$(MAKE) PGLOADER=$(realpath $(PGLOADER)) CL=$(CL) -C test regress
save: ./src/save.lisp $(LISP_SRC)
$(CL) $(CL_OPTS) --load ./src/save.lisp
check-saved:
$(MAKE) PGLOADER=$(realpath $(PGLOADER)) CL=$(CL) -C test regress
clean-bundle:
rm -rf $(BUNDLEDIR)
rm -rf $(BUNDLETESTD)/$(BUNDLENAME)/*
$(BUNDLETESTD):
mkdir -p $@
$(BUNDLEDIR): quicklisp
mkdir -p $@
$(CL) $(CL_OPTS) --load $(QLDIR)/setup.lisp \
--eval '(defvar *bundle-dir* "$@")' \
--eval '(defvar *pwd* "$(PWD)/")' \
--eval '(defvar *ql-dist* "$(BUNDLEDIST)")' \
--load bundle/ql.lisp
$(BUNDLEDIR)/version.sexp: $(BUNDLEDIR)
echo "\"$(VERSION)\"" > $@
$(BUNDLE): $(BUNDLEDIR) $(BUNDLEDIR)/version.sexp
cp bundle/README.md $(BUNDLEDIR)
cp bundle/save.lisp $(BUNDLEDIR)
sed -e s/%VERSION%/$(VERSION)/ < bundle/Makefile > $(BUNDLEDIR)/Makefile
git archive --format=tar --prefix=pgloader-$(VERSION)/ master \
| tar -C $(BUNDLEDIR)/local-projects/ -xf -
make QLDIR=$(BUNDLEDIR) clones
tar -C build/bundle \
--exclude bin \
--exclude test/sqlite \
-czf $@ $(BUNDLENAME)
bundle: clean-bundle $(BUNDLE) $(BUNDLETESTD)
tar -C $(BUNDLETESTD) -xf $(BUNDLE)
make -C $(BUNDLETESTD)/$(BUNDLENAME)
$(BUNDLETESTD)/$(BUNDLENAME)/bin/pgloader --version
test-bundle:
$(MAKE) -C $(BUNDLEDIR) test
deb:
# intended for use on a debian system
mkdir -p $(DEBUILD_ROOT) && rm -rf $(DEBUILD_ROOT)/*
rsync -Ca --exclude 'build' \
--exclude '.vagrant' \
--exclude 'test/sqlite-chinook.load' \
--exclude 'test/sqlite' \
--exclude 'test/data/2013_Gaz_113CDs_national.txt' \
--exclude 'test/data/reg2013.dbf' \
--exclude 'test/data/sakila-db.zip' \
./ $(DEBUILD_ROOT)/
cd $(DEBUILD_ROOT) && make -f debian/rules orig
cd $(DEBUILD_ROOT) && debuild -us -uc -sa
@ -207,4 +253,4 @@ latest:
check: test ;
.PHONY: test pgloader-standalone
.PHONY: test pgloader-standalone docs bundle

201
README.md
View File

@ -1,5 +1,9 @@
# PGLoader
[![Build Status](https://travis-ci.org/dimitri/pgloader.svg?branch=master)](https://travis-ci.org/dimitri/pgloader)
[![Join the chat at https://gitter.im/dimitri/pgloader](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/dimitri/pgloader?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[![Read The Docs Status](https://readthedocs.org/projects/pgloader/badge/?version=latest&style=plastic)](http://pgloader.readthedocs.io/en/latest/)
pgloader is a data loading tool for PostgreSQL, using the `COPY` command.
Its main advantage over just using `COPY` or `\copy`, and over using a
@ -16,140 +20,81 @@ being the transformation of MySQL datestamps `0000-00-00` and
`0000-00-00 00:00:00` to PostgreSQL `NULL` value (because our calendar
never had a *year zero*).
## Versioning
## Documentation
pgloader version 1.x is quite old and was developed in `TCL`.
When faced with maintaining that code, the new emerging development
team (hi!) picked `python` instead because that made sense at the
time. So pgloader version 2.x was written in python.
Full documentation is available online, including manual pages of all the
pgloader sub-commands. Check out
[https://pgloader.readthedocs.io/](https://pgloader.readthedocs.io/en/latest/).
The current version of pgloader is the 3.x series, which is written in
[Common Lisp](http://cliki.net/) for better development flexibility,
runtime performance, and support of real threading.
The versioning is now following the Emacs model, where any X.0 release
number means you're using a development version (alpha, beta, or release
candidate). The next stable versions are going to be `3.1` then `3.2` etc.
When using a development snapshot rather than a released version the version
number includes the git hash (in its abbreviated form):
- `pgloader version "3.0.99"`
Release candidate 9 for pgloader version 3.1, with a *git tag* named
`v3.0.99` so that it's easy to checkout the same sources as the
released code.
- `pgloader version "3.0.fecae2c"`
Development snapshot again *git hash* `fecae2c`. It's possible to have
the same sources on another setup with using the git command `git
checkout fecae2c`.
- `pgloader version "3.1.0"`
Stable release.
## LICENCE
pgloader is available under [The PostgreSQL Licence](http://www.postgresql.org/about/licence/).
## INSTALL
pgloader is now a Common Lisp program, tested using the
[SBCL](http://sbcl.org/) (>= 1.1.14) and
[Clozure CL](http://ccl.clozure.com/) implementations with
[Quicklisp](http://www.quicklisp.org/beta/).
$ apt-get install sbcl unzip libsqlite3-dev make curl gawk freetds-dev libzip-dev
$ cd /path/to/pgloader
$ make pgloader
$ ./build/bin/pgloader --help
You can also fetch pre-made binary packages at
[pgloader.io](http://pgloader.io/download.html).
## Testing a new feature
Being a Common Lisp program, pgloader is able to *upgrade itself* at run
time, and provides the command-line option `--self-upgrade` that just does
that.
If you want to test the current repository version (or any checkout really),
it's possible to clone the sources then load them with an older pgloader
release:
$ /usr/bin/pgloader --version
pgloader version "3.0.99"
compiled with SBCL 1.1.17
$ git clone https://github.com/dimitri/pgloader.git /tmp/pgloader
$ /usr/bin/pgloader --self-upgrade /tmp/pgloader --version
Self-upgrading from sources at "/tmp/pgloader/"
pgloader version "3.0.fecae2c"
compiled with SBCL 1.1.17
Here, the code from the *git clone* will be used at run-time. Self-upgrade
is done first, then the main program entry point is called again with the
new coded loaded in.
Please note that the *binary* file (`/usr/bin/pgloader` or
`./build/bin/pgloader`) is not modified in-place, so that if you want to run
the same upgraded code again you will have to use the `--self-upgrade`
command again. It might warrant for an option rename before `3.1.0` stable
release.
## The pgloader.lisp script
Now you can use the `#!` script or build a self-contained binary executable
file, as shown below.
./pgloader.lisp --help
Each time you run the `pgloader` command line, it will check that all its
dependencies are installed and compiled and if that's not the case fetch
them from the internet and prepare them (thanks to *Quicklisp*). So please
be patient while that happens and make sure we can actually connect and
download the dependencies.
## Build Self-Contained binary file
The `Makefile` target `pgloader` knows how to produce a Self Contained
Binary file for pgloader, named `pgloader.exe`:
$ make pgloader
By default, the `Makefile` uses [SBCL](http://sbcl.org/) to compile your
binary image, though it's possible to also build using
[CCL](http://ccl.clozure.com/).
$ make CL=ccl pgloader
Note that the `Makefile` uses the `--compress-core` option when using SBCL,
that should be enabled in your local copy of `SBCL`. If that's not the case,
it's probably because you did compile and install `SBCL` yourself, so that
you have a decently recent version to use. Then you need to compile it with
the `--with-sb-core-compression` option.
You can also remove the `--compress-core` option that way:
$ make COMPRESS_CORE=no pgloader
The `--compress-core` is unique to SBCL, so not used when `CC` is different
from the `sbcl` value.
The `make pgloader` command when successful outputs a `./build/bin/pgloader`
file for you to use.
```
$ pgloader --help
pgloader [ option ... ] SOURCE TARGET
--help -h boolean Show usage and exit.
--version -V boolean Displays pgloader version and exit.
--quiet -q boolean Be quiet
--verbose -v boolean Be verbose
--debug -d boolean Display debug level information.
--client-min-messages string Filter logs seen at the console (default: "warning")
--log-min-messages string Filter logs seen in the logfile (default: "notice")
--summary -S string Filename where to copy the summary
--root-dir -D string Output root directory. (default: #P"/tmp/pgloader/")
--upgrade-config -U boolean Output the command(s) corresponding to .conf file for v2.x
--list-encodings -E boolean List pgloader known encodings and exit.
--logfile -L string Filename where to send the logs.
--load-lisp-file -l string Read user code from files
--dry-run boolean Only check database connections, don't load anything.
--on-error-stop boolean Refrain from handling errors properly.
--no-ssl-cert-verification boolean Instruct OpenSSL to bypass verifying certificates.
--context -C string Command Context Variables
--with string Load options
--set string PostgreSQL options
--field string Source file fields specification
--cast string Specific cast rules
--type string Force input source type
--encoding string Source expected encoding
--before string SQL script to run before loading the data
--after string SQL script to run after loading the data
--self-upgrade string Path to pgloader newer sources
--regress boolean Drive regression testing
```
## Usage
Give as many command files that you need to pgloader:
You can either give a command file to pgloader or run it all from the
command line, see the
[pgloader quick start](https://pgloader.readthedocs.io/en/latest/tutorial/tutorial.html#pgloader-quick-start) on
<https://pgloader.readthedocs.io> for more details.
$ ./build/bin/pgloader --help
$ ./build/bin/pgloader <file.load>
See the documentation file `pgloader.1.md` for details. You can compile that
file into a manual page or an HTML page thanks to the `ronn` application:
$ apt-get install ruby-ronn
$ make docs
For example, for a full migration from SQLite:
$ createdb newdb
$ pgloader ./test/sqlite/sqlite.db postgresql:///newdb
Or for a full migration from MySQL, including schema definition (tables,
indexes, foreign keys, comments) and parallel loading of the corrected data:
$ createdb pagila
$ pgloader mysql://user@localhost/sakila postgresql:///pagila
## LICENCE
pgloader is available under [The PostgreSQL
Licence](http://www.postgresql.org/about/licence/).
## INSTALL
Please see full documentation at
[https://pgloader.readthedocs.io/](https://pgloader.readthedocs.io/en/latest/install.html).
If you're using debian, it's already available:
$ apt-get install pgloader
If you're using docker, you can use the latest version built by the CI at
each commit to the master branch:
$ docker pull ghcr.io/dimitri/pgloader:latest
$ docker run --rm -it ghcr.io/dimitri/pgloader:latest pgloader --version

View File

@ -1,24 +1,23 @@
#!/usr/bin/env bash
sudo yum -y install yum-utils rpmdevtools @development-tools \
sbcl sqlite-devel zlib-devel
SBCL_VERSION=2.2.5
# SBCL 1.1.14
# http://www.mikeivanov.com/post/66510551125/installing-sbcl-1-1-on-rhel-centos-systems
sudo yum -y groupinstall "Development Tools"
wget http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
sudo rpm -Uvh epel-release-6*.rpm
sudo yum install -y sbcl.x86_64
sudo yum -y install yum-utils rpmdevtools @"Development Tools" \
sqlite-devel zlib-devel
wget http://downloads.sourceforge.net/project/sbcl/sbcl/1.1.14/sbcl-1.1.14-source.tar.bz2
tar xfj sbcl-1.1.14-source.tar.bz2
cd sbcl-1.1.14
./make.sh --with-sb-thread --with-sb-core-compression > /dev/null 2>&1
# SBCL 1.3, we'll overwrite the repo version of sbcl with a more recent one
sudo yum -y install epel-release
sudo yum install -y sbcl.x86_64 --enablerepo=epel
wget http://downloads.sourceforge.net/project/sbcl/sbcl/$SBCL_VERSION/sbcl-$SBCL_VERSION-source.tar.bz2
tar xfj sbcl-$SBCL_VERSION-source.tar.bz2
cd sbcl-$SBCL_VERSION
./make.sh --with-sb-thread --with-sb-core-compression --prefix=/usr > /dev/null 2>&1
sudo sh install.sh
cd
# remove the old version that we used to compile the newer one.
sudo yum remove -y sbcl
# Missing dependencies
sudo yum -y install freetds-devel
# prepare the rpmbuild setup
rpmdev-setuptree

View File

@ -1,6 +1,6 @@
#!/usr/bin/env bash
sudo yum -y install yum-utils rpmdevtools @development-tools \
sudo yum -y install yum-utils rpmdevtools @"Development Tools" \
sqlite-devel zlib-devel
# Enable epel for sbcl
@ -8,7 +8,7 @@ sudo yum -y install epel-release
sudo yum -y install sbcl
# Missing dependency
sudo yum install freetds -y
sudo yum install freetds freetds-devel -y
sudo ln -s /usr/lib64/libsybdb.so.5 /usr/lib64/libsybdb.so
# prepare the rpmbuild setup

View File

@ -27,12 +27,12 @@ sudo apt-key adv --recv-keys --keyserver keyserver.ubuntu.com 0xcbcb082a1bb943db
sudo add-apt-repository 'deb http://mirrors.linsrv.net/mariadb/repo/10.0/debian wheezy main'
sudo apt-get update
sudo apt-get install -y postgresql-9.3 postgresql-contrib-9.3 \
postgresql-9.3-ip4r \
sudo apt-get install -y postgresql-15 \
postgresql-15-ip4r \
sbcl \
git patch unzip \
devscripts pandoc \
libsqlite3-dev \
freetds-dev libsqlite3-dev \
gnupg gnupg-agent
sudo DEBIAN_FRONTEND=noninteractive \
@ -40,11 +40,9 @@ sudo DEBIAN_FRONTEND=noninteractive \
# SBCL
#
# we need to backport SBCL from sid to have a recent enough version of the
# compiler and run time we depend on
sudo apt-get -y build-dep sbcl
sudo apt-get source -b sbcl > /dev/null 2>&1 # too verbose
sudo dpkg -i *.deb
# we used to need to backport SBCL, it's only the case now in wheezy, all
# the later distributions are uptodate enough for our needs here.
sudo apt-get -y install sbcl
HBA=/etc/postgresql/9.3/main/pg_hba.conf
echo "local all all trust" | sudo tee $HBA

1
build/.gitignore vendored
View File

@ -2,4 +2,5 @@
*
# Except this file
!bin
!bundle
!.gitignore

4
build/bundle/.gitignore vendored Normal file
View File

@ -0,0 +1,4 @@
# Ignore everything in this directory
*
# Except this file
!.gitignore

70
bundle/Makefile Normal file
View File

@ -0,0 +1,70 @@
# pgloader build tool for bundle tarball
# only supports SBCL
CL = sbcl
APP_NAME = pgloader
VERSION = %VERSION%
ifeq ($(OS),Windows_NT)
EXE = .exe
COMPRESS_CORE = no
DYNSIZE = 1024 # support for windows 32 bits
else
DYNSIZE = 16384
EXE =
endif
BUILDDIR = bin
BUILDAPP = $(BUILDDIR)/buildapp$(EXE)
PGLOADER = ./bin/pgloader
SRCDIR = local-projects/pgloader-$(VERSION)
BUILDAPP_OPTS = --require sb-posix \
--require sb-bsd-sockets \
--require sb-rotate-byte
CL_OPTS = --noinform --no-sysinit --no-userinit
COMPRESS_CORE ?= $(shell $(CL) --noinform \
--quit \
--eval '(when (member :sb-core-compression cl:*features*) (write-string "yes"))')
ifeq ($(COMPRESS_CORE),yes)
COMPRESS_CORE_OPT = --compress-core
endif
pgloader: $(PGLOADER) ;
buildapp: $(BUILDAPP) ;
$(BUILDAPP):
mkdir -p $(BUILDDIR)
$(CL) $(CL_OPTS) --load bundle.lisp \
--eval '(asdf:load-system :buildapp)' \
--eval '(buildapp:build-buildapp "$@")' \
--eval '(quit)'
$(PGLOADER): $(BUILDAPP)
$(BUILDAPP) --logfile /tmp/pgloader-bundle-build.log \
$(BUILDAPP_OPTS) \
--sbcl $(CL) \
--asdf-tree . \
--load-system cffi \
--load-system cl+ssl \
--load-system mssql \
--load $(SRCDIR)/src/hooks.lisp \
--load-system $(APP_NAME) \
--eval '(setf pgloader.params::*version-string* "$(VERSION)")' \
--entry pgloader:main \
--dynamic-space-size $(DYNSIZE) \
$(COMPRESS_CORE_OPT) \
--output $@.tmp
# that's ugly, but necessary when building on Windows :(
mv $@.tmp $@
test: $(PGLOADER)
$(MAKE) PGLOADER=$(realpath $(PGLOADER)) -C $(SRCDIR)/test regress
save:
sbcl --no-userinit --load ./save.lisp
check: test ;

26
bundle/README.md Normal file
View File

@ -0,0 +1,26 @@
# pgloader source bundle
In order to ease building pgloader for non-lisp users, the *bundle*
distribution is a tarball containing pgloader and its build dependencies.
See the the following documentation for more details:
<https://www.quicklisp.org/beta/bundles.html>
The *bundle* comes with a specific `Makefile` so that building it is as
simple as the following (which includes testing the resulting binary):
make
LANG=en_US.UTF-8 make test
The compilation might takes a while, it's because SBCL is trying hard to
generate run-time binary code that is fast and efficient. Yes you need to be
in a unicide environment to run the test suite, so that it matches with the
encoding of the test *.load files.
You can then package or use the pgloader binary:
./bin/pgloader --version
./bin/pgloader --help
Note that the SQLite test files are not included in the bundle, for weithing
too much here.

30
bundle/ql.lisp Normal file
View File

@ -0,0 +1,30 @@
;;;
;;; Script used to prepare a pgloader bundle
;;;
;; fetch a list of recent candidates with
;; (subseq (ql-dist:available-versions (ql-dist:dist "quicklisp")) 0 5)
;;
;; the 2017-06-30 QL release is broken, avoid it.
;;
(defvar *ql-dist* :latest)
(defvar *ql-dist-url-format*
"http://beta.quicklisp.org/dist/quicklisp/~a/distinfo.txt")
(let ((pkgs (append '("pgloader" "buildapp")
(getf (read-from-string
(uiop:read-file-string
(uiop:merge-pathnames* "pgloader.asd" *pwd*)))
:depends-on)))
(dist (if (or (eq :latest *ql-dist*)
(string= "latest" *ql-dist*))
(cdr
;; available-versions is an alist of (date . url), and the
;; first one is the most recent one
(first
(ql-dist:available-versions (ql-dist:dist "quicklisp"))))
(format nil *ql-dist-url-format* *ql-dist*))))
(ql-dist:install-dist dist :prompt nil :replace t)
(ql:bundle-systems pkgs :to *bundle-dir*))
(quit)

47
bundle/save.lisp Normal file
View File

@ -0,0 +1,47 @@
;;;
;;; Create a build/bin/pgloader executable from the source code, using
;;; Quicklisp to load pgloader and its dependencies.
;;;
(in-package #:cl-user)
(require :asdf) ; should work in SBCL and CCL
(let* ((cwd (uiop:getcwd))
(bundle.lisp (uiop:merge-pathnames* "bundle.lisp" cwd))
(version-file (uiop:merge-pathnames* "version.sexp" cwd))
(version-string (uiop:read-file-form version-file))
(asdf:*central-registry* (list cwd)))
(format t "Loading bundle.lisp~%")
(load bundle.lisp)
(format t "Loading system pgloader ~a~%" version-string)
(asdf:load-system :pgloader :verbose nil)
(load (asdf:system-relative-pathname :pgloader "src/hooks.lisp"))
(let* ((pgl (find-package "PGLOADER"))
(version-symbol (find-symbol "*VERSION-STRING*" pgl)))
(setf (symbol-value version-symbol) version-string)))
(defun pgloader-image-main ()
(let ((argv #+sbcl sb-ext:*posix-argv*
#+ccl ccl:*command-line-argument-list*))
(pgloader::main argv)))
(let* ((cwd (uiop:getcwd))
(bin-dir (uiop:merge-pathnames* "bin/" cwd))
(bin-filename (uiop:merge-pathnames* "pgloader" bin-dir)))
(ensure-directories-exist bin-dir)
#+ccl
(ccl:save-application bin-filename
:toplevel-function #'cl-user::pgloader-image-main
:prepend-kernel t)
#+sbcl
(sb-ext:save-lisp-and-die bin-filename
:toplevel #'cl-user::pgloader-image-main
:executable t
:save-runtime-options t
:compression t))

4
conf/freetds.conf Normal file
View File

@ -0,0 +1,4 @@
[global]
tds version = 8.0
client charset = UTF-8

141
debian/changelog vendored
View File

@ -1,3 +1,144 @@
pgloader (3.6.10-2) unstable; urgency=medium
* Limit architectures to those that have sbcl available and working thread
support (notably, this excludes armel and armhf).
-- Christoph Berg <myon@debian.org> Fri, 22 Mar 2024 14:59:27 +0100
pgloader (3.6.10-1) unstable; urgency=medium
* New upstream version.
* Bump ip4r dependencies to 16. (Closes: #1052837)
-- Christoph Berg <myon@debian.org> Thu, 02 Nov 2023 17:44:07 +0100
pgloader (3.6.9-1) unstable; urgency=medium
* New upstream version.
* Bump ip4r dependencies to 15. (Closes: #1022296)
-- Christoph Berg <myon@debian.org> Mon, 24 Oct 2022 12:58:09 +0200
pgloader (3.6.8-1) unstable; urgency=medium
* New upstream version.
* Depend on libsqlite3-0.
-- Christoph Berg <myon@debian.org> Mon, 26 Sep 2022 14:24:02 +0200
pgloader (3.6.7-1) unstable; urgency=medium
* New upstream version:
* Set SBCL dynamic space size to 16 GB on 64 bit architectures.
* Improve documentation with command lines and defaults.
* SBCL compiler notes should not be fatal to pgloader.
-- Christoph Berg <myon@debian.org> Sat, 13 Aug 2022 10:32:41 +0200
pgloader (3.6.6-1) unstable; urgency=medium
* New upstream version.
* Run tests at build-time as well.
-- Christoph Berg <myon@debian.org> Mon, 27 Jun 2022 11:03:00 +0200
pgloader (3.6.4-1) unstable; urgency=medium
* New upstream version.
* debian/tests/testsuite: Run regression tests.
-- Christoph Berg <myon@debian.org> Fri, 24 Jun 2022 14:32:54 +0200
pgloader (3.6.3-1) unstable; urgency=medium
* New upstream version.
* Remove cl-pgloader, deprecated upstream.
* debian/tests/ssl: Force md5 auth if cl-postmodern is too old.
-- Christoph Berg <myon@debian.org> Tue, 21 Dec 2021 10:09:53 +0100
pgloader (3.6.2-1) unstable; urgency=medium
* New upstream version.
* debian/tests/ssl: Add --debug to get backtraces.
* debian/rules: Sync loaded systems with Makefile.
* debian/rules: Print actual compiler log.
* debian/rules: Skip dh_dwz like dh_strip as it fails on buster.
* Bump required cl-db3 version to 20200212.
* Note that we need cl-plus-ssl 20190204 or later.
* Note that we need cl-csv 20180712 or later.
* DH 13.
-- Christoph Berg <myon@debian.org> Tue, 14 Jul 2020 17:02:30 +0200
pgloader (3.6.1-1) unstable; urgency=medium
* New upstream version.
* SSL is always enabled now, drop our patch.
* Add B-D on python3-sphinx-rtd-theme.
-- Christoph Berg <christoph.berg@credativ.de> Mon, 21 Jan 2019 16:09:17 +0100
pgloader (3.5.2-3) unstable; urgency=medium
* Make cl-pgloader test depend on ca-certificates so the snakeoil
certificate is recognized as a valid CA. (Needs the /etc/ssl/certs/*.0
file.)
-- Christoph Berg <christoph.berg@credativ.de> Tue, 31 Jul 2018 16:24:03 +0200
pgloader (3.5.2-2) unstable; urgency=medium
* Install pgloader.asd into correct location. (Closes: #857226)
* Test cl-pgloader through sbcl --eval.
* Skip building and manpage generation in arch-indep builds.
-- Christoph Berg <myon@debian.org> Tue, 03 Jul 2018 22:51:48 +0200
pgloader (3.5.2-1) unstable; urgency=medium
* New upstream version.
* All included test data has been verified as free, stop building a +dfsg
tarball.
* debian/source/options: Ignore changes in src/params.lisp (release vs
non-release).
* Enable SSL in src/hooks.lisp.
* Run wrap-and-sort -st.
* Add new B-D cl-mustache, cl-yason, cl-zs3, sync Depends to cl-pgloader.
* Depend on the libssl version cl-plus-ssl depends on. (Closes: #864309)
* Build and install new sphinx docs instead.
* Build manpage using help2man.
* Priority: optional, move cl-pgloader to Section: lisp.
* Update S-V.
* Add watch file.
-- Christoph Berg <christoph.berg@credativ.de> Tue, 03 Jul 2018 16:59:07 +0200
pgloader (3.4.1+dfsg-1) unstable; urgency=medium
* New release, bugfixes and new features
-- Dimitri Fontaine <dim@tapoueh.org> Thu, 06 Jul 2017 16:51:53 +0300
pgloader (3.3.2+dfsg-1) unstable; urgency=medium
* Fixes github issue 453 (Closes: #843555)
* Maintenance release.
-- Dimitri Fontaine <dim@tapoueh.org> Sat, 03 Dec 2016 19:36:56 +0300
pgloader (3.3.1+dfsg-2) unstable; urgency=medium
* Add tzdata to build-depends (Closes: #839468)
-- Christoph Berg <christoph.berg@credativ.de> Thu, 03 Nov 2016 14:32:28 +0100
pgloader (3.3.1+dfsg-1) unstable; urgency=medium
* New release, bugfixes and new features
-- Dimitri Fontaine <dim@tapoueh.org> Sun, 28 Aug 2016 21:07:47 +0300
pgloader (3.2.2+dfsg-1) unstable; urgency=medium
* New release, lots of bugfixes, some new features

View File

@ -1,2 +0,0 @@
usr/share/common-lisp/source/pgloader
usr/share/common-lisp/systems

View File

@ -1,2 +0,0 @@
README.md
pgloader.1.md

View File

@ -1,3 +0,0 @@
pgloader.asd usr/share/common-lisp/source/simple-date
pgloader.lisp usr/share/common-lisp/source/pgloader
src usr/share/common-lisp/source/pgloader

View File

@ -1 +0,0 @@
usr/share/common-lisp/source/pgloader/pgloader.asd usr/share/common-lisp/systems/pgloader.asd

1
debian/clean vendored Normal file
View File

@ -0,0 +1 @@
buildapp.*

1
debian/compat vendored
View File

@ -1 +0,0 @@
8

86
debian/control vendored
View File

@ -1,34 +1,74 @@
Source: pgloader
Section: database
Priority: extra
Priority: optional
Maintainer: Dimitri Fontaine <dim@tapoueh.org>
Uploaders: Christoph Berg <myon@debian.org>
Build-Depends: debhelper (>= 8.0.0), sbcl (>= 1.1.13), ruby-ronn, buildapp (>= 1.5), cl-asdf (>= 3.0.3), cl-log, cl-postmodern, cl-simple-date, cl-qmynd, cl-split-sequence, cl-unicode, cl-interpol, cl-csv, cl-fad, cl-lparallel, cl-esrap, cl-alexandria, cl-drakma, cl-flexi-streams, cl-usocket, cl-local-time, cl-command-line-arguments, cl-abnf, cl-db3, cl-py-configparser, cl-sqlite, cl-trivial-backtrace, cl-markdown, cl-md5, cl-asdf-finalizers, cl-asdf-system-connections, cl-cffi (>= 1:0.12.0), cl-ixf, gawk, cl-bordeaux-threads (>= 0.8.3), cl-metabang-bind, cl-mssql, cl-uuid, cl-trivial-utf-8, cl-quri, cl-utilities
Standards-Version: 3.9.6
Uploaders:
Christoph Berg <myon@debian.org>,
Build-Depends:
buildapp (>= 1.5),
cl-abnf,
cl-alexandria,
cl-asdf (>= 3.0.3),
cl-asdf-finalizers,
cl-asdf-system-connections,
cl-bordeaux-threads (>= 0.8.3),
cl-cffi (>= 1:0.12.0),
cl-command-line-arguments,
cl-csv (>= 20180712),
cl-db3 (>= 20200212),
cl-drakma,
cl-esrap,
cl-fad,
cl-flexi-streams,
cl-interpol,
cl-ixf,
cl-local-time,
cl-log,
cl-lparallel,
cl-markdown,
cl-md5,
cl-metabang-bind,
cl-mssql,
cl-mustache,
cl-plus-ssl (>= 20190204),
cl-postmodern,
cl-ppcre,
cl-py-configparser,
cl-qmynd,
cl-quri,
cl-simple-date,
cl-split-sequence,
cl-sqlite,
cl-trivial-backtrace,
cl-trivial-utf-8,
cl-unicode,
cl-usocket,
cl-utilities,
cl-uuid,
cl-yason,
cl-zs3,
debhelper-compat (= 13),
gawk,
help2man,
libsqlite3-dev,
postgresql-16-ip4r <!nocheck> | postgresql-ip4r <!nocheck>,
python3-sphinx,
python3-sphinx-rtd-theme,
sbcl (>= 1.1.13),
tzdata,
Standards-Version: 4.6.0
Homepage: https://github.com/dimitri/pgloader
Vcs-Git: https://github.com/dimitri/pgloader.git
Vcs-Browser: https://github.com/dimitri/pgloader
Package: pgloader
Architecture: any
Depends: ${shlibs:Depends}, ${misc:Depends}, freetds-dev
Description: extract, transform and load data into PostgreSQL
pgloader imports data from different kind of sources and COPY it into
PostgreSQL.
.
The command language is described in the manual page and allows one to
describe where to find the data source, its format, and to describe data
processing and transformation.
.
Supported source formats include CSV, fixed width flat files, dBase3 files
(DBF), and SQLite and MySQL databases. In most of those formats, pgloader
is able to auto-discover the schema and create the tables and the indexes
in PostgreSQL. In the MySQL case it's possible to edit CASTing rules from
the pgloader command directly.
Package: cl-pgloader
Architecture: all
Depends: ${misc:Depends}, cl-asdf (>= 3.0.3), cl-log, cl-postmodern, cl-simple-date, cl-qmynd, cl-split-sequence, cl-unicode, cl-interpol, cl-csv, cl-fad, cl-lparallel, cl-esrap, cl-alexandria, cl-drakma, cl-flexi-streams, cl-usocket, cl-local-time, cl-command-line-arguments, cl-abnf, cl-db3, cl-py-configparser, cl-sqlite, cl-trivial-backtrace, cl-markdown, cl-md5, cl-asdf-finalizers, cl-asdf-system-connections, cl-cffi (>= 1:0.12.0), cl-bordeaux-threads (>= 0.8.3), cl-metabang-bind, cl-uuid, cl-trivial-utf-8, cl-quri, cl-utilities
Architecture: amd64 arm64 i386 ppc64el powerpc ppc64
Depends:
freetds-dev,
${misc:Depends},
${shlibs:Depends},
${sqlite:Depends},
${ssl:Depends},
Description: extract, transform and load data into PostgreSQL
pgloader imports data from different kind of sources and COPY it into
PostgreSQL.

74
debian/copyright vendored
View File

@ -20,4 +20,76 @@ License: PostgreSQL
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON
AN "AS IS" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO
PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
Files: test/sqlite/Chinook*
Copyright: Copyright (c) 2008-2017 Luis Rocha
License: MIT
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:
.
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS
IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT
LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE
AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Files: test/data/2013_Gaz_113CDs_national.txt
Copyright: public domain
License: us-public-domain
All U.S. Census Bureau materials, regardless of the media, are entirely in the
public domain. There are no user fees, site licenses, or any special agreements
etc for the public or private use, and or reuse of any census title. As tax
funded product, it's all in the public record.
Files: test/data/reg2013.dbf
Copyright: public comain
License: fr-public-domain
Les publications et données mises à disposition sur le présent site sont
consultables et téléchargeables gratuitement. Sauf spécification contraire,
elles peuvent être réutilisées, y compris à des fins commerciales, sans licence
et sans versement de redevances autres que celles collectées par les sociétés
de perception et de répartition des droits d'auteur régies par le titre II du
livre III du code de la propriété intellectuelle. La réutilisation est
toutefois subordonnée au respect de l'intégrité de l'information et des données
et à la mention précise des sources.
.
https://www.insee.fr/fr/information/2008466
Files: test/data/sakila-db.zip
Copyright: Copyright © 2007, 2018, Oracle and/or its affiliates. All rights reserved.
License: new-bsd-license
The contents of the sakila-schema.sql and sakila-data.sql files are licensed
under the New BSD license.
.
Information on the New BSD license can be found at
http://www.opensource.org/licenses/bsd-license.php and
http://en.wikipedia.org/wiki/BSD_License.
.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
.
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

14
debian/patches/bionic-theme-options vendored Normal file
View File

@ -0,0 +1,14 @@
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -92,11 +92,6 @@ html_theme = 'sphinx_rtd_theme'
#
# html_theme_options = {}
html_theme_options = {
- 'github_user': 'dimitri',
- 'github_repo': 'pgloader',
- 'description': 'your migration companion',
- 'travis_button': True,
- 'show_related': True,
#'sidebar_collapse': False,
}

1
debian/patches/series vendored Normal file
View File

@ -0,0 +1 @@
#bionic-theme-options

View File

@ -1,3 +1,2 @@
README.md
pgloader.1.md
web/src/*.md
docs/_build/html

View File

@ -1 +0,0 @@
pgloader.1

106
debian/rules vendored
View File

@ -1,55 +1,87 @@
#!/usr/bin/make -f
# -*- makefile -*-
# Sample debian/rules that uses debhelper.
# This file was originally written by Joey Hess and Craig Small.
# As a special exception, when this file is copied by dh-make into a
# dh-make output file, you may use that output file without restriction.
# This special exception was added by Craig Small in version 0.37 of dh-make.
# Uncomment this to turn on verbose mode.
#export DH_VERBOSE=1
include /usr/share/dpkg/pkg-info.mk
PKGVERS = $(shell dpkg-parsechangelog | awk -F '[:-]' '/^Version:/ { print substr($$2, 2) }')
EXCLUDE = --exclude-vcs --exclude=debian --exclude=build --exclude=.vagrant
# get libsqlite3 package name from libsqlite3-dev
LIBSQLITE := $(shell dpkg-query --showformat='$${Depends}' --show libsqlite3-dev | grep -o 'libsqlite[^ ]*')
# make pgloader depend on the libssl package cl-plus-ssl depends on
LIBSSL := $(shell dpkg-query --showformat='$${Depends}' --show cl-plus-ssl | grep -o 'libssl[^ ]*')
BITS = $(shell dpkg-architecture -qDEB_BUILD_ARCH_BITS)
ifeq ($(BITS),32)
SIZE=1024
else
SIZE=4096
SIZE=16384
endif
MAKEFILE_VERSION = $(shell awk '/^VERSION/ { print $$3 }' Makefile)
DOC_VERSION = $(shell awk '/^release/ { print $$3 }' docs/conf.py | tr -d "'")
SPECFILE_VERSION = $(shell awk '/^Version/ { print $$2 }' pgloader.spec)
DEBIAN_VERSION = $(shell dpkg-parsechangelog -SVersion | cut -d- -f 1)
PGLOADER_MAJOR_VERSION = $(shell awk '/^.defparameter .major-version/ { print $$3 }' src/params.lisp | grep -Eo '[0-9.]+')
PGLOADER_MINOR_VERSION = $(shell awk '/^.defparameter .minor-version/ { print $$3 }' src/params.lisp | grep -Eo '[0-9.]+')
# buildd provides a build environment where $HOME is not writable, but the
# CL compilers here will need to fill-in a per-user cache
export HOME = $(CURDIR)/debian/home
orig: clean
rm -rf $(HOME)
cd .. && tar czf pgloader_$(PKGVERS).orig.tar.gz $(EXCLUDE) pgloader
override_dh_auto_clean:
dh_auto_clean
rm -rf debian/home
# sanity checks on version number
[ "$(MAKEFILE_VERSION)" = "$(DOC_VERSION)" ] # Makefile = docs/conf.py version
[ "$(MAKEFILE_VERSION)" = "$(SPECFILE_VERSION)" ] # Makefile = pgloader.spec version
[ "$(MAKEFILE_VERSION)" = "$(DEBIAN_VERSION)" ] # Makefile = debian/changelog version
[ "$(MAKEFILE_VERSION)" = "$(PGLOADER_MAJOR_VERSION).$(PGLOADER_MINOR_VERSION)" ] # Makefile = src/params.lisp version
override_dh_auto_build:
make docs
mkdir -p build/bin
mkdir -p $(HOME)
buildapp --require sb-posix \
--require sb-bsd-sockets \
--load /usr/share/common-lisp/source/cl-asdf/build/asdf.lisp \
--asdf-path . \
--asdf-tree /usr/share/common-lisp/systems \
--load-system asdf-finalizers \
--load-system asdf-system-connections \
--load-system pgloader \
--load src/hooks.lisp \
--entry pgloader:main \
--dynamic-space-size $(SIZE) \
--compress-core \
--output build/bin/pgloader
override_dh_auto_test:
# no nothing
override_dh_strip:
override_dh_auto_build-indep:
# do nothing
override_dh_auto_build-arch:
mkdir -p build/bin
mkdir -p $(HOME)
buildapp --require sb-posix \
--require sb-bsd-sockets \
--load /usr/share/common-lisp/source/cl-asdf/build/asdf.lisp \
--asdf-path . \
--asdf-tree /usr/share/common-lisp/systems \
--load-system asdf-finalizers \
--load-system asdf-system-connections \
--load-system cffi \
--load-system cl+ssl \
--load-system mssql \
--load src/hooks.lisp \
--load-system pgloader \
--entry pgloader:main \
--dynamic-space-size $(SIZE) \
--compress-core \
--logfile buildapp.log \
--output build/bin/pgloader \
|| echo $$? > buildapp.fail
cat buildapp.log
test ! -f buildapp.fail
ls -l build/bin/pgloader
$(MAKE) -C docs html
override_dh_auto_test:
PATH=$(CURDIR)/build/bin:$(PATH) debian/tests/testsuite
override_dh_strip override_dh_dwz:
# do nothing, sbcl doesn't write any debug info
override_dh_installman-arch:
mkdir -p debian/pgloader/usr/share/man/man1/
PATH=debian/pgloader/usr/bin:$(PATH) \
help2man --version-string $(DEB_VERSION_UPSTREAM) \
--no-info \
--name "extract, transform and load data into PostgreSQL" \
pgloader > \
debian/pgloader/usr/share/man/man1/pgloader.1
override_dh_gencontrol:
dh_gencontrol -- \
-V"sqlite:Depends=$(LIBSQLITE)" \
-V"ssl:Depends=$(LIBSSL)"
%:
dh $@
dh $@

2
debian/source/options vendored Normal file
View File

@ -0,0 +1,2 @@
# ignore release/non-release status
extend-diff-ignore=src/params.lisp

13
debian/tests/control vendored Normal file
View File

@ -0,0 +1,13 @@
Depends:
ca-certificates,
cl-postmodern,
pgloader,
postgresql,
Tests: ssl
Restrictions: allow-stderr, needs-root
Depends:
pgloader,
postgresql-16-ip4r | postgresql-ip4r,
Tests: testsuite
Restrictions: allow-stderr

34
debian/tests/ssl vendored Executable file
View File

@ -0,0 +1,34 @@
#!/bin/sh
# test needs root so we have a SSL certificate
set -eux
trap "rm -rf /tmp/pgloader" EXIT
# check if cl-postmodern is new enough to support scram-sha-256
postmodern=$(dpkg-query --show --showformat='${Version}' cl-postmodern)
if dpkg --compare-versions "$postmodern" lt 20200101; then
AUTH="-i--auth-local=trust -i--auth-host=md5"
fi
pg_virtualenv ${AUTH:-} <<-'EOF'
set -eux
# force SSL connection
HBA=$(psql -XAtc 'SHOW hba_file')
sed -i -e 's/^host/hostssl/' $HBA
psql -XAtc 'SELECT pg_reload_conf()'
createdb pgloader
export PGDATABASE=pgloader
psql -XAtc 'create schema expected'
# test UNIX socket
rm -rf /tmp/pgloader
PGHOST=/var/run/postgresql su -c 'pgloader --debug --regress test/allcols.load' postgres
# test SSL connection
rm -rf /tmp/pgloader
PGSSLMODE=require pgloader --debug --regress test/allcols.load
EOF

11
debian/tests/testsuite vendored Executable file
View File

@ -0,0 +1,11 @@
#!/bin/sh
set -eux
case $USER in
root) PGSUPERUSER=postgres ;;
*) PGSUPERUSER=$USER ;;
esac
# use trust authentication to avoid scram failures on bullseye/buster/stretch/impish/focal/bionic
PGLOADER=pgloader PGSUPERUSER=$PGSUPERUSER pg_virtualenv -i'-Atrust' make -C test prepare regress

2
debian/watch vendored Normal file
View File

@ -0,0 +1,2 @@
version=4
https://github.com/dimitri/pgloader/tags .*/v(.*).tar.gz

1
docs/CNAME Normal file
View File

@ -0,0 +1 @@
pgloader.org

20
docs/Makefile Normal file
View File

@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
SPHINXPROJ = pgloader
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

123
docs/batches.rst Normal file
View File

@ -0,0 +1,123 @@
Batch Processing
================
To load data to PostgreSQL, pgloader uses the `COPY` streaming protocol.
While this is the faster way to load data, `COPY` has an important drawback:
as soon as PostgreSQL emits an error with any bit of data sent to it,
whatever the problem is, the whole data set is rejected by PostgreSQL.
To work around that, pgloader cuts the data into *batches* of 25000 rows
each, so that when a problem occurs it's only impacting that many rows of
data. Each batch is kept in memory while the `COPY` streaming happens, in
order to be able to handle errors should some happen.
When PostgreSQL rejects the whole batch, pgloader logs the error message
then isolates the bad row(s) from the accepted ones by retrying the batched
rows in smaller batches. To do that, pgloader parses the *CONTEXT* error
message from the failed COPY, as the message contains the line number where
the error was found in the batch, as in the following example::
CONTEXT: COPY errors, line 3, column b: "2006-13-11"
Using that information, pgloader will reload all rows in the batch before
the erroneous one, log the erroneous one as rejected, then try loading the
remaining of the batch in a single attempt, which may or may not contain
other erroneous data.
At the end of a load containing rejected rows, you will find two files in
the *root-dir* location, under a directory named the same as the target
database of your setup. The filenames are the target table, and their
extensions are `.dat` for the rejected data and `.log` for the file
containing the full PostgreSQL client side logs about the rejected data.
The `.dat` file is formatted in PostgreSQL the text COPY format as documented
in `http://www.postgresql.org/docs/9.2/static/sql-copy.html#AEN66609`.
It is possible to use the following WITH options to control pgloader batch
behavior:
- *on error stop*, *on error resume next*
This option controls if pgloader is using building batches of data at
all. The batch implementation allows pgloader to recover errors by
sending the data that PostgreSQL accepts again, and by keeping away the
data that PostgreSQL rejects.
To enable retrying the data and loading the good parts, use the option
*on error resume next*, which is the default to file based data loads
(such as CSV, IXF or DBF).
When migrating from another RDMBS technology, it's best to have a
reproducible loading process. In that case it's possible to use *on
error stop* and fix either the casting rules, the data transformation
functions or in cases the input data until your migration runs through
completion. That's why *on error resume next* is the default for SQLite,
MySQL and MS SQL source kinds.
A Note About Performance
------------------------
pgloader has been developed with performance in mind, to be able to cope
with ever growing needs in loading large amounts of data into PostgreSQL.
The basic architecture it uses is the old Unix pipe model, where a thread is
responsible for loading the data (reading a CSV file, querying MySQL, etc)
and fills pre-processed data into a queue. Another threads feeds from the
queue, apply some more *transformations* to the input data and stream the
end result to PostgreSQL using the COPY protocol.
When given a file that the PostgreSQL `COPY` command knows how to parse, and
if the file contains no erroneous data, then pgloader will never be as fast
as just using the PostgreSQL `COPY` command.
Note that while the `COPY` command is restricted to read either from its
standard input or from a local file on the server's file system, the command
line tool `psql` implements a `\copy` command that knows how to stream a
file local to the client over the network and into the PostgreSQL server,
using the same protocol as pgloader uses.
A Note About Parallelism
------------------------
pgloader uses several concurrent tasks to process the data being loaded:
- a reader task reads the data in and pushes it to a queue,
- at last one write task feeds from the queue and formats the raw into the
PostgreSQL COPY format in batches (so that it's possible to then retry a
failed batch without reading the data from source again), and then sends
the data to PostgreSQL using the COPY protocol.
The parameter *workers* allows to control how many worker threads are
allowed to be active at any time (that's the parallelism level); and the
parameter *concurrency* allows to control how many tasks are started to
handle the data (they may not all run at the same time, depending on the
*workers* setting).
We allow *workers* simultaneous workers to be active at the same time in the
context of a single table. A single unit of work consist of several kinds of
workers:
- a reader getting raw data from the source,
- N writers preparing and sending the data down to PostgreSQL.
The N here is setup to the *concurrency* parameter: with a *CONCURRENCY* of
2, we start (+ 1 2) = 3 concurrent tasks, with a *concurrency* of 4 we start
(+ 1 4) = 5 concurrent tasks, of which only *workers* may be active
simultaneously.
The defaults are `workers = 4, concurrency = 1` when loading from a database
source, and `workers = 8, concurrency = 2` when loading from something else
(currently, a file). Those defaults are arbitrary and waiting for feedback
from users, so please consider providing feedback if you play with the
settings.
As the `CREATE INDEX` threads started by pgloader are only waiting until
PostgreSQL is done with the real work, those threads are *NOT* counted into
the concurrency levels as detailed here.
By default, as many `CREATE INDEX` threads as the maximum number of indexes
per table are found in your source schema. It is possible to set the `max
parallel create index` *WITH* option to another number in case there's just
too many of them to create.

49
docs/bugreport.rst Normal file
View File

@ -0,0 +1,49 @@
Reporting Bugs
==============
pgloader is a software and as such contains bugs. Most bugs are easy to
solve and taken care of in a short delay. For this to be possible though,
bug reports need to follow those recommandations:
- include pgloader version,
- include problematic input and output,
- include a description of the output you expected,
- explain the difference between the ouput you have and the one you expected,
- include a self-reproducing test-case
Test Cases to Reproduce Bugs
----------------------------
Use the *inline* source type to help reproduce a bug, as in the pgloader tests::
LOAD CSV
FROM INLINE
INTO postgresql://dim@localhost/pgloader?public."HS"
WITH truncate,
fields terminated by '\t',
fields not enclosed,
fields escaped by backslash-quote,
quote identifiers
SET work_mem to '128MB',
standard_conforming_strings to 'on',
application_name to 'my app name'
BEFORE LOAD DO
$$ create extension if not exists hstore; $$,
$$ drop table if exists "HS"; $$,
$$ CREATE TABLE "HS"
(
id serial primary key,
kv hstore
)
$$;
1 email=>foo@example.com,a=>b
2 test=>value
3 a=>b,c=>"quoted hstore value",d=>other
4 baddata

380
docs/command.rst Normal file
View File

@ -0,0 +1,380 @@
Command Syntax
==============
pgloader implements a Domain Specific Language allowing to setup complex
data loading scripts handling computed columns and on-the-fly sanitization
of the input data. For more complex data loading scenarios, you will be
required to learn that DSL's syntax. It's meant to look familiar to DBA by
being inspired by SQL where it makes sense, which is not that much after
all.
The pgloader commands follow the same global grammar rules. Each of them
might support only a subset of the general options and provide specific
options.
::
LOAD <source-type>
FROM <source-url>
[ HAVING FIELDS <source-level-options> ]
INTO <postgresql-url>
[ TARGET TABLE [ "<schema>" ]."<table name>" ]
[ TARGET COLUMNS <columns-and-options> ]
[ WITH <load-options> ]
[ SET <postgresql-settings> ]
[ BEFORE LOAD [ DO <sql statements> | EXECUTE <sql file> ] ... ]
[ AFTER LOAD [ DO <sql statements> | EXECUTE <sql file> ] ... ]
;
The main clauses are the `LOAD`, `FROM`, `INTO` and `WITH` clauses that each
command implements. Some command then implement the `SET` command, or some
specific clauses such as the `CAST` clause.
.. _common_clauses:
Command Clauses
---------------
The pgloader command syntax allows composing CLAUSEs together. Some clauses
are specific to the FROM source-type, most clauses are always available.
FROM
----
The *FROM* clause specifies where to read the data from, and each command
introduces its own variant of sources. For instance, the *CSV* source
supports `inline`, `stdin`, a filename, a quoted filename, and a *FILENAME
MATCHING* clause (see above); whereas the *MySQL* source only supports a
MySQL database URI specification.
INTO
----
The PostgreSQL connection URI must contains the name of the target table
where to load the data into. That table must have already been created in
PostgreSQL, and the name might be schema qualified.
Then *INTO* option also supports an optional comma separated list of target
columns, which are either the name of an input *field* or the white space
separated list of the target column name, its PostgreSQL data type and a
*USING* expression.
The *USING* expression can be any valid Common Lisp form and will be read
with the current package set to `pgloader.transforms`, so that you can use
functions defined in that package, such as functions loaded dynamically with
the `--load` command line parameter.
Each *USING* expression is compiled at runtime to native code.
This feature allows pgloader to load any number of fields in a CSV file into
a possibly different number of columns in the database, using custom code
for that projection.
WITH
----
Set of options to apply to the command, using a global syntax of either:
- *key = value*
- *use option*
- *do not use option*
See each specific command for details.
All data sources specific commands support the following options:
- *on error stop*, *on error resume next*
- *batch rows = R*
- *batch size = ... MB*
- *prefetch rows = ...*
See the section BATCH BEHAVIOUR OPTIONS for more details.
In addition, the following settings are available:
- *workers = W*
- *concurrency = C*
- *max parallel create index = I*
See section A NOTE ABOUT PARALLELISM for more details.
SET
---
This clause allows to specify session parameters to be set for all the
sessions opened by pgloader. It expects a list of parameter name, the equal
sign, then the single-quoted value as a comma separated list.
The names and values of the parameters are not validated by pgloader, they
are given as-is to PostgreSQL.
BEFORE LOAD DO
--------------
You can run SQL queries against the database before loading the data from
the `CSV` file. Most common SQL queries are `CREATE TABLE IF NOT EXISTS` so
that the data can be loaded.
Each command must be *dollar-quoted*: it must begin and end with a double
dollar sign, `$$`. Dollar-quoted queries are then comma separated. No extra
punctuation is expected after the last SQL query.
BEFORE LOAD EXECUTE
-------------------
Same behaviour as in the *BEFORE LOAD DO* clause. Allows you to read the SQL
queries from a SQL file. Implements support for PostgreSQL dollar-quoting
and the `\i` and `\ir` include facilities as in `psql` batch mode (where
they are the same thing).
AFTER LOAD DO
-------------
Same format as *BEFORE LOAD DO*, the dollar-quoted queries found in that
section are executed once the load is done. That's the right time to create
indexes and constraints, or re-enable triggers.
AFTER LOAD EXECUTE
------------------
Same behaviour as in the *AFTER LOAD DO* clause. Allows you to read the SQL
queries from a SQL file. Implements support for PostgreSQL dollar-quoting
and the `\i` and `\ir` include facilities as in `psql` batch mode (where
they are the same thing).
AFTER CREATE SCHEMA DO
----------------------
Same format as *BEFORE LOAD DO*, the dollar-quoted queries found in that
section are executed once the schema has been created by pgloader, and
before the data is loaded. It's the right time to ALTER TABLE or do some
custom implementation on-top of what pgloader does, like maybe partitioning.
AFTER CREATE SCHEMA EXECUTE
---------------------------
Same behaviour as in the *AFTER CREATE SCHEMA DO* clause. Allows you to read
the SQL queries from a SQL file. Implements support for PostgreSQL
dollar-quoting and the `\i` and `\ir` include facilities as in `psql` batch
mode (where they are the same thing).
Connection String
-----------------
The `<postgresql-url>` parameter is expected to be given as a *Connection URI*
as documented in the PostgreSQL documentation at
http://www.postgresql.org/docs/9.3/static/libpq-connect.html#LIBPQ-CONNSTRING.
::
postgresql://[user[:password]@][netloc][:port][/dbname][?option=value&...]
Where:
- *user*
Can contain any character, including colon (`:`) which must then be
doubled (`::`) and at-sign (`@`) which must then be doubled (`@@`).
When omitted, the *user* name defaults to the value of the `PGUSER`
environment variable, and if it is unset, the value of the `USER`
environment variable.
- *password*
Can contain any character, including the at sign (`@`) which must then
be doubled (`@@`). To leave the password empty, when the *user* name
ends with at at sign, you then have to use the syntax user:@.
When omitted, the *password* defaults to the value of the `PGPASSWORD`
environment variable if it is set, otherwise the password is left
unset.
When no *password* is found either in the connection URI nor in the
environment, then pgloader looks for a `.pgpass` file as documented at
https://www.postgresql.org/docs/current/static/libpq-pgpass.html. The
implementation is not that of `libpq` though. As with `libpq` you can
set the environment variable `PGPASSFILE` to point to a `.pgpass` file,
and pgloader defaults to `~/.pgpass` on unix like systems and
`%APPDATA%\postgresql\pgpass.conf` on windows. Matching rules and syntax
are the same as with `libpq`, refer to its documentation.
- *netloc*
Can be either a hostname in dotted notation, or an ipv4, or an Unix
domain socket path. Empty is the default network location, under a
system providing *unix domain socket* that method is preferred, otherwise
the *netloc* default to `localhost`.
It's possible to force the *unix domain socket* path by using the syntax
`unix:/path/to/where/the/socket/file/is`, so to force a non default
socket path and a non default port, you would have:
postgresql://unix:/tmp:54321/dbname
The *netloc* defaults to the value of the `PGHOST` environment
variable, and if it is unset, to either the default `unix` socket path
when running on a Unix system, and `localhost` otherwise.
Socket path containing colons are supported by doubling the colons
within the path, as in the following example:
postgresql://unix:/tmp/project::region::instance:5432/dbname
- *dbname*
Should be a proper identifier (letter followed by a mix of letters,
digits and the punctuation signs comma (`,`), dash (`-`) and underscore
(`_`).
When omitted, the *dbname* defaults to the value of the environment
variable `PGDATABASE`, and if that is unset, to the *user* value as
determined above.
- *options*
The optional parameters must be supplied with the form `name=value`, and
you may use several parameters by separating them away using an
ampersand (`&`) character.
Only some options are supported here, *tablename* (which might be
qualified with a schema name) *sslmode*, *host*, *port*, *dbname*,
*user* and *password*.
The *sslmode* parameter values can be one of `disable`, `allow`,
`prefer` or `require`.
For backward compatibility reasons, it's possible to specify the
*tablename* option directly, without spelling out the `tablename=`
parts.
The options override the main URI components when both are given, and
using the percent-encoded option parameters allow using passwords
starting with a colon and bypassing other URI components parsing
limitations.
Regular Expressions
-------------------
Several clauses listed in the following accept *regular expressions* with
the following input rules:
- A regular expression begins with a tilde sign (`~`),
- is then followed with an opening sign,
- then any character is allowed and considered part of the regular
expression, except for the closing sign,
- then a closing sign is expected.
The opening and closing sign are allowed by pair, here's the complete list
of allowed delimiters::
~//
~[]
~{}
~()
~<>
~""
~''
~||
~##
Pick the set of delimiters that don't collide with the *regular expression*
you're trying to input. If your expression is such that none of the
solutions allow you to enter it, the places where such expressions are
allowed should allow for a list of expressions.
Comments
--------
Any command may contain comments, following those input rules:
- the `--` delimiter begins a comment that ends with the end of the
current line,
- the delimiters `/*` and `*/` respectively start and end a comment, which
can be found in the middle of a command or span several lines.
Any place where you could enter a *whitespace* will accept a comment too.
Batch behaviour options
-----------------------
All pgloader commands have support for a *WITH* clause that allows for
specifying options. Some options are generic and accepted by all commands,
such as the *batch behaviour options*, and some options are specific to a
data source kind, such as the CSV *skip header* option.
The global batch behaviour options are:
- *batch rows*
Takes a numeric value as argument, used as the maximum number of rows
allowed in a batch. The default is `25 000` and can be changed to try
having better performance characteristics or to control pgloader memory
usage;
- *batch size*
Takes a memory unit as argument, such as *20 MB*, its default value.
Accepted multipliers are *kB*, *MB*, *GB*, *TB* and *PB*. The case is
important so as not to be confused about bits versus bytes, we're only
talking bytes here.
- *prefetch rows*
Takes a numeric value as argument, defaults to `100000`. That's the
number of rows that pgloader is allowed to read in memory in each reader
thread. See the *workers* setting for how many reader threads are
allowed to run at the same time.
Other options are specific to each input source, please refer to specific
parts of the documentation for their listing and covering.
A batch is then closed as soon as either the *batch rows* or the *batch
size* threshold is crossed, whichever comes first. In cases when a batch has
to be closed because of the *batch size* setting, a *debug* level log
message is printed with how many rows did fit in the *oversized* batch.
Templating with Mustache
------------------------
pgloader implements the https://mustache.github.io/ templating system so
that you may have dynamic parts of your commands. See the documentation for
this template system online.
A specific feature of pgloader is the ability to fetch a variable from the
OS environment of the pgloader process, making it possible to run pgloader
as in the following example::
$ DBPATH=sqlite/sqlite.db pgloader ./test/sqlite-env.load
or in several steps::
$ export DBPATH=sqlite/sqlite.db
$ pgloader ./test/sqlite-env.load
The variable can then be used in a typical mustache fashion::
load database
from '{{DBPATH}}'
into postgresql:///pgloader;
It's also possible to prepare a INI file such as the following::
[pgloader]
DBPATH = sqlite/sqlite.db
And run the following command, feeding the INI values as a *context* for
pgloader templating system::
$ pgloader --context ./test/sqlite.ini ./test/sqlite-ini.load
The mustache templates implementation with OS environment support replaces
former `GETENV` implementation, which didn't work anyway.

118
docs/conf.py Normal file
View File

@ -0,0 +1,118 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# pgloader documentation build configuration file, created by
# sphinx-quickstart on Tue Dec 5 19:23:32 2017.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))
# -- Project information -----------------------------------------------------
project = 'pgloader'
copyright = '2005-2022, Dimitri Fontaine'
author = 'Dimitri Fontaine'
version = '3.6'
release = '3.6.10'
# -- General configuration ------------------------------------------------
# The master toctree document.
master_doc = 'index'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
# -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
#html_theme = 'alabaster'
html_theme = 'sphinx_rtd_theme'
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
#html_static_path = ['_static']
# -- Options for LaTeX output ---------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#
# 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#
# 'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#
# 'preamble': '',
# Latex figure (float) alignment
#
# 'figure_align': 'htbp',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'pgloader.tex', 'pgloader Documentation',
'Dimitri Fontaine', 'manual'),
]
# -- Options for manual page output ---------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
(master_doc, 'pgloader', 'pgloader Documentation',
[author], 1)
]
# -- Options for Texinfo output -------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'pgloader', 'pgloader Documentation',
author, 'pgloader', 'One line description of project.',
'Miscellaneous'),
]

296
docs/index.rst Normal file
View File

@ -0,0 +1,296 @@
.. pgloader documentation master file, created by
sphinx-quickstart on Tue Dec 5 19:23:32 2017.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to pgloader's documentation!
====================================
The `pgloader`__ project is an Open Source Software project. The development
happens at `https://github.com/dimitri/pgloader`__ and is public: everyone
is welcome to participate by opening issues, pull requests, giving feedback,
etc.
__ https://github.com/dimitri/pgloader
__ https://github.com/dimitri/pgloader
pgloader loads data from various sources into PostgreSQL. It can transform
the data it reads on the fly and submit raw SQL before and after the
loading. It uses the `COPY` PostgreSQL protocol to stream the data into the
server, and manages errors by filling a pair of *reject.dat* and
*reject.log* files.
Thanks to being able to load data directly from a database source, pgloader
also supports from migrations from other productions to PostgreSQL. In this
mode of operations, pgloader handles both the schema and data parts of the
migration, in a single unmanned command, allowing to implement **Continuous
Migration**.
Features Overview
=================
pgloader has two modes of operation: loading from files, migrating
databases. In both cases, pgloader uses the PostgreSQL COPY protocol which
implements a **streaming** to send data in a very efficient way.
Loading file content in PostgreSQL
----------------------------------
When loading from files, pgloader implements the following features:
Many source formats supported
Support for a wide variety of file based formats are included in
pgloader: the CSV family, fixed columns formats, dBase files (``db3``),
and IBM IXF files.
The SQLite database engine is accounted for in the next section:
pgloader considers SQLite as a database source and implements schema
discovery from SQLite catalogs.
On the fly data transformation
Often enough the data as read from a CSV file (or another format) needs
some tweaking and clean-up before being sent to PostgreSQL.
For instance in the `geolite
<https://github.com/dimitri/pgloader/blob/master/test/archive.load>`_
example we can see that integer values are being rewritten as IP address
ranges, allowing to target an ``ip4r`` column directly.
Full Field projections
pgloader supports loading data into less fields than found on file, or
more, doing some computation on the data read before sending it to
PostgreSQL.
Reading files from an archive
Archive formats *zip*, *tar*, and *gzip* are supported by pgloader: the
archive is extracted in a temporary directly and expanded files are then
loaded.
HTTP(S) support
pgloader knows how to download a source file or a source archive using
HTTP directly. It might be better to use ``curl -O- http://... |
pgloader`` and read the data from *standard input*, then allowing for
streaming of the data from its source down to PostgreSQL.
Target schema discovery
When loading in an existing table, pgloader takes into account the
existing columns and may automatically guess the CSV format for you.
On error stop / On error resume next
In some cases the source data is so damaged as to be impossible to
migrate in full, and when loading from a file then the default for
pgloader is to use ``on error resume next`` option, where the rows
rejected by PostgreSQL are saved away and the migration continues with
the other rows.
In other cases loading only a part of the input data might not be a
great idea, and in such cases it's possible to use the ``on error stop``
option.
Pre/Post SQL commands
This feature allows pgloader commands to include SQL commands to run
before and after loading a file. It might be about creating a table
first, then loading the data into it, and then doing more processing
on-top of the data (implementing an *ELT* pipeline then), or creating
specific indexes as soon as the data has been made ready.
One-command migration to PostgreSQL
-----------------------------------
When migrating a full database in a single command, pgloader implements the
following features:
One-command migration
The whole migration is started with a single command line and then runs
unattended. pgloader is meant to be integrated in a fully automated
tooling that you can repeat as many times as needed.
Schema discovery
The source database is introspected using its SQL catalogs to get the
list of tables, attributes (with data types, default values, not null
constraints, etc), primary key constraints, foreign key constraints,
indexes, comments, etc. This feeds an internal database catalog of all
the objects to migrate from the source database to the target database.
User defined casting rules
Some source database have ideas about their data types that might not be
compatible with PostgreSQL implementaion of equivalent data types.
For instance, SQLite since version 3 has a `Dynamic Type System
<https://www.sqlite.org/datatype3.html>`_ which of course isn't
compatible with the idea of a `Relation
<https://en.wikipedia.org/wiki/Relation_(database)>`_. Or MySQL accepts
datetime for year zero, which doesn't exists in our calendar, and
doesn't have a boolean data type.
When migrating from another source database technology to PostgreSQL,
data type casting choices must be made. pgloader implements solid
defaults that you can rely upon, and a facility for **user defined data
type casting rules** for specific cases. The idea is to allow users to
specify the how the migration should be done, in order for it to be
repeatable and included in a *Continuous Migration* process.
On the fly data transformations
The user defined casting rules come with on the fly rewrite of the data.
For instance zero dates (it's not just the year, MySQL accepts
``0000-00-00`` as a valid datetime) are rewritten to NULL values by
default.
Partial Migrations
It is possible to include only a partial list of the source database
tables in the migration, or to exclude some of the tables on the source
database.
Schema only, Data only
This is the **ORM compatibility** feature of pgloader, where it is
possible to create the schema using your ORM and then have pgloader
migrate the data targeting this already created schema.
When doing this, it is possible for pgloader to *reindex* the target
schema: before loading the data from the source database into PostgreSQL
using COPY, pgloader DROPs the indexes and constraints, and reinstalls
the exact same definitions of them once the data has been loaded.
The reason for operating that way is of course data load performance.
Repeatable (DROP+CREATE)
By default, pgloader issues DROP statements in the target PostgreSQL
database before issuing any CREATE statement, so that you can repeat the
migration as many times as necessary until migration specifications and
rules are bug free.
The schedule the data migration to run every night (or even more often!)
for the whole duration of the code migration project. See the
`Continuous Migration <https://pgloader.io/blog/continuous-migration/>`_
methodology for more details about the approach.
On error stop / On error resume next
The default behavior of pgloader when migrating from a database is
``on error stop``. The idea is to let the user fix either the migration
specifications or the source data, and run the process again, until
it works.
In some cases the source data is so damaged as to be impossible to
migrate in full, and it might be necessary to then resort to the ``on
error resume next`` option, where the rows rejected by PostgreSQL are
saved away and the migration continues with the other rows.
Pre/Post SQL commands, Post-Schema SQL commands
While pgloader takes care of rewriting the schema to PostgreSQL
expectations, and even provides *user-defined data type casting rules*
support to that end, sometimes it is necessary to add some specific SQL
commands around the migration. It's of course supported right from
pgloader itself, without having to script around it.
Online ALTER schema
At times migrating to PostgreSQL is also a good opportunity to review
and fix bad decisions that were made in the past, or simply that are not
relevant to PostgreSQL.
The pgloader command syntax allows to ALTER pgloader's internal
representation of the target catalogs so that the target schema can be
created a little different from the source one. Changes supported
include target a different *schema* or *table* name.
Materialized Views, or schema rewrite on-the-fly
In some cases the schema rewriting goes deeper than just renaming the
SQL objects to being a full normalization exercise. Because PostgreSQL
is great at running a normalized schema in production under most
workloads.
pgloader implements full flexibility in on-the-fly schema rewriting, by
making it possible to migrate from a view definition. The view attribute
list becomes a table definition in PostgreSQL, and the data is fetched
by querying the view on the source system.
A SQL view allows to implement both content filtering at the column
level using the SELECT projection clause, and at the row level using the
WHERE restriction clause. And backfilling from reference tables thanks
to JOINs.
Distribute to Citus
When migrating from PostgreSQL to Citus, a important part of the process
consists of adjusting the schema to the distribution key. Read
`Preparing Tables and Ingesting Data
<https://docs.citusdata.com/en/v8.0/use_cases/multi_tenant.html>`_ in
the Citus documentation for a complete example showing how to do that.
When using pgloader it's possible to specify the distribution keys and
reference tables and let pgloader take care of adjusting the table,
indexes, primary keys and foreign key definitions all by itself.
Encoding Overrides
MySQL doesn't actually enforce the encoding of the data in the database
to match the encoding known in the metadata, defined at the database,
table, or attribute level. Sometimes, it's necessary to override the
metadata in order to make sense of the text, and pgloader makes it easy
to do so.
Continuous Migration
--------------------
pgloader is meant to migrate a whole database in a single command line and
without any manual intervention. The goal is to be able to setup a
*Continuous Integration* environment as described in the `Project
Methodology <http://mysqltopgsql.com/project/>`_ document of the `MySQL to
PostgreSQL <http://mysqltopgsql.com/project/>`_ webpage.
1. Setup your target PostgreSQL Architecture
2. Fork a Continuous Integration environment that uses PostgreSQL
3. Migrate the data over and over again every night, from production
4. As soon as the CI is all green using PostgreSQL, schedule the D-Day
5. Migrate without suprise and enjoy!
In order to be able to follow this great methodology, you need tooling to
implement the third step in a fully automated way. That's pgloader.
.. toctree::
:hidden:
:caption: Getting Started
intro
quickstart
tutorial/tutorial
install
bugreport
.. toctree::
:hidden:
:caption: Reference Manual
pgloader
command
batches
ref/transforms
.. toctree::
:hidden:
:caption: Manual for file formats
ref/csv
ref/fixed
ref/copy
ref/dbf
ref/ixf
ref/archive
.. toctree::
:maxdepth: 2
:hidden:
:caption: Manual for Database Servers
ref/mysql
ref/sqlite
ref/mssql
ref/pgsql
ref/pgsql-citus-target
ref/pgsql-redshift
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

221
docs/install.rst Normal file
View File

@ -0,0 +1,221 @@
Installing pgloader
===================
Several distributions are available for pgcopydb.
debian packages
---------------
You can install pgloader directly from `apt.postgresql.org`__ and from
official debian repositories, see `packages.debian.org/pgloader`__.
::
$ apt-get install pgloader
__ https://wiki.postgresql.org/wiki/Apt
__ https://packages.debian.org/search?keywords=pgloader
RPM packages
------------
The Postgres community repository for RPM packages is `yum.postgresql.org`__
and does include binary packages for pgloader.
__ https://yum.postgresql.org
Docker Images
-------------
Docker images are maintained for each tagged release at dockerhub, and also
built from the CI/CD integration on GitHub at each commit to the `main`
branch.
The DockerHub `dimitri/pgloader`__ repository is where the tagged releases
are made available. The image uses the Postgres version currently in debian
stable.
__ https://hub.docker.com/r/dimitri/pgloader
To use the ``dimitri/pgloader`` docker image::
$ docker run --rm -it dimitri/pgloader:latest pgloader --version
Or you can use the CI/CD integration that publishes packages from the main
branch to the GitHub docker repository::
$ docker pull ghcr.io/dimitri/pgloader:latest
$ docker run --rm -it ghcr.io/dimitri/pgloader:latest pgloader --version
$ docker run --rm -it ghcr.io/dimitri/pgloader:latest pgloader --help
Build from sources
------------------
pgloader is a Common Lisp program, tested using the `SBCL`__ (>= 1.2.5) and
`Clozure CL`__ implementations and with `Quicklisp`__ to fetch build
dependencies.
__ http://sbcl.org/
__ http://ccl.clozure.com/
__ http://www.quicklisp.org/beta/
When building from sources, you should always build from the current git
HEAD as it's basically the only source that is managed in a way to ensure it
builds against current set of dependencies versions.
The build system for pgloader uses a Makefile and the Quicklisp Common Lisp
packages distribution system.
The modern build system for pgloader is entirely written in Common Lisp,
where the historical name for our operation is `save-lisp-and-die` and can
be used that way:
::
$ make save
The legacy build system also uses Buildapp and can be used that way:
::
$ make pgloader
Building from sources on debian
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Install the build dependencies first, then use the Makefile::
$ apt-get install sbcl unzip libsqlite3-dev make curl gawk freetds-dev libzip-dev
$ cd /path/to/pgloader
$ make save
$ ./build/bin/pgloader --help
Building from sources on RedHat/CentOS
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To build and install pgloader the Steel Bank Common Lisp package (sbcl) from
EPEL, and the freetds packages are required.
It is recommended to build the RPM yourself, see below, to ensure that all
installed files are properly tracked and that you can safely update to newer
versions of pgloader as they're released.
To do an adhoc build and install run ``boostrap-centos.sh`` for CentOS 6 or
``bootstrap-centos7.sh`` for CentOS 7 to install the required dependencies.
Building a pgloader RPM from sources
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The spec file in the root of the pgloader repository can be used to build your
own RPM. For production deployments it is recommended that you build this RPM on
a dedicated build box and then copy the RPM to your production environment for
use; it is considered bad practice to have compilers and build tools present in
production environments.
1. Install the [EPEL repo](https://fedoraproject.org/wiki/EPEL#Quickstart).
2. Install rpmbuild dependencies::
sudo yum -y install yum-utils rpmdevtools @"Development Tools"
3. Install pgloader build dependencies::
sudo yum-builddep pgloader.spec
4. Download pgloader source::
spectool -g -R pgloader.spec
5. Build the source and binary RPMs (see `rpmbuild --help` for other build
options)::
rpmbuild -ba pgloader.spec
Building from sources on macOS
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We suppose you already have ``git`` and ``make`` available, if that's not
the case now is the time to install those tools. The SQLite lib that comes
in MacOSX is fine, no need for extra software here.
You will need to install either SBCL or CCL separately, and when using
[brew](http://brew.sh/) it's as simple as:
::
$ brew install sbcl
$ brew install clozure-cl
NOTE: Make sure you installed the universal binaries of Freetds, so that
they can be loaded correctly.
::
$ brew install freetds --universal --build-from-source
Then use the normal build system for pgloader:
::
$ make save
$ ./build/bin/pgloader --version
Building from sources on Windows
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Building pgloader on Windows is supported (in theory), thanks to Common Lisp
implementations being available on that platform, and to the Common Lisp
Standard for making it easy to write actually portable code.
It is recommended to have a look at the `issues labelled with Windows
support`__ if you run into trouble when building pgloader, because the
development team is lacking windows user and in practice we can't maintain
the support for that Operating System:
__ https://github.com/dimitri/pgloader/issues?utf8=✓&q=label%3A%22Windows%20support%22%20>
If you need ``pgloader.exe`` on windows please condider contributing fixes
for that environment and maybe longer term support then. Specifically, a CI
integration with a windows build host would allow ensuring that we continue
to support that target.
Building Docker image from sources
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can build a Docker image from source using SBCL by default::
$ docker build .
Or Clozure CL (CCL)::
$ docker build -f Dockerfile.ccl .
More options when building from source
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The ``Makefile`` target ``save`` knows how to produce a Self Contained
Binary file for pgloader, found at ``./build/bin/pgloader``::
$ make save
By default, the ``Makefile`` uses `SBCL`__ to compile your binary image,
though it's possible to build using `Clozure-CL`__.
__ http://sbcl.org/
__ http://ccl.clozure.com/
::
$ make CL=ccl64 save
It is possible to to tweak the default amount of memory that the pgloader
image will allow itself using when running through your data (don't ask for
more than your current RAM tho). At the moment only the legacy build system
includes support for this custom build::
$ make DYNSIZE=8192 pgloader
The ``make pgloader`` command when successful outputs a
`./build/bin/pgloader` file for you to use.

100
docs/intro.rst Normal file
View File

@ -0,0 +1,100 @@
Introduction
============
pgloader loads data from various sources into PostgreSQL. It can
transform the data it reads on the fly and submit raw SQL before and
after the loading. It uses the `COPY` PostgreSQL protocol to stream
the data into the server, and manages errors by filling a pair of
*reject.dat* and *reject.log* files.
pgloader knows how to read data from different kind of sources:
* Files
* CSV
* Fixed Format
* Postgres COPY text format
* DBF
* IXF
* Databases
* SQLite
* MySQL
* MS SQL Server
* PostgreSQL
* Redshift
pgloader knows how to target different products using the PostgreSQL Protocol:
* PostgreSQL
* `Citus <https://www.citusdata.com>`_
* Redshift
The level of automation provided by pgloader depends on the data source
type. In the case of CSV and Fixed Format files, a full description of the
expected input properties must be given to pgloader. In the case of a
database, pgloader connects to the live service and knows how to fetch the
metadata it needs directly from it.
Features Matrix
---------------
Here's a comparison of the features supported depending on the source
database engine. Some features that are not supported can be added to
pgloader, it's just that nobody had the need to do so yet. Those features
are marked with ✗. Empty cells are used when the feature doesn't make sense
for the selected source database.
========================== ======= ====== ====== =========== =========
Feature SQLite MySQL MS SQL PostgreSQL Redshift
========================== ======= ====== ====== =========== =========
One-command migration ✓ ✓ ✓ ✓ ✓
Continuous Migration ✓ ✓ ✓ ✓ ✓
Schema discovery ✓ ✓ ✓ ✓ ✓
Partial Migrations ✓ ✓ ✓ ✓ ✓
Schema only ✓ ✓ ✓ ✓ ✓
Data only ✓ ✓ ✓ ✓ ✓
Repeatable (DROP+CREATE) ✓ ✓ ✓ ✓ ✓
User defined casting rules ✓ ✓ ✓ ✓ ✓
Encoding Overrides ✓
On error stop ✓ ✓ ✓ ✓ ✓
On error resume next ✓ ✓ ✓ ✓ ✓
Pre/Post SQL commands ✓ ✓ ✓ ✓ ✓
Post-Schema SQL commands ✗ ✓ ✓ ✓ ✓
Primary key support ✓ ✓ ✓ ✓ ✓
Foreign key support ✓ ✓ ✓ ✓
Online ALTER schema ✓ ✓ ✓ ✓ ✓
Materialized views ✗ ✓ ✓ ✓ ✓
Distribute to Citus ✗ ✓ ✓ ✓ ✓
========================== ======= ====== ====== =========== =========
For more details about what the features are about, see the specific
reference pages for your database source.
For some of the features, missing support only means that the feature is not
needed for the other sources, such as the capability to override MySQL
encoding metadata about a table or a column. Only MySQL in this list is left
completely unable to guarantee text encoding. Or Redshift not having foreign
keys.
Commands
--------
pgloader implements its own *Command Language*, a DSL that allows to specify
every aspect of the data load and migration to implement. Some of the
features provided in the language are only available for a specific source
type.
Command Line
------------
The pgloader command line accepts those two variants::
pgloader [<options>] [<command-file>]...
pgloader [<options>] SOURCE TARGET
Either you have a *command-file* containing migration specifications in the
pgloader *Command Language*, or you can give a *Source* for the data and a
PostgreSQL database connection *Target* where to load the data into.

235
docs/pgloader.rst Normal file
View File

@ -0,0 +1,235 @@
Command Line
============
pgloader loads data from various sources into PostgreSQL. It can
transform the data it reads on the fly and submit raw SQL before and
after the loading. It uses the `COPY` PostgreSQL protocol to stream
the data into the server, and manages errors by filling a pair of
*reject.dat* and *reject.log* files.
pgloader operates either using commands which are read from files::
pgloader commands.load
or by using arguments and options all provided on the command line::
pgloader SOURCE TARGET
Arguments
---------
The pgloader arguments can be as many load files as needed, or a couple of
connection strings to a specific input file.
Source Connection String
^^^^^^^^^^^^^^^^^^^^^^^^
The source connection string format is as follows::
format:///absolute/path/to/file.ext
format://./relative/path/to/file.ext
Where format might be one of `csv`, `fixed`, `copy`, `dbf`, `db3` or `ixf`.::
db://user:pass@host:port/dbname
Where db might be of `sqlite`, `mysql` or `mssql`.
When using a file based source format, pgloader also support natively
fetching the file from an http location and decompressing an archive if
needed. In that case it's necessary to use the `--type` option to specify
the expected format of the file. See the examples below.
Also note that some file formats require describing some implementation
details such as columns to be read and delimiters and quoting when loading
from csv.
For more complex loading scenarios, you will need to write a full fledge
load command in the syntax described later in this document.
Target Connection String
^^^^^^^^^^^^^^^^^^^^^^^^
The target connection string format is described in details later in this
document, see Section Connection String.
Options
-------
Inquiry Options
^^^^^^^^^^^^^^^
Use these options when you want to know more about how to use pgloader, as
those options will cause pgloader not to load any data.
--help
Show command usage summary and exit.
--version
Show pgloader version string and exit.
--with-encodings
List known encodings in this version of pgloader.
--upgrade-config
Parse given files in the command line as ``pgloader.conf`` files with
the INI syntax that was in use in pgloader versions 2.x, and output the
new command syntax for pgloader on standard output.
General Options
^^^^^^^^^^^^^^^
Those options are meant to tweak pgloader behavior when loading data.
--verbose
Be verbose.
--quiet
Be quiet.
--debug
Show debug level information messages.
--root-dir
Set the root working directory (defaults to ``/tmp/pgloader``).
--logfile
Set the pgloader log file (defaults to ``/tmp/pgloader/pgloader.log``).
--log-min-messages
Minimum level of verbosity needed for log message to make it to the
logfile. One of critical, log, error, warning, notice, info or debug.
--client-min-messages
Minimum level of verbosity needed for log message to make it to the
console. One of critical, log, error, warning, notice, info or debug.
--summary
A filename where to copy the summary output. When relative, the filename
is expanded into ``*root-dir*``.
The format of the filename defaults to being *human readable*. It is
possible to have the output in machine friendly formats such as *CSV*,
*COPY* (PostgreSQL's own COPY format) or *JSON* by specifying a filename
with the extension resp. ``.csv``, ``.copy`` or ``.json``.
--load-lisp-file <file>
Specify a lisp <file> to compile and load into the pgloader image before
reading the commands, allowing to define extra transformation function.
Those functions should be defined in the ``pgloader.transforms``
package. This option can appear more than once in the command line.
--dry-run
Allow testing a ``.load`` file without actually trying to load any data.
It's useful to debug it until it's ok, in particular to fix connection
strings.
--on-error-stop
Alter pgloader behavior: rather than trying to be smart about error
handling and continue loading good data, separating away the bad one,
just stop as soon as PostgreSQL refuses anything sent to it. Useful to
debug data processing, transformation function and specific type
casting.
--self-upgrade <directory>
Specify a <directory> where to find pgloader sources so that one of the
very first things it does is dynamically loading-in (and compiling to
machine code) another version of itself, usually a newer one like a very
recent git checkout.
--no-ssl-cert-verification
Uses the OpenSSL option to accept a locally issued server-side
certificate, avoiding the following error message::
SSL verify error: 20 X509_V_ERR_UNABLE_TO_GET_ISSUER_CERT_LOCALLY
The right way to fix the SSL issue is to use a trusted certificate, of
course. Sometimes though it's useful to make progress with the pgloader
setup while the certificate chain of trust is being fixed, maybe by
another team. That's when this option is useful.
Command Line Only Operations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Those options are meant to be used when using pgloader from the command line
only, rather than using a command file and the rich command clauses and
parser. In simple cases, it can be much easier to use the *SOURCE* and
*TARGET* directly on the command line, then tweak the loading with those
options:
--with <option>
Allows setting options from the command line. You can use that option as
many times as you want. The option arguments must follow the *WITH*
clause for the source type of the ``SOURCE`` specification, as described
later in this document.
--set
Allows setting PostgreSQL configuration from the command line. Note that
the option parsing is the same as when used from the *SET* command
clause, in particular you must enclose the guc value with single-quotes.
Use ``--set "guc_name='value'"``.
--field
Allows setting a source field definition. Fields are accumulated in the
order given on the command line. It's possible to either use a
``--field`` option per field in the source file, or to separate field
definitions by a comma, as you would do in the *HAVING FIELDS* clause.
--cast <rule>
Allows setting a specific casting rule for loading the data.
--type <csv|fixed|db3|ixf|sqlite|mysql|mssql>
Allows forcing the source type, in case when the *SOURCE* parsing isn't
satisfying.
--encoding <encoding>
Set the encoding of the source file to load data from.
--before <filename>
Parse given filename for SQL queries and run them against the target
database before loading the data from the source. The queries are parsed
by pgloader itself: they need to be terminated by a semi-colon (;) and
the file may include `\i` or `\ir` commands to *include* another file.
--after <filename>
Parse given filename for SQL queries and run them against the target
database after having loaded the data from the source. The queries are
parsed in the same way as with the `--before` option, see above.
More Debug Information
^^^^^^^^^^^^^^^^^^^^^^
To get the maximum amount of debug information, you can use both the
`--verbose` and the `--debug` switches at the same time, which is equivalent
to saying `--client-min-messages data`. Then the log messages will show the
data being processed, in the cases where the code has explicit support for
it.

View File

@ -1,10 +1,12 @@
# pgloader: a quickstart
Pgloader Quick Start
====================
In simple cases, pgloader is very easy to use.
## CSV
CSV
---
Load data from a CSV file into a pre-existing table in your database:
Load data from a CSV file into a pre-existing table in your database::
pgloader --type csv \
--field id --field field \
@ -23,10 +25,11 @@ For documentation about the available syntaxes for the `--field` and
Note also that the PostgreSQL URI includes the target *tablename*.
## Reading from STDIN
Reading from STDIN
------------------
File based pgloader sources can be loaded from the standard input, as in the
following example:
following example::
pgloader --type csv \
--field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong" \
@ -38,14 +41,15 @@ following example:
The dash (`-`) character as a source is used to mean *standard input*, as
usual in Unix command lines. It's possible to stream compressed content to
pgloader with this technique, using the Unix pipe:
pgloader with this technique, using the Unix pipe::
gunzip -c source.gz | pgloader --type csv ... - pgsql:///target?foo
## Loading from CSV available through HTTP
Loading from CSV available through HTTP
---------------------------------------
The same command as just above can also be run if the CSV file happens to be
found on a remote HTTP location:
found on a remote HTTP location::
pgloader --type csv \
--field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong" \
@ -62,7 +66,7 @@ notice). Also, in that case, we specify all the fields right into a single
Again, the PostgreSQL target connection string must contain the *tablename*
option and you have to ensure that the target table exists and may fit the
data. Here's the SQL command used in that example in case you want to try it
yourself:
yourself::
create table districts_longlat
(
@ -79,7 +83,8 @@ yourself:
Also notice that the same command will work against an archived version of
the same data.
## Streaming CSV data from an HTTP compressed file
Streaming CSV data from an HTTP compressed file
-----------------------------------------------
Finally, it's important to note that pgloader first fetches the content from
the HTTP URL it to a local file, then expand the archive when it's
@ -89,7 +94,7 @@ In some cases, either because pgloader has no direct support for your
archive format or maybe because expanding the archive is not feasible in
your environment, you might want to *stream* the content straight from its
remote location into PostgreSQL. Here's how to do that, using the old battle
tested Unix Pipes trick:
tested Unix Pipes trick::
curl http://pgsql.tapoueh.org/temp/2013_Gaz_113CDs_national.txt.gz \
| gunzip -c \
@ -104,28 +109,31 @@ Now the OS will take care of the streaming and buffering between the network
and the commands and pgloader will take care of streaming the data down to
PostgreSQL.
## Migrating from SQLite
Migrating from SQLite
---------------------
The following command will open the SQLite database, discover its tables
definitions including indexes and foreign keys, migrate those definitions
while *casting* the data type specifications to their PostgreSQL equivalent
and then migrate the data over:
and then migrate the data over::
createdb newdb
pgloader ./test/sqlite/sqlite.db postgresql:///newdb
## Migrating from MySQL
Migrating from MySQL
--------------------
Just create a database where to host the MySQL data and definitions and have
pgloader do the migration for you in a single command line:
pgloader do the migration for you in a single command line::
createdb pagila
pgloader mysql://user@localhost/sakila postgresql:///pagila
## Fetching an archived DBF file from a HTTP remote location
Fetching an archived DBF file from a HTTP remote location
---------------------------------------------------------
It's possible for pgloader to download a file from HTTP, unarchive it, and
only then open it to discover the schema then load the data:
only then open it to discover the schema then load the data::
createdb foo
pgloader --type dbf http://www.insee.fr/fr/methodes/nomenclatures/cog/telechargement/2013/dbf/historiq2013.zip postgresql:///foo

120
docs/ref/archive.rst Normal file
View File

@ -0,0 +1,120 @@
Archive (http, zip)
===================
This command instructs pgloader to load data from one or more files contained
in an archive. Currently the only supported archive format is *ZIP*, and the
archive might be downloaded from an *HTTP* URL.
Using advanced options and a load command file
----------------------------------------------
The command then would be:
::
$ pgloader archive.load
And the contents of the ``archive.load`` file could be inspired from the
following:
::
LOAD ARCHIVE
FROM /Users/dim/Downloads/GeoLiteCity-latest.zip
INTO postgresql:///ip4r
BEFORE LOAD
DO $$ create extension if not exists ip4r; $$,
$$ create schema if not exists geolite; $$,
EXECUTE 'geolite.sql'
LOAD CSV
FROM FILENAME MATCHING ~/GeoLiteCity-Location.csv/
WITH ENCODING iso-8859-1
(
locId,
country,
region null if blanks,
city null if blanks,
postalCode null if blanks,
latitude,
longitude,
metroCode null if blanks,
areaCode null if blanks
)
INTO postgresql:///ip4r?geolite.location
(
locid,country,region,city,postalCode,
location point using (format nil "(~a,~a)" longitude latitude),
metroCode,areaCode
)
WITH skip header = 2,
fields optionally enclosed by '"',
fields escaped by double-quote,
fields terminated by ','
AND LOAD CSV
FROM FILENAME MATCHING ~/GeoLiteCity-Blocks.csv/
WITH ENCODING iso-8859-1
(
startIpNum, endIpNum, locId
)
INTO postgresql:///ip4r?geolite.blocks
(
iprange ip4r using (ip-range startIpNum endIpNum),
locId
)
WITH skip header = 2,
fields optionally enclosed by '"',
fields escaped by double-quote,
fields terminated by ','
FINALLY DO
$$ create index blocks_ip4r_idx on geolite.blocks using gist(iprange); $$;
Common Clauses
--------------
Please refer to :ref:`common_clauses` for documentation about common
clauses.
Archive Source Specification: FROM
----------------------------------
Filename or HTTP URI where to load the data from. When given an HTTP URL the
linked file will get downloaded locally before processing.
If the file is a `zip` file, the command line utility `unzip` is used to
expand the archive into files in `$TMPDIR`, or `/tmp` if `$TMPDIR` is unset
or set to a non-existing directory.
Then the following commands are used from the top level directory where the
archive has been expanded.
Archive Sub Commands
--------------------
- command [ *AND* command ... ]
A series of commands against the contents of the archive, at the moment
only `CSV`,`'FIXED` and `DBF` commands are supported.
Note that commands are supporting the clause *FROM FILENAME MATCHING*
which allows the pgloader command not to depend on the exact names of
the archive directories.
The same clause can also be applied to several files with using the
spelling *FROM ALL FILENAMES MATCHING* and a regular expression.
The whole *matching* clause must follow the following rule::
FROM [ ALL FILENAMES | [ FIRST ] FILENAME ] MATCHING
Archive Final SQL Commands
--------------------------
- *FINALLY DO*
SQL Queries to run once the data is loaded, such as `CREATE INDEX`.

133
docs/ref/copy.rst Normal file
View File

@ -0,0 +1,133 @@
COPY
====
This commands instructs pgloader to load from a file containing COPY TEXT
data as described in the PostgreSQL documentation.
Using advanced options and a load command file
----------------------------------------------
The command then would be:
::
$ pgloader copy.load
And the contents of the ``copy.load`` file could be inspired from the following:
::
LOAD COPY
FROM copy://./data/track.copy
(
trackid, track, album, media, genre, composer,
milliseconds, bytes, unitprice
)
INTO postgresql:///pgloader
TARGET TABLE track_full
WITH truncate
SET work_mem to '14MB',
standard_conforming_strings to 'on'
BEFORE LOAD DO
$$ drop table if exists track_full; $$,
$$ create table track_full (
trackid bigserial,
track text,
album text,
media text,
genre text,
composer text,
milliseconds bigint,
bytes bigint,
unitprice numeric
);
$$;
Common Clauses
--------------
Please refer to :ref:`common_clauses` for documentation about common
clauses.
COPY Formatted Files Source Specification: FROM
-----------------------------------------------
Filename where to load the data from. This support local files, HTTP URLs
and zip files containing a single dbf file of the same name. Fetch such a
zip file from an HTTP address is of course supported.
- *inline*
The data is found after the end of the parsed commands. Any number of
empty lines between the end of the commands and the beginning of the
data is accepted.
- *stdin*
Reads the data from the standard input stream.
- *FILENAMES MATCHING*
The whole *matching* clause must follow the following rule::
[ ALL FILENAMES | [ FIRST ] FILENAME ]
MATCHING regexp
[ IN DIRECTORY '...' ]
The *matching* clause applies given *regular expression* (see above for
exact syntax, several options can be used here) to filenames. It's then
possible to load data from only the first match of all of them.
The optional *IN DIRECTORY* clause allows specifying which directory to
walk for finding the data files, and can be either relative to where the
command file is read from, or absolute. The given directory must exists.
COPY Formatted File Options: WITH
---------------------------------
When loading from a `COPY` file, the following options are supported:
- *delimiter*
Takes a single character as argument, which must be found inside single
quotes, and might be given as the printable character itself, the
special value \t to denote a tabulation character, or `0x` then an
hexadecimal value read as the ASCII code for the character.
This character is used as the *delimiter* when reading the data, in a
similar way to the PostgreSQL `COPY` option.
- *null*
Takes a quoted string as an argument (quotes can be either double quotes
or single quotes) and uses that string as the `NULL` representation in
the data.
This is similar to the *null* `COPY` option in PostgreSQL.
- *truncate*
When this option is listed, pgloader issues a `TRUNCATE` command against
the PostgreSQL target table before reading the data file.
- *disable triggers*
When this option is listed, pgloader issues an `ALTER TABLE ... DISABLE
TRIGGER ALL` command against the PostgreSQL target table before copying
the data, then the command `ALTER TABLE ... ENABLE TRIGGER ALL` once the
`COPY` is done.
This option allows loading data into a pre-existing table ignoring the
*foreign key constraints* and user defined triggers and may result in
invalid *foreign key constraints* once the data is loaded. Use with
care.
- *skip header*
Takes a numeric value as argument. Instruct pgloader to skip that many
lines at the beginning of the input file.

262
docs/ref/csv.rst Normal file
View File

@ -0,0 +1,262 @@
CSV
===
This command instructs pgloader to load data from a `CSV` file. Because of
the complexity of guessing the parameters of a CSV file, it's simpler to
instruct pgloader with how to parse the data in there, using the full
pgloader command syntax and CSV specifications as in the following example.
Using advanced options and a load command file
----------------------------------------------
The command then would be:
::
$ pgloader csv.load
And the contents of the ``csv.load`` file could be inspired from the following:
::
LOAD CSV
FROM 'GeoLiteCity-Blocks.csv' WITH ENCODING iso-646-us
HAVING FIELDS
(
startIpNum, endIpNum, locId
)
INTO postgresql://user@localhost:54393/dbname
TARGET TABLE geolite.blocks
TARGET COLUMNS
(
iprange ip4r using (ip-range startIpNum endIpNum),
locId
)
WITH truncate,
skip header = 2,
fields optionally enclosed by '"',
fields escaped by backslash-quote,
fields terminated by '\t'
SET work_mem to '32 MB', maintenance_work_mem to '64 MB';
Common Clauses
--------------
Please refer to :ref:`common_clauses` for documentation about common
clauses.
CSV Source Specification: FROM
------------------------------
Filename where to load the data from. Accepts an *ENCODING* option. Use the
`--list-encodings` option to know which encoding names are supported.
The filename may be enclosed by single quotes, and could be one of the
following special values:
- *inline*
The data is found after the end of the parsed commands. Any number
of empty lines between the end of the commands and the beginning of
the data is accepted.
- *stdin*
Reads the data from the standard input stream.
- *FILENAME MATCHING*
The whole *matching* clause must follow the following rule::
[ ALL FILENAMES | [ FIRST ] FILENAME ]
MATCHING regexp
[ IN DIRECTORY '...' ]
The *matching* clause applies given *regular expression* (see above
for exact syntax, several options can be used here) to filenames.
It's then possible to load data from only the first match of all of
them.
The optional *IN DIRECTORY* clause allows specifying which directory
to walk for finding the data files, and can be either relative to
where the command file is read from, or absolute. The given
directory must exists.
Fields Specifications
---------------------
The *FROM* option also supports an optional comma separated list of *field*
names describing what is expected in the `CSV` data file, optionally
introduced by the clause `HAVING FIELDS`.
Each field name can be either only one name or a name following with
specific reader options for that field, enclosed in square brackets and
comma-separated. Supported per-field reader options are:
- *terminated by*
See the description of *field terminated by* below.
The processing of this option is not currently implemented.
- *date format*
When the field is expected of the date type, then this option allows
to specify the date format used in the file.
Date format string are template strings modeled against the
PostgreSQL `to_char` template strings support, limited to the
following patterns:
- YYYY, YYY, YY for the year part
- MM for the numeric month part
- DD for the numeric day part
- HH, HH12, HH24 for the hour part
- am, AM, a.m., A.M.
- pm, PM, p.m., P.M.
- MI for the minutes part
- SS for the seconds part
- MS for the milliseconds part (4 digits)
- US for the microseconds part (6 digits)
- unparsed punctuation signs: - . * # @ T / \ and space
Here's an example of a *date format* specification::
column-name [date format 'YYYY-MM-DD HH24-MI-SS.US']
- *null if*
This option takes an argument which is either the keyword *blanks*
or a double-quoted string.
When *blanks* is used and the field value that is read contains
only space characters, then it's automatically converted to an SQL
`NULL` value.
When a double-quoted string is used and that string is read as the
field value, then the field value is automatically converted to an
SQL `NULL` value.
- *trim both whitespace*, *trim left whitespace*, *trim right whitespace*
This option allows to trim whitespaces in the read data, either from
both sides of the data, or only the whitespace characters found on
the left of the streaing, or only those on the right of the string.
CSV Loading Options: WITH
-------------------------
When loading from a `CSV` file, the following options are supported:
- *truncate*
When this option is listed, pgloader issues a `TRUNCATE` command
against the PostgreSQL target table before reading the data file.
- *drop indexes*
When this option is listed, pgloader issues `DROP INDEX` commands
against all the indexes defined on the target table before copying
the data, then `CREATE INDEX` commands once the `COPY` is done.
In order to get the best performance possible, all the indexes are
created in parallel and when done the primary keys are built again
from the unique indexes just created. This two step process allows
creating the primary key index in parallel with the other indexes,
as only the `ALTER TABLE` command needs an *access exclusive lock*
on the target table.
- *disable triggers*
When this option is listed, pgloader issues an `ALTER TABLE ...
DISABLE TRIGGER ALL` command against the PostgreSQL target table
before copying the data, then the command `ALTER TABLE ... ENABLE
TRIGGER ALL` once the `COPY` is done.
This option allows loading data into a pre-existing table ignoring
the *foreign key constraints* and user defined triggers and may
result in invalid *foreign key constraints* once the data is loaded.
Use with care.
- *skip header*
Takes a numeric value as argument. Instruct pgloader to skip that
many lines at the beginning of the input file.
- *csv header*
Use the first line read after *skip header* as the list of csv field
names to be found in the CSV file, using the same CSV parameters as
for the CSV data.
- *trim unquoted blanks*
When reading unquoted values in the `CSV` file, remove the blanks
found in between the separator and the value. That behaviour is the
default.
- *keep unquoted blanks*
When reading unquoted values in the `CSV` file, keep blanks found in
between the separator and the value.
- *fields optionally enclosed by*
Takes a single character as argument, which must be found inside single
quotes, and might be given as the printable character itself, the
special value \t to denote a tabulation character, the special value \'
to denote a single-quote, or `0x` then an hexadecimal value read as the
ASCII code for the character.
The following options specify the same enclosing character, a single quote::
fields optionally enclosed by '\''
fields optionally enclosed by '0x27'
This character is used as the quoting character in the `CSV` file,
and defaults to double-quote.
- *fields not enclosed*
By default, pgloader will use the double-quote character as the
enclosing character. If you have a CSV file where fields are not
enclosed and are using double-quote as an expected ordinary
character, then use the option *fields not enclosed* for the CSV
parser to accept those values.
- *fields escaped by*
Takes either the special value *backslash-quote* or *double-quote*,
or any value supported by the *fields terminated by* option (see
below). This value is used to recognize escaped field separators
when they are to be found within the data fields themselves.
Defaults to *double-quote*.
- *csv escape mode*
Takes either the special value *quote* (the default) or *following*
and allows the CSV parser to parse either only escaped field
separator or any character (including CSV data) when using the
*following* value.
- *fields terminated by*
Takes a single character as argument, which must be found inside
single quotes, and might be given as the printable character itself,
the special value \t to denote a tabulation character, or `0x` then
an hexadecimal value read as the ASCII code for the character.
This character is used as the *field separator* when reading the
`CSV` data.
- *lines terminated by*
Takes a single character as argument, which must be found inside
single quotes, and might be given as the printable character itself,
the special value \t to denote a tabulation character, or `0x` then
an hexadecimal value read as the ASCII code for the character.
This character is used to recognize *end-of-line* condition when
reading the `CSV` data.

88
docs/ref/dbf.rst Normal file
View File

@ -0,0 +1,88 @@
DBF
===
This command instructs pgloader to load data from a `DBF` file. A default
set of casting rules are provided and might be overloaded and appended to by
the command.
Using advanced options and a load command file
----------------------------------------------
Here's an example with a remote HTTP source and some user defined casting
rules. The command then would be:
::
$ pgloader dbf.load
And the contents of the ``dbf.load`` file could be inspired from the following:
::
LOAD DBF
FROM http://www.insee.fr/fr/methodes/nomenclatures/cog/telechargement/2013/dbf/reg2013.dbf
INTO postgresql://user@localhost/dbname
WITH truncate, create table
CAST column reg2013.region to integer,
column reg2013.tncc to smallint;
Common Clauses
--------------
Please refer to :ref:`common_clauses` for documentation about common
clauses.
DBF Source Specification: FROM
------------------------------
Filename where to load the data from. This support local files, HTTP URLs
and zip files containing a single dbf file of the same name. Fetch such a
zip file from an HTTP address is of course supported.
DBF Loading Options: WITH
-------------------------
When loading from a `DBF` file, the following options are supported:
- *truncate*
When this option is listed, pgloader issues a `TRUNCATE` command against
the PostgreSQL target table before reading the data file.
- *disable triggers*
When this option is listed, pgloader issues an `ALTER TABLE ... DISABLE
TRIGGER ALL` command against the PostgreSQL target table before copying
the data, then the command `ALTER TABLE ... ENABLE TRIGGER ALL` once the
`COPY` is done.
This option allows loading data into a pre-existing table ignoring the
*foreign key constraints* and user defined triggers and may result in
invalid *foreign key constraints* once the data is loaded. Use with
care.
- *create table*
When this option is listed, pgloader creates the table using the meta
data found in the `DBF` file, which must contain a list of fields with
their data type. A standard data type conversion from DBF to PostgreSQL
is done.
- *table name*
This options expects as its value the possibly qualified name of the
table to create.
Default DB3 Casting Rules
-------------------------
When migrating from DB3 the following Casting Rules are provided::
type C to text using db3-trim-string
type M to text using db3-trim-string
type N to numeric using db3-numeric-to-pgsql-integer
type I to numeric using db3-numeric-to-pgsql-numeric
type L to boolean using logical-to-boolean
type D to date using db3-date-to-pgsql-date

204
docs/ref/fixed.rst Normal file
View File

@ -0,0 +1,204 @@
Fixed Columns
=============
This command instructs pgloader to load data from a text file containing
columns arranged in a *fixed size* manner.
Using advanced options and a load command file
----------------------------------------------
The command then would be:
::
$ pgloader fixed.load
And the contents of the ``fixed.load`` file could be inspired from the following:
::
LOAD FIXED
FROM inline
(
a from 0 for 10,
b from 10 for 8,
c from 18 for 8,
d from 26 for 17 [null if blanks, trim right whitespace]
)
INTO postgresql:///pgloader
TARGET TABLE fixed
(
a, b,
c time using (time-with-no-separator c),
d
)
WITH truncate
SET work_mem to '14MB',
standard_conforming_strings to 'on'
BEFORE LOAD DO
$$ drop table if exists fixed; $$,
$$ create table fixed (
a integer,
b date,
c time,
d text
);
$$;
01234567892008052011431250firstline
01234562008052115182300left blank-padded
12345678902008052208231560another line
2345609872014092914371500
2345678902014092914371520
Note that the example comes from the test suite of pgloader, where we use
the advanced feature ``FROM inline`` that allows embedding the source data
within the command file. In most cases a more classic FROM clause loading
the data from a separate file would be used.
Common Clauses
--------------
Please refer to :ref:`common_clauses` for documentation about common
clauses.
Fixed File Format Source Specification: FROM
--------------------------------------------
Filename where to load the data from. Accepts an *ENCODING* option. Use the
`--list-encodings` option to know which encoding names are supported.
The filename may be enclosed by single quotes, and could be one of the
following special values:
- *inline*
The data is found after the end of the parsed commands. Any number
of empty lines between the end of the commands and the beginning of
the data is accepted.
- *stdin*
Reads the data from the standard input stream.
- *FILENAMES MATCHING*
The whole *matching* clause must follow the following rule::
[ ALL FILENAMES | [ FIRST ] FILENAME ]
MATCHING regexp
[ IN DIRECTORY '...' ]
The *matching* clause applies given *regular expression* (see above
for exact syntax, several options can be used here) to filenames.
It's then possible to load data from only the first match of all of
them.
The optional *IN DIRECTORY* clause allows specifying which directory
to walk for finding the data files, and can be either relative to
where the command file is read from, or absolute. The given
directory must exists.
Fields Specifications
---------------------
The *FROM* option also supports an optional comma separated list of *field*
names describing what is expected in the `FIXED` data file.
Each field name is composed of the field name followed with specific reader
options for that field. Supported per-field reader options are the
following, where only *start* and *length* are required.
- *start*
Position in the line where to start reading that field's value. Can
be entered with decimal digits or `0x` then hexadecimal digits.
- *length*
How many bytes to read from the *start* position to read that
field's value. Same format as *start*.
Those optional parameters must be enclosed in square brackets and
comma-separated:
- *terminated by*
See the description of *field terminated by* below.
The processing of this option is not currently implemented.
- *date format*
When the field is expected of the date type, then this option allows
to specify the date format used in the file.
Date format string are template strings modeled against the
PostgreSQL `to_char` template strings support, limited to the
following patterns:
- YYYY, YYY, YY for the year part
- MM for the numeric month part
- DD for the numeric day part
- HH, HH12, HH24 for the hour part
- am, AM, a.m., A.M.
- pm, PM, p.m., P.M.
- MI for the minutes part
- SS for the seconds part
- MS for the milliseconds part (4 digits)
- US for the microseconds part (6 digits)
- unparsed punctuation signs: - . * # @ T / \ and space
Here's an example of a *date format* specification::
column-name [date format 'YYYY-MM-DD HH24-MI-SS.US']
- *null if*
This option takes an argument which is either the keyword *blanks*
or a double-quoted string.
When *blanks* is used and the field value that is read contains only
space characters, then it's automatically converted to an SQL `NULL`
value.
When a double-quoted string is used and that string is read as the
field value, then the field value is automatically converted to an
SQL `NULL` value.
- *trim both whitespace*, *trim left whitespace*, *trim right whitespace*
This option allows to trim whitespaces in the read data, either from
both sides of the data, or only the whitespace characters found on
the left of the streaing, or only those on the right of the string.
Fixed File Format Loading Options: WITH
---------------------------------------
When loading from a `FIXED` file, the following options are supported:
- *truncate*
When this option is listed, pgloader issues a `TRUNCATE` command
against the PostgreSQL target table before reading the data file.
- *disable triggers*
When this option is listed, pgloader issues an `ALTER TABLE ...
DISABLE TRIGGER ALL` command against the PostgreSQL target table
before copying the data, then the command `ALTER TABLE ... ENABLE
TRIGGER ALL` once the `COPY` is done.
This option allows loading data into a pre-existing table ignoring
the *foreign key constraints* and user defined triggers and may
result in invalid *foreign key constraints* once the data is loaded.
Use with care.
- *skip header*
Takes a numeric value as argument. Instruct pgloader to skip that
many lines at the beginning of the input file.

83
docs/ref/ixf.rst Normal file
View File

@ -0,0 +1,83 @@
IXF
===
This command instructs pgloader to load data from an IBM `IXF` file.
Using advanced options and a load command file
----------------------------------------------
The command then would be:
::
$ pgloader ixf.load
And the contents of the ``ixf.load`` file could be inspired from the following:
::
LOAD IXF
FROM data/nsitra.test1.ixf
INTO postgresql:///pgloader
TARGET TABLE nsitra.test1
WITH truncate, create table, timezone UTC
BEFORE LOAD DO
$$ create schema if not exists nsitra; $$,
$$ drop table if exists nsitra.test1; $$;
Common Clauses
--------------
Please refer to :ref:`common_clauses` for documentation about common
clauses.
IXF Source Specification: FROM
------------------------------
Filename where to load the data from. This support local files, HTTP URLs
and zip files containing a single ixf file of the same name. Fetch such a
zip file from an HTTP address is of course supported.
IXF Loading Options: WITH
-------------------------
When loading from a `IXF` file, the following options are supported:
- *truncate*
When this option is listed, pgloader issues a `TRUNCATE` command against
the PostgreSQL target table before reading the data file.
- *disable triggers*
When this option is listed, pgloader issues an `ALTER TABLE ... DISABLE
TRIGGER ALL` command against the PostgreSQL target table before copying
the data, then the command `ALTER TABLE ... ENABLE TRIGGER ALL` once the
`COPY` is done.
This option allows loading data into a pre-existing table ignoring the
*foreign key constraints* and user defined triggers and may result in
invalid *foreign key constraints* once the data is loaded. Use with
care.
- *create table*
When this option is listed, pgloader creates the table using the meta
data found in the `DBF` file, which must contain a list of fields with
their data type. A standard data type conversion from DBF to PostgreSQL
is done.
- *table name*
This options expects as its value the possibly qualified name of the
table to create.
- *timezone*
This options allows to specify which timezone is used when parsing
timestamps from an IXF file, and defaults to *UTC*. Expected values are
either `UTC`, `GMT` or a single quoted location name such as
`'Universal'` or `'Europe/Paris'`.

242
docs/ref/mssql.rst Normal file
View File

@ -0,0 +1,242 @@
MS SQL to Postgres
==================
This command instructs pgloader to load data from a MS SQL database.
Automatic discovery of the schema is supported, including build of the
indexes, primary and foreign keys constraints.
Using default settings
----------------------
Here is the simplest command line example, which might be all you need:
::
$ pgloader mssql://user@mshost/dbname pgsql://pguser@pghost/dbname
Using advanced options and a load command file
----------------------------------------------
The command then would be:
::
$ pgloader ms.load
And the contents of the command file ``ms.load`` could be inspired from the
following:
::
load database
from mssql://user@host/dbname
into postgresql:///dbname
including only table names like 'GlobalAccount' in schema 'dbo'
set work_mem to '16MB', maintenance_work_mem to '512 MB'
before load do $$ drop schema if exists dbo cascade; $$;
Common Clauses
--------------
Please refer to :ref:`common_clauses` for documentation about common
clauses.
MS SQL Database Source Specification: FROM
------------------------------------------
Connection string to an existing MS SQL database server that listens and
welcome external TCP/IP connection. As pgloader currently piggybacks on the
FreeTDS driver, to change the port of the server please export the `TDSPORT`
environment variable.
MS SQL Database Migration Options: WITH
---------------------------------------
When loading from a `MS SQL` database, the same options as when loading a
`MYSQL` database are supported. Please refer to the MYSQL section. The
following options are added:
- *create schemas*
When this option is listed, pgloader creates the same schemas as found
on the MS SQL instance. This is the default.
- *create no schemas*
When this option is listed, pgloader refrains from creating any schemas
at all, you must then ensure that the target schema do exist.
MS SQL Database Casting Rules
-----------------------------
CAST
^^^^
The cast clause allows to specify custom casting rules, either to overload
the default casting rules or to amend them with special cases.
Please refer to the MS SQL CAST clause for details.
MS SQL Views Support
--------------------
MS SQL views support allows pgloader to migrate view as if they were base
tables. This feature then allows for on-the-fly transformation from MS SQL
to PostgreSQL, as the view definition is used rather than the base data.
MATERIALIZE VIEWS
^^^^^^^^^^^^^^^^^
This clause allows you to implement custom data processing at the data
source by providing a *view definition* against which pgloader will query
the data. It's not possible to just allow for plain `SQL` because we want to
know a lot about the exact data types of each column involved in the query
output.
This clause expect a comma separated list of view definitions, each one
being either the name of an existing view in your database or the following
expression::
*name* `AS` `$$` *sql query* `$$`
The *name* and the *sql query* will be used in a `CREATE VIEW` statement at
the beginning of the data loading, and the resulting view will then be
dropped at the end of the data loading.
MATERIALIZE ALL VIEWS
^^^^^^^^^^^^^^^^^^^^^
Same behaviour as *MATERIALIZE VIEWS* using the dynamic list of views as
returned by MS SQL rather than asking the user to specify the list.
MS SQL Partial Migration
------------------------
INCLUDING ONLY TABLE NAMES LIKE
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Introduce a comma separated list of table name patterns used to limit the
tables to migrate to a sublist. More than one such clause may be used, they
will be accumulated together.
Example::
including only table names like 'GlobalAccount' in schema 'dbo'
EXCLUDING TABLE NAMES LIKE
^^^^^^^^^^^^^^^^^^^^^^^^^^
Introduce a comma separated list of table name patterns used to exclude
table names from the migration. This filter only applies to the result of
the *INCLUDING* filter.
::
excluding table names matching 'LocalAccount' in schema 'dbo'
MS SQL Schema Transformations
-----------------------------
ALTER SCHEMA '...' RENAME TO '...'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Allows to rename a schema on the flight, so that for instance the tables
found in the schema 'dbo' in your source database will get migrated into the
schema 'public' in the target database with this command::
alter schema 'dbo' rename to 'public'
ALTER TABLE NAMES MATCHING ... IN SCHEMA '...'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Introduce a comma separated list of table names or *regular expressions*
that you want to target in the pgloader *ALTER TABLE* command. Available
actions are *SET SCHEMA*, *RENAME TO*, and *SET*::
ALTER TABLE NAMES MATCHING ~/_list$/, 'sales_by_store', ~/sales_by/
IN SCHEMA 'dbo'
SET SCHEMA 'mv'
ALTER TABLE NAMES MATCHING 'film' IN SCHEMA 'dbo' RENAME TO 'films'
ALTER TABLE NAMES MATCHING ~/./ IN SCHEMA 'dbo' SET (fillfactor='40')
ALTER TABLE NAMES MATCHING ~/./ IN SCHEMA 'dbo' SET TABLESPACE 'tlbspc'
You can use as many such rules as you need. The list of tables to be
migrated is searched in pgloader memory against the *ALTER TABLE* matching
rules, and for each command pgloader stops at the first matching criteria
(regexp or string).
No *ALTER TABLE* command is sent to PostgreSQL, the modification happens at
the level of the pgloader in-memory representation of your source database
schema. In case of a name change, the mapping is kept and reused in the
*foreign key* and *index* support.
The *SET ()* action takes effect as a *WITH* clause for the `CREATE TABLE`
command that pgloader will run when it has to create a table.
The *SET TABLESPACE* action takes effect as a *TABLESPACE* clause for the
`CREATE TABLE` command that pgloader will run when it has to create a table.
The matching is done in pgloader itself, with a Common Lisp regular
expression lib, so doesn't depend on the *LIKE* implementation of MS SQL,
nor on the lack of support for regular expressions in the engine.
MS SQL Driver setup and encoding
--------------------------------
pgloader is using the `FreeTDS` driver, and internally expects the data to
be sent in utf-8. To achieve that, you can configure the FreeTDS driver with
those defaults, in the file `~/.freetds.conf`::
[global]
tds version = 7.4
client charset = UTF-8
Default MS SQL Casting Rules
----------------------------
When migrating from MS SQL the following Casting Rules are provided:
Numbers::
type tinyint to smallint
type float to float using float-to-string
type real to real using float-to-string
type double to double precision using float-to-string
type numeric to numeric using float-to-string
type decimal to numeric using float-to-string
type money to numeric using float-to-string
type smallmoney to numeric using float-to-string
Texts::
type char to text drop typemod
type nchar to text drop typemod
type varchar to text drop typemod
type nvarchar to text drop typemod
type xml to text drop typemod
Binary::
type binary to bytea using byte-vector-to-bytea
type varbinary to bytea using byte-vector-to-bytea
Date::
type datetime to timestamptz
type datetime2 to timestamptz
Others::
type bit to boolean
type hierarchyid to bytea
type geography to bytea
type uniqueidentifier to uuid using sql-server-uniqueidentifier-to-uuid

687
docs/ref/mysql.rst Normal file
View File

@ -0,0 +1,687 @@
MySQL to Postgres
=================
This command instructs pgloader to load data from a database connection.
pgloader supports dynamically converting the schema of the source database
and the indexes building.
A default set of casting rules are provided and might be overloaded and
appended to by the command.
Using default settings
----------------------
Here is the simplest command line example, which might be all you need:
::
$ pgloader mysql://myuser@myhost/dbname pgsql://pguser@pghost/dbname
Using advanced options and a load command file
----------------------------------------------
It might be that you want more flexibility than that and want to set
advanced options. Then the next example is using as many options as
possible, some of them even being defaults. Chances are you don't need that
complex a setup, don't copy and paste it, use it only as a reference!
The command then would be:
::
$ pgloader my.load
And the contents of the command file ``my.load`` could be inspired from the
following:
::
LOAD DATABASE
FROM mysql://root@localhost/sakila
INTO postgresql://localhost:54393/sakila
WITH include drop, create tables, create indexes, reset sequences,
workers = 8, concurrency = 1,
multiple readers per thread, rows per range = 50000
SET PostgreSQL PARAMETERS
maintenance_work_mem to '128MB',
work_mem to '12MB',
search_path to 'sakila, public, "$user"'
SET MySQL PARAMETERS
net_read_timeout = '120',
net_write_timeout = '120'
CAST type bigint when (= precision 20) to bigserial drop typemod,
type date drop not null drop default using zero-dates-to-null,
-- type tinyint to boolean using tinyint-to-boolean,
type year to integer
MATERIALIZE VIEWS film_list, staff_list
-- INCLUDING ONLY TABLE NAMES MATCHING ~/film/, 'actor'
-- EXCLUDING TABLE NAMES MATCHING ~<ory>
-- DECODING TABLE NAMES MATCHING ~/messed/, ~/encoding/ AS utf8
-- ALTER TABLE NAMES MATCHING 'film' RENAME TO 'films'
-- ALTER TABLE NAMES MATCHING ~/_list$/ SET SCHEMA 'mv'
ALTER TABLE NAMES MATCHING ~/_list$/, 'sales_by_store', ~/sales_by/
SET SCHEMA 'mv'
ALTER TABLE NAMES MATCHING 'film' RENAME TO 'films'
ALTER TABLE NAMES MATCHING ~/./ SET (fillfactor='40')
ALTER SCHEMA 'sakila' RENAME TO 'pagila'
BEFORE LOAD DO
$$ create schema if not exists pagila; $$,
$$ create schema if not exists mv; $$,
$$ alter database sakila set search_path to pagila, mv, public; $$;
Common Clauses
--------------
Please refer to :ref:`common_clauses` for documentation about common
clauses.
MySQL Database Source Specification: FROM
-----------------------------------------
Must be a connection URL pointing to a MySQL database.
If the connection URI contains a table name, then only this table is
migrated from MySQL to PostgreSQL.
See the `SOURCE CONNECTION STRING` section above for details on how to write
the connection string. The MySQL connection string accepts the same
parameter *sslmode* as the PostgreSQL connection string, but the *verify*
mode is not implemented (yet).
::
mysql://[user[:password]@][netloc][:port][/dbname][?option=value&...]
MySQL connection strings support specific options:
- ``useSSL``
The same notation rules as found in the *Connection String* parts of the
documentation apply, and we have a specific MySQL option: ``useSSL``.
The value for ``useSSL`` can be either ``false`` or ``true``.
If both ``sslmode`` and ``useSSL`` are used in the same connection
string, pgloader behavior is undefined.
The MySQL connection string also accepts the *useSSL* parameter with values
being either *false* or *true*.
Environment variables described in
<http://dev.mysql.com/doc/refman/5.0/en/environment-variables.html> can be
used as default values too. If the user is not provided, then it defaults to
`USER` environment variable value. The password can be provided with the
environment variable `MYSQL_PWD`. The host can be provided with the
environment variable `MYSQL_HOST` and otherwise defaults to `localhost`. The
port can be provided with the environment variable `MYSQL_TCP_PORT` and
otherwise defaults to `3306`.
MySQL Database Migration Options: WITH
--------------------------------------
When loading from a `MySQL` database, the following options are supported,
and the default *WITH* clause is: *no truncate*, *create
tables*, *include drop*, *create indexes*, *reset sequences*, *foreign
keys*, *downcase identifiers*, *uniquify index names*.
- *include drop*
When this option is listed, pgloader drops all the tables in the target
PostgreSQL database whose names appear in the MySQL database. This
option allows for using the same command several times in a row until
you figure out all the options, starting automatically from a clean
environment. Please note that `CASCADE` is used to ensure that tables
are dropped even if there are foreign keys pointing to them. This is
precisely what `include drop` is intended to do: drop all target tables
and recreate them.
Great care needs to be taken when using `include drop`, as it will
cascade to *all* objects referencing the target tables, possibly
including other tables that are not being loaded from the source DB.
- *include no drop*
When this option is listed, pgloader will not include any `DROP`
statement when loading the data.
- *truncate*
When this option is listed, pgloader issue the `TRUNCATE` command
against each PostgreSQL table just before loading data into it.
- *no truncate*
When this option is listed, pgloader issues no `TRUNCATE` command.
- *disable triggers*
When this option is listed, pgloader issues an `ALTER TABLE ... DISABLE
TRIGGER ALL` command against the PostgreSQL target table before copying
the data, then the command `ALTER TABLE ... ENABLE TRIGGER ALL` once the
`COPY` is done.
This option allows loading data into a pre-existing table ignoring the
*foreign key constraints* and user defined triggers and may result in
invalid *foreign key constraints* once the data is loaded. Use with
care.
- *create tables*
When this option is listed, pgloader creates the table using the meta
data found in the `MySQL` file, which must contain a list of fields with
their data type. A standard data type conversion from DBF to PostgreSQL
is done.
- *create no tables*
When this option is listed, pgloader skips the creation of table before
loading data, target tables must then already exist.
Also, when using *create no tables* pgloader fetches the metadata from
the current target database and checks type casting, then will remove
constraints and indexes prior to loading the data and install them back
again once the loading is done.
- *create indexes*
When this option is listed, pgloader gets the definitions of all the
indexes found in the MySQL database and create the same set of index
definitions against the PostgreSQL database.
- *create no indexes*
When this option is listed, pgloader skips the creating indexes.
- *drop indexes*
When this option is listed, pgloader drops the indexes in the target
database before loading the data, and creates them again at the end
of the data copy.
- *uniquify index names*, *preserve index names*
MySQL index names are unique per-table whereas in PostgreSQL index names
have to be unique per-schema. The default for pgloader is to change the
index name by prefixing it with `idx_OID` where `OID` is the internal
numeric identifier of the table the index is built against.
In somes cases like when the DDL are entirely left to a framework it
might be sensible for pgloader to refrain from handling index unique
names, that is achieved by using the *preserve index names* option.
The default is to *uniquify index names*.
Even when using the option *preserve index names*, MySQL primary key
indexes named "PRIMARY" will get their names uniquified. Failing to do
so would prevent the primary keys to be created again in PostgreSQL
where the index names must be unique per schema.
- *drop schema*
When this option is listed, pgloader drops the target schema in the
target PostgreSQL database before creating it again and all the objects
it contains. The default behavior doesn't drop the target schemas.
- *foreign keys*
When this option is listed, pgloader gets the definitions of all the
foreign keys found in the MySQL database and create the same set of
foreign key definitions against the PostgreSQL database.
- *no foreign keys*
When this option is listed, pgloader skips creating foreign keys.
- *reset sequences*
When this option is listed, at the end of the data loading and after the
indexes have all been created, pgloader resets all the PostgreSQL
sequences created to the current maximum value of the column they are
attached to.
The options *schema only* and *data only* have no effects on this
option.
- *reset no sequences*
When this option is listed, pgloader skips resetting sequences after the
load.
The options *schema only* and *data only* have no effects on this
option.
- *downcase identifiers*
When this option is listed, pgloader converts all MySQL identifiers
(table names, index names, column names) to *downcase*, except for
PostgreSQL *reserved* keywords.
The PostgreSQL *reserved* keywords are determined dynamically by using
the system function `pg_get_keywords()`.
- *quote identifiers*
When this option is listed, pgloader quotes all MySQL identifiers so
that their case is respected. Note that you will then have to do the
same thing in your application code queries.
- *schema only*
When this option is listed pgloader refrains from migrating the data
over. Note that the schema in this context includes the indexes when the
option *create indexes* has been listed.
- *data only*
When this option is listed pgloader only issues the `COPY` statements,
without doing any other processing.
- *single reader per thread*, *multiple readers per thread*
The default is *single reader per thread* and it means that each
MySQL table is read by a single thread as a whole, with a single
`SELECT` statement using no `WHERE` clause.
When using *multiple readers per thread* pgloader may be able to
divide the reading work into several threads, as many as the
*concurrency* setting, which needs to be greater than 1 for this
option to kick be activated.
For each source table, pgloader searches for a primary key over a
single numeric column, or a multiple-column primary key index for
which the first column is of a numeric data type (one of `integer`
or `bigint`). When such an index exists, pgloader runs a query to
find the *min* and *max* values on this column, and then split that
range into many ranges containing a maximum of *rows per range*.
When the range list we then obtain contains at least as many ranges
than our concurrency setting, then we distribute those ranges to
each reader thread.
So when all the conditions are met, pgloader then starts as many
reader thread as the *concurrency* setting, and each reader thread
issues several queries with a `WHERE id >= x AND id < y`, where `y -
x = rows per range` or less (for the last range, depending on the
max value just obtained.
- *rows per range*
How many rows are fetched per `SELECT` query when using *multiple
readers per thread*, see above for details.
- *SET MySQL PARAMETERS*
The *SET MySQL PARAMETERS* allows setting MySQL parameters using the
MySQL `SET` command each time pgloader connects to it.
MySQL Database Casting Rules
----------------------------
The command *CAST* introduces user-defined casting rules.
The cast clause allows to specify custom casting rules, either to overload
the default casting rules or to amend them with special cases.
A casting rule is expected to follow one of the forms::
type <mysql-type-name> [ <guard> ... ] to <pgsql-type-name> [ <option> ... ]
column <table-name>.<column-name> [ <guards> ] to ...
It's possible for a *casting rule* to either match against a MySQL data type
or against a given *column name* in a given *table name*. That flexibility
allows to cope with cases where the type `tinyint` might have been used as a
`boolean` in some cases but as a `smallint` in others.
The *casting rules* are applied in order, the first match prevents following
rules to be applied, and user defined rules are evaluated first.
The supported guards are:
- *when unsigned*
The casting rule is only applied against MySQL columns of the source
type that have the keyword *unsigned* in their data type definition.
Example of a casting rule using a *unsigned* guard::
type smallint when unsigned to integer drop typemod
- *when default 'value'*
The casting rule is only applied against MySQL columns of the source
type that have given *value*, which must be a single-quoted or a
double-quoted string.
- *when typemod expression*
The casting rule is only applied against MySQL columns of the source
type that have a *typemod* value matching the given *typemod
expression*. The *typemod* is separated into its *precision* and *scale*
components.
Example of a cast rule using a *typemod* guard::
type char when (= precision 1) to char keep typemod
This expression casts MySQL `char(1)` column to a PostgreSQL column of
type `char(1)` while allowing for the general case `char(N)` will be
converted by the default cast rule into a PostgreSQL type `varchar(N)`.
- *with extra auto_increment*
The casting rule is only applied against MySQL columns having the
*extra* column `auto_increment` option set, so that it's possible to
target e.g. `serial` rather than `integer`.
The default matching behavior, when this option isn't set, is to match
both columns with the extra definition and without.
This means that if you want to implement a casting rule that target
either `serial` or `integer` from a `smallint` definition depending on
the *auto_increment* extra bit of information from MySQL, then you need
to spell out two casting rules as following::
type smallint with extra auto_increment
to serial drop typemod keep default keep not null,
type smallint
to integer drop typemod keep default keep not null
The supported casting options are:
- *drop default*, *keep default*
When the option *drop default* is listed, pgloader drops any
existing default expression in the MySQL database for columns of the
source type from the `CREATE TABLE` statement it generates.
The spelling *keep default* explicitly prevents that behaviour and
can be used to overload the default casting rules.
- *drop not null*, *keep not null*, *set not null*
When the option *drop not null* is listed, pgloader drops any
existing `NOT NULL` constraint associated with the given source
MySQL datatype when it creates the tables in the PostgreSQL
database.
The spelling *keep not null* explicitly prevents that behaviour and
can be used to overload the default casting rules.
When the option *set not null* is listed, pgloader sets a `NOT NULL`
constraint on the target column regardless whether it has been set
in the source MySQL column.
- *drop typemod*, *keep typemod*
When the option *drop typemod* is listed, pgloader drops any
existing *typemod* definition (e.g. *precision* and *scale*) from
the datatype definition found in the MySQL columns of the source
type when it created the tables in the PostgreSQL database.
The spelling *keep typemod* explicitly prevents that behaviour and
can be used to overload the default casting rules.
- *using*
This option takes as its single argument the name of a function to
be found in the `pgloader.transforms` Common Lisp package. See above
for details.
It's possible to augment a default cast rule (such as one that
applies against `ENUM` data type for example) with a *transformation
function* by omitting entirely the `type` parts of the casting rule,
as in the following example::
column enumerate.foo using empty-string-to-null
MySQL Views Support
-------------------
MySQL views support allows pgloader to migrate view as if they were base
tables. This feature then allows for on-the-fly transformation from MySQL to
PostgreSQL, as the view definition is used rather than the base data.
MATERIALIZE VIEWS
^^^^^^^^^^^^^^^^^
This clause allows you to implement custom data processing at the data
source by providing a *view definition* against which pgloader will query
the data. It's not possible to just allow for plain `SQL` because we want to
know a lot about the exact data types of each column involved in the query
output.
This clause expect a comma separated list of view definitions, each one
being either the name of an existing view in your database or the following
expression::
*name* `AS` `$$` *sql query* `$$`
The *name* and the *sql query* will be used in a `CREATE VIEW` statement at
the beginning of the data loading, and the resulting view will then be
dropped at the end of the data loading.
MATERIALIZE ALL VIEWS
^^^^^^^^^^^^^^^^^^^^^
Same behaviour as *MATERIALIZE VIEWS* using the dynamic list of views as
returned by MySQL rather than asking the user to specify the list.
MySQL Partial Migration
-----------------------
INCLUDING ONLY TABLE NAMES MATCHING
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Introduce a comma separated list of table names or *regular expression* used
to limit the tables to migrate to a sublist.
Example::
including only table names matching ~/film/, 'actor'
EXCLUDING TABLE NAMES MATCHING
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Introduce a comma separated list of table names or *regular expression* used
to exclude table names from the migration. This filter only applies to the
result of the *INCLUDING* filter.
::
excluding table names matching ~<ory>
MySQL Encoding Support
----------------------
DECODING TABLE NAMES MATCHING
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Introduce a comma separated list of table names or *regular expressions*
used to force the encoding to use when processing data from MySQL. If the
data encoding known to you is different from MySQL's idea about it, this is
the option to use.
::
decoding table names matching ~/messed/, ~/encoding/ AS utf8
You can use as many such rules as you need, all with possibly different
encodings.
MySQL Schema Transformations
----------------------------
ALTER TABLE NAMES MATCHING
^^^^^^^^^^^^^^^^^^^^^^^^^^
Introduce a comma separated list of table names or *regular expressions*
that you want to target in the pgloader *ALTER TABLE* command. Available
actions are *SET SCHEMA*, *RENAME TO*, and *SET*::
ALTER TABLE NAMES MATCHING ~/_list$/, 'sales_by_store', ~/sales_by/
SET SCHEMA 'mv'
ALTER TABLE NAMES MATCHING 'film' RENAME TO 'films'
ALTER TABLE NAMES MATCHING ~/./ SET (fillfactor='40')
ALTER TABLE NAMES MATCHING ~/./ SET TABLESPACE 'pg_default'
You can use as many such rules as you need. The list of tables to be
migrated is searched in pgloader memory against the *ALTER TABLE* matching
rules, and for each command pgloader stops at the first matching criteria
(regexp or string).
No *ALTER TABLE* command is sent to PostgreSQL, the modification happens at
the level of the pgloader in-memory representation of your source database
schema. In case of a name change, the mapping is kept and reused in the
*foreign key* and *index* support.
The *SET ()* action takes effect as a *WITH* clause for the `CREATE TABLE`
command that pgloader will run when it has to create a table.
The *SET TABLESPACE* action takes effect as a *TABLESPACE* clause for the
`CREATE TABLE` command that pgloader will run when it has to create a table.
MySQL Migration: limitations
----------------------------
The `database` command currently only supports MySQL source database and has
the following limitations:
- Views are not migrated,
Supporting views might require implementing a full SQL parser for the
MySQL dialect with a porting engine to rewrite the SQL against
PostgreSQL, including renaming functions and changing some constructs.
While it's not theoretically impossible, don't hold your breath.
- Triggers are not migrated
The difficulty of doing so is not yet assessed.
- Of the geometric datatypes, only the `POINT` database has been covered.
The other ones should be easy enough to implement now, it's just not
done yet.
Default MySQL Casting Rules
---------------------------
When migrating from MySQL the following Casting Rules are provided:
Numbers::
type int with extra auto_increment to serial when (< precision 10)
type int with extra auto_increment to bigserial when (<= 10 precision)
type int to int when (< precision 10)
type int to bigint when (<= 10 precision)
type tinyint with extra auto_increment to serial
type smallint with extra auto_increment to serial
type mediumint with extra auto_increment to serial
type bigint with extra auto_increment to bigserial
type tinyint to boolean when (= 1 precision) using tinyint-to-boolean
type bit when (= 1 precision) to boolean drop typemod using bits-to-boolean
type bit to bit drop typemod using bits-to-hex-bitstring
type bigint when signed to bigint drop typemod
type bigint when (< 19 precision) to numeric drop typemod
type tinyint when unsigned to smallint drop typemod
type smallint when unsigned to integer drop typemod
type mediumint when unsigned to integer drop typemod
type integer when unsigned to bigint drop typemod
type tinyint to smallint drop typemod
type smallint to smallint drop typemod
type mediumint to integer drop typemod
type integer to integer drop typemod
type bigint to bigint drop typemod
type float to float drop typemod
type double to double precision drop typemod
type numeric to numeric keep typemod
type decimal to decimal keep typemod
Texts::
type char to char keep typemod using remove-null-characters
type varchar to varchar keep typemod using remove-null-characters
type tinytext to text using remove-null-characters
type text to text using remove-null-characters
type mediumtext to text using remove-null-characters
type longtext to text using remove-null-characters
Binary::
type binary to bytea using byte-vector-to-bytea
type varbinary to bytea using byte-vector-to-bytea
type tinyblob to bytea using byte-vector-to-bytea
type blob to bytea using byte-vector-to-bytea
type mediumblob to bytea using byte-vector-to-bytea
type longblob to bytea using byte-vector-to-bytea
Date::
type datetime when default "0000-00-00 00:00:00" and not null
to timestamptz drop not null drop default
using zero-dates-to-null
type datetime when default "0000-00-00 00:00:00"
to timestamptz drop default
using zero-dates-to-null
type datetime with extra on update current timestamp when not null
to timestamptz drop not null drop default
using zero-dates-to-null
type datetime with extra on update current timestamp
to timestamptz drop default
using zero-dates-to-null
type timestamp when default "0000-00-00 00:00:00" and not null
to timestamptz drop not null drop default
using zero-dates-to-null
type timestamp when default "0000-00-00 00:00:00"
to timestamptz drop default
using zero-dates-to-null
type date when default "0000-00-00" to date drop default
using zero-dates-to-null
type date to date
type datetime to timestamptz
type timestamp to timestamptz
type year to integer drop typemod
Geometric::
type geometry to point using convert-mysql-point
type point to point using convert-mysql-point
type linestring to path using convert-mysql-linestring
Enum types are declared inline in MySQL and separately with a `CREATE TYPE`
command in PostgreSQL, so each column of Enum Type is converted to a type
named after the table and column names defined with the same labels in the
same order.
When the source type definition is not matched in the default casting rules
nor in the casting rules provided in the command, then the type name with
the typemod is used.

View File

@ -0,0 +1,196 @@
PostgreSQL to Citus
===================
This command instructs pgloader to load data from a database connection.
Automatic discovery of the schema is supported, including build of the
indexes, primary and foreign keys constraints. A default set of casting
rules are provided and might be overloaded and appended to by the command.
Automatic distribution column backfilling is supported, either from commands
that specify what is the distribution column in every table, or only in the
main table, then relying on foreign key constraints to discover the other
distribution keys.
Here's a short example of migrating a database from a PostgreSQL server to
another:
::
load database
from pgsql:///hackathon
into pgsql://localhost:9700/dim
with include drop, reset no sequences
cast column impressions.seen_at to "timestamp with time zone"
distribute companies using id
-- distribute campaigns using company_id
-- distribute ads using company_id from campaigns
-- distribute clicks using company_id from ads, campaigns
-- distribute impressions using company_id from ads, campaigns
;
Everything works exactly the same way as when doing a PostgreSQL to
PostgreSQL migration, with the added fonctionality of this new `distribute`
command.
Distribute Command
^^^^^^^^^^^^^^^^^^
The distribute command syntax is as following::
distribute <table name> using <column name>
distribute <table name> using <column name> from <table> [, <table>, ...]
distribute <table name> as reference table
When using the distribute command, the following steps are added to pgloader
operations when migrating the schema:
- if the distribution column does not exist in the table, it is added as
the first column of the table
- if the distribution column does not exists in the primary key of the
table, it is added as the first column of the primary of the table
- all the foreign keys that point to the table are added the distribution
key automatically too, including the source tables of the foreign key
constraints
- once the schema has been created on the target database, pgloader then
issues Citus specific command `create_reference_table()
<http://docs.citusdata.com/en/v8.0/develop/api_udf.html?highlight=create_reference_table#create-reference-table>`_
and `create_distributed_table()
<http://docs.citusdata.com/en/v8.0/develop/api_udf.html?highlight=create_reference_table#create-distributed-table>`_
to make the tables distributed
Those operations are done in the schema section of pgloader, before the data
is loaded. When the data is loaded, the newly added columns need to be
backfilled from referenced data. pgloader knows how to do that by generating
a query like the following and importing the result set of such a query
rather than the raw data from the source table.
Citus Migration Example
^^^^^^^^^^^^^^^^^^^^^^^
With the migration command as above, pgloader adds the column ``company_id``
to the tables that have a direct or indirect foreign key reference to the
``companies`` table.
We run pgloader using the following command, where the file
`./test/citus/company.load
<https://github.com/dimitri/pgloader/blob/master/test/citus/company.load>`_
contains the pgloader command as shown above.
::
$ pgloader --client-min-messages sql ./test/citus/company.load
The following SQL statements are all extracted from the log messages that
the pgloader command outputs. We are going to have a look at the
`impressions` table. It gets created with a new column `company_id` in the
first position, as follows:
::
CREATE TABLE "public"."impressions"
(
company_id bigint,
"id" bigserial,
"ad_id" bigint default NULL,
"seen_at" timestamp with time zone default NULL,
"site_url" text default NULL,
"cost_per_impression_usd" numeric(20,10) default NULL,
"user_ip" inet default NULL,
"user_data" jsonb default NULL
);
The original schema for this table does not have the `company_id` column,
which means pgloader now needs to change the primary key definition, the
foreign keys constraints definitions from and to this table, and also to
*backfill* the `company_id` data to this table when doing the COPY phase of
the migration.
Then once the tables have been created, pgloader executes the following SQL
statements::
SELECT create_distributed_table('"public"."companies"', 'id');
SELECT create_distributed_table('"public"."campaigns"', 'company_id');
SELECT create_distributed_table('"public"."ads"', 'company_id');
SELECT create_distributed_table('"public"."clicks"', 'company_id');
SELECT create_distributed_table('"public"."impressions"', 'company_id');
Then when copying the data from the source PostgreSQL database to the new
Citus tables, the new column (here ``company_id``) needs to be backfilled
from the source tables. Here's the SQL query that pgloader uses as a data
source for the ``ads`` table in our example:
::
SELECT "campaigns".company_id::text, "ads".id::text, "ads".campaign_id::text,
"ads".name::text, "ads".image_url::text, "ads".target_url::text,
"ads".impressions_count::text, "ads".clicks_count::text,
"ads".created_at::text, "ads".updated_at::text
FROM "public"."ads"
JOIN "public"."campaigns"
ON ads.campaign_id = campaigns.id
The ``impressions`` table has an indirect foreign key reference to the
``company`` table, which is the table where the distribution key is
specified. pgloader will discover that itself from walking the PostgreSQL
catalogs, and you may also use the following specification in the pgloader
command to explicitely add the indirect dependency:
::
distribute impressions using company_id from ads, campaigns
Given this schema, the SQL query used by pgloader to fetch the data for the
`impressions` table is the following, implementing online backfilling of the
data:
::
SELECT "campaigns".company_id::text, "impressions".id::text,
"impressions".ad_id::text, "impressions".seen_at::text,
"impressions".site_url::text,
"impressions".cost_per_impression_usd::text,
"impressions".user_ip::text,
"impressions".user_data::text
FROM "public"."impressions"
JOIN "public"."ads"
ON impressions.ad_id = ads.id
JOIN "public"."campaigns"
ON ads.campaign_id = campaigns.id
When the data copying is done, then pgloader also has to install the indexes
supporting the primary keys, and add the foreign key definitions to the
schema. Those definitions are not the same as in the source schema, because
of the adding of the distribution column to the table: we need to also add
the column to the primary key and the foreign key constraints.
Here's the commands issued by pgloader for the ``impressions`` table:
::
CREATE UNIQUE INDEX "impressions_pkey"
ON "public"."impressions" (company_id, id);
ALTER TABLE "public"."impressions"
ADD CONSTRAINT "impressions_ad_id_fkey"
FOREIGN KEY(company_id,ad_id)
REFERENCES "public"."ads"(company_id,id)
Given a single line of specification ``distribute companies using id`` then
pgloader implements all the necessary schema changes on the fly when
migrating to Citus, and also dynamically backfills the data.
Citus Migration: Limitations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The way pgloader implements *reset sequence* does not work with Citus at
this point, so sequences need to be taken care of separately at this point.

View File

@ -0,0 +1,71 @@
Redshift to Postgres
====================
The command and behavior are the same as when migration from a PostgreSQL
database source, see :ref:`migrating_to_pgsql`. pgloader automatically
discovers that it's talking to a Redshift database by parsing the output of
the ``SELECT version()`` SQL query.
Redshift as a data source
^^^^^^^^^^^^^^^^^^^^^^^^^
Redshift is a variant of PostgreSQL version 8.0.2, which allows pgloader to
work with only a very small amount of adaptation in the catalog queries
used. In other words, migrating from Redshift to PostgreSQL works just the
same as when migrating from a PostgreSQL data source, including the
connection string specification.
Redshift as a data destination
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The Redshift variant of PostgreSQL 8.0.2 does not have support for the
``COPY FROM STDIN`` feature that pgloader normally relies upon. To use COPY
with Redshift, the data must first be made available in an S3 bucket.
First, pgloader must authenticate to Amazon S3. pgloader uses the following
setup for that:
- ``~/.aws/config``
This INI formatted file contains sections with your default region and
other global values relevant to using the S3 API. pgloader parses it to
get the region when it's setup in the ``default`` INI section.
The environment variable ``AWS_DEFAULT_REGION`` can be used to override
the configuration file value.
- ``~/.aws/credentials``
The INI formatted file contains your authentication setup to Amazon,
with the properties ``aws_access_key_id`` and ``aws_secret_access_key``
in the section ``default``. pgloader parses this file for those keys,
and uses their values when communicating with Amazon S3.
The environment variables ``AWS_ACCESS_KEY_ID`` and
``AWS_SECRET_ACCESS_KEY`` can be used to override the configuration file
- ``AWS_S3_BUCKET_NAME``
Finally, the value of the environment variable ``AWS_S3_BUCKET_NAME`` is
used by pgloader as the name of the S3 bucket where to upload the files
to COPY to the Redshift database. The bucket name defaults to
``pgloader``.
Then pgloader works as usual, see the other sections of the documentation
for the details, depending on the data source (files, other databases, etc).
When preparing the data for PostgreSQL, pgloader now uploads each batch into
a single CSV file, and then issue such as the following, for each batch:
::
COPY <target_table_name>
FROM 's3://<s3 bucket>/<s3-filename-just-uploaded>'
FORMAT CSV
TIMEFORMAT 'auto'
REGION '<aws-region>'
ACCESS_KEY_ID '<aws-access-key-id>'
SECRET_ACCESS_KEY '<aws-secret-access-key>;
This is the only difference with a PostgreSQL core version, where pgloader
can rely on the classic ``COPY FROM STDIN`` command, which allows to send
data through the already established connection to PostgreSQL.

441
docs/ref/pgsql.rst Normal file
View File

@ -0,0 +1,441 @@
.. _migrating_to_pgsql:
Postgres to Postgres
====================
This command instructs pgloader to load data from a database connection.
Automatic discovery of the schema is supported, including build of the
indexes, primary and foreign keys constraints. A default set of casting
rules are provided and might be overloaded and appended to by the command.
For a complete Postgres to Postgres solution including Change Data Capture
support with Logical Decoding, see `pgcopydb`__.
__ https://pgcopydb.readthedocs.io/
Using default settings
----------------------
Here is the simplest command line example, which might be all you need:
::
$ pgloader pgsql://user@source/dbname pgsql://user@target/dbname
Using advanced options and a load command file
----------------------------------------------
Here's a short example of migrating a database from a PostgreSQL server to
another. The command would then be:
::
$ pgloader pg.load
And the contents of the command file ``pg.load`` could be inspired from the
following:
::
load database
from pgsql://localhost/pgloader
into pgsql://localhost/copy
including only table names matching 'bits', ~/utilisateur/ in schema 'mysql'
including only table names matching ~/geolocations/ in schema 'public'
;
Common Clauses
--------------
Please refer to :ref:`common_clauses` for documentation about common
clauses.
PostgreSQL Database Source Specification: FROM
----------------------------------------------
Must be a connection URL pointing to a PostgreSQL database.
See the `SOURCE CONNECTION STRING` section above for details on how to write
the connection string.
::
pgsql://[user[:password]@][netloc][:port][/dbname][?option=value&...]
PostgreSQL Database Migration Options: WITH
-------------------------------------------
When loading from a `PostgreSQL` database, the following options are
supported, and the default *WITH* clause is: *no truncate*, *create schema*,
*create tables*, *include drop*, *create indexes*, *reset sequences*,
*foreign keys*, *downcase identifiers*, *uniquify index names*, *reindex*.
- *include drop*
When this option is listed, pgloader drops all the tables in the target
PostgreSQL database whose names appear in the MySQL database. This
option allows for using the same command several times in a row until
you figure out all the options, starting automatically from a clean
environment. Please note that `CASCADE` is used to ensure that tables
are dropped even if there are foreign keys pointing to them. This is
precisely what `include drop` is intended to do: drop all target tables
and recreate them.
Great care needs to be taken when using `include drop`, as it will
cascade to *all* objects referencing the target tables, possibly
including other tables that are not being loaded from the source DB.
- *include no drop*
When this option is listed, pgloader will not include any `DROP`
statement when loading the data.
- *truncate*
When this option is listed, pgloader issue the `TRUNCATE` command
against each PostgreSQL table just before loading data into it.
- *no truncate*
When this option is listed, pgloader issues no `TRUNCATE` command.
- *disable triggers*
When this option is listed, pgloader issues an `ALTER TABLE ... DISABLE
TRIGGER ALL` command against the PostgreSQL target table before copying
the data, then the command `ALTER TABLE ... ENABLE TRIGGER ALL` once the
`COPY` is done.
This option allows loading data into a pre-existing table ignoring the
*foreign key constraints* and user defined triggers and may result in
invalid *foreign key constraints* once the data is loaded. Use with
care.
- *create tables*
When this option is listed, pgloader creates the table using the meta
data found in the `MySQL` file, which must contain a list of fields with
their data type. A standard data type conversion from DBF to PostgreSQL
is done.
- *create no tables*
When this option is listed, pgloader skips the creation of table before
loading data, target tables must then already exist.
Also, when using *create no tables* pgloader fetches the metadata from
the current target database and checks type casting, then will remove
constraints and indexes prior to loading the data and install them back
again once the loading is done.
- *create indexes*
When this option is listed, pgloader gets the definitions of all the
indexes found in the MySQL database and create the same set of index
definitions against the PostgreSQL database.
- *create no indexes*
When this option is listed, pgloader skips the creating indexes.
- *drop indexes*
When this option is listed, pgloader drops the indexes in the target
database before loading the data, and creates them again at the end
of the data copy.
- *reindex*
When this option is used, pgloader does both *drop indexes* before
loading the data and *create indexes* once data is loaded.
- *drop schema*
When this option is listed, pgloader drops the target schema in the
target PostgreSQL database before creating it again and all the objects
it contains. The default behavior doesn't drop the target schemas.
- *foreign keys*
When this option is listed, pgloader gets the definitions of all the
foreign keys found in the MySQL database and create the same set of
foreign key definitions against the PostgreSQL database.
- *no foreign keys*
When this option is listed, pgloader skips creating foreign keys.
- *reset sequences*
When this option is listed, at the end of the data loading and after the
indexes have all been created, pgloader resets all the PostgreSQL
sequences created to the current maximum value of the column they are
attached to.
The options *schema only* and *data only* have no effects on this
option.
- *reset no sequences*
When this option is listed, pgloader skips resetting sequences after the
load.
The options *schema only* and *data only* have no effects on this
option.
- *downcase identifiers*
When this option is listed, pgloader converts all MySQL identifiers
(table names, index names, column names) to *downcase*, except for
PostgreSQL *reserved* keywords.
The PostgreSQL *reserved* keywords are determined dynamically by using
the system function `pg_get_keywords()`.
- *quote identifiers*
When this option is listed, pgloader quotes all MySQL identifiers so
that their case is respected. Note that you will then have to do the
same thing in your application code queries.
- *schema only*
When this option is listed pgloader refrains from migrating the data
over. Note that the schema in this context includes the indexes when the
option *create indexes* has been listed.
- *data only*
When this option is listed pgloader only issues the `COPY` statements,
without doing any other processing.
- *rows per range*
How many rows are fetched per `SELECT` query when using *multiple
readers per thread*, see above for details.
PostgreSQL Database Casting Rules
---------------------------------
The command *CAST* introduces user-defined casting rules.
The cast clause allows to specify custom casting rules, either to overload
the default casting rules or to amend them with special cases.
A casting rule is expected to follow one of the forms::
type <type-name> [ <guard> ... ] to <pgsql-type-name> [ <option> ... ]
column <table-name>.<column-name> [ <guards> ] to ...
It's possible for a *casting rule* to either match against a PostgreSQL data
type or against a given *column name* in a given *table name*. So it's
possible to migrate a table from a PostgreSQL database while changing and
`int` column to a `bigint` one, automatically.
The *casting rules* are applied in order, the first match prevents following
rules to be applied, and user defined rules are evaluated first.
The supported guards are:
- *when default 'value'*
The casting rule is only applied against MySQL columns of the source
type that have given *value*, which must be a single-quoted or a
double-quoted string.
- *when typemod expression*
The casting rule is only applied against MySQL columns of the source
type that have a *typemod* value matching the given *typemod
expression*. The *typemod* is separated into its *precision* and *scale*
components.
Example of a cast rule using a *typemod* guard::
type char when (= precision 1) to char keep typemod
This expression casts MySQL `char(1)` column to a PostgreSQL column of
type `char(1)` while allowing for the general case `char(N)` will be
converted by the default cast rule into a PostgreSQL type `varchar(N)`.
- *with extra auto_increment*
The casting rule is only applied against PostgreSQL attached to a
sequence. This can be the result of doing that manually, using a
`serial` or a `bigserial` data type, or an `identity` column.
The supported casting options are:
- *drop default*, *keep default*
When the option *drop default* is listed, pgloader drops any
existing default expression in the MySQL database for columns of the
source type from the `CREATE TABLE` statement it generates.
The spelling *keep default* explicitly prevents that behaviour and
can be used to overload the default casting rules.
- *drop not null*, *keep not null*, *set not null*
When the option *drop not null* is listed, pgloader drops any
existing `NOT NULL` constraint associated with the given source
MySQL datatype when it creates the tables in the PostgreSQL
database.
The spelling *keep not null* explicitly prevents that behaviour and
can be used to overload the default casting rules.
When the option *set not null* is listed, pgloader sets a `NOT NULL`
constraint on the target column regardless whether it has been set
in the source MySQL column.
- *drop typemod*, *keep typemod*
When the option *drop typemod* is listed, pgloader drops any
existing *typemod* definition (e.g. *precision* and *scale*) from
the datatype definition found in the MySQL columns of the source
type when it created the tables in the PostgreSQL database.
The spelling *keep typemod* explicitly prevents that behaviour and
can be used to overload the default casting rules.
- *using*
This option takes as its single argument the name of a function to
be found in the `pgloader.transforms` Common Lisp package. See above
for details.
It's possible to augment a default cast rule (such as one that
applies against `ENUM` data type for example) with a *transformation
function* by omitting entirely the `type` parts of the casting rule,
as in the following example::
column enumerate.foo using empty-string-to-null
PostgreSQL Views Support
------------------------
PostgreSQL views support allows pgloader to migrate view as if they were
base tables. This feature then allows for on-the-fly transformation of the
source schema, as the view definition is used rather than the base data.
MATERIALIZE VIEWS
^^^^^^^^^^^^^^^^^
This clause allows you to implement custom data processing at the data
source by providing a *view definition* against which pgloader will query
the data. It's not possible to just allow for plain `SQL` because we want to
know a lot about the exact data types of each column involved in the query
output.
This clause expect a comma separated list of view definitions, each one
being either the name of an existing view in your database or the following
expression::
*name* `AS` `$$` *sql query* `$$`
The *name* and the *sql query* will be used in a `CREATE VIEW` statement at
the beginning of the data loading, and the resulting view will then be
dropped at the end of the data loading.
MATERIALIZE ALL VIEWS
^^^^^^^^^^^^^^^^^^^^^
Same behaviour as *MATERIALIZE VIEWS* using the dynamic list of views as
returned by PostgreSQL rather than asking the user to specify the list.
PostgreSQL Partial Migration
----------------------------
INCLUDING ONLY TABLE NAMES MATCHING
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Introduce a comma separated list of table names or *regular expression* used
to limit the tables to migrate to a sublist.
Example::
including only table names matching ~/film/, 'actor' in schema 'public'
EXCLUDING TABLE NAMES MATCHING
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Introduce a comma separated list of table names or *regular expression* used
to exclude table names from the migration. This filter only applies to the
result of the *INCLUDING* filter.
::
excluding table names matching ~<ory> in schema 'public'
PostgreSQL Schema Transformations
---------------------------------
ALTER TABLE NAMES MATCHING
^^^^^^^^^^^^^^^^^^^^^^^^^^
Introduce a comma separated list of table names or *regular expressions*
that you want to target in the pgloader *ALTER TABLE* command. Available
actions are *SET SCHEMA*, *RENAME TO*, and *SET*::
ALTER TABLE NAMES MATCHING ~/_list$/, 'sales_by_store', ~/sales_by/
IN SCHEMA 'public'
SET SCHEMA 'mv'
ALTER TABLE NAMES MATCHING 'film' IN SCHEMA 'public' RENAME TO 'films'
ALTER TABLE NAMES MATCHING ~/./ IN SCHEMA 'public' SET (fillfactor='40')
ALTER TABLE NAMES MATCHING ~/./ IN SCHEMA 'public' SET TABLESPACE 'pg_default'
You can use as many such rules as you need. The list of tables to be
migrated is searched in pgloader memory against the *ALTER TABLE* matching
rules, and for each command pgloader stops at the first matching criteria
(regexp or string).
No *ALTER TABLE* command is sent to PostgreSQL, the modification happens at
the level of the pgloader in-memory representation of your source database
schema. In case of a name change, the mapping is kept and reused in the
*foreign key* and *index* support.
The *SET ()* action takes effect as a *WITH* clause for the `CREATE TABLE`
command that pgloader will run when it has to create a table.
The *SET TABLESPACE* action takes effect as a *TABLESPACE* clause for the
`CREATE TABLE` command that pgloader will run when it has to create a table.
PostgreSQL Migration: limitations
---------------------------------
The only PostgreSQL objects supported at this time in pgloader are
extensions, schema, tables, indexes and constraints. Anything else is ignored.
- Views are not migrated,
Supporting views might require implementing a full SQL parser for the
MySQL dialect with a porting engine to rewrite the SQL against
PostgreSQL, including renaming functions and changing some constructs.
While it's not theoretically impossible, don't hold your breath.
- Triggers are not migrated
The difficulty of doing so is not yet assessed.
- Stored Procedures and Functions are not migrated.
Default PostgreSQL Casting Rules
--------------------------------
When migrating from PostgreSQL the following Casting Rules are provided::
type int with extra auto_increment to serial
type bigint with extra auto_increment to bigserial
type "character varying" to text drop typemod

230
docs/ref/sqlite.rst Normal file
View File

@ -0,0 +1,230 @@
SQLite to Postgres
==================
This command instructs pgloader to load data from a SQLite file. Automatic
discovery of the schema is supported, including build of the indexes.
Using default settings
----------------------
Here is the simplest command line example, which might be all you need:
::
$ pgloader sqlite:///path/to/file.db pgsql://pguser@pghost/dbname
Using advanced options and a load command file
----------------------------------------------
The command then would be:
::
$ pgloader db.load
Here's an example of the ``db.load`` contents then::
load database
from sqlite:///Users/dim/Downloads/lastfm_tags.db
into postgresql:///tags
with include drop, create tables, create indexes, reset sequences
set work_mem to '16MB', maintenance_work_mem to '512 MB';
Common Clauses
--------------
Please refer to :ref:`common_clauses` for documentation about common
clauses.
SQLite Database Source Specification: FROM
------------------------------------------
Path or HTTP URL to a SQLite file, might be a `.zip` file.
SQLite Database Migration Options: WITH
---------------------------------------
When loading from a `SQLite` database, the following options are
supported:
When loading from a `SQLite` database, the following options are
supported, and the default *WITH* clause is: *no truncate*, *create
tables*, *include drop*, *create indexes*, *reset sequences*, *downcase
identifiers*, *encoding 'utf-8'*.
- *include drop*
When this option is listed, pgloader drops all the tables in the target
PostgreSQL database whose names appear in the SQLite database. This
option allows for using the same command several times in a row until
you figure out all the options, starting automatically from a clean
environment. Please note that `CASCADE` is used to ensure that tables
are dropped even if there are foreign keys pointing to them. This is
precisely what `include drop` is intended to do: drop all target tables
and recreate them.
Great care needs to be taken when using `include drop`, as it will
cascade to *all* objects referencing the target tables, possibly
including other tables that are not being loaded from the source DB.
- *include no drop*
When this option is listed, pgloader will not include any `DROP`
statement when loading the data.
- *truncate*
When this option is listed, pgloader issue the `TRUNCATE` command
against each PostgreSQL table just before loading data into it.
- *no truncate*
When this option is listed, pgloader issues no `TRUNCATE` command.
- *disable triggers*
When this option is listed, pgloader issues an `ALTER TABLE ... DISABLE
TRIGGER ALL` command against the PostgreSQL target table before copying
the data, then the command `ALTER TABLE ... ENABLE TRIGGER ALL` once the
`COPY` is done.
This option allows loading data into a pre-existing table ignoring
the *foreign key constraints* and user defined triggers and may
result in invalid *foreign key constraints* once the data is loaded.
Use with care.
- *create tables*
When this option is listed, pgloader creates the table using the meta
data found in the `SQLite` file, which must contain a list of fields
with their data type. A standard data type conversion from SQLite to
PostgreSQL is done.
- *create no tables*
When this option is listed, pgloader skips the creation of table before
loading data, target tables must then already exist.
Also, when using *create no tables* pgloader fetches the metadata
from the current target database and checks type casting, then will
remove constraints and indexes prior to loading the data and install
them back again once the loading is done.
- *create indexes*
When this option is listed, pgloader gets the definitions of all the
indexes found in the SQLite database and create the same set of index
definitions against the PostgreSQL database.
- *create no indexes*
When this option is listed, pgloader skips the creating indexes.
- *drop indexes*
When this option is listed, pgloader drops the indexes in the target
database before loading the data, and creates them again at the end
of the data copy.
- *reset sequences*
When this option is listed, at the end of the data loading and after
the indexes have all been created, pgloader resets all the
PostgreSQL sequences created to the current maximum value of the
column they are attached to.
- *reset no sequences*
When this option is listed, pgloader skips resetting sequences after the
load.
The options *schema only* and *data only* have no effects on this
option.
- *schema only*
When this option is listed pgloader will refrain from migrating the data
over. Note that the schema in this context includes the indexes when the
option *create indexes* has been listed.
- *data only*
When this option is listed pgloader only issues the `COPY` statements,
without doing any other processing.
- *encoding*
This option allows to control which encoding to parse the SQLite text
data with. Defaults to UTF-8.
SQLite Database Casting Rules
-----------------------------
The command *CAST* introduces user-defined casting rules.
The cast clause allows to specify custom casting rules, either to overload
the default casting rules or to amend them with special cases.
SQlite Database Partial Migrations
----------------------------------
INCLUDING ONLY TABLE NAMES LIKE
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Introduce a comma separated list of table name patterns used to limit the
tables to migrate to a sublist.
Example::
including only table names like 'Invoice%'
EXCLUDING TABLE NAMES LIKE
^^^^^^^^^^^^^^^^^^^^^^^^^^
Introduce a comma separated list of table name patterns used to exclude
table names from the migration. This filter only applies to the result of
the *INCLUDING* filter.
::
excluding table names like 'appointments'
Default SQLite Casting Rules
----------------------------
When migrating from SQLite the following Casting Rules are provided:
Numbers::
type tinyint to smallint using integer-to-string
type integer to bigint using integer-to-string
type float to float using float-to-string
type real to real using float-to-string
type double to double precision using float-to-string
type numeric to numeric using float-to-string
type decimal to numeric using float-to-string
Texts::
type character to text drop typemod
type varchar to text drop typemod
type nvarchar to text drop typemod
type char to text drop typemod
type nchar to text drop typemod
type nvarchar to text drop typemod
type clob to text drop typemod
Binary::
type blob to bytea
Date::
type datetime to timestamptz using sqlite-timestamp-to-timestamp
type timestamp to timestamptz using sqlite-timestamp-to-timestamp
type timestamptz to timestamptz using sqlite-timestamp-to-timestamp

142
docs/ref/transforms.rst Normal file
View File

@ -0,0 +1,142 @@
Transformation Functions
========================
Some data types are implemented in a different enough way that a
transformation function is necessary. This function must be written in
`Common lisp` and is searched in the `pgloader.transforms` package.
Some default transformation function are provided with pgloader, and you can
use the `--load` command line option to load and compile your own lisp file
into pgloader at runtime. For your functions to be found, remember to begin
your lisp file with the following form::
(in-package #:pgloader.transforms)
The provided transformation functions are:
- *zero-dates-to-null*
When the input date is all zeroes, return `nil`, which gets loaded as a
PostgreSQL `NULL` value.
- *date-with-no-separator*
Applies *zero-dates-to-null* then transform the given date into a format
that PostgreSQL will actually process::
In: "20041002152952"
Out: "2004-10-02 15:29:52"
- *time-with-no-separator*
Transform the given time into a format that PostgreSQL will actually
process::
In: "08231560"
Out: "08:23:15.60"
- *tinyint-to-boolean*
As MySQL lacks a proper boolean type, *tinyint* is often used to
implement that. This function transforms `0` to `'false'` and anything
else to `'true`'.
- *bits-to-boolean*
As MySQL lacks a proper boolean type, *BIT* is often used to implement
that. This function transforms 1-bit bit vectors from `0` to `f` and any
other value to `t`..
- *int-to-ip*
Convert an integer into a dotted representation of an ip4. ::
In: 18435761
Out: "1.25.78.177"
- *ip-range*
Converts a couple of integers given as strings into a range of ip4. ::
In: "16825344" "16825599"
Out: "1.0.188.0-1.0.188.255"
- *convert-mysql-point*
Converts from the `astext` representation of points in MySQL to the
PostgreSQL representation. ::
In: "POINT(48.5513589 7.6926827)"
Out: "(48.5513589,7.6926827)"
- *integer-to-string*
Converts a integer string or a Common Lisp integer into a string
suitable for a PostgreSQL integer. Takes care of quoted integers. ::
In: "\"0\""
Out: "0"
- *float-to-string*
Converts a Common Lisp float into a string suitable for a PostgreSQL float::
In: 100.0d0
Out: "100.0"
- *hex-to-dec*
Converts a string containing an hexadecimal representation of a number
into its decimal representation::
In: "deadbeef"
Out: "3735928559"
- *set-to-enum-array*
Converts a string representing a MySQL SET into a PostgreSQL Array of
Enum values from the set. ::
In: "foo,bar"
Out: "{foo,bar}"
- *empty-string-to-null*
Convert an empty string to a null.
- *right-trim*
Remove whitespace at end of string.
- *remove-null-characters*
Remove `NUL` characters (`0x0`) from given strings.
- *byte-vector-to-bytea*
Transform a simple array of unsigned bytes to the PostgreSQL bytea Hex
Format representation as documented at
http://www.postgresql.org/docs/9.3/interactive/datatype-binary.html
- *sqlite-timestamp-to-timestamp*
SQLite type system is quite interesting, so cope with it here to produce
timestamp literals as expected by PostgreSQL. That covers year only on 4
digits, 0 dates to null, and proper date strings.
- *sql-server-uniqueidentifier-to-uuid*
The SQL Server driver receives data fo type uniqueidentifier as byte
vector that we then need to convert to an UUID string for PostgreSQL
COPY input format to process.
- *unix-timestamp-to-timestamptz*
Converts a unix timestamp (number of seconds elapsed since beginning of
1970) into a proper PostgreSQL timestamp format.
- *varbinary-to-string*
Converts binary encoded string (such as a MySQL `varbinary` entry) to a
decoded text, using the table's encoding that may be overloaded with the
*DECODING TABLE NAMES MATCHING* clause.

4
docs/requirements.txt Normal file
View File

@ -0,0 +1,4 @@
Sphinx==4.2.0
sphinx_rtd_theme==1.0.0
docutils==0.16
readthedocs-sphinx-search==0.1.0

View File

@ -1,13 +1,14 @@
# Loading CSV Data with pgloader
Loading CSV Data with pgloader
------------------------------
CSV means *comma separated values* and is often found with quite varying
specifications. pgloader allows you to describe those specs in its command.
## The Command
The Command
^^^^^^^^^^^
To load data with [pgloader](http://pgloader.io/) you need to define in a
*command* the operations in some details. Here's our example for loading CSV
data:
To load data with pgloader you need to define in a *command* the operations in
some details. Here's our example for loading CSV data::
LOAD CSV
FROM 'path/to/file.csv' (x, y, a, b, c, d)
@ -33,13 +34,10 @@ data:
);
$$;
You can see the full list of options in the
[pgloader reference manual](pgloader.1.html), with a complete description
of the options you see here.
The Data
^^^^^^^^
## The Data
This command allows loading the following CSV file content:
This command allows loading the following CSV file content::
Header, with a © sign
"2.6.190.56","2.6.190.63","33996344","33996351","GB","United Kingdom"
@ -49,10 +47,11 @@ This command allows loading the following CSV file content:
"4.17.143.0","4.17.143.15","68259584","68259599","CA","Canada"
"4.17.143.16","4.18.32.71","68259600","68296775","US","United States"
## Loading the data
Loading the data
^^^^^^^^^^^^^^^^
Here's how to start loading the data. Note that the ouput here has been
edited so as to facilitate its browsing online.
edited so as to facilitate its browsing online::
$ pgloader csv.load
... LOG Starting pgloader, log system is ready.
@ -66,11 +65,12 @@ edited so as to facilitate its browsing online.
----------------- --------- --------- --------- --------------
Total import time 6 6 0 0.058s
## The result
The result
^^^^^^^^^^
As you can see, the command described above is filtering the input and only
importing some of the columns from the example data file. Here's what gets
loaded in the PostgreSQL database:
loaded in the PostgreSQL database::
pgloader# table csv;
a | b | c | d

View File

@ -1,20 +1,22 @@
# Loading dBase files with pgloader
Loading dBase files with pgloader
---------------------------------
The dBase format is still in use in some places as modern tools such as
*Filemaker* and *Excel* offer some level of support for it. Speaking of
support in modern tools, pgloader is right there on the list too!
## The Command
The Command
^^^^^^^^^^^
To load data with [pgloader](http://pgloader.io/) you need to define in a
*command* the operations in some details. Here's our example for loading a
dBase file, using a file provided by the french administration.
To load data with pgloader you need to define in a *command* the operations in
some details. Here's our example for loading a dBase file, using a file
provided by the french administration.
You can find more files from them at the
[Insee](http://www.insee.fr/fr/methodes/nomenclatures/cog/telechargement.asp)
You can find more files from them at the `Insee
<http://www.insee.fr/fr/methodes/nomenclatures/cog/telechargement.asp>`_
website.
Here's our command:
Here's our command::
LOAD DBF
FROM http://www.insee.fr/fr/methodes/nomenclatures/cog/telechargement/2013/dbf/historiq2013.zip
@ -22,17 +24,14 @@ Here's our command:
WITH truncate, create table
SET client_encoding TO 'latin1';
You can see the full list of options in the
[pgloader reference manual](pgloader.1.html), with a complete description
of the options you see here.
Note that here pgloader will benefit from the meta-data information found in
the dBase file to create a PostgreSQL table capable of hosting the data as
described, then load the data.
## Loading the data
Loading the data
^^^^^^^^^^^^^^^^
Let's start the `pgloader` command with our `dbf-zip.load` command file:
Let's start the `pgloader` command with our `dbf-zip.load` command file::
$ pgloader dbf-zip.load
... LOG Starting pgloader, log system is ready.
@ -50,7 +49,7 @@ Let's start the `pgloader` command with our `dbf-zip.load` command file:
----------------- --------- --------- --------- --------------
Total import time 9181 9181 0 1.906s
We can see that [http://pgloader.io](pgloader) did download the file from
We can see that `pgloader <http://pgloader.io>`_ did download the file from
its HTTP URL location then *unziped* it before the loading itself.
Note that the output of the command has been edited to facilitate its

View File

@ -1,22 +1,24 @@
# Loading Fixed Width Data File with pgloader
Loading Fixed Width Data File with pgloader
-------------------------------------------
Some data providers still use a format where each column is specified with a
starting index position and a given length. Usually the columns are
blank-padded when the data is shorter than the full reserved range.
## The Command
The Command
^^^^^^^^^^^
To load data with [pgloader](http://pgloader.io/) you need to define in a
*command* the operations in some details. Here's our example for loading
Fixed Width Data, using a file provided by the US census.
To load data with pgloader you need to define in a *command* the operations in
some details. Here's our example for loading Fixed Width Data, using a file
provided by the US census.
You can find more files from them at the
[Census 2000 Gazetteer Files](http://www.census.gov/geo/maps-data/data/gazetteer2000.html).
Here's our command:
Here's our command::
LOAD ARCHIVE
FROM http://www.census.gov/geo/maps-data/data/docs/gazetteer/places2k.zip
FROM http://www2.census.gov/geo/docs/maps-data/data/gazetteer/places2k.zip
INTO postgresql:///pgloader
BEFORE LOAD DO
@ -52,14 +54,11 @@ Here's our command:
usps, fips, fips_code, "LocationName"
);
You can see the full list of options in the
[pgloader reference manual](pgloader.1.html), with a complete description
of the options you see here.
## The Data
The Data
^^^^^^^^
This command allows loading the following file content, where we are only
showing the first couple of lines:
showing the first couple of lines::
AL0100124Abbeville city 2987 1353 40301945 120383 15.560669 0.046480 31.566367 -85.251300
AL0100460Adamsville city 4965 2042 50779330 14126 19.606010 0.005454 33.590411 -86.949166
@ -69,14 +68,15 @@ showing the first couple of lines:
AL0100988Albertville city 17247 7090 67212867 258738 25.951034 0.099899 34.265362 -86.211261
AL0101132Alexander City city 15008 6855 100534344 433413 38.816529 0.167342 32.933157 -85.936008
## Loading the data
Loading the data
^^^^^^^^^^^^^^^^
Let's start the `pgloader` command with our `census-places.load` command file:
Let's start the `pgloader` command with our `census-places.load` command file::
$ pgloader census-places.load
... LOG Starting pgloader, log system is ready.
... LOG Parsing commands from file "/Users/dim/dev/pgloader/test/census-places.load"
... LOG Fetching 'http://www.census.gov/geo/maps-data/data/docs/gazetteer/places2k.zip'
... LOG Fetching 'http://www2.census.gov/geo/docs/maps-data/data/gazetteer/places2k.zip'
... LOG Extracting files from archive '//private/var/folders/w7/9n8v8pw54t1gngfff0lj16040000gn/T/pgloader//places2k.zip'
table name read imported errors time
@ -89,8 +89,8 @@ Let's start the `pgloader` command with our `census-places.load` command file:
----------------- --------- --------- --------- --------------
Total import time 25375 25375 0 3.019s
We can see that [http://pgloader.io](pgloader) did download the file from
its HTTP URL location then *unziped* it before the loading itself.
We can see that pgloader did download the file from its HTTP URL location
then *unziped* it before the loading itself.
Note that the output of the command has been edited to facilitate its
browsing online.

View File

@ -1,15 +1,16 @@
# Loading MaxMind Geolite Data with pgloader
Loading MaxMind Geolite Data with pgloader
------------------------------------------
The [MaxMind](http://www.maxmind.com/) provides a free dataset for
`MaxMind <http://www.maxmind.com/>`_ provides a free dataset for
geolocation, which is quite popular. Using pgloader you can download the
lastest version of it, extract the CSV files from the archive and load their
content into your database directly.
## The Command
The Command
^^^^^^^^^^^
To load data with [pgloader](http://pgloader.io/) you need to define in a
*command* the operations in some details. Here's our example for loading the
Geolite data:
To load data with pgloader you need to define in a *command* the operations
in some details. Here's our example for loading the Geolite data::
/*
* Loading from a ZIP archive containing CSV files. The full test can be
@ -92,27 +93,24 @@ Geolite data:
FINALLY DO
$$ create index blocks_ip4r_idx on geolite.blocks using gist(iprange); $$;
You can see the full list of options in the
[pgloader reference manual](pgloader.1.html), with a complete description
of the options you see here.
Note that while the *Geolite* data is using a pair of integers (*start*,
*end*) to represent *ipv4* data, we use the very poweful
[ip4r](https://github.com/RhodiumToad/ip4r) PostgreSQL Extension instead.
*end*) to represent *ipv4* data, we use the very poweful `ip4r
<https://github.com/RhodiumToad/ip4r>`_ PostgreSQL Extension instead.
The transformation from a pair of integers into an IP is done dynamically by
the pgloader process.
Also, the location is given as a pair of *float* columns for the *longitude*
and the *latitude* where PostgreSQL offers the
[point](http://www.postgresql.org/docs/9.3/interactive/functions-geometry.html)
`point <http://www.postgresql.org/docs/9.3/interactive/functions-geometry.html>`_
datatype, so the pgloader command here will actually transform the data on
the fly to use the appropriate data type and its input representation.
## Loading the data
Loading the data
^^^^^^^^^^^^^^^^
Here's how to start loading the data. Note that the ouput here has been
edited so as to facilitate its browsing online.
edited so as to facilitate its browsing online::
$ pgloader archive.load
... LOG Starting pgloader, log system is ready.
@ -135,12 +133,12 @@ edited so as to facilitate its browsing online.
The timing of course includes the transformation of the *1.9 million* pairs
of integer into a single *ipv4 range* each. The *finally* step consists of
creating the *GiST* specialized index as given in the main command:
creating the *GiST* specialized index as given in the main command::
CREATE INDEX blocks_ip4r_idx ON geolite.blocks USING gist(iprange);
That index will then be used to speed up queries wanting to find which
recorded geolocation contains a specific IP address:
recorded geolocation contains a specific IP address::
ip4r> select *
from geolite.location l

View File

@ -1,21 +1,92 @@
# Migrating from MySQL with pgloader
Migrating from MySQL to PostgreSQL
----------------------------------
If you want to migrate your data over to
[PostgreSQL](http://www.postgresql.org) from MySQL then pgloader is the tool
of choice!
If you want to migrate your data over to `PostgreSQL
<http://www.postgresql.org>`_ from MySQL then pgloader is the tool of
choice!
Most tools around are skipping the main problem with migrating from MySQL,
which is to do with the type casting and data sanitizing that needs to be
done. pgloader will not leave you alone on those topics.
## The Command
In a Single Command Line
^^^^^^^^^^^^^^^^^^^^^^^^
To load data with [pgloader](http://pgloader.tapoueh.org/) you need to
define in a *command* the operations in some details. Here's our example for
loading the
[MySQL Sakila Sample Database](http://dev.mysql.com/doc/sakila/en/):
As an example, we will use the f1db database from <http://ergast.com/mrd/>
which which provides a historical record of motor racing data for
non-commercial purposes. You can either use their API or download the whole
database at `http://ergast.com/downloads/f1db.sql.gz
<http://ergast.com/downloads/f1db.sql.gz>`_. Once you've done that load the
database in MySQL::
Here's our command:
$ mysql -u root
> create database f1db;
> source f1db.sql
Now let's migrate this database into PostgreSQL in a single command line::
$ createdb f1db
$ pgloader mysql://root@localhost/f1db pgsql:///f1db
Done! All with schema, table definitions, constraints, indexes, primary
keys, *auto_increment* columns turned into *bigserial* , foreign keys,
comments, and if you had some MySQL default values such as *ON UPDATE
CURRENT_TIMESTAMP* they would have been translated to a `PostgreSQL before
update trigger
<https://www.postgresql.org/docs/current/static/plpgsql-trigger.html>`_
automatically.
::
$ pgloader mysql://root@localhost/f1db pgsql:///f1db
2017-06-16T08:56:14.064000+02:00 LOG Main logs in '/private/tmp/pgloader/pgloader.log'
2017-06-16T08:56:14.068000+02:00 LOG Data errors in '/private/tmp/pgloader/'
2017-06-16T08:56:19.542000+02:00 LOG report summary reset
table name read imported errors total time
------------------------- --------- --------- --------- --------------
fetch meta data 33 33 0 0.365s
Create Schemas 0 0 0 0.007s
Create SQL Types 0 0 0 0.006s
Create tables 26 26 0 0.068s
Set Table OIDs 13 13 0 0.012s
------------------------- --------- --------- --------- --------------
f1db.constructorresults 11011 11011 0 0.205s
f1db.circuits 73 73 0 0.150s
f1db.constructors 208 208 0 0.059s
f1db.constructorstandings 11766 11766 0 0.365s
f1db.drivers 841 841 0 0.268s
f1db.laptimes 413578 413578 0 2.892s
f1db.driverstandings 31420 31420 0 0.583s
f1db.pitstops 5796 5796 0 2.154s
f1db.races 976 976 0 0.227s
f1db.qualifying 7257 7257 0 0.228s
f1db.seasons 68 68 0 0.527s
f1db.results 23514 23514 0 0.658s
f1db.status 133 133 0 0.130s
------------------------- --------- --------- --------- --------------
COPY Threads Completion 39 39 0 4.303s
Create Indexes 20 20 0 1.497s
Index Build Completion 20 20 0 0.214s
Reset Sequences 0 10 0 0.058s
Primary Keys 13 13 0 0.012s
Create Foreign Keys 0 0 0 0.000s
Create Triggers 0 0 0 0.001s
Install Comments 0 0 0 0.000s
------------------------- --------- --------- --------- --------------
Total import time 506641 506641 0 5.547s
You may need to have special cases to take care of tho, or views that you
want to materialize while doing the migration. In advanced case you can use
the pgloader command.
The Command
^^^^^^^^^^^
To load data with pgloader you need to define in a *command* the operations
in some details. Here's our example for loading the `MySQL Sakila Sample
Database <http://dev.mysql.com/doc/sakila/en/>`_.
Here's our command::
load database
from mysql://root@localhost/sakila
@ -38,10 +109,6 @@ Here's our command:
BEFORE LOAD DO
$$ create schema if not exists sakila; $$;
You can see the full list of options in the
[pgloader reference manual](pgloader.1.html), with a complete description
of the options you see here.
Note that here pgloader will benefit from the meta-data information found in
the MySQL database to create a PostgreSQL database capable of hosting the
data as described, then load the data.
@ -60,9 +127,10 @@ It's possible to use the *MATERIALIZE VIEWS* clause and give both the name
and the SQL (in MySQL dialect) definition of view, then pgloader creates the
view before loading the data, then drops it again at the end.
## Loading the data
Loading the data
^^^^^^^^^^^^^^^^
Let's start the `pgloader` command with our `sakila.load` command file:
Let's start the `pgloader` command with our `sakila.load` command file::
$ pgloader sakila.load
... LOG Starting pgloader, log system is ready.

131
docs/tutorial/sqlite.rst Normal file
View File

@ -0,0 +1,131 @@
Loading SQLite files with pgloader
----------------------------------
The SQLite database is a respected solution to manage your data with. Its
embeded nature makes it a source of migrations when a projects now needs to
handle more concurrency, which `PostgreSQL`__ is very good at. pgloader can help
you there.
__ http://www.postgresql.org/
In a Single Command Line
^^^^^^^^^^^^^^^^^^^^^^^^
You can ::
$ createdb chinook
$ pgloader https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite_AutoIncrementPKs.sqlite pgsql:///chinook
Done! All with the schema, data, constraints, primary keys and foreign keys,
etc. We also see an error with the Chinook schema that contains several
primary key definitions against the same table, which is not accepted by
PostgreSQL::
2017-06-20T16:18:59.019000+02:00 LOG Data errors in '/private/tmp/pgloader/'
2017-06-20T16:18:59.236000+02:00 LOG Fetching 'https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite_AutoIncrementPKs.sqlite'
2017-06-20T16:19:00.664000+02:00 ERROR Database error 42P16: multiple primary keys for table "playlisttrack" are not allowed
QUERY: ALTER TABLE playlisttrack ADD PRIMARY KEY USING INDEX idx_66873_sqlite_autoindex_playlisttrack_1;
2017-06-20T16:19:00.665000+02:00 LOG report summary reset
table name read imported errors total time
----------------------- --------- --------- --------- --------------
fetch 0 0 0 0.877s
fetch meta data 33 33 0 0.033s
Create Schemas 0 0 0 0.003s
Create SQL Types 0 0 0 0.006s
Create tables 22 22 0 0.043s
Set Table OIDs 11 11 0 0.012s
----------------------- --------- --------- --------- --------------
album 347 347 0 0.023s
artist 275 275 0 0.023s
customer 59 59 0 0.021s
employee 8 8 0 0.018s
invoice 412 412 0 0.031s
genre 25 25 0 0.021s
invoiceline 2240 2240 0 0.034s
mediatype 5 5 0 0.025s
playlisttrack 8715 8715 0 0.040s
playlist 18 18 0 0.016s
track 3503 3503 0 0.111s
----------------------- --------- --------- --------- --------------
COPY Threads Completion 33 33 0 0.313s
Create Indexes 22 22 0 0.160s
Index Build Completion 22 22 0 0.027s
Reset Sequences 0 0 0 0.017s
Primary Keys 12 0 1 0.013s
Create Foreign Keys 11 11 0 0.040s
Create Triggers 0 0 0 0.000s
Install Comments 0 0 0 0.000s
----------------------- --------- --------- --------- --------------
Total import time 15607 15607 0 1.669s
You may need to have special cases to take care of tho. In advanced case you
can use the pgloader command.
The Command
^^^^^^^^^^^
To load data with pgloader you need to define in a *command* the operations in
some details. Here's our command::
load database
from 'sqlite/Chinook_Sqlite_AutoIncrementPKs.sqlite'
into postgresql:///pgloader
with include drop, create tables, create indexes, reset sequences
set work_mem to '16MB', maintenance_work_mem to '512 MB';
Note that here pgloader will benefit from the meta-data information found in
the SQLite file to create a PostgreSQL database capable of hosting the data
as described, then load the data.
Loading the data
^^^^^^^^^^^^^^^^
Let's start the `pgloader` command with our `sqlite.load` command file::
$ pgloader sqlite.load
... LOG Starting pgloader, log system is ready.
... LOG Parsing commands from file "/Users/dim/dev/pgloader/test/sqlite.load"
... WARNING Postgres warning: table "album" does not exist, skipping
... WARNING Postgres warning: table "artist" does not exist, skipping
... WARNING Postgres warning: table "customer" does not exist, skipping
... WARNING Postgres warning: table "employee" does not exist, skipping
... WARNING Postgres warning: table "genre" does not exist, skipping
... WARNING Postgres warning: table "invoice" does not exist, skipping
... WARNING Postgres warning: table "invoiceline" does not exist, skipping
... WARNING Postgres warning: table "mediatype" does not exist, skipping
... WARNING Postgres warning: table "playlist" does not exist, skipping
... WARNING Postgres warning: table "playlisttrack" does not exist, skipping
... WARNING Postgres warning: table "track" does not exist, skipping
table name read imported errors time
---------------------- --------- --------- --------- --------------
create, truncate 0 0 0 0.052s
Album 347 347 0 0.070s
Artist 275 275 0 0.014s
Customer 59 59 0 0.014s
Employee 8 8 0 0.012s
Genre 25 25 0 0.018s
Invoice 412 412 0 0.032s
InvoiceLine 2240 2240 0 0.077s
MediaType 5 5 0 0.012s
Playlist 18 18 0 0.008s
PlaylistTrack 8715 8715 0 0.071s
Track 3503 3503 0 0.105s
index build completion 0 0 0 0.000s
---------------------- --------- --------- --------- --------------
Create Indexes 20 20 0 0.279s
reset sequences 0 0 0 0.043s
---------------------- --------- --------- --------- --------------
Total streaming time 15607 15607 0 0.476s
We can see that `pgloader <http://pgloader.io>`_ did download the file from
its HTTP URL location then *unziped* it before loading it.
Also, the *WARNING* messages we see here are expected as the PostgreSQL
database is empty when running the command, and pgloader is using the SQL
commands `DROP TABLE IF EXISTS` when the given command uses the `include
drop` option.
Note that the output of the command has been edited to facilitate its
browsing online.

View File

@ -0,0 +1,9 @@
Pgloader Tutorial
=================
.. include:: csv.rst
.. include:: fixed.rst
.. include:: geolite.rst
.. include:: dBase.rst
.. include:: sqlite.rst
.. include:: mysql.rst

2917
pgloader.1

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,201 +1,287 @@
;;;; pgloader.asd
(asdf:defsystem #:pgloader
:serial t
:description "Load data into PostgreSQL"
:author "Dimitri Fontaine <dimitri@2ndQuadrant.fr>"
:license "The PostgreSQL Licence"
:depends-on (#:uiop ; host system integration
#:cl-log ; logging
#:postmodern ; PostgreSQL protocol implementation
#:cl-postgres ; low level bits for COPY streaming
#:simple-date ; FIXME: recheck dependency
#:qmynd ; MySQL protocol implemenation
#:split-sequence ; some parsing is made easy
#:cl-csv ; full CSV reader
#:cl-fad ; file and directories
#:lparallel ; threads, workers, queues
#:esrap ; parser generator
#:alexandria ; utils
#:drakma ; http client, download archives
#:flexi-streams ; streams
#:usocket ; UDP / syslog
#:local-time ; UDP date parsing
#:command-line-arguments ; for the main function
#:abnf ; ABNF parser generator (for syslog)
#:db3 ; DBF version 3 file reader
#:ixf ; IBM IXF file format reader
#:py-configparser ; Read old-style INI config files
#:sqlite ; Query a SQLite file
#:cl-base64 ; Decode base64 data
#:trivial-backtrace ; For --debug cli usage
#:cl-markdown ; To produce the website
#:metabang-bind ; the bind macro
#:mssql ; M$ SQL connectivity
#:uuid ; Transforming MS SQL unique identifiers
#:quri ; decode URI parameters
)
:components
((:module "src"
:components
((:file "params")
(:file "package" :depends-on ("params"))
(:file "queue" :depends-on ("params" "package"))
:serial t
:description "Load data into PostgreSQL"
:author "Dimitri Fontaine <dim@tapoueh.org>"
:license "The PostgreSQL Licence"
:depends-on (#:uiop ; host system integration
#:cl-log ; logging
#:postmodern ; PostgreSQL protocol implementation
#:cl-postgres ; low level bits for COPY streaming
#:simple-date ; FIXME: recheck dependency
#:qmynd ; MySQL protocol implemenation
#:split-sequence ; some parsing is made easy
#:cl-csv ; full CSV reader
#:cl-fad ; file and directories
#:lparallel ; threads, workers, queues
#:esrap ; parser generator
#:alexandria ; utils
#:drakma ; http client, download archives
#:flexi-streams ; streams
#:usocket ; UDP / syslog
#:local-time ; UDP date parsing
#:command-line-arguments ; for the main function
#:db3 ; DBF version 3 file reader
#:ixf ; IBM IXF file format reader
#:py-configparser ; Read old-style INI config files
#:sqlite ; Query a SQLite file
#:cl-base64 ; Decode base64 data
#:trivial-backtrace ; For --debug cli usage
#:cl-markdown ; To produce the website
#:metabang-bind ; the bind macro
#:mssql ; M$ SQL connectivity
#:uuid ; Transforming MS SQL unique identifiers
#:quri ; decode URI parameters
#:cl-ppcre ; Perl Compatible Regular Expressions
#:cl-mustache ; Logic-less templates
#:yason ; JSON routines
#:closer-mop ; introspection
#:zs3 ; integration with AWS S3 for Redshift
)
:components
((:module "src"
:components
((:file "params")
(:file "package" :depends-on ("params"))
(:module "monkey"
:components
((:file "bind")
(:file "mssql")))
(:module "monkey"
:components
((:file "bind")
(:file "mssql")))
(:module "utils"
:depends-on ("package" "params")
:components
((:file "charsets")
(:file "threads")
(:file "logs")
(:file "monitor" :depends-on ("logs"))
(:file "state")
(:file "report" :depends-on ("state"))
(:file "utils" :depends-on ("charsets" "monitor"))
(:file "archive" :depends-on ("logs"))
(:module "utils"
:depends-on ("package" "params")
:components
((:file "charsets")
(:file "logs")
(:file "utils")
(:file "state")
;; those are one-package-per-file
(:file "transforms")
(:file "read-sql-files")))
;; user defined transforms package and pgloader
;; provided ones
(:file "transforms")
;; generic connection api
(:file "connection" :depends-on ("utils"))
;; PostgreSQL related utils
(:file "read-sql-files")
(:file "queries")
(:file "quoting" :depends-on ("utils"))
(:file "catalog" :depends-on ("quoting"))
(:file "alter-table" :depends-on ("catalog"))
(:file "citus" :depends-on ("catalog"))
;; some table name and schema facilities
(:file "schema" :depends-on ("package"))
;; State, monitoring, reporting
(:file "reject" :depends-on ("state"))
(:file "pretty-print-state" :depends-on ("state"))
(:file "report" :depends-on ("state"
"pretty-print-state"
"utils"
"catalog"))
(:file "monitor" :depends-on ("logs"
"state"
"reject"
"report"))
(:file "threads" :depends-on ("monitor"))
(:file "archive" :depends-on ("monitor"))
;; package pgloader.pgsql
(:module pgsql
:depends-on ("package" "params" "utils" "connection")
:components
((:file "copy-format")
(:file "queries")
(:file "schema")
(:file "pgsql"
:depends-on ("copy-format"
"queries"
"schema"))))
;; generic connection api
(:file "connection" :depends-on ("monitor"
"archive"))))
(:module "parsers"
:depends-on ("params" "package" "utils"
"pgsql" "monkey" "connection")
:serial t
:components
((:file "parse-ini")
(:file "command-utils")
(:file "command-keywords")
(:file "command-regexp")
(:file "command-db-uri")
(:file "command-source")
(:file "command-options")
(:file "command-sql-block")
(:file "command-csv")
(:file "command-ixf")
(:file "command-fixed")
(:file "command-copy")
(:file "command-dbf")
(:file "command-cast-rules")
(:file "command-mysql")
(:file "command-mssql")
(:file "command-sqlite")
(:file "command-archive")
(:file "command-parser")
(:file "date-format")))
;; package pgloader.pgsql
(:module pgsql
:depends-on ("package" "params" "utils")
:serial t
:components
((:file "connection")
(:file "pgsql-ddl")
(:file "pgsql-ddl-citus")
(:file "pgsql-schema")
(:file "merge-catalogs" :depends-on ("pgsql-schema"))
(:file "pgsql-trigger")
(:file "pgsql-index-filter")
(:file "pgsql-finalize-catalogs")
(:file "pgsql-create-schema"
:depends-on ("pgsql-trigger"))))
;; Source format specific implementations
(:module sources
:depends-on ("monkey" ; mssql driver patches
"params"
"package"
"connection"
"pgsql"
"utils"
"parsers"
"queue")
:components
((:module "common"
:components
((:file "api")
(:file "casting-rules")
(:file "files-and-pathnames")
(:file "project-fields")))
;; Source format specific implementations
(:module sources
:depends-on ("monkey" ; mssql driver patches
"params"
"package"
"pgsql"
"utils")
:components
((:module "common"
:serial t
:components
((:file "api")
(:file "methods")
(:file "md-methods")
(:file "matviews")
(:file "casting-rules")
(:file "files-and-pathnames")
(:file "project-fields")))
(:module "csv"
:depends-on ("common")
:components
((:file "csv-guess")
(:file "csv-database")
(:file "csv")))
(:module "csv"
:depends-on ("common")
:components
((:file "csv-guess")
;; (:file "csv-database")
(:file "csv")))
(:file "fixed"
:depends-on ("common" "csv"))
(:module "fixed"
:depends-on ("common")
:serial t
:components
((:file "fixed-guess")
(:file "fixed")))
(:file "copy"
:depends-on ("common" "csv"))
(:file "copy"
:depends-on ("common" "csv"))
(:module "db3"
:depends-on ("common" "csv")
:components
((:file "db3-schema")
(:file "db3" :depends-on ("db3-schema"))))
(:module "db3"
:serial t
:depends-on ("common" "csv")
:components
((:file "db3-cast-rules")
(:file "db3-connection")
(:file "db3-schema")
(:file "db3")))
(:module "ixf"
:depends-on ("common")
:components
((:file "ixf-schema")
(:file "ixf" :depends-on ("ixf-schema"))))
(:module "ixf"
:serial t
:depends-on ("common")
:components
((:file "ixf-cast-rules")
(:file "ixf-connection")
(:file "ixf-schema")
(:file "ixf" :depends-on ("ixf-schema"))))
;(:file "syslog") ; experimental...
;(:file "syslog") ; experimental...
(:module "sqlite"
:depends-on ("common")
:components
((:file "sqlite-cast-rules")
(:file "sqlite-schema"
:depends-on ("sqlite-cast-rules"))
(:file "sqlite"
:depends-on ("sqlite-cast-rules"
"sqlite-schema"))))
(:module "sqlite"
:serial t
:depends-on ("common")
:components
((:file "sqlite-cast-rules")
(:file "sqlite-connection")
(:file "sqlite-schema")
(:file "sqlite")))
(:module "mssql"
:depends-on ("common")
:components
((:file "mssql-cast-rules")
(:file "mssql-schema"
:depends-on ("mssql-cast-rules"))
(:file "mssql"
:depends-on ("mssql-cast-rules"
"mssql-schema"))))
(:module "mssql"
:serial t
:depends-on ("common")
:components
((:file "mssql-cast-rules")
(:file "mssql-connection")
(:file "mssql-schema")
(:file "mssql")
(:file "mssql-index-filters")))
(:module "mysql"
:depends-on ("common")
:components
((:file "mysql-cast-rules")
(:file "mysql-schema"
:depends-on ("mysql-cast-rules"))
(:file "mysql-csv"
:depends-on ("mysql-schema"))
(:file "mysql"
:depends-on ("mysql-cast-rules"
"mysql-schema"))))))
(:module "mysql"
:serial t
:depends-on ("common")
:components
((:file "mysql-cast-rules")
(:file "mysql-connection")
(:file "mysql-schema")
(:file "mysql")))
;; the main entry file, used when building a stand-alone
;; executable image
(:file "main" :depends-on ("params"
"package"
"utils"
"parsers"
"sources"))))
(:module "pgsql"
:serial t
:depends-on ("common")
:components ((:file "pgsql-cast-rules")
(:file "pgsql")))))
;; to produce the website
(:module "web"
:components
((:module src
:components
((:file "docs")))))))
;; package pgloader.copy
(:module "pg-copy"
:depends-on ("params"
"package"
"utils"
"pgsql"
"sources")
:serial t
:components
((:file "copy-batch")
(:file "copy-format")
(:file "copy-db-write")
(:file "copy-rows-in-stream")
(:file "copy-rows-in-batch")
(:file "copy-rows-in-batch-through-s3")
(:file "copy-retry-batch")
(:file "copy-from-queue")))
(:module "load"
:depends-on ("params"
"package"
"utils"
"pgsql"
"sources")
:serial t
:components
((:file "api")
(:file "copy-data")
(:file "load-file")
(:file "migrate-database")))
(:module "parsers"
:depends-on ("params"
"package"
"utils"
"pgsql"
"sources"
"monkey")
:serial t
:components
((:file "parse-ini")
(:file "template")
(:file "command-utils")
(:file "command-keywords")
(:file "command-regexp")
(:file "parse-pgpass")
(:file "command-db-uri")
(:file "command-source")
(:file "command-options")
(:file "command-sql-block")
(:file "command-sexp")
(:file "command-csv")
(:file "command-ixf")
(:file "command-fixed")
(:file "command-copy")
(:file "command-dbf")
(:file "command-cast-rules")
(:file "command-materialize-views")
(:file "command-alter-table")
(:file "command-distribute")
(:file "command-mysql")
(:file "command-including-like")
(:file "command-mssql")
(:file "command-sqlite")
(:file "command-pgsql")
(:file "command-archive")
(:file "command-parser")
(:file "parse-sqlite-type-name")
(:file "date-format")))
;; the main entry file, used when building a stand-alone
;; executable image
(:file "api" :depends-on ("params"
"package"
"utils"
"parsers"
"sources"))
(:module "regress"
:depends-on ("params" "package" "utils" "pgsql" "api")
:components ((:file "regress")))
(:file "main" :depends-on ("params"
"package"
"utils"
"parsers"
"sources"
"api"
"regress"))))))

View File

@ -1,24 +0,0 @@
#!/bin/sh
#|
exec sbcl --script "$0" $@
|#
;;; load the necessary components then parse the command line
;;; and launch the work
#-quicklisp
(let ((quicklisp-init (merge-pathnames "quicklisp/setup.lisp"
(user-homedir-pathname))))
(when (probe-file quicklisp-init)
(load quicklisp-init)))
;; now is the time to load our Quicklisp project
(format t "Loading quicklisp and the pgloader project and its dependencies...")
(terpri)
(with-output-to-string (*standard-output*)
(ql:quickload '(:pgloader)))
(in-package #:pgloader)
;;; actually call the main function, too
(main SB-EXT:*POSIX-ARGV*)

View File

@ -1,11 +1,22 @@
Summary: extract, transform and load data into PostgreSQL
Name: pgloader
Version: 3.2.1.preview
Version: 3.6.10
Release: 22%{?dist}
License: The PostgreSQL Licence
Group: System Environment/Base
Source: %{name}-%{version}.tar.gz
URL: https://github.com/dimitri/pgloader
Source0: %{url}/archive/v%{version}.tar.gz
BuildRequires: sbcl
BuildRequires: freetds-devel
BuildRequires: openssl-devel
BuildRequires: sqlite-devel
BuildRequires: zlib-devel
Requires: freetds
Requires: openssl-devel
Requires: sbcl
Requires: zlib
Requires: sqlite
%description
pgloader imports data from different kind of sources and COPY it into
@ -22,7 +33,7 @@ PostgreSQL. In the MySQL case it's possible to edit CASTing rules from the
pgloader command directly.
%prep
%setup -q -n %{name}
%setup -q -n %{name}-%{version}
%build
%define debug_package %{nil}
@ -35,11 +46,20 @@ mkdir -p $RPM_BUILD_ROOT/etc/prelink.conf.d
echo '-b /usr/bin/pgloader' > $RPM_BUILD_ROOT/etc/prelink.conf.d/%{name}.conf
%files
%doc README.md pgloader.1.md
%doc README.md
%{_bindir}/*
/etc/prelink.conf.d/%{name}.conf
%changelog
* Sun Mar 22 2020 Michał "phoe" Herda <phoe@disroot.org> - 3.6.2
- Release 3.6.2
* Tue Sep 24 2019 Phil Ingram <pingram.au@gmail.com> - 3.6.1
- Release 3.6.1
- Use Requires and BuildRequires
- Variablise Source0
- Fix Files
* Thu Jan 22 2015 Dimitri Fontaine <dimitri@2ndQuadrant.fr> - 3.2.1.preview-22
- Release 3.2.1.preview

278
src/api.lisp Normal file
View File

@ -0,0 +1,278 @@
;;;
;;; The main API, or an attempt at providing pgloader as a lisp usable API
;;; rather than only an end-user program.
;;;
(in-package #:pgloader)
(define-condition source-definition-error (error)
((mesg :initarg :mesg :reader source-definition-error-mesg))
(:report (lambda (err stream)
(format stream "~a" (source-definition-error-mesg err)))))
(define-condition cli-parsing-error (error) ()
(:report (lambda (err stream)
(declare (ignore err))
(format stream "Could not parse the command line: see above."))))
(define-condition load-files-not-found-error (error)
((filename-list :initarg :filename-list))
(:report (lambda (err stream)
(format stream
;; start lines with 3 spaces because of trivial-backtrace
"~{No such file or directory: ~s~^~% ~}"
(slot-value err 'filename-list)))))
;;;
;;; Helper functions to actually do things
;;;
(defun process-command-file (filename-list &key (flush-summary t))
"Process each FILENAME in FILENAME-LIST as a pgloader command
file (.load)."
(loop :for filename :in filename-list
:for truename := (probe-file filename)
:unless truename :collect filename :into not-found-list
:do (if truename
(run-commands truename
:start-logger nil
:flush-summary flush-summary)
(log-message :error "Can not find file: ~s" filename))
:finally (when not-found-list
(error 'load-files-not-found-error :filename-list not-found-list))))
(defun process-source-and-target (source-string target-string
&optional
type encoding set with field cast
before after)
"Given exactly 2 CLI arguments, process them as source and target URIs.
Parameters here are meant to be already parsed, see parse-cli-optargs."
(let* ((type (handler-case
(parse-cli-type type)
(condition (e)
(log-message :warning
"Could not parse --type ~s: ~a"
type e))))
(source-uri (handler-case
(if type
(parse-source-string-for-type type source-string)
(parse-source-string source-string))
(condition (e)
(log-message :warning
"Could not parse source string ~s: ~a"
source-string e))))
(type (when (and source-string
(typep source-uri 'connection))
(parse-cli-type (conn-type source-uri))))
(target-uri (handler-case
(parse-target-string target-string)
(condition (e)
(log-message :error
"Could not parse target string ~s: ~a"
target-string e)))))
;; some verbosity about the parsing "magic"
(log-message :info " SOURCE: ~s" source-string)
(log-message :info "SOURCE URI: ~s" source-uri)
(log-message :info " TARGET: ~s" target-string)
(log-message :info "TARGET URI: ~s" target-uri)
(cond ((and (null source-uri) (null target-uri))
(process-command-file (list source-string target-string)))
((or (null source-string) (null source-uri))
(log-message :fatal
"Failed to parse ~s as a source URI." source-string)
(log-message :log "You might need to use --type."))
((or (null target-string) (null target-uri))
(log-message :fatal
"Failed to parse ~s as a PostgreSQL database URI."
target-string)))
(let* ((nb-errors 0)
(options (handler-case
(parse-cli-options type with)
(condition (e)
(incf nb-errors)
(log-message :error "Could not parse --with ~s:" with)
(log-message :error "~a" e))))
(fields (handler-case
(parse-cli-fields type field)
(condition (e)
(incf nb-errors)
(log-message :error "Could not parse --fields ~s:" field)
(log-message :error "~a" e)))))
(destructuring-bind (&key encoding gucs casts before after)
(loop :for (keyword option user-string parse-fn)
:in `((:encoding "--encoding" ,encoding ,#'parse-cli-encoding)
(:gucs "--set" ,set ,#'parse-cli-gucs)
(:casts "--cast" ,cast ,#'parse-cli-casts)
(:before "--before" ,before ,#'parse-sql-file)
(:after "--after" ,after ,#'parse-sql-file))
:append (list keyword
(handler-case
(funcall parse-fn user-string)
(condition (e)
(incf nb-errors)
(log-message :error "Could not parse ~a ~s: ~a"
option user-string e)))))
(unless (= 0 nb-errors)
(error 'cli-parsing-error))
;; so, we actually have all the specs for the
;; job on the command line now.
(when (and source-uri target-uri (= 0 nb-errors))
(load-data :from source-uri
:into target-uri
:encoding encoding
:options options
:gucs gucs
:fields fields
:casts casts
:before before
:after after
:start-logger nil))))))
;;;
;;; Helper function to run a given command
;;;
(defun run-commands (source
&key
(start-logger t)
(flush-summary t)
((:summary *summary-pathname*) *summary-pathname*)
((:log-filename *log-filename*) *log-filename*)
((:log-min-messages *log-min-messages*) *log-min-messages*)
((:client-min-messages *client-min-messages*) *client-min-messages*))
"SOURCE can be a function, which is run, a list, which is compiled as CL
code then run, a pathname containing one or more commands that are parsed
then run, or a commands string that is then parsed and each command run."
(with-monitor (:start-logger start-logger)
(let* ((*print-circle* nil)
(funcs
(typecase source
(function (list source))
(list (list (compile-lisp-command source)))
(pathname (mapcar #'compile-lisp-command
(parse-commands-from-file source)))
(t (mapcar #'compile-lisp-command
(if (probe-file source)
(parse-commands-from-file source)
(parse-commands source)))))))
(loop :for func :in funcs
:do (funcall func)
:do (when flush-summary
(flush-summary :reset t))))))
(defun compile-lisp-command (source)
"SOURCE must be lisp source code, a list form."
(let (function warnings-p failure-p notes)
;; capture the compiler notes and warnings
(setf notes
(with-output-to-string (stream)
(let ((*standard-output* stream)
(*error-output* stream)
(*trace-output* stream))
(with-compilation-unit (:override t)
(setf (values function warnings-p failure-p)
(compile nil source))))))
;; log the captured compiler output at the DEBUG level
(when (and notes (string/= notes ""))
(let ((pp-source (with-output-to-string (s) (pprint source s))))
(log-message :debug "While compiling:~%~a~%~a" pp-source notes)))
;; and signal an error if we failed to compile our lisp code
(cond
(failure-p (error "Failed to compile code: ~a~%~a" source notes))
(warnings-p function)
(t function))))
;;;
;;; Main API to use from outside of pgloader.
;;;
(defun load-data (&key ((:from source)) ((:into target))
encoding fields target-table-name
options gucs casts before after
(start-logger t) (flush-summary t))
"Load data from SOURCE into TARGET."
(declare (type connection source)
(type pgsql-connection target))
(when (and (typep source (or 'csv-connection
'copy-connection
'fixed-connection))
(null target-table-name)
(null (pgconn-table-name target)))
(error 'source-definition-error
:mesg (format nil
"~a data source require a table name target."
(conn-type source))))
(with-monitor (:start-logger start-logger)
(when (and casts (not (member (type-of source)
'(sqlite-connection
mysql-connection
mssql-connection))))
(log-message :log "Cast rules are ignored for this sources."))
;; now generates the code for the command
(log-message :debug "LOAD DATA FROM ~s" source)
(let* ((target-table-name (or target-table-name
(pgconn-table-name target)))
(code (lisp-code-for-loading :from source
:into target
:encoding encoding
:fields fields
:target-table-name target-table-name
:options options
:gucs gucs
:casts casts
:before before
:after after)))
(run-commands (process-relative-pathnames (uiop:getcwd) code)
:start-logger nil
:flush-summary flush-summary))))
(defvar *get-code-for-source*
(list (cons 'copy-connection #'lisp-code-for-loading-from-copy)
(cons 'fixed-connection #'lisp-code-for-loading-from-fixed)
(cons 'csv-connection #'lisp-code-for-loading-from-csv)
(cons 'dbf-connection #'lisp-code-for-loading-from-dbf)
(cons 'ixf-connection #'lisp-code-for-loading-from-ixf)
(cons 'sqlite-connection #'lisp-code-for-loading-from-sqlite)
(cons 'mysql-connection #'lisp-code-for-loading-from-mysql)
(cons 'mssql-connection #'lisp-code-for-loading-from-mssql)
(cons 'pgsql-connection #'lisp-code-for-loading-from-pgsql))
"Each source type might require a different set of options.")
(defun lisp-code-for-loading (&key
((:from source)) ((:into target))
encoding fields target-table-name
options gucs casts before after)
(let ((func (cdr (assoc (type-of source) *get-code-for-source*))))
;; not all functions support the same set of &key parameters,
;; they all have &allow-other-keys in their signature tho.
(assert (not (null func)))
(if func
(funcall func
source
target
:target-table-name target-table-name
:fields fields
:encoding (or encoding :default)
:gucs gucs
:casts casts
:options options
:before before
:after after
:allow-other-keys t))))

View File

@ -8,9 +8,11 @@
(in-package :cl-user)
;;
;; ccl provides an implementation of getenv already.
;;
#+sbcl
(defun getenv (name &optional default)
"Return the current value for the environment variable NAME, or default
when unset."
(or #+sbcl (sb-ext:posix-getenv name)
#+ccl (ccl:getenv name)
default))
(or (sb-ext:posix-getenv name) default))

View File

@ -11,10 +11,20 @@
(in-package #:cl-user)
;; So that we can #+pgloader-image some code away, see main.lisp
(push :pgloader-image *features*)
;;;
;;; We need to support *print-circle* for the debug traces of the catalogs,
;;; and while at it let's enforce *print-pretty* too.
;;;
(setf *print-circle* t *print-pretty* t)
(defun close-foreign-libs ()
"Close Foreign libs in use by pgloader at application save time."
(let (#+sbcl (sb-ext:*muffled-warnings* 'style-warning))
(mapc #'cffi:close-foreign-library '(cl+ssl::libssl
cl+ssl::libcrypto
mssql::sybdb))))
(defun open-foreign-libs ()
@ -22,7 +32,10 @@
(let (#+sbcl (sb-ext:*muffled-warnings* 'style-warning))
;; we specifically don't load mssql::sybdb eagerly, it's getting loaded
;; in only when the data source is a MS SQL database.
(cffi:load-foreign-library 'cl+ssl::libssl)))
;;
;; and for CL+SSL, we need to call the specific reload function that
;; handles some context and things around loading with CFFI.
(cl+ssl:reload)))
#+ccl (push #'open-foreign-libs *lisp-startup-functions*)
#+sbcl (push #'open-foreign-libs sb-ext:*init-hooks*)
@ -34,6 +47,10 @@
;;; Register all loaded systems in the image, so that ASDF don't search for
;;; them again when doing --self-upgrade
;;;
;;; FIXME: this idea kept failing.
#|
(defun register-preloaded-system (system)
(unless (string= "pgloader" (asdf::coerce-name system))
(let ((version (slot-value system 'asdf::version)))
@ -43,3 +60,12 @@
(setf pgloader::*self-upgrade-immutable-systems*
(remove "pgloader" (asdf:already-loaded-systems) :test #'string=))
(defun list-files-to-load-for-system (system-name)
(loop for (o . c) in (asdf/plan:plan-actions
(asdf/plan:make-plan 'asdf/plan:sequential-plan
'asdf:load-source-op
(asdf:find-system system-name)))
when (typep o 'asdf:load-source-op)
append (asdf:input-files o c)))
|#

71
src/load/api.lisp Normal file
View File

@ -0,0 +1,71 @@
;;;
;;; Generic API for pgloader data loading and database migrations.
;;;
(in-package :pgloader.load)
(defgeneric copy-from (source &key)
(:documentation
"Load data from SOURCE into its target as defined by the SOURCE object."))
;; That one is more an export than a load. It always export to a single very
;; well defined format, the importing utility is defined in
;; src/pgsql-copy-format.lisp
(defgeneric copy-to (source filename)
(:documentation
"Load data from SOURCE and serialize it into FILENAME, using PostgreSQL
COPY TEXT format."))
;; The next generic function is only to get instanciated for sources
;; actually containing more than a single source item (tables, collections,
;; etc)
(defgeneric copy-database (source
&key
worker-count
concurrency
max-parallel-create-index
truncate
data-only
schema-only
create-tables
include-drop
foreign-keys
create-indexes
reset-sequences
disable-triggers
materialize-views
set-table-oids
including
excluding)
(:documentation
"Auto-discover source schema, convert it to PostgreSQL, migrate the data
from the source definition to PostgreSQL for all the discovered
items (tables, collections, etc), then reset the PostgreSQL sequences
created by SERIAL columns in the first step.
The target tables are automatically discovered, the only-tables
parameter allows to filter them out."))
(defgeneric prepare-pgsql-database (db-copy catalog
&key
truncate
create-tables
create-schemas
drop-indexes
set-table-oids
materialize-views
foreign-keys
include-drop)
(:documentation "Prepare the target PostgreSQL database."))
(defgeneric complete-pgsql-database (db-copy catalog pkeys
&key
foreign-keys
create-indexes
create-triggers
reset-sequences)
(:documentation "Alter load duties for database sources copy support."))

156
src/load/copy-data.lisp Normal file
View File

@ -0,0 +1,156 @@
;;;
;;; Generic API for pgloader sources
;;;
(in-package :pgloader.load)
;;;
;;; Common API implementation
;;;
(defmethod queue-raw-data ((copy copy) rawq concurrency)
"Stream data as read by the map-queue method on the COPY argument into QUEUE,
as given."
(log-message :notice "COPY ~a ~@[with ~d rows estimated~] [~a/~a]"
(format-table-name (target copy))
(table-row-count-estimate (target copy))
(lp:kernel-worker-index)
(lp:kernel-worker-count))
(log-message :debug "Reader started for ~a" (format-table-name (target copy)))
(let* ((start-time (get-internal-real-time))
(row-count 0)
(process-row
(if (or (eq :data *log-min-messages*)
(eq :data *client-min-messages*))
;; when debugging, use a lambda with debug traces
(lambda (row)
(log-message :data "< ~s" row)
(lq:push-queue row rawq)
(incf row-count))
;; usual non-debug case
(lambda (row)
(lq:push-queue row rawq)
(incf row-count)))))
;; signal we are starting
(update-stats :data (target copy) :start start-time)
;; call the source-specific method for reading input data
(map-rows copy :process-row-fn process-row)
;; process last batches and send them to queues
;; and mark end of stream
(loop :repeat concurrency :do (lq:push-queue :end-of-data rawq))
(let ((seconds (elapsed-time-since start-time)))
(log-message :debug "Reader for ~a is done in ~6$s"
(format-table-name (target copy)) seconds)
(update-stats :data (target copy) :read row-count :rs seconds)
(list :reader (target copy) seconds))))
(defmethod copy-to ((copy copy) pgsql-copy-filename)
"Extract data from COPY file into a PotgreSQL COPY TEXT formated file"
(with-open-file (text-file pgsql-copy-filename
:direction :output
:if-exists :supersede
:external-format :utf-8)
(let ((row-fn (lambda (row)
(format-vector-row text-file row (transforms copy)))))
(map-rows copy :process-row-fn row-fn))))
(defmethod copy-from ((copy copy)
&key
(kernel nil k-s-p)
(channel nil c-s-p)
(worker-count 8)
(concurrency 2)
(multiple-readers nil)
(on-error-stop *on-error-stop*)
disable-triggers)
"Copy data from COPY source into PostgreSQL."
(let* ((table-name (format-table-name (target copy)))
(lp:*kernel* (or kernel (make-kernel worker-count)))
(channel (or channel (lp:make-channel)))
(readers nil)
(task-count 0))
(flet ((submit-task (channel function &rest args)
(apply #'lp:submit-task channel function args)
(incf task-count)))
(lp:task-handler-bind
(#+pgloader-image
(copy-init-error
#'(lambda (condition)
;; stop the other tasks and then transfer the control
(log-message :log "COPY INIT ERROR")
(lp:invoke-transfer-error condition)))
(on-error-stop
#'(lambda (condition)
(log-message :log "ON ERROR STOP")
(lp:kill-tasks :default)
(lp:invoke-transfer-error condition)))
#+pgloader-image
(error
#'(lambda (condition)
(log-message :error "A thread failed with error: ~a" condition)
(log-message :error "~a"
(trivial-backtrace:print-backtrace condition
:output nil))
(lp::invoke-transfer-error condition))))
;; Check for Read Concurrency Support from our source
(when (and multiple-readers (< 1 concurrency))
(let ((label "Check Concurrency Support"))
(with-stats-collection (label :section :pre)
(setf readers (concurrency-support copy concurrency))
(update-stats :pre label :read 1 :rows (if readers 1 0))
(when readers
(log-message :notice "Multiple Readers Enabled for ~a"
(format-table-name (target copy)))))))
;; when reader is non-nil, we have reader concurrency support!
(if readers
;; here we have detected Concurrency Support: we create as many
;; readers as writers and create associated couples, each couple
;; shares its own queue
(let ((rawqs
(loop :repeat concurrency :collect
(lq:make-queue :fixed-capacity *prefetch-rows*))))
(log-message :info "Read Concurrency Enabled for ~s"
(format-table-name (target copy)))
(loop :for rawq :in rawqs :for reader :in readers :do
;; each reader pretends to be alone, pass 1 as concurrency
(submit-task channel #'queue-raw-data reader rawq 1)
(submit-task channel #'copy-rows-from-queue
copy rawq
:on-error-stop on-error-stop
:disable-triggers disable-triggers)))
;; no Read Concurrency Support detected, start a single reader
;; task, using a single data queue that is read by multiple
;; writers.
(let ((rawq
(lq:make-queue :fixed-capacity *prefetch-rows*)))
(submit-task channel #'queue-raw-data copy rawq concurrency)
;; start a task to transform the raw data in the copy format
;; and send that data down to PostgreSQL
(loop :repeat concurrency :do
(submit-task channel #'copy-rows-from-queue
copy rawq
:on-error-stop on-error-stop
:disable-triggers disable-triggers))))
;; now wait until both the tasks are over, and kill the kernel
(unless c-s-p
(log-message :debug "waiting for ~d tasks" task-count)
(loop :repeat task-count :do (lp:receive-result channel))
(log-message :notice "COPY ~s done." table-name)
(unless k-s-p (lp:end-kernel :wait t)))
;; return task-count, which is how many tasks we submitted to our
;; lparallel kernel.
task-count))))

133
src/load/load-file.lisp Normal file
View File

@ -0,0 +1,133 @@
;;;
;;; Generic API for pgloader sources
;;; Methods for source types with multiple files input
;;;
(in-package :pgloader.load)
(defmethod copy-database ((copy md-copy)
&key
(on-error-stop *on-error-stop*)
truncate
disable-triggers
drop-indexes
max-parallel-create-index
;; generic API, but ignored here
(worker-count 4)
(concurrency 1)
data-only
schema-only
create-tables
include-drop
foreign-keys
create-indexes
reset-sequences
materialize-views
set-table-oids
including
excluding)
"Copy the contents of the COPY formated file to PostgreSQL."
(declare (ignore data-only schema-only
create-tables include-drop foreign-keys
create-indexes reset-sequences materialize-views
set-table-oids including excluding))
(let* ((*on-error-stop* on-error-stop)
(pgconn (target-db copy))
pgsql-catalog)
(handler-case
(with-pgsql-connection (pgconn)
(setf pgsql-catalog
(fetch-pgsql-catalog (db-name pgconn)
:table (target copy)
:variant (pgconn-variant pgconn)
:pgversion (pgconn-major-version pgconn)))
;; if the user didn't tell us the column list of the table, now is
;; a proper time to set it in the copy object
(unless (and (slot-boundp copy 'columns)
(slot-value copy 'columns))
(setf (columns copy)
(mapcar (lambda (col)
;; we need to handle the md-copy format for the
;; column list, which allow for user given
;; options: each column is a list which car is
;; the column name.
(list (column-name col)))
(table-field-list (first (table-list pgsql-catalog))))))
(log-message :data "CATALOG: ~s" pgsql-catalog)
;; this sets (table-index-list (target copy))
(maybe-drop-indexes pgsql-catalog :drop-indexes drop-indexes)
;; now is the proper time to truncate, before parallel operations
(when truncate
(truncate-tables pgsql-catalog)))
(cl-postgres:database-error (e)
(log-message :fatal "Failed to prepare target PostgreSQL table.")
(log-message :fatal "~a" e)
(return-from copy-database)))
;; Keep the PostgreSQL table target around in the copy instance,
;; with the following subtleties to deal with:
;; 1. the catalog fetching did fill-in PostgreSQL columns as fields
;; 2. we might target fewer pg columns than the table actually has
(let ((table (first (table-list pgsql-catalog))))
(setf (table-column-list table)
(loop :for column-name :in (mapcar #'first (columns copy))
:collect (find column-name (table-field-list table)
:key #'column-name
:test #'string=)))
(setf (target copy) table))
;; expand the specs of our source, we might have to care about several
;; files actually.
(let* ((lp:*kernel* (make-kernel worker-count))
(channel (lp:make-channel))
(path-list (expand-spec (source copy)))
(task-count 0))
(with-stats-collection ("Files Processed" :section :post
:use-result-as-read t
:use-result-as-rows t)
(loop :for path-spec :in path-list
:count t
:do (let ((table-source (clone-copy-for copy path-spec)))
(when (and (header table-source) (null (fields table-source)))
(parse-header table-source))
(incf task-count
(copy-from table-source
:concurrency concurrency
:kernel lp:*kernel*
:channel channel
:on-error-stop on-error-stop
:disable-triggers disable-triggers)))))
;; end kernel
(with-stats-collection ("COPY Threads Completion" :section :post
:use-result-as-read t
:use-result-as-rows t)
(loop :repeat task-count
:do (handler-case
(destructuring-bind (task table seconds)
(lp:receive-result channel)
(log-message :debug
"Finished processing ~a for ~s ~50T~6$s"
task (format-table-name table) seconds))
(condition (e)
(log-message :fatal "~a" e)))
:finally (progn
(lp:end-kernel :wait nil)
(return task-count))))
(lp:end-kernel :wait t))
;; re-create the indexes from the target table entry
(create-indexes-again (target-db copy)
pgsql-catalog
:max-parallel-create-index max-parallel-create-index
:drop-indexes drop-indexes)))

View File

@ -0,0 +1,548 @@
;;;
;;; Generic API for pgloader sources
;;; Methods for database source types (with introspection)
;;;
(in-package :pgloader.load)
;;;
;;; Prepare the PostgreSQL database before streaming the data into it.
;;;
(defmethod prepare-pgsql-database ((copy db-copy)
(catalog catalog)
&key
truncate
create-tables
create-schemas
drop-schema
drop-indexes
set-table-oids
materialize-views
foreign-keys
include-drop)
"Prepare the target PostgreSQL database: create tables casting datatypes
from the MySQL definitions, prepare index definitions and create target
tables for materialized views.
That function mutates index definitions in ALL-INDEXES."
(log-message :notice "Prepare PostgreSQL database.")
(with-pgsql-transaction (:pgconn (target-db copy))
(finalize-catalogs catalog (pgconn-variant (target-db copy)))
(if create-tables
(progn
(when create-schemas
(with-stats-collection ("Create Schemas" :section :pre
:use-result-as-read t
:use-result-as-rows t)
(create-schemas catalog
:include-drop drop-schema
:client-min-messages :error)))
;; create new SQL types (ENUMs, SETs) if needed and before we
;; get to the table definitions that will use them
(with-stats-collection ("Create SQL Types" :section :pre
:use-result-as-read t
:use-result-as-rows t)
;; some SQL types come from extensions (ip4r, hstore, etc)
(create-extensions catalog
:include-drop include-drop
:if-not-exists t
:client-min-messages :error)
(create-sqltypes catalog
:include-drop include-drop
:client-min-messages :error))
;; now the tables
(with-stats-collection ("Create tables" :section :pre
:use-result-as-read t
:use-result-as-rows t)
(create-tables catalog
:include-drop include-drop
:client-min-messages :error)))
(progn
;; if we're not going to create the tables, now is the time to
;; remove the constraints: indexes, primary keys, foreign keys
;;
;; to be able to do that properly, get the constraints from
;; the pre-existing target database catalog
(let* ((pgversion (pgconn-major-version (target-db copy)))
(pgsql-catalog
(fetch-pgsql-catalog (db-name (target-db copy))
:source-catalog catalog
:pgversion pgversion)))
(merge-catalogs catalog pgsql-catalog))
;; now the foreign keys and only then the indexes, because a
;; drop constraint on a primary key cascades to the drop of
;; any foreign key that targets the primary key
(when foreign-keys
(with-stats-collection ("Drop Foreign Keys" :section :pre
:use-result-as-read t
:use-result-as-rows t)
(drop-pgsql-fkeys catalog :log-level :notice)))
(when drop-indexes
(with-stats-collection ("Drop Indexes" :section :pre
:use-result-as-read t
:use-result-as-rows t)
;; we want to error out early in case we can't DROP the
;; index, don't CASCADE
(drop-indexes catalog :cascade nil :log-level :notice)))
(when truncate
(with-stats-collection ("Truncate" :section :pre
:use-result-as-read t
:use-result-as-rows t)
(truncate-tables catalog)))))
;; Some database sources allow the same index name being used
;; against several tables, so we add the PostgreSQL table OID in the
;; index name, to differenciate. Set the table oids now.
(when (and create-tables set-table-oids)
(with-stats-collection ("Set Table OIDs" :section :pre
:use-result-as-read t
:use-result-as-rows t)
(set-table-oids catalog :variant (pgconn-variant (target-db copy)))))
;; We might have to MATERIALIZE VIEWS
(when (and create-tables materialize-views)
(with-stats-collection ("Create MatViews Tables" :section :pre
:use-result-as-read t
:use-result-as-rows t)
(create-views catalog
:include-drop include-drop
:client-min-messages :error))))
;; Citus Support
;;
;; We need a separate transaction here in some cases, because of the
;; distributed DDL support from Citus, to avoid the following error:
;;
;; ERROR Database error 25001: cannot establish a new connection for
;; placement 2299, since DDL has been executed on a connection that is in
;; use
;;
(when (catalog-distribution-rules catalog)
(with-pgsql-transaction (:pgconn (target-db copy))
(with-stats-collection ("Citus Distribute Tables" :section :pre)
(create-distributed-table (catalog-distribution-rules catalog)))))
;; log the catalog we just fetched and (maybe) merged
(log-message :data "CATALOG: ~s" catalog))
(defmethod complete-pgsql-database ((copy db-copy)
(catalog catalog)
pkeys
&key
foreign-keys
create-indexes
create-triggers
reset-sequences)
"After loading the data into PostgreSQL, we can now reset the sequences
and declare foreign keys."
;;
;; Now Reset Sequences, the good time to do that is once the whole data
;; has been imported and once we have the indexes in place, as max() is
;; able to benefit from the indexes. In particular avoid doing that step
;; while CREATE INDEX statements are in flight (avoid locking).
;;
(log-message :notice "Completing PostgreSQL database.")
(when reset-sequences
(reset-sequences (clone-connection (target-db copy)) catalog))
(handler-case
(with-pgsql-transaction (:pgconn (clone-connection (target-db copy)))
;;
;; Turn UNIQUE indexes into PRIMARY KEYS now
;;
(when create-indexes
(pgsql-execute-with-timing :post "Primary Keys" pkeys
:log-level :notice))
;;
;; Foreign Key Constraints
;;
;; We need to have finished loading both the reference and the
;; refering tables to be able to build the foreign keys, so wait
;; until all tables and indexes are imported before doing that.
;;
(when foreign-keys
(create-pgsql-fkeys catalog
:section :post
:label "Create Foreign Keys"
:log-level :notice))
;;
;; Triggers and stored procedures -- includes special default values
;;
(when create-triggers
(create-triggers catalog
:section :post
:label "Create Triggers"))
;;
;; Add schemas that needs to be in the search_path to the database
;; search_path, when using PostgreSQL. Redshift doesn't know how to
;; do that, unfortunately.
;;
(unless (eq :redshift (pgconn-variant (target-db copy)))
(add-to-search-path catalog
:section :post
:label "Set Search Path"))
;;
;; And now, comments on tables and columns.
;;
(comment-on-tables-and-columns catalog
:section :post
:label "Install Comments"))
(postgresql-unavailable (condition)
(log-message :error "~a" condition)
(log-message :error
"Complete PostgreSQL database reconnecting to PostgreSQL.")
;; in order to avoid Socket error in "connect": ECONNREFUSED if we
;; try just too soon, wait a little
(sleep 2)
;;
;; Reset Sequence can be done several times safely, and the rest of the
;; operations run in a single transaction, so if the connection was lost,
;; nothing has been done. Retry.
;;
(complete-pgsql-database copy
catalog
pkeys
:foreign-keys foreign-keys
:create-indexes create-indexes
:create-triggers create-triggers
:reset-sequences reset-sequences))))
(defun process-catalog (copy catalog &key alter-table alter-schema distribute)
"Do all the PostgreSQL catalog tweaking here: casts, index WHERE clause
rewriting, pgloader level alter schema and alter table commands."
(log-message :info "Processing source catalogs")
;; cast the catalog into something PostgreSQL can work on
(cast catalog)
;; support code for index filters (where clauses)
(process-index-definitions catalog :sql-dialect (class-name (class-of copy)))
;; we may have to alter schemas
(when alter-schema
(alter-schema catalog alter-schema))
;; if asked, now alter the catalog with given rules: the alter-table
;; keyword parameter actually contains a set of alter table rules.
(when alter-table
(alter-table catalog alter-table))
;; we also support schema changes necessary for Citus distribution
(when distribute
(log-message :info "Applying distribution rules")
(setf (catalog-distribution-rules catalog)
(citus-distribute-schema catalog distribute))))
(defun optimize-table-copy-ordering (catalog)
"Return a list of tables to copy over in optimized order"
(let ((table-list (copy-list (table-list catalog)))
(view-list (copy-list (view-list catalog))))
;; when materialized views are not supported, view-list is empty here
(cond
((notevery #'zerop (mapcar #'table-row-count-estimate table-list))
(let ((sorted-table-list
(sort table-list #'> :key #'table-row-count-estimate)))
(log-message :notice
"Processing tables in this order: ~{~a: ~d rows~^, ~}"
(loop :for table :in (append table-list view-list)
:collect (format-table-name table)
:collect (table-row-count-estimate table)))
(nconc sorted-table-list view-list)))
(t
(nconc table-list view-list)))))
;;;
;;; Generic enough implementation of the copy-database method.
;;;
(defmethod copy-database ((copy db-copy)
&key
(on-error-stop *on-error-stop*)
(worker-count 4)
(concurrency 1)
(multiple-readers nil)
max-parallel-create-index
(truncate nil)
(disable-triggers nil)
(data-only nil)
(schema-only nil)
(create-schemas t)
(create-tables t)
(include-drop t)
(drop-schema nil)
(create-indexes t)
(index-names :uniquify)
(reset-sequences t)
(foreign-keys t)
(reindex nil)
(after-schema nil)
distribute
including
excluding
set-table-oids
alter-table
alter-schema
materialize-views)
"Export database source data and Import it into PostgreSQL"
(log-message :log "Migrating from ~a" (source-db copy))
(log-message :log "Migrating into ~a" (target-db copy))
(let* ((*on-error-stop* on-error-stop)
(copy-data (or data-only (not schema-only)))
(create-ddl (or schema-only (not data-only)))
(create-tables (and create-tables create-ddl))
(create-schemas (and create-schemas create-ddl))
;; foreign keys has a special meaning in data-only mode
(foreign-keys (if (eq :redshift (pgconn-variant (target-db copy)))
nil
foreign-keys))
(drop-indexes (if (eq :redshift (pgconn-variant (target-db copy)))
nil
(or reindex
(and include-drop create-ddl))))
(create-indexes (if (eq :redshift (pgconn-variant (target-db copy)))
nil
(or reindex
(and create-indexes drop-indexes create-ddl))))
(reset-sequences (if (eq :redshift (pgconn-variant (target-db copy)))
nil
reset-sequences))
(*preserve-index-names*
(or (eq :preserve index-names)
;; if we didn't create the tables, we are re-installing the
;; pre-existing indexes
(not create-tables)))
(copy-kernel (make-kernel worker-count))
(copy-channel (let ((lp:*kernel* copy-kernel)) (lp:make-channel)))
(catalog (handler-case
(fetch-metadata
copy
(make-catalog
:name (typecase (source-db copy)
(db-connection
(db-name (source-db copy)))
(fd-connection
(pathname-name
(fd-path (source-db copy))))))
:materialize-views materialize-views
:create-indexes create-indexes
:foreign-keys foreign-keys
:including including
:excluding excluding)
(mssql::mssql-error (e)
(log-message :error "MSSQL ERROR: ~a" e)
(log-message :log "You might need to review the FreeTDS protocol version in your freetds.conf file, see http://www.freetds.org/userguide/choosingtdsprotocol.htm")
(return-from copy-database))
#+pgloader-image
(condition (e)
(log-message :error
"~a: ~a"
(conn-type (source-db copy))
e)
(return-from copy-database))))
pkeys
(writers-count (make-hash-table :size (count-tables catalog)))
(max-indexes (when create-indexes
(max-indexes-per-table catalog)))
(idx-kernel (when (and max-indexes (< 0 max-indexes))
(make-kernel (or max-parallel-create-index
max-indexes))))
(idx-channel (when idx-kernel
(let ((lp:*kernel* idx-kernel))
(lp:make-channel))))
(task-count 0))
;; apply catalog level transformations to support the database migration
;; that's CAST rules, index WHERE clause rewriting and ALTER commands
(handler-case
(process-catalog copy catalog
:alter-table alter-table
:alter-schema alter-schema
:distribute distribute)
#+pgloader-image
((or citus-rule-table-not-found citus-rule-is-missing-from-list) (e)
(log-message :fatal "~a" e)
(return-from copy-database))
#+pgloader-image
(condition (e)
(log-message :fatal "Failed to process catalogs: ~a" e)
(return-from copy-database)))
;; if asked, first drop/create the tables on the PostgreSQL side
(handler-case
(progn
(prepare-pgsql-database copy
catalog
:truncate truncate
:create-tables create-tables
:create-schemas create-schemas
:drop-indexes drop-indexes
:drop-schema drop-schema
:include-drop include-drop
:foreign-keys foreign-keys
:set-table-oids set-table-oids
:materialize-views materialize-views)
;; if there's an AFTER SCHEMA DO/EXECUTE command, now is the time
;; to run it.
(when after-schema
(pgloader.parser::execute-sql-code-block (target-db copy)
:pre
after-schema
"after schema")))
;;
;; In case some error happens in the preparatory transaction, we
;; need to stop now and refrain from trying to load the data into
;; an incomplete schema.
;;
(cl-postgres:database-error (e)
(declare (ignore e)) ; a log has already been printed
(log-message :fatal "Failed to create the schema, see above.")
;; we might have some cleanup to do...
(cleanup copy catalog :materialize-views materialize-views)
(return-from copy-database)))
(loop
:for table :in (optimize-table-copy-ordering catalog)
:do (let ((table-source (instanciate-table-copy-object copy table)))
;; first COPY the data from source to PostgreSQL, using copy-kernel
(if (not copy-data)
;; start indexing straight away then
(when create-indexes
(alexandria:appendf
pkeys
(create-indexes-in-kernel (target-db copy)
table
idx-kernel
idx-channel)))
;; prepare the writers-count hash-table, as we start
;; copy-from, we have concurrency tasks writing.
(progn ; when copy-data
(setf (gethash table writers-count) concurrency)
(incf task-count
(copy-from table-source
:concurrency concurrency
:multiple-readers multiple-readers
:kernel copy-kernel
:channel copy-channel
:on-error-stop on-error-stop
:disable-triggers disable-triggers))))))
;; now end the kernels
;; and each time a table is done, launch its indexing
(when copy-data
(let ((lp:*kernel* copy-kernel))
(with-stats-collection ("COPY Threads Completion" :section :post
:use-result-as-read t
:use-result-as-rows t)
(loop :repeat task-count
:do (destructuring-bind (task table seconds)
(lp:receive-result copy-channel)
(log-message :debug
"Finished processing ~a for ~s ~50T~6$s"
task (format-table-name table) seconds)
(when (eq :writer task)
;;
;; Start the CREATE INDEX parallel tasks only when
;; the data has been fully copied over to the
;; corresponding table, that's when the writers
;; count is down to zero.
;;
(decf (gethash table writers-count))
(log-message :debug "writers-counts[~a] = ~a"
(format-table-name table)
(gethash table writers-count))
(when (and create-indexes
(zerop (gethash table writers-count)))
(let* ((stats pgloader.monitor::*sections*)
(section (get-state-section stats :data))
(table-stats (pgstate-get-label section table))
(pprint-secs
(pgloader.state::format-interval seconds nil)))
;; in CCL we have access to the *sections* dynamic
;; binding from another thread, in SBCL we access
;; an empty copy.
(log-message :notice
"DONE copying ~a in ~a~@[ for ~d rows~]"
(format-table-name table)
pprint-secs
(when table-stats
(pgtable-rows table-stats))))
(alexandria:appendf
pkeys
(create-indexes-in-kernel (target-db copy)
table
idx-kernel
idx-channel)))))
:finally (progn
(lp:end-kernel :wait nil)
(return worker-count))))))
(log-message :info "Done with COPYing data, waiting for indexes")
(when create-indexes
(let ((lp:*kernel* idx-kernel))
;; wait until the indexes are done being built...
;; don't forget accounting for that waiting time.
(with-stats-collection ("Index Build Completion" :section :post
:use-result-as-read t
:use-result-as-rows t)
(loop :for count :below (count-indexes catalog)
:do (lp:receive-result idx-channel))
(lp:end-kernel :wait t)
(log-message :info "Done waiting for indexes")
(count-indexes catalog))))
;;
;; Complete the PostgreSQL database before handing over.
;;
(complete-pgsql-database copy
catalog
pkeys
:foreign-keys foreign-keys
:create-indexes create-indexes
;; only create triggers (for default values)
;; when we've been responsible for creating the
;; tables -- otherwise assume the schema is
;; good as it is
:create-triggers create-tables
:reset-sequences reset-sequences)
;;
;; Time to cleanup!
;;
(cleanup copy catalog :materialize-views materialize-views)))

View File

@ -1,13 +1,5 @@
(in-package #:pgloader)
;;;
;;; Some command line constants for OS errors codes
;;;
(defparameter +os-code-success+ 0)
(defparameter +os-code-error+ 1)
(defparameter +os-code-error-usage+ 2)
(defparameter +os-code-error-bad-source+ 4)
;;;
;;; Now some tooling
;;;
@ -54,7 +46,16 @@
:documentation "Read user code from files")
("dry-run" :type boolean
:documentation "Only check database connections, don't load anything.")
:documentation "Only check database connections, don't load anything.")
("on-error-stop" :type boolean
:documentation "Refrain from handling errors properly.")
("no-ssl-cert-verification"
:type boolean
:documentation "Instruct OpenSSL to bypass verifying certificates.")
(("context" #\C) :type string :documentation "Command Context Variables")
(("with") :type string :list t :optional t
:documentation "Load options")
@ -81,14 +82,17 @@
:documentation "SQL script to run after loading the data")
("self-upgrade" :type string :optional t
:documentation "Path to pgloader newer sources")))
:documentation "Path to pgloader newer sources")
(defun print-backtrace (condition debug stream)
("regress" :type boolean :optional t
:documentation "Drive regression testing")))
(defun print-backtrace (condition debug)
"Depending on DEBUG, print out the full backtrace or just a shorter
message on STREAM for given CONDITION."
(if debug
(trivial-backtrace:print-backtrace condition :output stream :verbose t)
(trivial-backtrace:print-condition condition stream)))
(trivial-backtrace:print-backtrace condition :output nil)
(trivial-backtrace:print-condition condition nil)))
(defun mkdir-or-die (path debug &optional (stream *standard-output*))
"Create a directory at given PATH and exit with an error message when
@ -101,7 +105,7 @@
(condition (e)
;; any error here is a panic
(if debug
(print-backtrace e debug stream)
(format stream "PANIC: ~a~%" (print-backtrace e debug))
(format stream "PANIC: ~a.~%" e))
(uiop:quit))))
@ -167,7 +171,7 @@
(defvar *--load-list-file-extension-whitelist* '("lisp" "lsp" "cl" "asd")
"White list of file extensions allowed with the --load option.")
(defun load-extra-transformation-functions (filename)
(defun load-extra-transformation-functions (filename &optional verbose)
"Load an extra filename to tweak pgloader's behavior."
(let ((pathname (uiop:parse-native-namestring filename)))
(unless (member (pathname-type pathname)
@ -175,8 +179,8 @@
:test #'string=)
(error "Unknown lisp file extension: ~s" (pathname-type pathname)))
(log-message :info "Loading code from ~s" pathname)
(load (compile-file pathname :verbose nil :print nil))))
(format t "Loading code from ~s~%" pathname)
(load (compile-file pathname :verbose verbose :print verbose))))
(defun main (argv)
"Entry point when building an executable image with buildapp"
@ -191,11 +195,14 @@
(usage argv :quit t)))
(destructuring-bind (&key help version quiet verbose debug logfile
list-encodings upgrade-config dry-run
list-encodings upgrade-config
dry-run on-error-stop context
((:load-lisp-file load))
client-min-messages log-min-messages summary
root-dir self-upgrade
with set field cast type encoding before after)
with set field cast type encoding before after
no-ssl-cert-verification
regress)
options
;; parse the log thresholds
@ -218,6 +225,11 @@
(let ((*self-upgraded-already* t))
(main argv))))
;; --list-encodings, -E
(when list-encodings
(show-encodings)
(uiop:quit +os-code-success+))
;; First care about the root directory where pgloader is supposed to
;; output its data logs and reject files
(let ((root-dir-truename (or (probe-file root-dir)
@ -227,8 +239,20 @@
;; Set parameters that come from the environement
(init-params-from-environment)
;; Read the context file (if given) and the environment
(handler-case
(initialize-context context)
(condition (e)
(format t "Couldn't read ini file ~s: ~a~%" context e)
(usage argv)))
;; Then process options
(when debug
(format t "pgloader version ~a~%" *version-string*)
#+pgloader-image
(format t "compiled with ~a ~a~%"
(lisp-implementation-type)
(lisp-implementation-version))
#+sbcl
(format t "sb-impl::*default-external-format* ~s~%"
sb-impl::*default-external-format*)
@ -240,14 +264,14 @@
(lisp-implementation-type)
(lisp-implementation-version)))
(when help
(when (or help)
(usage argv))
(when (or help version) (uiop:quit +os-code-success+))
(when list-encodings
(show-encodings)
(uiop:quit +os-code-success+))
(when (null arguments)
(usage argv)
(uiop:quit +os-code-error-usage+))
(when upgrade-config
(loop for filename in arguments
@ -263,256 +287,109 @@
;; Should we run in dry-run mode?
(setf *dry-run* dry-run)
;; Should we stop at first error?
(setf *on-error-stop* on-error-stop)
;; load extra lisp code provided for by the user
(when load
(loop :for filename :in load :do
(handler-case
(load-extra-transformation-functions filename debug)
((or simple-condition serious-condition) (e)
(format *error-output*
"Failed to load lisp source file ~s~%" filename)
(format *error-output* "~a~%~%" e)
(uiop:quit +os-code-error+)))))
;; Now process the arguments
(when arguments
;; Start the logs system
(let* ((*log-filename* (log-file-name logfile))
(*summary-pathname* (parse-summary-filename summary debug)))
(with-monitor ()
;; tell the user where to look for interesting things
(log-message :log "Main logs in '~a'" (probe-file *log-filename*))
(log-message :log "Data errors in '~a'~%" *root-dir*)
(handler-case
;; The handler-case is to catch unhandled exceptions at the
;; top level.
;;
;; The handler-bind below is to be able to offer a
;; meaningful backtrace to the user in case of unexpected
;; conditions being signaled.
(handler-bind
(((and serious-condition (not (or monitor-error
cli-parsing-error
source-definition-error
regression-test-error)))
#'(lambda (condition)
(format *error-output* "KABOOM!~%")
(format *error-output* "~a: ~a~%~a~%~%"
(class-name (class-of condition))
condition
(print-backtrace condition debug)))))
;; load extra lisp code provided for by the user
(when load
(loop for filename in load do
(handler-case
(load-extra-transformation-functions filename)
(condition (e)
(log-message :fatal
"Failed to load lisp source file ~s~%"
filename)
(log-message :error "~a" e)
(uiop:quit +os-code-error+)))))
(with-monitor ()
;; tell the user where to look for interesting things
(log-message :log "Main logs in '~a'"
(uiop:native-namestring *log-filename*))
(log-message :log "Data errors in '~a'~%" *root-dir*)
(handler-case
;; The handler-case is to catch unhandled exceptions at the
;; top level.
;;
;; The handler-bind is to be able to offer a meaningful
;; backtrace to the user in case of unexpected conditions
;; being signaled.
(handler-bind
((condition
#'(lambda (condition)
(log-message :fatal "We have a situation here.")
(print-backtrace condition debug *standard-output*))))
(when no-ssl-cert-verification
(setf cl+ssl:*make-ssl-client-stream-verify-default* nil))
;; if there are exactly two arguments in the command
;; line, try and process them as source and target
;; arguments
(if (= 2 (length arguments))
(let* ((type (parse-cli-type type))
(source (first arguments))
(source (if type
(parse-source-string-for-type type source)
(parse-source-string source)))
(type (when source
(parse-cli-type (conn-type source))))
(target (parse-target-string (second arguments))))
(cond
((and regress (= 1 (length arguments)))
(process-regression-test (first arguments)))
;; some verbosity about the parsing "magic"
(log-message :info "SOURCE: ~s" source)
(log-message :info "TARGET: ~s" target)
(regress
(log-message :fatal "Regression testing requires a single .load file as input."))
(cond ((and (null source) (null target)
(probe-file
(uiop:parse-unix-namestring
(first arguments)))
(probe-file
(uiop:parse-unix-namestring
(second arguments))))
(mapcar #'process-command-file arguments))
((= 2 (length arguments))
;; if there are exactly two arguments in the command
;; line, try and process them as source and target
;; arguments
(process-source-and-target (first arguments)
(second arguments)
type encoding
set with field cast
before after))
(t
;; process the files
;; other options are not going to be used here
(let ((cli-options `(("--type" ,type)
("--encoding" ,encoding)
("--set" ,set)
("--with" ,with)
("--field" ,field)
("--cast" ,cast)
("--before" ,before)
("--after" ,after))))
(loop :for (cli-option-name cli-option-value)
:in cli-options
:when cli-option-value
:do (log-message
:fatal
"Option ~s is ignored when using a load file"
cli-option-name))
((null source)
(log-message :fatal
"Failed to parse ~s as a source URI."
(first arguments))
(log-message :log "You might need to use --type."))
;; when we issued a single error previously, do nothing
(unless (remove-if #'null (mapcar #'second cli-options))
(process-command-file arguments)))))))
((null target)
(log-message :fatal
"Failed to parse ~s as a PostgreSQL database URI."
(second arguments))))
((or cli-parsing-error source-definition-error) (c)
(format *error-output* "~%~a~%~%" c)
(uiop:quit +os-code-error-bad-source+))
;; so, we actually have all the specs for the
;; job on the command line now.
(when (and source target)
(load-data :from source
:into target
:encoding (parse-cli-encoding encoding)
:options (parse-cli-options type with)
:gucs (parse-cli-gucs set)
:fields (parse-cli-fields type field)
:casts (parse-cli-casts cast)
:before (parse-sql-file before)
:after (parse-sql-file after)
:start-logger nil)))
(regression-test-error (c)
(format *error-output* "~%~a~%~%" c)
(uiop:quit +os-code-error-regress+))
;; process the files
(mapcar #'process-command-file arguments)))
(monitor-error (c)
(format *error-output* "~a~%" c)
(uiop:quit +os-code-error+))
(source-definition-error (c)
(log-message :fatal "~a" c)
(uiop:quit +os-code-error-bad-source+))
(condition (c)
(when debug (invoke-debugger c))
(uiop:quit +os-code-error+))))))
(serious-condition (c)
(format *error-output* "~%What I am doing here?~%~%")
(format *error-output* "~a~%~%" c)
(uiop:quit +os-code-error+)))))
;; done.
(uiop:quit +os-code-success+)))))
(defun process-command-file (filename)
"Process FILENAME as a pgloader command file (.load)."
(let ((truename (probe-file filename)))
(if truename
(run-commands truename :start-logger nil)
(log-message :error "Can not find file: ~s" filename)))
(format t "~&"))
(defun run-commands (source
&key
(start-logger t)
((:summary *summary-pathname*) *summary-pathname*)
((:log-filename *log-filename*) *log-filename*)
((:log-min-messages *log-min-messages*) *log-min-messages*)
((:client-min-messages *client-min-messages*) *client-min-messages*))
"SOURCE can be a function, which is run, a list, which is compiled as CL
code then run, a pathname containing one or more commands that are parsed
then run, or a commands string that is then parsed and each command run."
(with-monitor (:start-logger start-logger)
(let* ((funcs
(typecase source
(function (list source))
(list (list (compile nil source)))
(pathname (mapcar (lambda (expr) (compile nil expr))
(parse-commands-from-file source)))
(t (mapcar (lambda (expr) (compile nil expr))
(if (probe-file source)
(parse-commands-from-file source)
(parse-commands source)))))))
;; maybe duplicate the summary to a file
(let* ((summary-stream (when *summary-pathname*
(open *summary-pathname*
:direction :output
:if-exists :rename
:if-does-not-exist :create)))
(*report-stream* (or summary-stream *standard-output*)))
(unwind-protect
;; run the commands
(loop for func in funcs do (funcall func))
;; cleanup
(when summary-stream (close summary-stream)))))))
;;;
;;; Main API to use from outside of pgloader.
;;;
(define-condition source-definition-error (error)
((mesg :initarg :mesg :reader source-definition-error-mesg))
(:report (lambda (err stream)
(format stream "~a" (source-definition-error-mesg err)))))
(defun load-data (&key ((:from source)) ((:into target))
encoding fields options gucs casts before after
(start-logger t))
"Load data from SOURCE into TARGET."
(declare (type connection source)
(type pgsql-connection target))
;; some preliminary checks
(when (and (typep source 'csv-connection)
(not (typep source 'copy-connection))
(null fields))
(error 'source-definition-error
:mesg "This data source requires fields definitions."))
(when (and (typep source 'csv-connection) (null (pgconn-table-name target)))
(error 'source-definition-error
:mesg "This data source require a table name target."))
(when (and (typep source 'fixed-connection) (null (pgconn-table-name target)))
(error 'source-definition-error
:mesg "Fixed-width data source require a table name target."))
(with-monitor (:start-logger start-logger)
(when (and casts (not (member (type-of source)
'(sqlite-connection
mysql-connection
mssql-connection))))
(log-message :log "Cast rules are ignored for this sources."))
;; now generates the code for the command
(log-message :debug "LOAD DATA FROM ~s" source)
(run-commands
(process-relative-pathnames
(uiop:getcwd)
(typecase source
(copy-connection
(lisp-code-for-loading-from-copy source fields target
:encoding (or encoding :default)
:gucs gucs
:copy-options options
:before before
:after after))
(fixed-connection
(lisp-code-for-loading-from-fixed source fields target
:encoding encoding
:gucs gucs
:fixed-options options
:before before
:after after))
(csv-connection
(lisp-code-for-loading-from-csv source fields target
:encoding encoding
:gucs gucs
:csv-options options
:before before
:after after))
(dbf-connection
(lisp-code-for-loading-from-dbf source target
:gucs gucs
:dbf-options options
:before before
:after after))
(ixf-connection
(lisp-code-for-loading-from-ixf source target
:gucs gucs
:ixf-options options
:before before
:after after))
(sqlite-connection
(lisp-code-for-loading-from-sqlite source target
:gucs gucs
:casts casts
:sqlite-options options))
(mysql-connection
(lisp-code-for-loading-from-mysql source target
:gucs gucs
:casts casts
:mysql-options options
:before before
:after after))
(mssql-connection
(lisp-code-for-loading-from-mssql source target
:gucs gucs
:casts casts
:mssql-options options
:before before
:after after))))
:start-logger start-logger)))

View File

@ -78,32 +78,58 @@
(logior byte (- (mask-field (byte 1 (1- (* n 8))) byte))))
(defun sysdb-data-to-lisp (%dbproc data type len)
(if (> len 0)
(case (foreign-enum-keyword '%syb-value-type type)
((:syb-varchar :syb-text) (foreign-string-to-lisp data :count len))
(:syb-char (string-trim #(#\Space) (foreign-string-to-lisp data :count len)))
((:syb-bit :syb-bitn) (mem-ref data :int))
(:syb-int1 (unsigned-to-signed (mem-ref data :unsigned-int) 1))
(:syb-int2 (unsigned-to-signed (mem-ref data :unsigned-int) 2))
(:syb-int4 (unsigned-to-signed (mem-ref data :unsigned-int) 4))
(:syb-int8 (mem-ref data :int8))
(:syb-flt8 (mem-ref data :double))
(:syb-datetime
(with-foreign-pointer (%buf +numeric-buf-sz+)
(foreign-string-to-lisp %buf
:count (%dbconvert %dbproc type data -1 :syb-char %buf +numeric-buf-sz+))))
((:syb-money :syb-money4 :syb-decimal :syb-numeric)
(with-foreign-pointer (%buf +numeric-buf-sz+)
(parse-number:parse-number
(foreign-string-to-lisp %buf
:count (%dbconvert %dbproc type data -1 :syb-char %buf +numeric-buf-sz+)))))
((:syb-image :syb-binary :syb-varbinary :syb-blob)
(let ((vector (make-array len :element-type '(unsigned-byte 8))))
(dotimes (i len)
(setf (aref vector i) (mem-ref data :uchar i)))
vector))
(otherwise (error "not supported type ~A"
(foreign-enum-keyword '%syb-value-type type))))))
(let ((syb-type (foreign-enum-keyword '%syb-value-type type)))
(case syb-type
;; we accept emtpy string (len is 0)
((:syb-char :syb-varchar :syb-text :syb-msxml)
(foreign-string-to-lisp data :count len))
(otherwise
;; other types must have a non-zero len now, or we just return nil.
(if (> len 0)
(case syb-type
((:syb-bit :syb-bitn) (mem-ref data :int))
(:syb-int1 (unsigned-to-signed (mem-ref data :unsigned-int) 1))
(:syb-int2 (unsigned-to-signed (mem-ref data :unsigned-int) 2))
(:syb-int4 (unsigned-to-signed (mem-ref data :unsigned-int) 4))
(:syb-int8 (mem-ref data :int8))
(:syb-real (mem-ref data :float))
(:syb-flt8 (mem-ref data :double))
((:syb-datetime
:syb-datetime4
:syb-msdate
:syb-mstime
:syb-msdatetime2
:syb-msdatetimeoffset)
(with-foreign-pointer (%buf +numeric-buf-sz+)
(let ((count
(%dbconvert %dbproc
type
data
-1
:syb-char
%buf
+numeric-buf-sz+)))
(foreign-string-to-lisp %buf :count count))))
((:syb-money :syb-money4 :syb-decimal :syb-numeric)
(with-foreign-pointer (%buf +numeric-buf-sz+)
(let ((count
(%dbconvert %dbproc
type
data
-1
:syb-char
%buf
+numeric-buf-sz+)))
(parse-number:parse-number
(foreign-string-to-lisp %buf :count count )))))
((:syb-image :syb-binary :syb-varbinary :syb-blob)
(let ((vector (make-array len :element-type '(unsigned-byte 8))))
(dotimes (i len)
(setf (aref vector i) (mem-ref data :uchar i)))
vector))
(otherwise (error "not supported type ~A"
(foreign-enum-keyword '%syb-value-type type)))))))))
;; (defconstant +dbbuffer+ 14)

File diff suppressed because it is too large Load Diff

View File

@ -7,6 +7,8 @@
(:use #:cl)
(:export #:*version-string*
#:*dry-run*
#:*on-error-stop*
#:on-error-stop
#:*self-upgrade-immutable-systems*
#:*fd-path-root*
#:*root-dir*
@ -15,34 +17,44 @@
#:*client-min-messages*
#:*log-min-messages*
#:*report-stream*
#:*pgsql-reserved-keywords*
#:*identifier-case*
#:*preserve-index-names*
#:*copy-batch-rows*
#:*copy-batch-size*
#:*concurrent-batches*
#:*rows-per-range*
#:*prefetch-rows*
#:*pg-settings*
#:*state*
#:*mysql-settings*
#:*mssql-settings*
#:*default-tmpdir*
#:init-params-from-environment
#:getenv-default))
#:getenv-default
#:*context*
#:+os-code-success+
#:+os-code-error+
#:+os-code-error-usage+
#:+os-code-error-bad-source+
#:+os-code-error-regress+))
(in-package :pgloader.params)
(defparameter *release* nil
"non-nil when this build is a release build.")
(defparameter *major-version* "3.2")
(defparameter *minor-version* "1")
(defparameter *major-version* "3.6")
(defparameter *minor-version* "10")
(defun git-hash ()
"Return the current abbreviated git hash of the development tree."
(handler-case
(let ((git-hash `("git" "--no-pager" "log" "-n1" "--format=format:%h")))
(uiop:with-current-directory ((asdf:system-source-directory :pgloader))
(multiple-value-bind (stdout stderr code)
(uiop:run-program git-hash :output :string)
(declare (ignore code stderr))
stdout)))
(multiple-value-bind (stdout stderr code)
(uiop:run-program git-hash :output :string
:directory (asdf:system-source-directory :pgloader))
(declare (ignore code stderr))
stdout))
(condition (e)
;; in case anything happen, just return X.Y.Z~devel
(declare (ignore e))
@ -68,18 +80,24 @@
(defparameter *dry-run* nil
"Set to non-nil to only run checks about the load setup.")
;; we can't use pgloader.utils:make-pgstate yet because params is compiled
;; first in the asd definition, we just make the symbol a special variable.
(defparameter *state* nil
"State of the current loading.")
(defparameter *on-error-stop* nil
"Set to non-nil to for pgloader to refrain from handling errors, quitting instead.")
(define-condition on-error-stop ()
((on-condition :initarg :on-condition :reader on-error-condition
:documentation "Condition that triggered on-error-stop"))
(:report (lambda (condition stream)
(format stream
"On Error Stop: ~a"
(on-error-condition condition)))))
(defparameter *fd-path-root* nil
"Where to load files from, when loading from an archive or expanding regexps.")
(defparameter *root-dir*
#+unix (make-pathname :directory "/tmp/pgloader/")
#+unix (uiop:parse-native-namestring "/tmp/pgloader/")
#-unix (uiop:merge-pathnames*
"pgloader/"
(uiop:make-pathname* :directory '(:relative "pgloader"))
(uiop:ensure-directory-pathname (getenv-default "Temp")))
"Top directory where to store all data logs and reject files.")
@ -100,6 +118,9 @@
;;;
;;; When converting from other databases, how to deal with case sensitivity?
;;;
(defvar *pgsql-reserved-keywords* nil
"We need to always quote PostgreSQL reserved keywords")
(defparameter *identifier-case* :downcase
"Dealing with source databases casing rules.")
@ -115,10 +136,15 @@
(defparameter *copy-batch-size* (* 20 1024 1024)
"Maximum memory size allowed for a single batch.")
(defparameter *concurrent-batches* 10
"How many batches do we stack in the queue in advance.")
(defparameter *prefetch-rows* 100000
"How many rows do read in advance in the reader queue.")
(defparameter *rows-per-range* 10000
"How many rows to read in each reader's thread, per SQL query.")
(defparameter *pg-settings* nil "An alist of GUC names and values.")
(defparameter *mysql-settings* nil "An alist of GUC names and values.")
(defparameter *mssql-settings* nil "An alist of GUC names and values.")
;;;
;;; Archive processing: downloads and unzip.
@ -144,3 +170,21 @@
(setf *default-tmpdir*
(fad:pathname-as-directory
(getenv-default "TMPDIR" *default-tmpdir*))))
;;;
;;; Run time context to fill-in variable parts of the commands.
;;;
(defvar *context* nil
"Alist of (names . values) intialized from the environment at run-time,
and from a --context command line argument, then used in the commands when
they are using the Mustache templating feature.")
;;;
;;; Some command line constants for OS errors codes
;;;
(defparameter +os-code-success+ 0)
(defparameter +os-code-error+ 1)
(defparameter +os-code-error-usage+ 2)
(defparameter +os-code-error-bad-source+ 4)
(defparameter +os-code-error-regress+ 5)

View File

@ -0,0 +1,95 @@
;;;
;;; ALTER TABLE allows to change some of their properties while migrating
;;; from a source to PostgreSQL, currently only takes care of the schema.
;;;
(in-package #:pgloader.parser)
(defrule match-rule-target-regex quoted-regex
(:lambda (re) (make-regex-match-rule :target (second re))))
(defrule match-rule-target-string quoted-namestring
(:lambda (s) (make-string-match-rule :target s)))
(defrule match-rule-target (or match-rule-target-string
match-rule-target-regex))
(defrule another-match-rule-target (and comma match-rule-target)
(:lambda (x)
(bind (((_ target) x)) target)))
(defrule filter-list-matching
(and match-rule-target (* another-match-rule-target))
(:lambda (source)
(destructuring-bind (filter1 filters) source
(list* filter1 filters))))
(defrule alter-table-names-matching (and kw-alter kw-table kw-names kw-matching
filter-list-matching)
(:lambda (alter-table)
(bind (((_ _ _ _ match-rule-target-list) alter-table))
match-rule-target-list)))
(defrule in-schema (and kw-in kw-schema quoted-namestring)
(:function third))
(defrule rename-to (and kw-rename kw-to quoted-namestring)
(:lambda (stmt)
(bind (((_ _ new-name) stmt))
(list #'pgloader.catalog::alter-table-rename new-name))))
(defrule set-schema (and kw-set kw-schema quoted-namestring)
(:lambda (stmt)
(bind (((_ _ schema) stmt))
(list #'pgloader.catalog::alter-table-set-schema schema))))
(defrule set-storage-parameters (and kw-set #\( generic-option-list #\))
(:lambda (stmt)
(bind (((_ _ parameters _) stmt))
(list #'pgloader.catalog::alter-table-set-storage-parameters parameters))))
(defrule set-tablespace (and kw-set kw-tablespace quoted-namestring)
(:lambda (stmt)
(list #'pgloader.catalog::alter-table-set-tablespace (third stmt))))
(defrule alter-table-action (or rename-to
set-schema
set-storage-parameters
set-tablespace))
(defrule alter-table-command (and alter-table-names-matching
(? in-schema)
alter-table-action)
(:lambda (alter-table-command)
(destructuring-bind (rule-list schema action)
alter-table-command
(loop :for rule :in rule-list
:collect (make-match-rule
:rule rule
:schema schema
:action (first action)
:args (rest action))))))
(defrule alter-table (+ (and alter-table-command ignore-whitespace))
(:lambda (alter-table-command-list)
(cons :alter-table
(loop :for (command ws) :in alter-table-command-list
:collect command))))
;;;
;;; ALTER SCHEMA ... RENAME TO ...
;;;
;;; Useful mainly for MS SQL at the moment
;;;
(defrule alter-schema-rename-to (and kw-alter kw-schema quoted-namestring
kw-rename kw-to quoted-namestring)
(:lambda (alter-schema-command)
(bind (((_ _ current-name _ _ new-name) alter-schema-command))
(pgloader.catalog::make-match-rule
:rule (make-string-match-rule :target current-name)
:action #'pgloader.catalog::alter-schema-rename
:args (list new-name)))))
;;; currently we only support a single ALTER SCHEMA variant
(defrule alter-schema alter-schema-rename-to
(:lambda (alter-schema-rename-to)
(cons :alter-schema (list (list alter-schema-rename-to)))))

View File

@ -42,32 +42,22 @@
(when (and (or before finally) (null pg-db-conn))
(error "When using a BEFORE LOAD DO or a FINALLY block, you must provide an archive level target database connection."))
`(lambda ()
(let* ((start-irt (get-internal-real-time))
(state-before (pgloader.utils:make-pgstate))
(*state* (pgloader.utils:make-pgstate))
,@(pgsql-connection-bindings pg-db-conn nil)
(state-finally ,(when finally `(pgloader.utils:make-pgstate)))
(let* (,@(pgsql-connection-bindings pg-db-conn nil)
(archive-file
,(destructuring-bind (kind url) source
(ecase kind
(:http `(with-stats-collection
("download" :state state-before)
(pgloader.archive:http-fetch-file ,url)))
(:filename url))))
(*fd-path-root*
(with-stats-collection ("extract" :state state-before)
(pgloader.archive:expand-archive archive-file))))
, (destructuring-bind (kind url) source
(ecase kind
(:http `(with-stats-collection
("download" :section :pre)
(pgloader.archive:http-fetch-file ,url)))
(:filename url))))
(*fd-path-root*
(with-stats-collection ("extract" :section :pre)
(pgloader.archive:expand-archive archive-file))))
(progn
,(sql-code-block pg-db-conn 'state-before before "before load")
,(sql-code-block pg-db-conn :pre before "before load")
;; import from files block
,@(loop for command in commands
collect `(funcall ,command))
,(sql-code-block pg-db-conn 'state-finally finally "finally")
;; reporting
(report-full-summary "Total import time" *state*
:start-time start-irt
:before state-before
:finally state-finally)))))))
,(sql-code-block pg-db-conn :post finally "finally")))))))

View File

@ -10,19 +10,33 @@
(defrule cast-default-guard (and kw-when kw-default quoted-string)
(:destructure (w d value) (declare (ignore w d)) (cons :default value)))
(defrule cast-source-guards (* (or cast-default-guard
cast-typemod-guard))
(:lambda (guards)
(alexandria:alist-plist guards)))
(defrule cast-unsigned-guard (and kw-when kw-unsigned)
(:constant (cons :unsigned t)))
(defrule cast-signed-guard (and kw-when kw-signed)
(:constant (cons :signed t)))
;; at the moment we only know about extra auto_increment
(defrule cast-source-extra (and kw-with kw-extra kw-auto-increment)
(:constant (list :auto-increment t)))
(defrule cast-source-extra (and kw-with kw-extra
(or kw-auto-increment
kw-on-update-current-timestamp))
(:lambda (extra)
(cons (third extra) t)))
(defrule cast-source-type (and kw-type trimmed-name)
;; type names may be "double quoted"
(defrule cast-type-name (or double-quoted-namestring
(and (alpha-char-p character)
(* (or (alpha-char-p character)
(digit-char-p character)
#\_))))
(:text t))
(defrule cast-source-type (and kw-type cast-type-name)
(:destructure (kw name) (declare (ignore kw)) (list :type name)))
(defrule table-column-name (and namestring "." namestring)
(defrule table-column-name (and maybe-quoted-namestring
"."
maybe-quoted-namestring)
(:destructure (table-name dot column-name)
(declare (ignore dot))
(list :column (cons (text table-name) (text column-name)))))
@ -31,26 +45,33 @@
;; well, we want namestring . namestring
(:destructure (kw name) (declare (ignore kw)) name))
(defrule cast-source-extra-or-guard (* (or cast-unsigned-guard
cast-signed-guard
cast-default-guard
cast-typemod-guard
cast-source-extra))
(:function alexandria:alist-plist))
(defrule cast-source (and (or cast-source-type cast-source-column)
(? cast-source-extra)
(? cast-source-guards)
ignore-whitespace)
cast-source-extra-or-guard)
(:lambda (source)
(bind (((name-and-type opts guards _) source)
(bind (((name-and-type extra-and-guards) source)
((&key (default nil d-s-p)
(typemod nil t-s-p)
&allow-other-keys) guards)
((&key (auto-increment nil ai-s-p)
&allow-other-keys) opts))
(signed nil s-s-p)
(unsigned nil u-s-p)
(auto-increment nil ai-s-p)
(on-update-current-timestamp nil ouct-s-p)
&allow-other-keys)
extra-and-guards))
`(,@name-and-type
,@(when t-s-p (list :typemod typemod))
,@(when d-s-p (list :default default))
,@(when ai-s-p (list :auto-increment auto-increment))))))
(defrule cast-type-name (and (alpha-char-p character)
(* (or (alpha-char-p character)
(digit-char-p character))))
(:text t))
,@(when s-s-p (list :signed signed))
,@(when u-s-p (list :unsigned unsigned))
,@(when ai-s-p (list :auto-increment auto-increment))
,@(when ouct-s-p (list :on-update-current-timestamp
on-update-current-timestamp))))))
(defrule cast-to-type (and kw-to cast-type-name ignore-whitespace)
(:lambda (source)
@ -75,33 +96,66 @@
(defrule cast-drop-not-null (and kw-drop kw-not kw-null)
(:constant (list :drop-not-null t)))
(defrule cast-set-not-null (and kw-set kw-not kw-null)
(:constant (list :set-not-null t)))
(defrule cast-keep-extra (and kw-keep kw-extra)
(:constant (list :keep-extra t)))
(defrule cast-drop-extra (and kw-drop kw-extra)
(:constant (list :drop-extra t)))
(defrule cast-def (+ (or cast-to-type
cast-keep-default
cast-drop-default
cast-keep-extra
cast-drop-extra
cast-keep-typemod
cast-drop-typemod
cast-keep-not-null
cast-drop-not-null))
cast-drop-not-null
cast-set-not-null))
(:lambda (source)
(destructuring-bind
(&key type drop-default drop-typemod drop-not-null &allow-other-keys)
(&key type drop-default drop-extra drop-typemod
drop-not-null set-not-null &allow-other-keys)
(apply #'append source)
(list :type type
:drop-extra drop-extra
:drop-default drop-default
:drop-typemod drop-typemod
:drop-not-null drop-not-null))))
:drop-not-null drop-not-null
:set-not-null set-not-null))))
(defun function-name-character-p (char)
(or (member char #.(quote (coerce "/:.-%" 'list)))
(or (member char #.(quote (coerce "/.-%" 'list)))
(alphanumericp char)))
(defrule function-name (* (function-name-character-p character))
(:text t))
(defrule function-name (+ (function-name-character-p character))
(:lambda (fname)
(text fname)))
(defrule cast-function (and kw-using function-name)
(:lambda (function)
(bind (((_ fname) function))
(intern (string-upcase fname) :pgloader.transforms))))
(defrule package-and-function-names (and function-name
(or ":" "::")
function-name)
(:lambda (pfn)
(bind (((pname _ fname) pfn))
(intern (string-upcase fname) (find-package (string-upcase pname))))))
(defrule maybe-qualified-function-name (or package-and-function-names
function-name)
(:lambda (fname)
(typecase fname
(string (intern (string-upcase fname) :pgloader.transforms))
(symbol fname))))
(defrule transform-expression sexp
(:lambda (sexp)
(eval sexp)))
(defrule cast-function (and kw-using (or maybe-qualified-function-name
transform-expression))
(:destructure (using symbol) (declare (ignore using)) symbol))
(defun fix-target-type (source target)
"When target has :type nil, steal the source :type definition."

View File

@ -33,29 +33,25 @@
(defrule option-null (and kw-null quoted-string)
(:destructure (kw null) (declare (ignore kw)) (cons :null-as null)))
(defrule copy-option (or option-batch-rows
(defrule copy-option (or option-on-error-stop
option-on-error-resume-next
option-workers
option-concurrency
option-batch-rows
option-batch-size
option-batch-concurrency
option-prefetch-rows
option-max-parallel-create-index
option-truncate
option-drop-indexes
option-disable-triggers
option-identifiers-case
option-skip-header
option-delimiter
option-null))
(defrule another-copy-option (and comma copy-option)
(:lambda (source)
(bind (((_ option) source)) option)))
(defrule copy-option-list (and copy-option (* another-copy-option))
(:lambda (source)
(destructuring-bind (opt1 opts) source
(alexandria:alist-plist `(,opt1 ,@opts)))))
(defrule copy-options (and kw-with copy-option-list)
(:lambda (source)
(bind (((_ opts) source))
(cons :copy-options opts))))
(defrule copy-options (and kw-with
(and copy-option (* (and comma copy-option))))
(:function flatten-option-list))
(defrule copy-uri (and "copy://" filename)
(:lambda (source)
@ -78,17 +74,7 @@
(:regex (make-instance 'copy-connection :spec src))
(:http (make-instance 'copy-connection :uri (first specs))))))))
(defrule get-copy-file-source-from-environment-variable (and kw-getenv name)
(:lambda (p-e-v)
(bind (((_ varname) p-e-v)
(connstring (getenv-default varname)))
(unless connstring
(error "Environment variable ~s is unset." varname))
(parse 'copy-file-source connstring))))
(defrule copy-source (and kw-load kw-copy kw-from
(or get-copy-file-source-from-environment-variable
copy-file-source))
(defrule copy-source (and kw-load kw-copy kw-from copy-file-source)
(:lambda (src)
(bind (((_ _ _ source) src)) source)))
@ -102,77 +88,86 @@
(defrule load-copy-file-command (and copy-source (? file-encoding)
(? copy-source-field-list)
target
(? csv-target-table)
(? csv-target-column-list)
load-copy-file-optional-clauses)
(:lambda (command)
(destructuring-bind (source encoding fields target columns clauses) command
`(,source ,encoding ,fields ,target ,columns ,@clauses))))
(destructuring-bind (source encoding fields pguri table-name columns clauses)
command
(list* source
encoding
fields
pguri
(or table-name (pgconn-table-name pguri))
columns
clauses))))
(defun lisp-code-for-loading-from-copy (copy-conn fields pg-db-conn
(defun lisp-code-for-loading-from-copy (copy-conn pg-db-conn
&key
(encoding :utf-8)
fields
target-table-name
columns
gucs before after
((:copy-options options)))
gucs before after options
&aux
(worker-count (getf options :worker-count))
(concurrency (getf options :concurrency)))
`(lambda ()
(let* ((state-before (pgloader.utils:make-pgstate))
(summary (null *state*))
(*state* (or *state* (pgloader.utils:make-pgstate)))
(state-idx ,(when (getf options :drop-indexes)
`(pgloader.utils:make-pgstate)))
(state-after ,(when (or after (getf options :drop-indexes))
`(pgloader.utils:make-pgstate)))
,@(pgsql-connection-bindings pg-db-conn gucs)
(let* (,@(pgsql-connection-bindings pg-db-conn gucs)
,@(batch-control-bindings options)
(source-db (with-stats-collection ("fetch" :state state-before)
(expand (fetch-file ,copy-conn)))))
,@(identifier-case-binding options)
(source-db (with-stats-collection ("fetch" :section :pre)
(expand (fetch-file ,copy-conn)))))
(progn
,(sql-code-block pg-db-conn 'state-before before "before load")
,(sql-code-block pg-db-conn :pre before "before load")
(let ((truncate ,(getf options :truncate))
(disable-triggers (getf ',options :disable-triggers))
(drop-indexes (getf ',options :drop-indexes))
(let ((on-error-stop (getf ',options :on-error-stop))
(truncate (getf ',options :truncate))
(disable-triggers (getf ',options :disable-triggers))
(drop-indexes (getf ',options :drop-indexes))
(max-parallel-create-index (getf ',options :max-parallel-create-index))
(source
(make-instance 'pgloader.copy:copy-copy
(make-instance 'copy-copy
:target-db ,pg-db-conn
:source source-db
:target ',(pgconn-table-name pg-db-conn)
:encoding ,encoding
:fields ',fields
:columns ',columns
:source source-db
:target (create-table ',target-table-name)
:encoding ,encoding
:fields ',fields
:columns ',columns
,@(remove-batch-control-option
options :extras '(:truncate
options :extras '(:worker-count
:concurrency
:truncate
:drop-indexes
:disable-triggers)))))
(pgloader.sources:copy-from source
:state-before state-before
:state-after state-after
:state-indexes state-idx
:truncate truncate
:drop-indexes drop-indexes
:disable-triggers disable-triggers))
:disable-triggers
:max-parallel-create-index)))))
(copy-database source
,@ (when worker-count
(list :worker-count worker-count))
,@ (when concurrency
(list :concurrency concurrency))
:on-error-stop on-error-stop
:truncate truncate
:drop-indexes drop-indexes
:disable-triggers disable-triggers
:max-parallel-create-index max-parallel-create-index))
,(sql-code-block pg-db-conn 'state-after after "after load")
;; reporting
(when summary
(report-full-summary "Total import time" *state*
:before state-before
:finally state-after
:parallel state-idx))))))
,(sql-code-block pg-db-conn :post after "after load")))))
(defrule load-copy-file load-copy-file-command
(:lambda (command)
(bind (((source encoding fields pg-db-uri columns
&key ((:copy-options options)) gucs before after) command))
(bind (((source encoding fields pg-db-uri table-name columns
&key options gucs before after) command))
(cond (*dry-run*
(lisp-code-for-csv-dry-run pg-db-uri))
(t
(lisp-code-for-loading-from-copy source fields pg-db-uri
(lisp-code-for-loading-from-copy source pg-db-uri
:encoding encoding
:fields fields
:target-table-name table-name
:columns columns
:gucs gucs
:before before
:after after
:copy-options options))))))
:options options))))))

View File

@ -34,12 +34,22 @@
(bind (((_ digits) hex))
(code-char (parse-integer (text digits) :radix 16)))))
(defrule tab (and #\\ #\t) (:constant #\Tab))
(defrule tab-separator (and #\' #\\ #\t #\') (:constant #\Tab))
(defrule backslash-separator (and #\' #\\ #\') (:constant #\\))
(defrule separator (and #\' (or hex-char-code tab character ) #\')
(defrule single-quote-separator (or (and #\' #\' #\' #\')
(and #\' #\\ #\' #\'))
(:constant #\'))
(defrule other-char-separator (and #\' (or hex-char-code character) #\')
(:lambda (sep)
(bind (((_ char _) sep)) char)))
(defrule separator (or single-quote-separator
backslash-separator
tab-separator
other-char-separator))
;;
;; Main CSV options (WITH ... in the command grammar)
;;
@ -50,7 +60,7 @@
(cons :skip-lines (parse-integer (text digits))))))
(defrule option-csv-header (and kw-csv kw-header)
(:constant (cons :csv-header t)))
(:constant (cons :header t)))
(defrule option-fields-enclosed-by
(and kw-fields (? kw-optionally) kw-enclosed kw-by separator)
@ -61,8 +71,8 @@
(defrule option-fields-not-enclosed (and kw-fields kw-not kw-enclosed)
(:constant (cons :quote nil)))
(defrule quote-quote "double-quote" (:constant "\"\""))
(defrule backslash-quote "backslash-quote" (:constant "\\\""))
(defrule quote-quote "double-quote" (:constant #(#\" #\")))
(defrule backslash-quote "backslash-quote" (:constant #(#\\ #\")))
(defrule escaped-quote-name (or quote-quote backslash-quote))
(defrule escaped-quote-literal (or (and #\" #\") (and #\\ #\")) (:text t))
(defrule escaped-quote (or escaped-quote-literal
@ -103,11 +113,17 @@
(bind (((_ _ _ escape-mode) term))
(cons :escape-mode escape-mode))))
(defrule csv-option (or option-batch-rows
(defrule csv-option (or option-on-error-stop
option-on-error-resume-next
option-workers
option-concurrency
option-batch-rows
option-batch-size
option-batch-concurrency
option-prefetch-rows
option-max-parallel-create-index
option-truncate
option-disable-triggers
option-identifiers-case
option-drop-indexes
option-skip-header
option-csv-header
@ -118,21 +134,12 @@
option-fields-terminated-by
option-trim-unquoted-blanks
option-keep-unquoted-blanks
option-csv-escape-mode))
option-csv-escape-mode
option-null-if))
(defrule another-csv-option (and comma csv-option)
(:lambda (source)
(bind (((_ option) source)) option)))
(defrule csv-option-list (and csv-option (* another-csv-option))
(:lambda (source)
(destructuring-bind (opt1 opts) source
(alexandria:alist-plist `(,opt1 ,@opts)))))
(defrule csv-options (and kw-with csv-option-list)
(:lambda (source)
(bind (((_ opts) source))
(cons :csv-options opts))))
(defrule csv-options (and kw-with
(and csv-option (* (and comma csv-option))))
(:function flatten-option-list))
;;
;; CSV per-field reading options
@ -196,15 +203,6 @@
(defrule csv-field-options (? csv-field-option-list))
(defrule csv-raw-field-name (and (or #\_ (alpha-char-p character))
(* (or (alpha-char-p character)
(digit-char-p character)
#\Space
#\.
#\$
#\_)))
(:text t))
(defrule csv-bare-field-name (and (or #\_ (alpha-char-p character))
(* (or (alpha-char-p character)
(digit-char-p character)
@ -214,9 +212,10 @@
(:lambda (name)
(string-downcase (text name))))
(defrule csv-quoted-field-name (and #\" csv-raw-field-name #\")
(defrule csv-quoted-field-name (or (and #\' (* (not #\')) #\')
(and #\" (* (not #\")) #\"))
(:lambda (csv-field-name)
(bind (((_ name _) csv-field-name)) name)))
(bind (((_ name _) csv-field-name)) (text name))))
(defrule csv-field-name (or csv-quoted-field-name csv-bare-field-name))
@ -233,11 +232,6 @@
(destructuring-bind (field1 fields) source
(list* field1 fields))))
(defrule open-paren (and ignore-whitespace #\( ignore-whitespace)
(:constant :open-paren))
(defrule close-paren (and ignore-whitespace #\) ignore-whitespace)
(:constant :close-paren))
(defrule having-fields (and kw-having kw-fields) (:constant nil))
(defrule csv-source-field-list (and (? having-fields)
@ -253,44 +247,6 @@
(defrule column-name csv-field-name) ; same rules here
(defrule column-type csv-field-name) ; again, same rules, names only
(defun not-doublequote (char)
(not (eql #\" char)))
(defun symbol-character-p (character)
(not (member character '(#\Space #\( #\)))))
(defun symbol-first-character-p (character)
(and (symbol-character-p character)
(not (member character '(#\+ #\-)))))
(defrule sexp-symbol (and (symbol-first-character-p character)
(* (symbol-character-p character)))
(:lambda (schars)
(pgloader.transforms:intern-symbol (text schars))))
(defrule sexp-string-char (or (not-doublequote character) (and #\\ #\")))
(defrule sexp-string (and #\" (* sexp-string-char) #\")
(:destructure (q1 string q2)
(declare (ignore q1 q2))
(text string)))
(defrule sexp-integer (+ (or "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"))
(:lambda (list)
(parse-integer (text list) :radix 10)))
(defrule sexp-list (and open-paren sexp (* sexp) close-paren)
(:destructure (open car cdr close)
(declare (ignore open close))
(cons car cdr)))
(defrule sexp-atom (and ignore-whitespace
(or sexp-string sexp-integer sexp-symbol))
(:lambda (atom)
(bind (((_ a) atom)) a)))
(defrule sexp (or sexp-atom sexp-list))
(defrule column-expression (and kw-using sexp)
(:lambda (expr)
(bind (((_ sexp) expr)) sexp)))
@ -319,6 +275,12 @@
open-paren csv-target-columns close-paren)
(:lambda (source)
(bind (((_ _ columns _) source)) columns)))
(defrule csv-target-table (and kw-target kw-table dsn-table-name)
(:lambda (c-t-t)
;; dsn-table-name: (:table-name "schema" . "table")
(cdr (third c-t-t))))
;;
;; The main command parsing
;;
@ -401,17 +363,7 @@
(:regex (make-instance 'csv-connection :spec src))
(:http (make-instance 'csv-connection :uri (first specs))))))))
(defrule get-csv-file-source-from-environment-variable (and kw-getenv name)
(:lambda (p-e-v)
(bind (((_ varname) p-e-v)
(connstring (getenv-default varname)))
(unless connstring
(error "Environment variable ~s is unset." varname))
(parse 'csv-file-source connstring))))
(defrule csv-source (and kw-load kw-csv kw-from
(or get-csv-file-source-from-environment-variable
csv-file-source))
(defrule csv-source (and kw-load kw-csv kw-from csv-file-source)
(:lambda (src)
(bind (((_ _ _ source) src)) source)))
@ -434,11 +386,20 @@
(defrule load-csv-file-command (and csv-source
(? file-encoding) (? csv-source-field-list)
target (? csv-target-column-list)
target
(? csv-target-table)
(? csv-target-column-list)
load-csv-file-optional-clauses)
(:lambda (command)
(destructuring-bind (source encoding fields target columns clauses) command
`(,source ,encoding ,fields ,target ,columns ,@clauses))))
(destructuring-bind (source encoding fields pguri table-name columns clauses)
command
(list* source
encoding
fields
pguri
(or table-name (pgconn-table-name pguri))
columns
clauses))))
(defun lisp-code-for-csv-dry-run (pg-db-conn)
`(lambda ()
@ -448,71 +409,82 @@
(log-message :log "DRY RUN, only checking PostgreSQL connection.")
(check-connection ,pg-db-conn)))
(defun lisp-code-for-loading-from-csv (csv-conn fields pg-db-conn
(defun lisp-code-for-loading-from-csv (csv-conn pg-db-conn
&key
(encoding :utf-8)
fields
target-table-name
columns
gucs before after
((:csv-options options)))
gucs before after options
&allow-other-keys
&aux
(worker-count (getf options :worker-count))
(concurrency (getf options :concurrency)))
`(lambda ()
(let* ((state-before (pgloader.utils:make-pgstate))
(summary (null *state*))
(*state* (or *state* (pgloader.utils:make-pgstate)))
(state-idx ,(when (getf options :drop-indexes)
`(pgloader.utils:make-pgstate)))
(state-after ,(when (or after (getf options :drop-indexes))
`(pgloader.utils:make-pgstate)))
,@(pgsql-connection-bindings pg-db-conn gucs)
(let* (,@(pgsql-connection-bindings pg-db-conn gucs)
,@(batch-control-bindings options)
(source-db (with-stats-collection ("fetch" :state state-before)
(expand (fetch-file ,csv-conn)))))
,@(identifier-case-binding options)
(source-db (with-stats-collection ("fetch" :section :pre)
(expand (fetch-file ,csv-conn)))))
(progn
,(sql-code-block pg-db-conn 'state-before before "before load")
,(sql-code-block pg-db-conn :pre before "before load")
(let ((truncate (getf ',options :truncate))
(disable-triggers (getf ',options :disable-triggers))
(drop-indexes (getf ',options :drop-indexes))
(source
(make-instance 'pgloader.csv:copy-csv
:target-db ,pg-db-conn
:source source-db
:target ',(pgconn-table-name pg-db-conn)
:encoding ,encoding
:fields ',fields
:columns ',columns
,@(remove-batch-control-option
options :extras '(:truncate
:drop-indexes
:disable-triggers)))))
(pgloader.sources:copy-from source
:state-before state-before
:state-after state-after
:state-indexes state-idx
:truncate truncate
:drop-indexes drop-indexes
:disable-triggers disable-triggers))
(let* ((on-error-stop (getf ',options :on-error-stop))
(truncate (getf ',options :truncate))
(disable-triggers (getf ',options :disable-triggers))
(drop-indexes (getf ',options :drop-indexes))
(max-parallel-create-index (getf ',options :max-parallel-create-index))
(fields
',(let ((null-as (getf options :null-as)))
(if null-as
(mapcar (lambda (field)
(if (member :null-as field) field
(append field (list :null-as null-as))))
fields)
fields)))
(source
(make-instance 'copy-csv
:target-db ,pg-db-conn
:source source-db
:target (create-table ',target-table-name)
:encoding ,encoding
:fields fields
:columns ',columns
,@(remove-batch-control-option
options :extras '(:null-as
:worker-count
:concurrency
:truncate
:drop-indexes
:disable-triggers
:max-parallel-create-index)))))
(copy-database source
,@ (when worker-count
(list :worker-count worker-count))
,@ (when concurrency
(list :concurrency concurrency))
:on-error-stop on-error-stop
:truncate truncate
:drop-indexes drop-indexes
:disable-triggers disable-triggers
:max-parallel-create-index max-parallel-create-index))
,(sql-code-block pg-db-conn 'state-after after "after load")
;; reporting
(when summary
(report-full-summary "Total import time" *state*
:before state-before
:finally state-after
:parallel state-idx))))))
,(sql-code-block pg-db-conn :post after "after load")))))
(defrule load-csv-file load-csv-file-command
(:lambda (command)
(bind (((source encoding fields pg-db-uri columns
&key ((:csv-options options)) gucs before after) command))
(bind (((source encoding fields pg-db-uri table-name columns
&key options gucs before after) command))
(cond (*dry-run*
(lisp-code-for-csv-dry-run pg-db-uri))
(t
(lisp-code-for-loading-from-csv source fields pg-db-uri
(lisp-code-for-loading-from-csv source pg-db-uri
:encoding encoding
:fields fields
:target-table-name table-name
:columns columns
:gucs gucs
:before before
:after after
:csv-options options))))))
:options options))))))

View File

@ -25,7 +25,7 @@
(defrule doubled-at-sign (and "@@") (:constant "@"))
(defrule doubled-colon (and "::") (:constant ":"))
(defrule password (+ (or (not "@") doubled-at-sign)) (:text t))
(defrule username (and (or #\_ (alpha-char-p character))
(defrule username (and (or #\_ (alpha-char-p character) (digit-char-p character))
(* (or (alpha-char-p character)
(digit-char-p character)
#\.
@ -44,9 +44,6 @@
;; password looks like '(":" "password")
(list :user username :password (cadr password)))))
(defun hexdigit-char-p (character)
(member character #. (quote (coerce "0123456789abcdefABCDEF" 'list))))
(defrule ipv4-part (and (digit-char-p character)
(? (digit-char-p character))
(? (digit-char-p character))))
@ -55,22 +52,56 @@
(:lambda (ipv4)
(list :ipv4 (text ipv4))))
;;; socket directory is unix only, so we can forbid ":" on the parsing
(defrule ipv6 (and #\[ (+ (or (hexdigit-char-p character) ":")) #\])
(:lambda (ipv6)
(list :ipv6 (text ipv6))))
;; socket directory is unix only, so we can forbid ":" on the parsing
(defun socket-directory-character-p (char)
(or (member char #.(quote (coerce "/.-_" 'list)))
(or (find char "/.-_")
(alphanumericp char)))
(defrule socket-directory (and "unix:" (* (socket-directory-character-p character)))
(defrule socket-directory (and "unix:"
(* (or (not ":") doubled-colon)))
(:destructure (unix socket-directory)
(declare (ignore unix))
(list :unix (when socket-directory (text socket-directory)))))
(defrule network-name (and namestring (* (and "." namestring)))
;;;
;;; See https://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_hostnames
;;;
;;; The characters allowed in labels are a subset of the ASCII character
;;; set, consisting of characters a through z, A through Z, digits 0 through
;;; 9, and hyphen.
;;;
;;; This rule is known as the LDH rule (letters, digits, hyphen).
;;;
;;; - Domain names are interpreted in case-independent manner.
;;; - Labels may not start or end with a hyphen.
;;; - An additional rule requires that top-level domain names should not be
;;; all-numeric.
;;;
(defrule network-label-letters-digit (or (alpha-char-p character)
(digit-char-p character)))
(defrule network-label-with-hyphen
(and network-label-letters-digit
(+ (or (and #\- network-label-letters-digit)
network-label-letters-digit)))
(:text t))
(defrule network-label-no-hyphen (+ network-label-letters-digit)
(:text t))
(defrule network-label (or network-label-with-hyphen network-label-no-hyphen)
(:identity t))
(defrule network-hostname (and network-label (* (and "." network-label)))
(:lambda (name)
(let ((host (text name)))
(list :host (unless (string= "" host) host)))))
(defrule hostname (or ipv4 socket-directory network-name)
(defrule hostname (or ipv4 ipv6 socket-directory network-hostname)
(:identity t))
(defun process-hostname (hostname)
@ -78,6 +109,7 @@
(ecase type
(:unix (if name (cons :unix name) :unix))
(:ipv4 name)
(:ipv6 name)
(:host name))))
(defrule dsn-hostname (and (? hostname) (? dsn-port))
@ -86,10 +118,13 @@
(append (list :host (when host (process-hostname host)))
port))))
(defrule dsn-dbname (and "/" (? namestring))
(:destructure (slash dbname)
(declare (ignore slash))
(list :dbname dbname)))
(defrule dsn-dbname (and "/" (? (or single-quoted-string
(* (or (alpha-char-p character)
(digit-char-p character)
#\.
punct)))))
(:lambda (dbn)
(list :dbname (text (second dbn)))))
(defrule dsn-option-ssl-disable "disable" (:constant :no))
(defrule dsn-option-ssl-allow "allow" (:constant :try))
@ -105,9 +140,11 @@
(declare (ignore key e))
(cons :use-ssl val))))
(defrule maybe-quoted-namestring (or double-quoted-namestring
quoted-namestring
namestring))
(defun get-pgsslmode (&optional (env-var-name "PGSSLMODE") default)
"Get PGSSLMODE from the environment."
(let ((pgsslmode (getenv-default env-var-name default)))
(when pgsslmode
(cdr (parse 'dsn-option-ssl (format nil "sslmode=~a" pgsslmode))))))
(defrule qualified-table-name (and maybe-quoted-namestring
"."
@ -118,6 +155,10 @@
(defrule dsn-table-name (or qualified-table-name maybe-quoted-namestring)
(:lambda (name)
;; we can't make a table instance yet here, because for that we need to
;; apply-identifier-case on it, and that requires to have initialized
;; the *pgsql-reserved-keywords*, and we can't do that before parsing
;; the target database connection string, can we?
(cons :table-name name)))
(defrule dsn-option-table-name (and (? (and "tablename" "="))
@ -194,32 +235,35 @@
;; Default to environment variables as described in
;; http://www.postgresql.org/docs/9.3/static/app-psql.html
(declare (ignore type))
(make-instance 'pgsql-connection
:user (or user
(getenv-default "PGUSER"
#+unix (getenv-default "USER")
#-unix (getenv-default "UserName")))
:pass (or password (getenv-default "PGPASSWORD"))
:host (or host (getenv-default "PGHOST"
#+unix :unix
#-unix "localhost"))
:port (or port (parse-integer
(getenv-default "PGPORT" "5432")))
:name (or dbname (getenv-default "PGDATABASE" user))
(let ((pgconn
(make-instance 'pgsql-connection
:user (or user
(getenv-default "PGUSER"
#+unix
(getenv-default "USER")
#-unix
(getenv-default "UserName")))
:host (or host (getenv-default "PGHOST"
#+unix :unix
#-unix "localhost"))
:port (or port (parse-integer
(getenv-default "PGPORT" "5432")))
:name (or dbname (getenv-default "PGDATABASE" user))
:use-ssl use-ssl
:table-name table-name))))
:use-ssl (or use-ssl (get-pgsslmode "PGSSLMODE"))
:table-name table-name)))
;; Now set the password, maybe from ~/.pgpass
(setf (db-pass pgconn)
(or password
(getenv-default "PGPASSWORD")
(match-pgpass-file (db-host pgconn)
(princ-to-string (db-port pgconn))
(db-name pgconn)
(db-user pgconn))))
;; And return our pgconn instance
pgconn))))
(defrule get-pgsql-uri-from-environment-variable (and kw-getenv name)
(:lambda (p-e-v)
(bind (((_ varname) p-e-v))
(let ((connstring (getenv-default varname)))
(unless connstring
(error "Environment variable ~s is unset." varname))
(parse 'pgsql-uri connstring)))))
(defrule target (and kw-into (or pgsql-uri
get-pgsql-uri-from-environment-variable))
(defrule target (and kw-into pgsql-uri)
(:destructure (into target)
(declare (ignore into))
target))
@ -227,7 +271,7 @@
(defun pgsql-connection-bindings (pg-db-uri gucs)
"Generate the code needed to set PostgreSQL connection bindings."
`((*pg-settings* ',gucs)
(pgloader.pgsql::*pgsql-reserved-keywords*
`((*pg-settings* (pgloader.pgsql:sanitize-user-gucs ',gucs))
(*pgsql-reserved-keywords*
(pgloader.pgsql:list-reserved-keywords ,pg-db-uri))))

View File

@ -18,9 +18,13 @@
(bind (((_ _ _ table-name) tn))
(cons :table-name (text table-name)))))
(defrule dbf-option (or option-batch-rows
(defrule dbf-option (or option-on-error-stop
option-on-error-resume-next
option-workers
option-concurrency
option-batch-rows
option-batch-size
option-batch-concurrency
option-prefetch-rows
option-truncate
option-disable-triggers
option-data-only
@ -28,21 +32,11 @@
option-include-drop
option-create-table
option-create-tables
option-table-name))
option-table-name
option-identifiers-case))
(defrule another-dbf-option (and comma dbf-option)
(:lambda (source)
(bind (((_ option) source)) option)))
(defrule dbf-option-list (and dbf-option (* another-dbf-option))
(:lambda (source)
(destructuring-bind (opt1 opts) source
(alexandria:alist-plist `(,opt1 ,@opts)))))
(defrule dbf-options (and kw-with dbf-option-list)
(:lambda (source)
(bind (((_ opts) source))
(cons :dbf-options opts))))
(defrule dbf-options (and kw-with (and dbf-option (* (and comma dbf-option))))
(:function flatten-option-list))
(defrule dbf-uri (and "dbf://" filename)
(:lambda (source)
@ -63,7 +57,9 @@
(defrule load-dbf-optional-clauses (* (or dbf-options
gucs
casts
before-load
after-schema
after-load))
(:lambda (clauses-list)
(alexandria:alist-plist clauses-list)))
@ -71,15 +67,22 @@
;;; dbf defaults to ascii rather than utf-8
(defrule dbf-file-encoding (? (and kw-with kw-encoding encoding))
(:lambda (enc)
(if enc
(bind (((_ _ encoding) enc)) encoding)
:ascii)))
(when enc
(bind (((_ _ encoding) enc)) encoding))))
(defrule load-dbf-command (and dbf-source (? dbf-file-encoding)
target load-dbf-optional-clauses)
(defrule load-dbf-command (and dbf-source
(? dbf-file-encoding)
target
(? csv-target-table)
load-dbf-optional-clauses)
(:lambda (command)
(destructuring-bind (source encoding target clauses) command
`(,source ,encoding ,target ,@clauses))))
(destructuring-bind (source encoding pguri table-name clauses)
command
(list* source
encoding
pguri
(or table-name (pgconn-table-name pguri))
clauses))))
(defun lisp-code-for-dbf-dry-run (dbf-db-conn pg-db-conn)
`(lambda ()
@ -89,51 +92,54 @@
(defun lisp-code-for-loading-from-dbf (dbf-db-conn pg-db-conn
&key
(encoding :ascii)
gucs before after
((:dbf-options options)))
target-table-name
encoding
gucs casts options
before after-schema after
&allow-other-keys)
`(lambda ()
(let* ((state-before (pgloader.utils:make-pgstate))
(summary (null *state*))
(*state* (or *state* (pgloader.utils:make-pgstate)))
(state-after ,(when after `(pgloader.utils:make-pgstate)))
(let* ((*default-cast-rules* ',*db3-default-cast-rules*)
(*cast-rules* ',casts)
,@(pgsql-connection-bindings pg-db-conn gucs)
,@(batch-control-bindings options)
,@(identifier-case-binding options)
(table-name ',(pgconn-table-name pg-db-conn))
(source-db (with-stats-collection ("fetch" :state state-before)
(expand (fetch-file ,dbf-db-conn))))
(source
(make-instance 'pgloader.db3:copy-db3
:target-db ,pg-db-conn
:encoding ,encoding
:source-db source-db
:target table-name)))
,@(identifier-case-binding options)
(on-error-stop (getf ',options :on-error-stop))
(source-db (with-stats-collection ("fetch" :section :pre)
(expand (fetch-file ,dbf-db-conn))))
(source
(make-instance 'copy-db3
:target-db ,pg-db-conn
:encoding ,encoding
:source-db source-db
:target ,(when target-table-name
(create-table target-table-name)))))
,(sql-code-block pg-db-conn 'state-before before "before load")
,(sql-code-block pg-db-conn :pre before "before load")
(pgloader.sources:copy-database source
:state-before state-before
,@(remove-batch-control-option options))
(copy-database source
,@(remove-batch-control-option options)
:after-schema ',after-schema
:on-error-stop on-error-stop
:create-indexes nil
:foreign-keys nil
:reset-sequences nil)
,(sql-code-block pg-db-conn 'state-after after "after load")
;; reporting
(when summary
(report-full-summary "Total import time" *state*
:before state-before
:finally state-after)))))
,(sql-code-block pg-db-conn :post after "after load"))))
(defrule load-dbf-file load-dbf-command
(:lambda (command)
(bind (((source encoding pg-db-uri
&key ((:dbf-options options)) gucs before after) command))
(bind (((source encoding pg-db-uri table-name
&key options gucs casts before after-schema after)
command))
(cond (*dry-run*
(lisp-code-for-dbf-dry-run source pg-db-uri))
(t
(lisp-code-for-loading-from-dbf source pg-db-uri
:target-table-name table-name
:encoding encoding
:gucs gucs
:casts casts
:before before
:after-schema after-schema
:after after
:dbf-options options))))))
:options options))))))

Some files were not shown because too many files have changed in this diff Show More