# PGLoader

[![Build Status](https://travis-ci.org/dimitri/pgloader.svg?branch=master)](https://travis-ci.org/dimitri/pgloader)

[![Join the chat at https://gitter.im/dimitri/pgloader](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/dimitri/pgloader?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)

pgloader is a data loading tool for PostgreSQL, using the `COPY` command.

Its main advantage over just using `COPY` or `\copy`, and over using a
*Foreign Data Wrapper*, is its transaction behaviour, where *pgloader*
will keep a separate file of rejected data, but continue trying to
`copy` good data in your database.

The default PostgreSQL behaviour is transactional, which means that
*any* erroneous line in the input data (file or remote database) will
stop the entire bulk load for the table.

pgloader also implements data reformatting, a typical example of that
being the transformation of MySQL datestamps `0000-00-00` and
`0000-00-00 00:00:00` to PostgreSQL `NULL` value (because our calendar
never had a *year zero*).

## Versioning

pgloader version 1.x is quite old and was developed in `TCL`.
When faced with maintaining that code, the new emerging development
team (hi!) picked `python` instead because that made sense at the
time. So pgloader version 2.x was written in python.

The current version of pgloader is the 3.x series, which is written in
[Common Lisp](http://cliki.net/) for better development flexibility,
runtime performance, and support of real threading.

The versioning is now following the Emacs model, where any X.0 release
number means you're using a development version (alpha, beta, or release
candidate). The next stable versions are going to be `3.1` then `3.2` etc.

When using a development snapshot rather than a released version the version
number includes the git hash (in its abbreviated form):

  - `pgloader version "3.0.99"`
  
     Release candidate 9 for pgloader version 3.1, with a *git tag* named
     `v3.0.99` so that it's easy to checkout the same sources as the
     released code.
     
  - `pgloader version "3.0.fecae2c"`
  
     Development snapshot again *git hash* `fecae2c`. It's possible to have
     the same sources on another setup with using the git command `git
     checkout fecae2c`.

  - `pgloader version "3.1.0"`
  
     Stable release.

## LICENCE

pgloader is available under [The PostgreSQL Licence](http://www.postgresql.org/about/licence/).

## INSTALL

You can install pgloader directly from
[apt.postgresql.org](https://wiki.postgresql.org/wiki/Apt) and from official
debian repositories, see
[packages.debian.org/pgloader](https://packages.debian.org/search?keywords=pgloader).

    $ apt-get install pgloader

You can also use a **docker** image for pgloader at
<https://hub.docker.com/r/dimitri/pgloader/>:

    $ docker pull dimitri/pgloader
    $ docker run --rm --name pgloader dimitri/pgloader:latest pgloader --version
    $ docker run --rm --name pgloader dimitri/pgloader:latest pgloader --help

## Build from sources

pgloader is now a Common Lisp program, tested using the
[SBCL](http://sbcl.org/) (>= 1.1.14) and
[Clozure CL](http://ccl.clozure.com/) implementations with
[Quicklisp](http://www.quicklisp.org/beta/).

    $ apt-get install sbcl unzip libsqlite3-dev make curl gawk freetds-dev libzip-dev
    $ cd /path/to/pgloader
	$ make pgloader
	$ ./build/bin/pgloader --help

When building from sources, you should always build from the current git
HEAD as it's basically the only source that is managed in a way to ensure it
builds aginst current set of dependencies versions.

## More options when building from source

The `Makefile` target `pgloader` knows how to produce a Self Contained
Binary file for pgloader, found at `./build/bin/pgloader`:

    $ make pgloader

By default, the `Makefile` uses [SBCL](http://sbcl.org/) to compile your
binary image, though it's possible to build using
[CCL](http://ccl.clozure.com/).

    $ make CL=ccl pgloader

If using `SBCL` and it supports core compression, the make process will
use it to generate a smaller binary.  To force disabling core
compression, you may use:

    $ make COMPRESS_CORE=no pgloader

The `--compress-core` is unique to SBCL, so not used when `CC` is different
from the `sbcl` value.

You can also tweak the default amount of memory that the `pgloader` image
will allow itself using when running through your data (don't ask for more
than your current RAM tho):

    $ make DYNSIZE=8192 pgloader

The `make pgloader` command when successful outputs a `./build/bin/pgloader`
file for you to use.

## Usage

You can either give a command file to pgloader or run it all from the
command line, see the
[pgloader quick start](http://pgloader.io/howto/quickstart.html) on
<http://pgloader.io> for more details.

    $ ./build/bin/pgloader --help
    $ ./build/bin/pgloader <file.load>

For example, for a full migration from SQLite:

    $ createdb newdb
    $ pgloader ./test/sqlite/sqlite.db postgresql:///newdb

Or for a full migration from MySQL, including schema definition (tables,
indexes, foreign keys, comments) and parallel loading of the corrected data:

    $ createdb pagila
    $ pgloader mysql://user@localhost/sakila postgresql:///pagila

See the documentation file `pgloader.1.md` for details. You can compile that
file into a manual page or an HTML page thanks to the `ronn` application:

    $ apt-get install ruby-ronn
	$ make docs