Review the README.

This commit is contained in:
Dimitri Fontaine 2013-11-12 11:19:06 +01:00
parent 59d8188ad0
commit c706813465

152
README.md
View File

@ -15,31 +15,61 @@ pgloader also implements data reformating, the main example of that being a
transformation from MySQL dates `0000-00-00` and `0000-00-00 00:00:00` to
PostgreSQL `NULL` value (because our calendar never had a *year zero*).
## Versioning
The pgloader version 1.x from a long time ago had been developped in `TCL`.
When faced with maintaining that code, the new emerging development team
(hi!) picked `python` instead because that made sense at the time. So
pgloader version 2.x were in python.
The current version of pgloader is the 3.x series, which is written in
[Common Lisp](http://cliki.net/) for better development flexibility, run
time performances, real threading.
The versioning is now following the Emacs model, where any X.0 release
number means you're using a development version (alpha, beta, or release
candidate). The next stable versions are going to be `3.1` then `3.2` etc.
## INSTALL
pgloader is now a Common Lisp program, tested using the
[SBCL](http://sbcl.org/) and [CCL](http://ccl.clozure.com/) implementation
with [Quicklisp](http://www.quicklisp.org/beta/).
[SBCL](http://sbcl.org/) implementation with
[Quicklisp](http://www.quicklisp.org/beta/).
apt-get install sbcl
apt-get install libmysqlclient-dev libsqlite3-dev
wget http://beta.quicklisp.org/quicklisp.lisp
sbcl --load quicklisp.lisp
* (quicklisp-quickstart:install)
* (ql:add-to-init-file)
make pgloader
The current version of the code depends on a recent version of Postmodern
not found in Quicklisp yet at the time of this writing:
### Patches
cd ~/quicklisp/local-projects/
git clone https://github.com/marijnh/Postmodern.git
git clone -b empty-strings-and-nil https://github.com/dimitri/cl-csv.git
git clone http://git.tapoueh.org/git/pgloader.git
Several dependencies needed some patching for pgloader to be running fine,
the given `Makefile` will handle that for you. The goal is for those patches
to get included in the mainline version of the dependencies so that this
whole section and assorted `Makefile` business disappear for being
irrelevant.
#### Postmodern
The current version of the code depends on a recent version of
[Postmodern](http://marijnhaverbeke.nl/postmodern/postmodern.html) not found
in Quicklisp yet at the time of this writing. Currently the pgloader source
tree contains a patch to apply against postmodern sources, and the
`Makefile` will do the following for you:
Read https://github.com/marijnh/Postmodern/issues/39 for details.
#### cl-csv
The handling of `NULL` values in `CSV` files requires pgloader to have more
smarts than the default `cl-csv` code, so the `Makefile` will fetch my
branch including a fix for that.
Read https://github.com/AccelerationNet/cl-csv/pull/12 for details.
## The pgloader.lisp script
Now you can use the `#!` script or build a self-contained binary executable
file, as shown below. You might have to modify it the `pgloader.lisp` script
because it's now hard coded to use `/usr/local/bin/sbcl` and you probably
want to change that part then:
file, as shown below.
./pgloader.lisp --help
@ -49,53 +79,31 @@ them from the internet and prepare them (thanks to *Quicklisp*). So please
be patient while that happens and make sure we can actually connect and
download the dependencies.
## Compile into a self-contained binary file
## Build Self-Contained binary file
First, make sure you have downloaded all the required Common Lisp
dependencies that pgloader uses, and install the
[buildapp](http://www.xach.com/lisp/buildapp/) application:
The `Makefile` target `pgloader` knows how to produce a Self Contained
Binary file for pgloader, named `pgloader.exe`:
$ sbcl
* (ql:quickload "pgloader")
* (ql:quickload "buildapp")
* (buildapp:build-buildapp "./buildapp")
$ make pgloader
If you just installed *SBCL* and *Quicklisp* to use pgloader, that command
should do it:
Note that the `Makefile` uses the `--compress-core` option, that should be
enabled in your local copy of `SBCL`. If that's not the case, it's probably
because you did compile and install `SBCL` yourself, so that you have a
decently recent version to use. Then you need to compile it with the
`--with-sb-core-compression` option.
./buildapp --logfile /tmp/build.log \
--asdf-tree ~/quicklisp/dists \
--load-system pgloader \
--entry pgloader:main \
--dynamic-space-size 4096 \
--output pgloader.exe
You can also remove the `--compress-core` option by editing the `Makefile`
and removing the line where it appears.
You can also use the option `--compress-core` if your platform supports it,
so has to reduce the size of the generated binary.
When you're a Common Lisp developper or otherwise already using Quicklisp
with some *local-projects* and a local source registry setup for *asdf*, use
a command line like this:
./buildapp --logfile /tmp/build.log \
--asdf-tree ~/quicklisp/local-projects \
--manifest-file ./manifest.ql \
--asdf-tree ~/quicklisp/dists \
--load-system pgloader \
--entry pgloader:main \
--dynamic-space-size 4096 \
--output pgloader.exe
That command requires a `manifest.ql` file that you can obtain with the lisp
command:
(ql:write-asdf-manifest-file "path/to/manifest.ql")
The `make pgloader` command when successful outputs a `./build/pgloader.exe`
file for you to use.
## Usage
Give as many command files that you need to pgloader:
./pgloader.lisp <file.load>
$ ./build/pgloader.exe --help
$ ./build/pgloader.exe <file.load>
See the documentation file `pgloader.1.md` for details. You can compile that
file into a manual page or an HTML page thanks to the `pandoc` application:
@ -108,23 +116,15 @@ file into a manual page or an HTML page thanks to the `pandoc` application:
Some notes about what I intend to be working on next.
### tests
- add needed pre-requisites in bootstrap.sh to run the MySQL and SQLite
tests from the `make test` target without errors
### binary distribution
- prepare an all-included binary for several platforms
### internals & refactoring
- review pgloader.pgsql:reformat-row date-columns arguments
- review connection string handling for both PostgreSQL and MySQL
- provide a better toplevel API
- implement tests
### command & control
- commands: `LOAD` and `INI` formats
- compat with `SQL*Loader` format
- see pgloader.1.md for details.
### docs
- host a proper website for the tool, with use cases and a tutorial
@ -135,18 +135,8 @@ Some notes about what I intend to be working on next.
- error reporting (done)
- add input line number to log file?
#### data input
- import directly from MySQL, file based export/import (done)
- import directly from MySQL streaming (done)
- general CSV and Flexible Text source formats
- fixed cols input data format
- compressed input (gzip, other algos)
- fetch data from S3
### transformation and casts
- experiment with perfs and inlining the transformation functions
- add typemod expression to cast rules in the command language
- add per-column support for cast rules in the system
@ -156,7 +146,7 @@ Some notes about what I intend to be working on next.
#### convenience
- automatic creation of schema (from MySQL schema, or from CSV header)
- automatic creation of schema even when loading from text files
- pre-fetch some rows to guesstimate data types?
#### performances
@ -170,11 +160,6 @@ Data reformating is now going to have to happen in Common Lisp mostly, maybe
offer some other languages (cl-awk etc).
- raw reformating, before rows are split
- per column reformating
- date (zero dates)
- integer and "" that should be NULL
- user-defined columns (constants, functions of other rows)
- column re-ordering
Have a try at something approaching:
@ -190,6 +175,11 @@ A part of that needs to happen client-side, another part server-side, and
the grammar has to make it clear what happens where. Maybe add a WHERE
clause to the `COPY` or `LOAD` grammar for the client.
#### filtering
Add commands to pick different target tables depending on the data found
when reading from the source.
#### UI
- add a web controler with pretty monitoring