Review the README.

This commit is contained in:
Dimitri Fontaine 2013-11-12 11:19:06 +01:00
parent 59d8188ad0
commit c706813465

152
README.md
View File

@ -15,31 +15,61 @@ pgloader also implements data reformating, the main example of that being a
transformation from MySQL dates `0000-00-00` and `0000-00-00 00:00:00` to transformation from MySQL dates `0000-00-00` and `0000-00-00 00:00:00` to
PostgreSQL `NULL` value (because our calendar never had a *year zero*). PostgreSQL `NULL` value (because our calendar never had a *year zero*).
## Versioning
The pgloader version 1.x from a long time ago had been developped in `TCL`.
When faced with maintaining that code, the new emerging development team
(hi!) picked `python` instead because that made sense at the time. So
pgloader version 2.x were in python.
The current version of pgloader is the 3.x series, which is written in
[Common Lisp](http://cliki.net/) for better development flexibility, run
time performances, real threading.
The versioning is now following the Emacs model, where any X.0 release
number means you're using a development version (alpha, beta, or release
candidate). The next stable versions are going to be `3.1` then `3.2` etc.
## INSTALL ## INSTALL
pgloader is now a Common Lisp program, tested using the pgloader is now a Common Lisp program, tested using the
[SBCL](http://sbcl.org/) and [CCL](http://ccl.clozure.com/) implementation [SBCL](http://sbcl.org/) implementation with
with [Quicklisp](http://www.quicklisp.org/beta/). [Quicklisp](http://www.quicklisp.org/beta/).
apt-get install sbcl apt-get install sbcl
apt-get install libmysqlclient-dev libsqlite3-dev apt-get install libmysqlclient-dev libsqlite3-dev
wget http://beta.quicklisp.org/quicklisp.lisp make pgloader
sbcl --load quicklisp.lisp
* (quicklisp-quickstart:install)
* (ql:add-to-init-file)
The current version of the code depends on a recent version of Postmodern ### Patches
not found in Quicklisp yet at the time of this writing:
cd ~/quicklisp/local-projects/ Several dependencies needed some patching for pgloader to be running fine,
git clone https://github.com/marijnh/Postmodern.git the given `Makefile` will handle that for you. The goal is for those patches
git clone -b empty-strings-and-nil https://github.com/dimitri/cl-csv.git to get included in the mainline version of the dependencies so that this
git clone http://git.tapoueh.org/git/pgloader.git whole section and assorted `Makefile` business disappear for being
irrelevant.
#### Postmodern
The current version of the code depends on a recent version of
[Postmodern](http://marijnhaverbeke.nl/postmodern/postmodern.html) not found
in Quicklisp yet at the time of this writing. Currently the pgloader source
tree contains a patch to apply against postmodern sources, and the
`Makefile` will do the following for you:
Read https://github.com/marijnh/Postmodern/issues/39 for details.
#### cl-csv
The handling of `NULL` values in `CSV` files requires pgloader to have more
smarts than the default `cl-csv` code, so the `Makefile` will fetch my
branch including a fix for that.
Read https://github.com/AccelerationNet/cl-csv/pull/12 for details.
## The pgloader.lisp script
Now you can use the `#!` script or build a self-contained binary executable Now you can use the `#!` script or build a self-contained binary executable
file, as shown below. You might have to modify it the `pgloader.lisp` script file, as shown below.
because it's now hard coded to use `/usr/local/bin/sbcl` and you probably
want to change that part then:
./pgloader.lisp --help ./pgloader.lisp --help
@ -49,53 +79,31 @@ them from the internet and prepare them (thanks to *Quicklisp*). So please
be patient while that happens and make sure we can actually connect and be patient while that happens and make sure we can actually connect and
download the dependencies. download the dependencies.
## Compile into a self-contained binary file ## Build Self-Contained binary file
First, make sure you have downloaded all the required Common Lisp The `Makefile` target `pgloader` knows how to produce a Self Contained
dependencies that pgloader uses, and install the Binary file for pgloader, named `pgloader.exe`:
[buildapp](http://www.xach.com/lisp/buildapp/) application:
$ sbcl $ make pgloader
* (ql:quickload "pgloader")
* (ql:quickload "buildapp")
* (buildapp:build-buildapp "./buildapp")
If you just installed *SBCL* and *Quicklisp* to use pgloader, that command Note that the `Makefile` uses the `--compress-core` option, that should be
should do it: enabled in your local copy of `SBCL`. If that's not the case, it's probably
because you did compile and install `SBCL` yourself, so that you have a
decently recent version to use. Then you need to compile it with the
`--with-sb-core-compression` option.
./buildapp --logfile /tmp/build.log \ You can also remove the `--compress-core` option by editing the `Makefile`
--asdf-tree ~/quicklisp/dists \ and removing the line where it appears.
--load-system pgloader \
--entry pgloader:main \
--dynamic-space-size 4096 \
--output pgloader.exe
You can also use the option `--compress-core` if your platform supports it, The `make pgloader` command when successful outputs a `./build/pgloader.exe`
so has to reduce the size of the generated binary. file for you to use.
When you're a Common Lisp developper or otherwise already using Quicklisp
with some *local-projects* and a local source registry setup for *asdf*, use
a command line like this:
./buildapp --logfile /tmp/build.log \
--asdf-tree ~/quicklisp/local-projects \
--manifest-file ./manifest.ql \
--asdf-tree ~/quicklisp/dists \
--load-system pgloader \
--entry pgloader:main \
--dynamic-space-size 4096 \
--output pgloader.exe
That command requires a `manifest.ql` file that you can obtain with the lisp
command:
(ql:write-asdf-manifest-file "path/to/manifest.ql")
## Usage ## Usage
Give as many command files that you need to pgloader: Give as many command files that you need to pgloader:
./pgloader.lisp <file.load> $ ./build/pgloader.exe --help
$ ./build/pgloader.exe <file.load>
See the documentation file `pgloader.1.md` for details. You can compile that See the documentation file `pgloader.1.md` for details. You can compile that
file into a manual page or an HTML page thanks to the `pandoc` application: file into a manual page or an HTML page thanks to the `pandoc` application:
@ -108,23 +116,15 @@ file into a manual page or an HTML page thanks to the `pandoc` application:
Some notes about what I intend to be working on next. Some notes about what I intend to be working on next.
### tests
- add needed pre-requisites in bootstrap.sh to run the MySQL and SQLite
tests from the `make test` target without errors
### binary distribution ### binary distribution
- prepare an all-included binary for several platforms - prepare an all-included binary for several platforms
### internals & refactoring
- review pgloader.pgsql:reformat-row date-columns arguments
- review connection string handling for both PostgreSQL and MySQL
- provide a better toplevel API
- implement tests
### command & control
- commands: `LOAD` and `INI` formats
- compat with `SQL*Loader` format
- see pgloader.1.md for details.
### docs ### docs
- host a proper website for the tool, with use cases and a tutorial - host a proper website for the tool, with use cases and a tutorial
@ -135,18 +135,8 @@ Some notes about what I intend to be working on next.
- error reporting (done) - error reporting (done)
- add input line number to log file? - add input line number to log file?
#### data input
- import directly from MySQL, file based export/import (done)
- import directly from MySQL streaming (done)
- general CSV and Flexible Text source formats
- fixed cols input data format
- compressed input (gzip, other algos)
- fetch data from S3
### transformation and casts ### transformation and casts
- experiment with perfs and inlining the transformation functions
- add typemod expression to cast rules in the command language - add typemod expression to cast rules in the command language
- add per-column support for cast rules in the system - add per-column support for cast rules in the system
@ -156,7 +146,7 @@ Some notes about what I intend to be working on next.
#### convenience #### convenience
- automatic creation of schema (from MySQL schema, or from CSV header) - automatic creation of schema even when loading from text files
- pre-fetch some rows to guesstimate data types? - pre-fetch some rows to guesstimate data types?
#### performances #### performances
@ -170,11 +160,6 @@ Data reformating is now going to have to happen in Common Lisp mostly, maybe
offer some other languages (cl-awk etc). offer some other languages (cl-awk etc).
- raw reformating, before rows are split - raw reformating, before rows are split
- per column reformating
- date (zero dates)
- integer and "" that should be NULL
- user-defined columns (constants, functions of other rows)
- column re-ordering
Have a try at something approaching: Have a try at something approaching:
@ -190,6 +175,11 @@ A part of that needs to happen client-side, another part server-side, and
the grammar has to make it clear what happens where. Maybe add a WHERE the grammar has to make it clear what happens where. Maybe add a WHERE
clause to the `COPY` or `LOAD` grammar for the client. clause to the `COPY` or `LOAD` grammar for the client.
#### filtering
Add commands to pick different target tables depending on the data found
when reading from the source.
#### UI #### UI
- add a web controler with pretty monitoring - add a web controler with pretty monitoring