mirror of
https://github.com/dimitri/pgloader.git
synced 2025-08-08 15:27:00 +02:00
Review the README.
This commit is contained in:
parent
59d8188ad0
commit
c706813465
152
README.md
152
README.md
@ -15,31 +15,61 @@ pgloader also implements data reformating, the main example of that being a
|
||||
transformation from MySQL dates `0000-00-00` and `0000-00-00 00:00:00` to
|
||||
PostgreSQL `NULL` value (because our calendar never had a *year zero*).
|
||||
|
||||
## Versioning
|
||||
|
||||
The pgloader version 1.x from a long time ago had been developped in `TCL`.
|
||||
When faced with maintaining that code, the new emerging development team
|
||||
(hi!) picked `python` instead because that made sense at the time. So
|
||||
pgloader version 2.x were in python.
|
||||
|
||||
The current version of pgloader is the 3.x series, which is written in
|
||||
[Common Lisp](http://cliki.net/) for better development flexibility, run
|
||||
time performances, real threading.
|
||||
|
||||
The versioning is now following the Emacs model, where any X.0 release
|
||||
number means you're using a development version (alpha, beta, or release
|
||||
candidate). The next stable versions are going to be `3.1` then `3.2` etc.
|
||||
|
||||
## INSTALL
|
||||
|
||||
pgloader is now a Common Lisp program, tested using the
|
||||
[SBCL](http://sbcl.org/) and [CCL](http://ccl.clozure.com/) implementation
|
||||
with [Quicklisp](http://www.quicklisp.org/beta/).
|
||||
[SBCL](http://sbcl.org/) implementation with
|
||||
[Quicklisp](http://www.quicklisp.org/beta/).
|
||||
|
||||
apt-get install sbcl
|
||||
apt-get install libmysqlclient-dev libsqlite3-dev
|
||||
wget http://beta.quicklisp.org/quicklisp.lisp
|
||||
sbcl --load quicklisp.lisp
|
||||
* (quicklisp-quickstart:install)
|
||||
* (ql:add-to-init-file)
|
||||
make pgloader
|
||||
|
||||
The current version of the code depends on a recent version of Postmodern
|
||||
not found in Quicklisp yet at the time of this writing:
|
||||
### Patches
|
||||
|
||||
cd ~/quicklisp/local-projects/
|
||||
git clone https://github.com/marijnh/Postmodern.git
|
||||
git clone -b empty-strings-and-nil https://github.com/dimitri/cl-csv.git
|
||||
git clone http://git.tapoueh.org/git/pgloader.git
|
||||
Several dependencies needed some patching for pgloader to be running fine,
|
||||
the given `Makefile` will handle that for you. The goal is for those patches
|
||||
to get included in the mainline version of the dependencies so that this
|
||||
whole section and assorted `Makefile` business disappear for being
|
||||
irrelevant.
|
||||
|
||||
#### Postmodern
|
||||
|
||||
The current version of the code depends on a recent version of
|
||||
[Postmodern](http://marijnhaverbeke.nl/postmodern/postmodern.html) not found
|
||||
in Quicklisp yet at the time of this writing. Currently the pgloader source
|
||||
tree contains a patch to apply against postmodern sources, and the
|
||||
`Makefile` will do the following for you:
|
||||
|
||||
Read https://github.com/marijnh/Postmodern/issues/39 for details.
|
||||
|
||||
#### cl-csv
|
||||
|
||||
The handling of `NULL` values in `CSV` files requires pgloader to have more
|
||||
smarts than the default `cl-csv` code, so the `Makefile` will fetch my
|
||||
branch including a fix for that.
|
||||
|
||||
Read https://github.com/AccelerationNet/cl-csv/pull/12 for details.
|
||||
|
||||
## The pgloader.lisp script
|
||||
|
||||
Now you can use the `#!` script or build a self-contained binary executable
|
||||
file, as shown below. You might have to modify it the `pgloader.lisp` script
|
||||
because it's now hard coded to use `/usr/local/bin/sbcl` and you probably
|
||||
want to change that part then:
|
||||
file, as shown below.
|
||||
|
||||
./pgloader.lisp --help
|
||||
|
||||
@ -49,53 +79,31 @@ them from the internet and prepare them (thanks to *Quicklisp*). So please
|
||||
be patient while that happens and make sure we can actually connect and
|
||||
download the dependencies.
|
||||
|
||||
## Compile into a self-contained binary file
|
||||
## Build Self-Contained binary file
|
||||
|
||||
First, make sure you have downloaded all the required Common Lisp
|
||||
dependencies that pgloader uses, and install the
|
||||
[buildapp](http://www.xach.com/lisp/buildapp/) application:
|
||||
The `Makefile` target `pgloader` knows how to produce a Self Contained
|
||||
Binary file for pgloader, named `pgloader.exe`:
|
||||
|
||||
$ sbcl
|
||||
* (ql:quickload "pgloader")
|
||||
* (ql:quickload "buildapp")
|
||||
* (buildapp:build-buildapp "./buildapp")
|
||||
$ make pgloader
|
||||
|
||||
If you just installed *SBCL* and *Quicklisp* to use pgloader, that command
|
||||
should do it:
|
||||
Note that the `Makefile` uses the `--compress-core` option, that should be
|
||||
enabled in your local copy of `SBCL`. If that's not the case, it's probably
|
||||
because you did compile and install `SBCL` yourself, so that you have a
|
||||
decently recent version to use. Then you need to compile it with the
|
||||
`--with-sb-core-compression` option.
|
||||
|
||||
./buildapp --logfile /tmp/build.log \
|
||||
--asdf-tree ~/quicklisp/dists \
|
||||
--load-system pgloader \
|
||||
--entry pgloader:main \
|
||||
--dynamic-space-size 4096 \
|
||||
--output pgloader.exe
|
||||
You can also remove the `--compress-core` option by editing the `Makefile`
|
||||
and removing the line where it appears.
|
||||
|
||||
You can also use the option `--compress-core` if your platform supports it,
|
||||
so has to reduce the size of the generated binary.
|
||||
|
||||
When you're a Common Lisp developper or otherwise already using Quicklisp
|
||||
with some *local-projects* and a local source registry setup for *asdf*, use
|
||||
a command line like this:
|
||||
|
||||
./buildapp --logfile /tmp/build.log \
|
||||
--asdf-tree ~/quicklisp/local-projects \
|
||||
--manifest-file ./manifest.ql \
|
||||
--asdf-tree ~/quicklisp/dists \
|
||||
--load-system pgloader \
|
||||
--entry pgloader:main \
|
||||
--dynamic-space-size 4096 \
|
||||
--output pgloader.exe
|
||||
|
||||
That command requires a `manifest.ql` file that you can obtain with the lisp
|
||||
command:
|
||||
|
||||
(ql:write-asdf-manifest-file "path/to/manifest.ql")
|
||||
The `make pgloader` command when successful outputs a `./build/pgloader.exe`
|
||||
file for you to use.
|
||||
|
||||
## Usage
|
||||
|
||||
Give as many command files that you need to pgloader:
|
||||
|
||||
./pgloader.lisp <file.load>
|
||||
$ ./build/pgloader.exe --help
|
||||
$ ./build/pgloader.exe <file.load>
|
||||
|
||||
See the documentation file `pgloader.1.md` for details. You can compile that
|
||||
file into a manual page or an HTML page thanks to the `pandoc` application:
|
||||
@ -108,23 +116,15 @@ file into a manual page or an HTML page thanks to the `pandoc` application:
|
||||
|
||||
Some notes about what I intend to be working on next.
|
||||
|
||||
### tests
|
||||
|
||||
- add needed pre-requisites in bootstrap.sh to run the MySQL and SQLite
|
||||
tests from the `make test` target without errors
|
||||
|
||||
### binary distribution
|
||||
|
||||
- prepare an all-included binary for several platforms
|
||||
|
||||
### internals & refactoring
|
||||
|
||||
- review pgloader.pgsql:reformat-row date-columns arguments
|
||||
- review connection string handling for both PostgreSQL and MySQL
|
||||
- provide a better toplevel API
|
||||
- implement tests
|
||||
|
||||
### command & control
|
||||
|
||||
- commands: `LOAD` and `INI` formats
|
||||
- compat with `SQL*Loader` format
|
||||
- see pgloader.1.md for details.
|
||||
|
||||
### docs
|
||||
|
||||
- host a proper website for the tool, with use cases and a tutorial
|
||||
@ -135,18 +135,8 @@ Some notes about what I intend to be working on next.
|
||||
- error reporting (done)
|
||||
- add input line number to log file?
|
||||
|
||||
#### data input
|
||||
|
||||
- import directly from MySQL, file based export/import (done)
|
||||
- import directly from MySQL streaming (done)
|
||||
- general CSV and Flexible Text source formats
|
||||
- fixed cols input data format
|
||||
- compressed input (gzip, other algos)
|
||||
- fetch data from S3
|
||||
|
||||
### transformation and casts
|
||||
|
||||
- experiment with perfs and inlining the transformation functions
|
||||
- add typemod expression to cast rules in the command language
|
||||
- add per-column support for cast rules in the system
|
||||
|
||||
@ -156,7 +146,7 @@ Some notes about what I intend to be working on next.
|
||||
|
||||
#### convenience
|
||||
|
||||
- automatic creation of schema (from MySQL schema, or from CSV header)
|
||||
- automatic creation of schema even when loading from text files
|
||||
- pre-fetch some rows to guesstimate data types?
|
||||
|
||||
#### performances
|
||||
@ -170,11 +160,6 @@ Data reformating is now going to have to happen in Common Lisp mostly, maybe
|
||||
offer some other languages (cl-awk etc).
|
||||
|
||||
- raw reformating, before rows are split
|
||||
- per column reformating
|
||||
- date (zero dates)
|
||||
- integer and "" that should be NULL
|
||||
- user-defined columns (constants, functions of other rows)
|
||||
- column re-ordering
|
||||
|
||||
Have a try at something approaching:
|
||||
|
||||
@ -190,6 +175,11 @@ A part of that needs to happen client-side, another part server-side, and
|
||||
the grammar has to make it clear what happens where. Maybe add a WHERE
|
||||
clause to the `COPY` or `LOAD` grammar for the client.
|
||||
|
||||
#### filtering
|
||||
|
||||
Add commands to pick different target tables depending on the data found
|
||||
when reading from the source.
|
||||
|
||||
#### UI
|
||||
|
||||
- add a web controler with pretty monitoring
|
||||
|
Loading…
Reference in New Issue
Block a user