mirror of
https://github.com/dimitri/pgloader.git
synced 2025-08-08 15:27:00 +02:00
Review the README.
This commit is contained in:
parent
59d8188ad0
commit
c706813465
152
README.md
152
README.md
@ -15,31 +15,61 @@ pgloader also implements data reformating, the main example of that being a
|
|||||||
transformation from MySQL dates `0000-00-00` and `0000-00-00 00:00:00` to
|
transformation from MySQL dates `0000-00-00` and `0000-00-00 00:00:00` to
|
||||||
PostgreSQL `NULL` value (because our calendar never had a *year zero*).
|
PostgreSQL `NULL` value (because our calendar never had a *year zero*).
|
||||||
|
|
||||||
|
## Versioning
|
||||||
|
|
||||||
|
The pgloader version 1.x from a long time ago had been developped in `TCL`.
|
||||||
|
When faced with maintaining that code, the new emerging development team
|
||||||
|
(hi!) picked `python` instead because that made sense at the time. So
|
||||||
|
pgloader version 2.x were in python.
|
||||||
|
|
||||||
|
The current version of pgloader is the 3.x series, which is written in
|
||||||
|
[Common Lisp](http://cliki.net/) for better development flexibility, run
|
||||||
|
time performances, real threading.
|
||||||
|
|
||||||
|
The versioning is now following the Emacs model, where any X.0 release
|
||||||
|
number means you're using a development version (alpha, beta, or release
|
||||||
|
candidate). The next stable versions are going to be `3.1` then `3.2` etc.
|
||||||
|
|
||||||
## INSTALL
|
## INSTALL
|
||||||
|
|
||||||
pgloader is now a Common Lisp program, tested using the
|
pgloader is now a Common Lisp program, tested using the
|
||||||
[SBCL](http://sbcl.org/) and [CCL](http://ccl.clozure.com/) implementation
|
[SBCL](http://sbcl.org/) implementation with
|
||||||
with [Quicklisp](http://www.quicklisp.org/beta/).
|
[Quicklisp](http://www.quicklisp.org/beta/).
|
||||||
|
|
||||||
apt-get install sbcl
|
apt-get install sbcl
|
||||||
apt-get install libmysqlclient-dev libsqlite3-dev
|
apt-get install libmysqlclient-dev libsqlite3-dev
|
||||||
wget http://beta.quicklisp.org/quicklisp.lisp
|
make pgloader
|
||||||
sbcl --load quicklisp.lisp
|
|
||||||
* (quicklisp-quickstart:install)
|
|
||||||
* (ql:add-to-init-file)
|
|
||||||
|
|
||||||
The current version of the code depends on a recent version of Postmodern
|
### Patches
|
||||||
not found in Quicklisp yet at the time of this writing:
|
|
||||||
|
|
||||||
cd ~/quicklisp/local-projects/
|
Several dependencies needed some patching for pgloader to be running fine,
|
||||||
git clone https://github.com/marijnh/Postmodern.git
|
the given `Makefile` will handle that for you. The goal is for those patches
|
||||||
git clone -b empty-strings-and-nil https://github.com/dimitri/cl-csv.git
|
to get included in the mainline version of the dependencies so that this
|
||||||
git clone http://git.tapoueh.org/git/pgloader.git
|
whole section and assorted `Makefile` business disappear for being
|
||||||
|
irrelevant.
|
||||||
|
|
||||||
|
#### Postmodern
|
||||||
|
|
||||||
|
The current version of the code depends on a recent version of
|
||||||
|
[Postmodern](http://marijnhaverbeke.nl/postmodern/postmodern.html) not found
|
||||||
|
in Quicklisp yet at the time of this writing. Currently the pgloader source
|
||||||
|
tree contains a patch to apply against postmodern sources, and the
|
||||||
|
`Makefile` will do the following for you:
|
||||||
|
|
||||||
|
Read https://github.com/marijnh/Postmodern/issues/39 for details.
|
||||||
|
|
||||||
|
#### cl-csv
|
||||||
|
|
||||||
|
The handling of `NULL` values in `CSV` files requires pgloader to have more
|
||||||
|
smarts than the default `cl-csv` code, so the `Makefile` will fetch my
|
||||||
|
branch including a fix for that.
|
||||||
|
|
||||||
|
Read https://github.com/AccelerationNet/cl-csv/pull/12 for details.
|
||||||
|
|
||||||
|
## The pgloader.lisp script
|
||||||
|
|
||||||
Now you can use the `#!` script or build a self-contained binary executable
|
Now you can use the `#!` script or build a self-contained binary executable
|
||||||
file, as shown below. You might have to modify it the `pgloader.lisp` script
|
file, as shown below.
|
||||||
because it's now hard coded to use `/usr/local/bin/sbcl` and you probably
|
|
||||||
want to change that part then:
|
|
||||||
|
|
||||||
./pgloader.lisp --help
|
./pgloader.lisp --help
|
||||||
|
|
||||||
@ -49,53 +79,31 @@ them from the internet and prepare them (thanks to *Quicklisp*). So please
|
|||||||
be patient while that happens and make sure we can actually connect and
|
be patient while that happens and make sure we can actually connect and
|
||||||
download the dependencies.
|
download the dependencies.
|
||||||
|
|
||||||
## Compile into a self-contained binary file
|
## Build Self-Contained binary file
|
||||||
|
|
||||||
First, make sure you have downloaded all the required Common Lisp
|
The `Makefile` target `pgloader` knows how to produce a Self Contained
|
||||||
dependencies that pgloader uses, and install the
|
Binary file for pgloader, named `pgloader.exe`:
|
||||||
[buildapp](http://www.xach.com/lisp/buildapp/) application:
|
|
||||||
|
|
||||||
$ sbcl
|
$ make pgloader
|
||||||
* (ql:quickload "pgloader")
|
|
||||||
* (ql:quickload "buildapp")
|
|
||||||
* (buildapp:build-buildapp "./buildapp")
|
|
||||||
|
|
||||||
If you just installed *SBCL* and *Quicklisp* to use pgloader, that command
|
Note that the `Makefile` uses the `--compress-core` option, that should be
|
||||||
should do it:
|
enabled in your local copy of `SBCL`. If that's not the case, it's probably
|
||||||
|
because you did compile and install `SBCL` yourself, so that you have a
|
||||||
|
decently recent version to use. Then you need to compile it with the
|
||||||
|
`--with-sb-core-compression` option.
|
||||||
|
|
||||||
./buildapp --logfile /tmp/build.log \
|
You can also remove the `--compress-core` option by editing the `Makefile`
|
||||||
--asdf-tree ~/quicklisp/dists \
|
and removing the line where it appears.
|
||||||
--load-system pgloader \
|
|
||||||
--entry pgloader:main \
|
|
||||||
--dynamic-space-size 4096 \
|
|
||||||
--output pgloader.exe
|
|
||||||
|
|
||||||
You can also use the option `--compress-core` if your platform supports it,
|
The `make pgloader` command when successful outputs a `./build/pgloader.exe`
|
||||||
so has to reduce the size of the generated binary.
|
file for you to use.
|
||||||
|
|
||||||
When you're a Common Lisp developper or otherwise already using Quicklisp
|
|
||||||
with some *local-projects* and a local source registry setup for *asdf*, use
|
|
||||||
a command line like this:
|
|
||||||
|
|
||||||
./buildapp --logfile /tmp/build.log \
|
|
||||||
--asdf-tree ~/quicklisp/local-projects \
|
|
||||||
--manifest-file ./manifest.ql \
|
|
||||||
--asdf-tree ~/quicklisp/dists \
|
|
||||||
--load-system pgloader \
|
|
||||||
--entry pgloader:main \
|
|
||||||
--dynamic-space-size 4096 \
|
|
||||||
--output pgloader.exe
|
|
||||||
|
|
||||||
That command requires a `manifest.ql` file that you can obtain with the lisp
|
|
||||||
command:
|
|
||||||
|
|
||||||
(ql:write-asdf-manifest-file "path/to/manifest.ql")
|
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
Give as many command files that you need to pgloader:
|
Give as many command files that you need to pgloader:
|
||||||
|
|
||||||
./pgloader.lisp <file.load>
|
$ ./build/pgloader.exe --help
|
||||||
|
$ ./build/pgloader.exe <file.load>
|
||||||
|
|
||||||
See the documentation file `pgloader.1.md` for details. You can compile that
|
See the documentation file `pgloader.1.md` for details. You can compile that
|
||||||
file into a manual page or an HTML page thanks to the `pandoc` application:
|
file into a manual page or an HTML page thanks to the `pandoc` application:
|
||||||
@ -108,23 +116,15 @@ file into a manual page or an HTML page thanks to the `pandoc` application:
|
|||||||
|
|
||||||
Some notes about what I intend to be working on next.
|
Some notes about what I intend to be working on next.
|
||||||
|
|
||||||
|
### tests
|
||||||
|
|
||||||
|
- add needed pre-requisites in bootstrap.sh to run the MySQL and SQLite
|
||||||
|
tests from the `make test` target without errors
|
||||||
|
|
||||||
### binary distribution
|
### binary distribution
|
||||||
|
|
||||||
- prepare an all-included binary for several platforms
|
- prepare an all-included binary for several platforms
|
||||||
|
|
||||||
### internals & refactoring
|
|
||||||
|
|
||||||
- review pgloader.pgsql:reformat-row date-columns arguments
|
|
||||||
- review connection string handling for both PostgreSQL and MySQL
|
|
||||||
- provide a better toplevel API
|
|
||||||
- implement tests
|
|
||||||
|
|
||||||
### command & control
|
|
||||||
|
|
||||||
- commands: `LOAD` and `INI` formats
|
|
||||||
- compat with `SQL*Loader` format
|
|
||||||
- see pgloader.1.md for details.
|
|
||||||
|
|
||||||
### docs
|
### docs
|
||||||
|
|
||||||
- host a proper website for the tool, with use cases and a tutorial
|
- host a proper website for the tool, with use cases and a tutorial
|
||||||
@ -135,18 +135,8 @@ Some notes about what I intend to be working on next.
|
|||||||
- error reporting (done)
|
- error reporting (done)
|
||||||
- add input line number to log file?
|
- add input line number to log file?
|
||||||
|
|
||||||
#### data input
|
|
||||||
|
|
||||||
- import directly from MySQL, file based export/import (done)
|
|
||||||
- import directly from MySQL streaming (done)
|
|
||||||
- general CSV and Flexible Text source formats
|
|
||||||
- fixed cols input data format
|
|
||||||
- compressed input (gzip, other algos)
|
|
||||||
- fetch data from S3
|
|
||||||
|
|
||||||
### transformation and casts
|
### transformation and casts
|
||||||
|
|
||||||
- experiment with perfs and inlining the transformation functions
|
|
||||||
- add typemod expression to cast rules in the command language
|
- add typemod expression to cast rules in the command language
|
||||||
- add per-column support for cast rules in the system
|
- add per-column support for cast rules in the system
|
||||||
|
|
||||||
@ -156,7 +146,7 @@ Some notes about what I intend to be working on next.
|
|||||||
|
|
||||||
#### convenience
|
#### convenience
|
||||||
|
|
||||||
- automatic creation of schema (from MySQL schema, or from CSV header)
|
- automatic creation of schema even when loading from text files
|
||||||
- pre-fetch some rows to guesstimate data types?
|
- pre-fetch some rows to guesstimate data types?
|
||||||
|
|
||||||
#### performances
|
#### performances
|
||||||
@ -170,11 +160,6 @@ Data reformating is now going to have to happen in Common Lisp mostly, maybe
|
|||||||
offer some other languages (cl-awk etc).
|
offer some other languages (cl-awk etc).
|
||||||
|
|
||||||
- raw reformating, before rows are split
|
- raw reformating, before rows are split
|
||||||
- per column reformating
|
|
||||||
- date (zero dates)
|
|
||||||
- integer and "" that should be NULL
|
|
||||||
- user-defined columns (constants, functions of other rows)
|
|
||||||
- column re-ordering
|
|
||||||
|
|
||||||
Have a try at something approaching:
|
Have a try at something approaching:
|
||||||
|
|
||||||
@ -190,6 +175,11 @@ A part of that needs to happen client-side, another part server-side, and
|
|||||||
the grammar has to make it clear what happens where. Maybe add a WHERE
|
the grammar has to make it clear what happens where. Maybe add a WHERE
|
||||||
clause to the `COPY` or `LOAD` grammar for the client.
|
clause to the `COPY` or `LOAD` grammar for the client.
|
||||||
|
|
||||||
|
#### filtering
|
||||||
|
|
||||||
|
Add commands to pick different target tables depending on the data found
|
||||||
|
when reading from the source.
|
||||||
|
|
||||||
#### UI
|
#### UI
|
||||||
|
|
||||||
- add a web controler with pretty monitoring
|
- add a web controler with pretty monitoring
|
||||||
|
Loading…
Reference in New Issue
Block a user