diff --git a/README.md b/README.md index bcb5a92..bb739df 100644 --- a/README.md +++ b/README.md @@ -15,31 +15,61 @@ pgloader also implements data reformating, the main example of that being a transformation from MySQL dates `0000-00-00` and `0000-00-00 00:00:00` to PostgreSQL `NULL` value (because our calendar never had a *year zero*). +## Versioning + +The pgloader version 1.x from a long time ago had been developped in `TCL`. +When faced with maintaining that code, the new emerging development team +(hi!) picked `python` instead because that made sense at the time. So +pgloader version 2.x were in python. + +The current version of pgloader is the 3.x series, which is written in +[Common Lisp](http://cliki.net/) for better development flexibility, run +time performances, real threading. + +The versioning is now following the Emacs model, where any X.0 release +number means you're using a development version (alpha, beta, or release +candidate). The next stable versions are going to be `3.1` then `3.2` etc. + ## INSTALL pgloader is now a Common Lisp program, tested using the -[SBCL](http://sbcl.org/) and [CCL](http://ccl.clozure.com/) implementation -with [Quicklisp](http://www.quicklisp.org/beta/). +[SBCL](http://sbcl.org/) implementation with +[Quicklisp](http://www.quicklisp.org/beta/). apt-get install sbcl apt-get install libmysqlclient-dev libsqlite3-dev - wget http://beta.quicklisp.org/quicklisp.lisp - sbcl --load quicklisp.lisp - * (quicklisp-quickstart:install) - * (ql:add-to-init-file) + make pgloader -The current version of the code depends on a recent version of Postmodern -not found in Quicklisp yet at the time of this writing: +### Patches - cd ~/quicklisp/local-projects/ - git clone https://github.com/marijnh/Postmodern.git - git clone -b empty-strings-and-nil https://github.com/dimitri/cl-csv.git - git clone http://git.tapoueh.org/git/pgloader.git +Several dependencies needed some patching for pgloader to be running fine, +the given `Makefile` will handle that for you. The goal is for those patches +to get included in the mainline version of the dependencies so that this +whole section and assorted `Makefile` business disappear for being +irrelevant. + +#### Postmodern + +The current version of the code depends on a recent version of +[Postmodern](http://marijnhaverbeke.nl/postmodern/postmodern.html) not found +in Quicklisp yet at the time of this writing. Currently the pgloader source +tree contains a patch to apply against postmodern sources, and the +`Makefile` will do the following for you: + +Read https://github.com/marijnh/Postmodern/issues/39 for details. + +#### cl-csv + +The handling of `NULL` values in `CSV` files requires pgloader to have more +smarts than the default `cl-csv` code, so the `Makefile` will fetch my +branch including a fix for that. + +Read https://github.com/AccelerationNet/cl-csv/pull/12 for details. + +## The pgloader.lisp script Now you can use the `#!` script or build a self-contained binary executable -file, as shown below. You might have to modify it the `pgloader.lisp` script -because it's now hard coded to use `/usr/local/bin/sbcl` and you probably -want to change that part then: +file, as shown below. ./pgloader.lisp --help @@ -49,53 +79,31 @@ them from the internet and prepare them (thanks to *Quicklisp*). So please be patient while that happens and make sure we can actually connect and download the dependencies. -## Compile into a self-contained binary file +## Build Self-Contained binary file -First, make sure you have downloaded all the required Common Lisp -dependencies that pgloader uses, and install the -[buildapp](http://www.xach.com/lisp/buildapp/) application: +The `Makefile` target `pgloader` knows how to produce a Self Contained +Binary file for pgloader, named `pgloader.exe`: - $ sbcl - * (ql:quickload "pgloader") - * (ql:quickload "buildapp") - * (buildapp:build-buildapp "./buildapp") + $ make pgloader -If you just installed *SBCL* and *Quicklisp* to use pgloader, that command -should do it: +Note that the `Makefile` uses the `--compress-core` option, that should be +enabled in your local copy of `SBCL`. If that's not the case, it's probably +because you did compile and install `SBCL` yourself, so that you have a +decently recent version to use. Then you need to compile it with the +`--with-sb-core-compression` option. - ./buildapp --logfile /tmp/build.log \ - --asdf-tree ~/quicklisp/dists \ - --load-system pgloader \ - --entry pgloader:main \ - --dynamic-space-size 4096 \ - --output pgloader.exe +You can also remove the `--compress-core` option by editing the `Makefile` +and removing the line where it appears. -You can also use the option `--compress-core` if your platform supports it, -so has to reduce the size of the generated binary. - -When you're a Common Lisp developper or otherwise already using Quicklisp -with some *local-projects* and a local source registry setup for *asdf*, use -a command line like this: - - ./buildapp --logfile /tmp/build.log \ - --asdf-tree ~/quicklisp/local-projects \ - --manifest-file ./manifest.ql \ - --asdf-tree ~/quicklisp/dists \ - --load-system pgloader \ - --entry pgloader:main \ - --dynamic-space-size 4096 \ - --output pgloader.exe - -That command requires a `manifest.ql` file that you can obtain with the lisp -command: - - (ql:write-asdf-manifest-file "path/to/manifest.ql") +The `make pgloader` command when successful outputs a `./build/pgloader.exe` +file for you to use. ## Usage Give as many command files that you need to pgloader: - ./pgloader.lisp + $ ./build/pgloader.exe --help + $ ./build/pgloader.exe See the documentation file `pgloader.1.md` for details. You can compile that file into a manual page or an HTML page thanks to the `pandoc` application: @@ -108,23 +116,15 @@ file into a manual page or an HTML page thanks to the `pandoc` application: Some notes about what I intend to be working on next. +### tests + + - add needed pre-requisites in bootstrap.sh to run the MySQL and SQLite + tests from the `make test` target without errors + ### binary distribution - prepare an all-included binary for several platforms -### internals & refactoring - - - review pgloader.pgsql:reformat-row date-columns arguments - - review connection string handling for both PostgreSQL and MySQL - - provide a better toplevel API - - implement tests - -### command & control - - - commands: `LOAD` and `INI` formats - - compat with `SQL*Loader` format - - see pgloader.1.md for details. - ### docs - host a proper website for the tool, with use cases and a tutorial @@ -135,18 +135,8 @@ Some notes about what I intend to be working on next. - error reporting (done) - add input line number to log file? -#### data input - - - import directly from MySQL, file based export/import (done) - - import directly from MySQL streaming (done) - - general CSV and Flexible Text source formats - - fixed cols input data format - - compressed input (gzip, other algos) - - fetch data from S3 - ### transformation and casts - - experiment with perfs and inlining the transformation functions - add typemod expression to cast rules in the command language - add per-column support for cast rules in the system @@ -156,7 +146,7 @@ Some notes about what I intend to be working on next. #### convenience - - automatic creation of schema (from MySQL schema, or from CSV header) + - automatic creation of schema even when loading from text files - pre-fetch some rows to guesstimate data types? #### performances @@ -170,11 +160,6 @@ Data reformating is now going to have to happen in Common Lisp mostly, maybe offer some other languages (cl-awk etc). - raw reformating, before rows are split - - per column reformating - - date (zero dates) - - integer and "" that should be NULL - - user-defined columns (constants, functions of other rows) - - column re-ordering Have a try at something approaching: @@ -190,6 +175,11 @@ A part of that needs to happen client-side, another part server-side, and the grammar has to make it clear what happens where. Maybe add a WHERE clause to the `COPY` or `LOAD` grammar for the client. +#### filtering + +Add commands to pick different target tables depending on the data found +when reading from the source. + #### UI - add a web controler with pretty monitoring