Improve pgloader docs (Table of Contents, titles, organisation). (#1440)

Make it easier to nagivate our docs, which are dense enough to warrant proper organisation and guided Table of Contents.
2026-05-05 19:06:11 +02:00 · 2022-10-18 17:28:34 +02:00 · 2022-10-18 17:28:34 +02:00 · 925996000b
commit 925996000b
parent 6d73667685
17 changed files with 567 additions and 562 deletions
--- a/docs/batches.rst
+++ b/docs/batches.rst
@ -0,0 +1,123 @@
+Batch Processing
+================
+
+To load data to PostgreSQL, pgloader uses the `COPY` streaming protocol.
+While this is the faster way to load data, `COPY` has an important drawback:
+as soon as PostgreSQL emits an error with any bit of data sent to it,
+whatever the problem is, the whole data set is rejected by PostgreSQL.
+
+To work around that, pgloader cuts the data into *batches* of 25000 rows
+each, so that when a problem occurs it's only impacting that many rows of
+data. Each batch is kept in memory while the `COPY` streaming happens, in
+order to be able to handle errors should some happen.
+
+When PostgreSQL rejects the whole batch, pgloader logs the error message
+then isolates the bad row(s) from the accepted ones by retrying the batched
+rows in smaller batches. To do that, pgloader parses the *CONTEXT* error
+message from the failed COPY, as the message contains the line number where
+the error was found in the batch, as in the following example::
+
+    CONTEXT: COPY errors, line 3, column b: "2006-13-11"
+
+Using that information, pgloader will reload all rows in the batch before
+the erroneous one, log the erroneous one as rejected, then try loading the
+remaining of the batch in a single attempt, which may or may not contain
+other erroneous data.
+
+At the end of a load containing rejected rows, you will find two files in
+the *root-dir* location, under a directory named the same as the target
+database of your setup. The filenames are the target table, and their
+extensions are `.dat` for the rejected data and `.log` for the file
+containing the full PostgreSQL client side logs about the rejected data.
+
+The `.dat` file is formatted in PostgreSQL the text COPY format as documented
+in `http://www.postgresql.org/docs/9.2/static/sql-copy.html#AEN66609`.
+
+It is possible to use the following WITH options to control pgloader batch
+behavior:
+
+  - *on error stop*, *on error resume next*
+
+    This option controls if pgloader is using building batches of data at
+    all. The batch implementation allows pgloader to recover errors by
+    sending the data that PostgreSQL accepts again, and by keeping away the
+    data that PostgreSQL rejects.
+
+    To enable retrying the data and loading the good parts, use the option
+    *on error resume next*, which is the default to file based data loads
+    (such as CSV, IXF or DBF).
+
+    When migrating from another RDMBS technology, it's best to have a
+    reproducible loading process. In that case it's possible to use *on
+    error stop* and fix either the casting rules, the data transformation
+    functions or in cases the input data until your migration runs through
+    completion. That's why *on error resume next* is the default for SQLite,
+    MySQL and MS SQL source kinds.
+
+A Note About Performance
+------------------------
+
+pgloader has been developed with performance in mind, to be able to cope
+with ever growing needs in loading large amounts of data into PostgreSQL.
+
+The basic architecture it uses is the old Unix pipe model, where a thread is
+responsible for loading the data (reading a CSV file, querying MySQL, etc)
+and fills pre-processed data into a queue. Another threads feeds from the
+queue, apply some more *transformations* to the input data and stream the
+end result to PostgreSQL using the COPY protocol.
+
+When given a file that the PostgreSQL `COPY` command knows how to parse, and
+if the file contains no erroneous data, then pgloader will never be as fast
+as just using the PostgreSQL `COPY` command.
+
+Note that while the `COPY` command is restricted to read either from its
+standard input or from a local file on the server's file system, the command
+line tool `psql` implements a `\copy` command that knows how to stream a
+file local to the client over the network and into the PostgreSQL server,
+using the same protocol as pgloader uses.
+
+A Note About Parallelism
+------------------------
+
+pgloader uses several concurrent tasks to process the data being loaded:
+
+  - a reader task reads the data in and pushes it to a queue,
+  
+  - at last one write task feeds from the queue and formats the raw into the
+    PostgreSQL COPY format in batches (so that it's possible to then retry a
+    failed batch without reading the data from source again), and then sends
+    the data to PostgreSQL using the COPY protocol.
+
+The parameter *workers* allows to control how many worker threads are
+allowed to be active at any time (that's the parallelism level); and the
+parameter *concurrency* allows to control how many tasks are started to
+handle the data (they may not all run at the same time, depending on the
+*workers* setting).
+
+We allow *workers* simultaneous workers to be active at the same time in the
+context of a single table. A single unit of work consist of several kinds of
+workers:
+
+  - a reader getting raw data from the source,
+  - N writers preparing and sending the data down to PostgreSQL.
+
+The N here is setup to the *concurrency* parameter: with a *CONCURRENCY* of
+2, we start (+ 1 2) = 3 concurrent tasks, with a *concurrency* of 4 we start
+(+ 1 4) = 5 concurrent tasks, of which only *workers* may be active
+simultaneously.
+
+The defaults are `workers = 4, concurrency = 1` when loading from a database
+source, and `workers = 8, concurrency = 2` when loading from something else
+(currently, a file). Those defaults are arbitrary and waiting for feedback
+from users, so please consider providing feedback if you play with the
+settings.
+
+As the `CREATE INDEX` threads started by pgloader are only waiting until
+PostgreSQL is done with the real work, those threads are *NOT* counted into
+the concurrency levels as detailed here.
+
+By default, as many `CREATE INDEX` threads as the maximum number of indexes
+per table are found in your source schema. It is possible to set the `max
+parallel create index` *WITH* option to another number in case there's just
+too many of them to create.
+
--- a/docs/command.rst
+++ b/docs/command.rst
@ -0,0 +1,380 @@
+Command Syntax
+==============
+
+pgloader implements a Domain Specific Language allowing to setup complex
+data loading scripts handling computed columns and on-the-fly sanitization
+of the input data. For more complex data loading scenarios, you will be
+required to learn that DSL's syntax. It's meant to look familiar to DBA by
+being inspired by SQL where it makes sense, which is not that much after
+all.
+
+The pgloader commands follow the same global grammar rules. Each of them
+might support only a subset of the general options and provide specific
+options.
+
+::
+
+    LOAD <source-type>
+	     FROM <source-url>
+           [ HAVING FIELDS <source-level-options> ]
+		 INTO <postgresql-url>
+           [ TARGET TABLE [ "<schema>" ]."<table name>" ]
+           [ TARGET COLUMNS <columns-and-options> ]
+
+	[ WITH <load-options> ]
+
+	[ SET <postgresql-settings> ]
+
+    [ BEFORE LOAD [ DO <sql statements> | EXECUTE <sql file> ] ... ]
+    [  AFTER LOAD [ DO <sql statements> | EXECUTE <sql file> ] ... ]
+	;
+
+The main clauses are the `LOAD`, `FROM`, `INTO` and `WITH` clauses that each
+command implements. Some command then implement the `SET` command, or some
+specific clauses such as the `CAST` clause.
+
+.. _common_clauses:
+
+Command Clauses
+---------------
+
+The pgloader command syntax allows composing CLAUSEs together. Some clauses
+are specific to the FROM source-type, most clauses are always available.
+
+FROM
+----
+
+The *FROM* clause specifies where to read the data from, and each command
+introduces its own variant of sources. For instance, the *CSV* source
+supports `inline`, `stdin`, a filename, a quoted filename, and a *FILENAME
+MATCHING* clause (see above); whereas the *MySQL* source only supports a
+MySQL database URI specification.
+
+INTO
+----
+
+The PostgreSQL connection URI must contains the name of the target table
+where to load the data into. That table must have already been created in
+PostgreSQL, and the name might be schema qualified.
+
+Then *INTO* option also supports an optional comma separated list of target
+columns, which are either the name of an input *field* or the white space
+separated list of the target column name, its PostgreSQL data type and a
+*USING* expression.
+
+The *USING* expression can be any valid Common Lisp form and will be read
+with the current package set to `pgloader.transforms`, so that you can use
+functions defined in that package, such as functions loaded dynamically with
+the `--load` command line parameter.
+
+Each *USING* expression is compiled at runtime to native code.
+
+This feature allows pgloader to load any number of fields in a CSV file into
+a possibly different number of columns in the database, using custom code
+for that projection.
+
+WITH
+----
+
+Set of options to apply to the command, using a global syntax of either:
+
+   - *key = value*
+   - *use option*
+   - *do not use option*
+
+See each specific command for details.
+
+All data sources specific commands support the following options:
+
+  - *on error stop*, *on error resume next*
+  - *batch rows = R*
+  - *batch size = ... MB*
+  - *prefetch rows = ...*
+
+See the section BATCH BEHAVIOUR OPTIONS for more details.
+
+In addition, the following settings are available:
+
+   - *workers = W*
+   - *concurrency = C*
+   - *max parallel create index = I*
+
+See section A NOTE ABOUT PARALLELISM for more details.
+
+SET
+---
+
+This clause allows to specify session parameters to be set for all the
+sessions opened by pgloader. It expects a list of parameter name, the equal
+sign, then the single-quoted value as a comma separated list.
+
+The names and values of the parameters are not validated by pgloader, they
+are given as-is to PostgreSQL.
+
+BEFORE LOAD DO
+--------------
+
+You can run SQL queries against the database before loading the data from
+the `CSV` file. Most common SQL queries are `CREATE TABLE IF NOT EXISTS` so
+that the data can be loaded.
+
+Each command must be *dollar-quoted*: it must begin and end with a double
+dollar sign, `$$`. Dollar-quoted queries are then comma separated. No extra
+punctuation is expected after the last SQL query.
+
+BEFORE LOAD EXECUTE
+-------------------
+
+Same behaviour as in the *BEFORE LOAD DO* clause. Allows you to read the SQL
+queries from a SQL file. Implements support for PostgreSQL dollar-quoting
+and the `\i` and `\ir` include facilities as in `psql` batch mode (where
+they are the same thing).
+
+AFTER LOAD DO
+-------------
+
+Same format as *BEFORE LOAD DO*, the dollar-quoted queries found in that
+section are executed once the load is done. That's the right time to create
+indexes and constraints, or re-enable triggers.
+
+AFTER LOAD EXECUTE
+------------------
+
+Same behaviour as in the *AFTER LOAD DO* clause. Allows you to read the SQL
+queries from a SQL file. Implements support for PostgreSQL dollar-quoting
+and the `\i` and `\ir` include facilities as in `psql` batch mode (where
+they are the same thing).
+
+AFTER CREATE SCHEMA DO
+----------------------
+
+Same format as *BEFORE LOAD DO*, the dollar-quoted queries found in that
+section are executed once the schema has been created by pgloader, and
+before the data is loaded. It's the right time to ALTER TABLE or do some
+custom implementation on-top of what pgloader does, like maybe partitioning.
+
+AFTER CREATE SCHEMA EXECUTE
+---------------------------
+
+Same behaviour as in the *AFTER CREATE SCHEMA DO* clause. Allows you to read
+the SQL queries from a SQL file. Implements support for PostgreSQL
+dollar-quoting and the `\i` and `\ir` include facilities as in `psql` batch
+mode (where they are the same thing).
+
+Connection String
+-----------------
+
+The `<postgresql-url>` parameter is expected to be given as a *Connection URI*
+as documented in the PostgreSQL documentation at
+http://www.postgresql.org/docs/9.3/static/libpq-connect.html#LIBPQ-CONNSTRING.
+
+::
+
+    postgresql://[user[:password]@][netloc][:port][/dbname][?option=value&...]
+
+Where:
+
+  - *user*
+
+    Can contain any character, including colon (`:`) which must then be
+    doubled (`::`) and at-sign (`@`) which must then be doubled (`@@`).
+
+    When omitted, the *user* name defaults to the value of the `PGUSER`
+    environment variable, and if it is unset, the value of the `USER`
+    environment variable.
+
+  - *password*
+
+	Can contain any character, including the at sign (`@`) which must then
+	be doubled (`@@`). To leave the password empty, when the *user* name
+	ends with at at sign, you then have to use the syntax user:@.
+
+    When omitted, the *password* defaults to the value of the `PGPASSWORD`
+    environment variable if it is set, otherwise the password is left
+    unset.
+    
+    When no *password* is found either in the connection URI nor in the
+    environment, then pgloader looks for a `.pgpass` file as documented at
+    https://www.postgresql.org/docs/current/static/libpq-pgpass.html. The
+    implementation is not that of `libpq` though. As with `libpq` you can
+    set the environment variable `PGPASSFILE` to point to a `.pgpass` file,
+    and pgloader defaults to `~/.pgpass` on unix like systems and
+    `%APPDATA%\postgresql\pgpass.conf` on windows. Matching rules and syntax
+    are the same as with `libpq`, refer to its documentation.
+
+  - *netloc*
+
+    Can be either a hostname in dotted notation, or an ipv4, or an Unix
+    domain socket path. Empty is the default network location, under a
+    system providing *unix domain socket* that method is preferred, otherwise
+    the *netloc* default to `localhost`.
+
+	It's possible to force the *unix domain socket* path by using the syntax
+	`unix:/path/to/where/the/socket/file/is`, so to force a non default
+	socket path and a non default port, you would have:
+
+	    postgresql://unix:/tmp:54321/dbname
+
+    The *netloc* defaults to the value of the `PGHOST` environment
+    variable, and if it is unset, to either the default `unix` socket path
+    when running on a Unix system, and `localhost` otherwise.
+
+    Socket path containing colons are supported by doubling the colons
+    within the path, as in the following example:
+    
+        postgresql://unix:/tmp/project::region::instance:5432/dbname
+
+  - *dbname*
+
+	Should be a proper identifier (letter followed by a mix of letters,
+	digits and the punctuation signs comma (`,`), dash (`-`) and underscore
+	(`_`).
+
+    When omitted, the *dbname* defaults to the value of the environment
+    variable `PGDATABASE`, and if that is unset, to the *user* value as
+    determined above.
+
+  - *options*
+
+    The optional parameters must be supplied with the form `name=value`, and
+    you may use several parameters by separating them away using an
+    ampersand (`&`) character.
+
+    Only some options are supported here, *tablename* (which might be
+    qualified with a schema name) *sslmode*, *host*, *port*, *dbname*,
+    *user* and *password*.
+
+    The *sslmode* parameter values can be one of `disable`, `allow`,
+    `prefer` or `require`.
+
+    For backward compatibility reasons, it's possible to specify the
+    *tablename* option directly, without spelling out the `tablename=`
+    parts.
+
+    The options override the main URI components when both are given, and
+    using the percent-encoded option parameters allow using passwords
+    starting with a colon and bypassing other URI components parsing
+    limitations.
+
+Regular Expressions
+-------------------
+
+Several clauses listed in the following accept *regular expressions* with
+the following input rules:
+
+  - A regular expression begins with a tilde sign (`~`),
+
+  - is then followed with an opening sign,
+
+  - then any character is allowed and considered part of the regular
+    expression, except for the closing sign,
+
+  - then a closing sign is expected.
+
+The opening and closing sign are allowed by pair, here's the complete list
+of allowed delimiters::
+
+    ~//
+    ~[]
+    ~{}
+    ~()
+    ~<>
+    ~""
+    ~''
+    ~||
+    ~##
+
+Pick the set of delimiters that don't collide with the *regular expression*
+you're trying to input. If your expression is such that none of the
+solutions allow you to enter it, the places where such expressions are
+allowed should allow for a list of expressions.
+
+Comments
+--------
+
+Any command may contain comments, following those input rules:
+
+  - the `--` delimiter begins a comment that ends with the end of the
+    current line,
+
+  - the delimiters `/*` and `*/` respectively start and end a comment, which
+    can be found in the middle of a command or span several lines.
+
+Any place where you could enter a *whitespace* will accept a comment too.
+
+Batch behaviour options
+-----------------------
+
+All pgloader commands have support for a *WITH* clause that allows for
+specifying options. Some options are generic and accepted by all commands,
+such as the *batch behaviour options*, and some options are specific to a
+data source kind, such as the CSV *skip header* option.
+
+The global batch behaviour options are:
+
+  - *batch rows*
+
+    Takes a numeric value as argument, used as the maximum number of rows
+    allowed in a batch. The default is `25 000` and can be changed to try
+    having better performance characteristics or to control pgloader memory
+    usage;
+
+  - *batch size*
+
+    Takes a memory unit as argument, such as *20 MB*, its default value.
+    Accepted multipliers are *kB*, *MB*, *GB*, *TB* and *PB*. The case is
+    important so as not to be confused about bits versus bytes, we're only
+    talking bytes here.
+
+  - *prefetch rows*
+
+    Takes a numeric value as argument, defaults to `100000`. That's the
+    number of rows that pgloader is allowed to read in memory in each reader
+    thread. See the *workers* setting for how many reader threads are
+    allowed to run at the same time.
+
+Other options are specific to each input source, please refer to specific
+parts of the documentation for their listing and covering.
+
+A batch is then closed as soon as either the *batch rows* or the *batch
+size* threshold is crossed, whichever comes first. In cases when a batch has
+to be closed because of the *batch size* setting, a *debug* level log
+message is printed with how many rows did fit in the *oversized* batch.
+
+Templating with Mustache
+------------------------
+
+pgloader implements the https://mustache.github.io/ templating system so
+that you may have dynamic parts of your commands. See the documentation for
+this template system online.
+
+A specific feature of pgloader is the ability to fetch a variable from the
+OS environment of the pgloader process, making it possible to run pgloader
+as in the following example::
+
+    $ DBPATH=sqlite/sqlite.db pgloader ./test/sqlite-env.load
+    
+or in several steps::
+
+    $ export DBPATH=sqlite/sqlite.db
+    $ pgloader ./test/sqlite-env.load
+
+The variable can then be used in a typical mustache fashion::
+
+    load database
+         from '{{DBPATH}}'
+         into postgresql:///pgloader;
+
+It's also possible to prepare a INI file such as the following::
+
+    [pgloader]
+    
+    DBPATH = sqlite/sqlite.db
+
+And run the following command, feeding the INI values as a *context* for
+pgloader templating system::
+
+    $ pgloader --context ./test/sqlite.ini ./test/sqlite-ini.load
+
+The mustache templates implementation with OS environment support replaces
+former `GETENV` implementation, which didn't work anyway.
--- a/docs/index.rst
+++ b/docs/index.rst
@ -6,6 +6,14 @@
 Welcome to pgloader's documentation!
 ====================================

+The `pgloader`__ project is an Open Source Software project. The development
+happens at `https://github.com/dimitri/pgloader`__ and is public: everyone
+is welcome to participate by opening issues, pull requests, giving feedback,
+etc.
+
+__ https://github.com/dimitri/pgloader
+__ https://github.com/dimitri/pgloader
+
 pgloader loads data from various sources into PostgreSQL. It can transform
 the data it reads on the fly and submit raw SQL before and after the
 loading. It uses the `COPY` PostgreSQL protocol to stream the data into the
@ -238,28 +246,47 @@ In order to be able to follow this great methodology, you need tooling to
 implement the third step in a fully automated way. That's pgloader.

 .. toctree::
-   :maxdepth: 2
-   :caption: Table Of Contents:
+   :hidden:
+   :caption: Getting Started

   intro
   quickstart
   tutorial/tutorial
   install
+   bugreport
+
+.. toctree::
+   :hidden:
+   :caption: Reference Manual
+
   pgloader
+   command
+   batches
+   ref/transforms
+
+.. toctree::
+   :hidden:
+   :caption: Manual for file formats
+
   ref/csv
   ref/fixed
   ref/copy
   ref/dbf
   ref/ixf
   ref/archive
+   
+.. toctree::
+   :maxdepth: 2
+   :hidden:
+   :caption: Manual for Database Servers
+
   ref/mysql
   ref/sqlite
   ref/mssql
   ref/pgsql
   ref/pgsql-citus-target
   ref/pgsql-redshift
-   ref/transforms
-   bugreport
+   

 Indices and tables
 ==================
--- a/docs/intro.rst
+++ b/docs/intro.rst
@ -13,7 +13,9 @@ pgloader knows how to read data from different kind of sources:

    * CSV
    * Fixed Format
+    * Postgres COPY text format
    * DBF
+    * IXF

  * Databases

--- a/docs/pgloader.rst
+++ b/docs/pgloader.rst
@ -1,5 +1,5 @@
-PgLoader Reference Manual
-=========================
+Command Line
+============

 pgloader loads data from various sources into PostgreSQL. It can
 transform the data it reads on the fly and submit raw SQL before and
@ -230,535 +230,3 @@ to saying `--client-min-messages data`. Then the log messages will show the
 data being processed, in the cases where the code has explicit support for
 it.

-Batches And Retry Behaviour
---------------------------
-
-To load data to PostgreSQL, pgloader uses the `COPY` streaming protocol.
-While this is the faster way to load data, `COPY` has an important drawback:
-as soon as PostgreSQL emits an error with any bit of data sent to it,
-whatever the problem is, the whole data set is rejected by PostgreSQL.
-
-To work around that, pgloader cuts the data into *batches* of 25000 rows
-each, so that when a problem occurs it's only impacting that many rows of
-data. Each batch is kept in memory while the `COPY` streaming happens, in
-order to be able to handle errors should some happen.
-
-When PostgreSQL rejects the whole batch, pgloader logs the error message
-then isolates the bad row(s) from the accepted ones by retrying the batched
-rows in smaller batches. To do that, pgloader parses the *CONTEXT* error
-message from the failed COPY, as the message contains the line number where
-the error was found in the batch, as in the following example::
-
-    CONTEXT: COPY errors, line 3, column b: "2006-13-11"
-
-Using that information, pgloader will reload all rows in the batch before
-the erroneous one, log the erroneous one as rejected, then try loading the
-remaining of the batch in a single attempt, which may or may not contain
-other erroneous data.
-
-At the end of a load containing rejected rows, you will find two files in
-the *root-dir* location, under a directory named the same as the target
-database of your setup. The filenames are the target table, and their
-extensions are `.dat` for the rejected data and `.log` for the file
-containing the full PostgreSQL client side logs about the rejected data.
-
-The `.dat` file is formatted in PostgreSQL the text COPY format as documented
-in `http://www.postgresql.org/docs/9.2/static/sql-copy.html#AEN66609`.
-
-It is possible to use the following WITH options to control pgloader batch
-behavior:
-
-  - *on error stop*, *on error resume next*
-
-    This option controls if pgloader is using building batches of data at
-    all. The batch implementation allows pgloader to recover errors by
-    sending the data that PostgreSQL accepts again, and by keeping away the
-    data that PostgreSQL rejects.
-
-    To enable retrying the data and loading the good parts, use the option
-    *on error resume next*, which is the default to file based data loads
-    (such as CSV, IXF or DBF).
-
-    When migrating from another RDMBS technology, it's best to have a
-    reproducible loading process. In that case it's possible to use *on
-    error stop* and fix either the casting rules, the data transformation
-    functions or in cases the input data until your migration runs through
-    completion. That's why *on error resume next* is the default for SQLite,
-    MySQL and MS SQL source kinds.
-
-A Note About Performance
------------------------
-
-pgloader has been developed with performance in mind, to be able to cope
-with ever growing needs in loading large amounts of data into PostgreSQL.
-
-The basic architecture it uses is the old Unix pipe model, where a thread is
-responsible for loading the data (reading a CSV file, querying MySQL, etc)
-and fills pre-processed data into a queue. Another threads feeds from the
-queue, apply some more *transformations* to the input data and stream the
-end result to PostgreSQL using the COPY protocol.
-
-When given a file that the PostgreSQL `COPY` command knows how to parse, and
-if the file contains no erroneous data, then pgloader will never be as fast
-as just using the PostgreSQL `COPY` command.
-
-Note that while the `COPY` command is restricted to read either from its
-standard input or from a local file on the server's file system, the command
-line tool `psql` implements a `\copy` command that knows how to stream a
-file local to the client over the network and into the PostgreSQL server,
-using the same protocol as pgloader uses.
-
-A Note About Parallelism
------------------------
-
-pgloader uses several concurrent tasks to process the data being loaded:
-
-  - a reader task reads the data in and pushes it to a queue,
-  
-  - at last one write task feeds from the queue and formats the raw into the
-    PostgreSQL COPY format in batches (so that it's possible to then retry a
-    failed batch without reading the data from source again), and then sends
-    the data to PostgreSQL using the COPY protocol.
-
-The parameter *workers* allows to control how many worker threads are
-allowed to be active at any time (that's the parallelism level); and the
-parameter *concurrency* allows to control how many tasks are started to
-handle the data (they may not all run at the same time, depending on the
-*workers* setting).
-
-We allow *workers* simultaneous workers to be active at the same time in the
-context of a single table. A single unit of work consist of several kinds of
-workers:
-
-  - a reader getting raw data from the source,
-  - N writers preparing and sending the data down to PostgreSQL.
-
-The N here is setup to the *concurrency* parameter: with a *CONCURRENCY* of
-2, we start (+ 1 2) = 3 concurrent tasks, with a *concurrency* of 4 we start
-(+ 1 4) = 5 concurrent tasks, of which only *workers* may be active
-simultaneously.
-
-The defaults are `workers = 4, concurrency = 1` when loading from a database
-source, and `workers = 8, concurrency = 2` when loading from something else
-(currently, a file). Those defaults are arbitrary and waiting for feedback
-from users, so please consider providing feedback if you play with the
-settings.
-
-As the `CREATE INDEX` threads started by pgloader are only waiting until
-PostgreSQL is done with the real work, those threads are *NOT* counted into
-the concurrency levels as detailed here.
-
-By default, as many `CREATE INDEX` threads as the maximum number of indexes
-per table are found in your source schema. It is possible to set the `max
-parallel create index` *WITH* option to another number in case there's just
-too many of them to create.
-
-Source Formats
--------------
-
-pgloader supports the following input formats:
-
-  - csv, which includes also tsv and other common variants where you can
-    change the *separator* and the *quoting* rules and how to *escape* the
-    *quotes* themselves;
-
-  - fixed columns file, where pgloader is flexible enough to accomodate with
-    source files missing columns (*ragged fixed length column files* do
-    exist);
-
-  - PostgreSLQ COPY formatted files, following the COPY TEXT documentation
-    of PostgreSQL, such as the reject files prepared by pgloader;
-
-  - dbase files known as db3 or dbf file;
-
-  - ixf formated files, ixf being a binary storage format from IBM;
-
-  - sqlite databases with fully automated discovery of the schema and
-    advanced cast rules;
-
-  - mysql databases with fully automated discovery of the schema and
-    advanced cast rules;
-
-  - MS SQL databases with fully automated discovery of the schema and
-    advanced cast rules.
-
-Pgloader Commands Syntax
------------------------
-
-pgloader implements a Domain Specific Language allowing to setup complex
-data loading scripts handling computed columns and on-the-fly sanitization
-of the input data. For more complex data loading scenarios, you will be
-required to learn that DSL's syntax. It's meant to look familiar to DBA by
-being inspired by SQL where it makes sense, which is not that much after
-all.
-
-The pgloader commands follow the same global grammar rules. Each of them
-might support only a subset of the general options and provide specific
-options.
-
-::
-
-    LOAD <source-type>
-	     FROM <source-url>
-           [ HAVING FIELDS <source-level-options> ]
-		 INTO <postgresql-url>
-           [ TARGET TABLE [ "<schema>" ]."<table name>" ]
-           [ TARGET COLUMNS <columns-and-options> ]
-
-	[ WITH <load-options> ]
-
-	[ SET <postgresql-settings> ]
-
-    [ BEFORE LOAD [ DO <sql statements> | EXECUTE <sql file> ] ... ]
-    [  AFTER LOAD [ DO <sql statements> | EXECUTE <sql file> ] ... ]
-	;
-
-The main clauses are the `LOAD`, `FROM`, `INTO` and `WITH` clauses that each
-command implements. Some command then implement the `SET` command, or some
-specific clauses such as the `CAST` clause.
-
-Templating with Mustache
------------------------
-
-pgloader implements the https://mustache.github.io/ templating system so
-that you may have dynamic parts of your commands. See the documentation for
-this template system online.
-
-A specific feature of pgloader is the ability to fetch a variable from the
-OS environment of the pgloader process, making it possible to run pgloader
-as in the following example::
-
-    $ DBPATH=sqlite/sqlite.db pgloader ./test/sqlite-env.load
-    
-or in several steps::
-
-    $ export DBPATH=sqlite/sqlite.db
-    $ pgloader ./test/sqlite-env.load
-
-The variable can then be used in a typical mustache fashion::
-
-    load database
-         from '{{DBPATH}}'
-         into postgresql:///pgloader;
-
-It's also possible to prepare a INI file such as the following::
-
-    [pgloader]
-    
-    DBPATH = sqlite/sqlite.db
-
-And run the following command, feeding the INI values as a *context* for
-pgloader templating system::
-
-    $ pgloader --context ./test/sqlite.ini ./test/sqlite-ini.load
-
-The mustache templates implementation with OS environment support replaces
-former `GETENV` implementation, which didn't work anyway.
-
-.. _common_clauses:
-
-Common Clauses
--------------
-
-Some clauses are common to all commands:
-
-FROM
-^^^^
-
-The *FROM* clause specifies where to read the data from, and each command
-introduces its own variant of sources. For instance, the *CSV* source
-supports `inline`, `stdin`, a filename, a quoted filename, and a *FILENAME
-MATCHING* clause (see above); whereas the *MySQL* source only supports a
-MySQL database URI specification.
-
-INTO
-^^^^
-
-The PostgreSQL connection URI must contains the name of the target table
-where to load the data into. That table must have already been created in
-PostgreSQL, and the name might be schema qualified.
-
-Then *INTO* option also supports an optional comma separated list of target
-columns, which are either the name of an input *field* or the white space
-separated list of the target column name, its PostgreSQL data type and a
-*USING* expression.
-
-The *USING* expression can be any valid Common Lisp form and will be read
-with the current package set to `pgloader.transforms`, so that you can use
-functions defined in that package, such as functions loaded dynamically with
-the `--load` command line parameter.
-
-Each *USING* expression is compiled at runtime to native code.
-
-This feature allows pgloader to load any number of fields in a CSV file into
-a possibly different number of columns in the database, using custom code
-for that projection.
-
-WITH
-^^^^
-
-Set of options to apply to the command, using a global syntax of either:
-
-   - *key = value*
-   - *use option*
-   - *do not use option*
-
-See each specific command for details.
-
-All data sources specific commands support the following options:
-
-  - *on error stop*, *on error resume next*
-  - *batch rows = R*
-  - *batch size = ... MB*
-  - *prefetch rows = ...*
-
-See the section BATCH BEHAVIOUR OPTIONS for more details.
-
-In addition, the following settings are available:
-
-   - *workers = W*
-   - *concurrency = C*
-   - *max parallel create index = I*
-
-See section A NOTE ABOUT PARALLELISM for more details.
-
-SET
-^^^
-
-This clause allows to specify session parameters to be set for all the
-sessions opened by pgloader. It expects a list of parameter name, the equal
-sign, then the single-quoted value as a comma separated list.
-
-The names and values of the parameters are not validated by pgloader, they
-are given as-is to PostgreSQL.
-
-BEFORE LOAD DO
-^^^^^^^^^^^^^^
-
-You can run SQL queries against the database before loading the data from
-the `CSV` file. Most common SQL queries are `CREATE TABLE IF NOT EXISTS` so
-that the data can be loaded.
-
-Each command must be *dollar-quoted*: it must begin and end with a double
-dollar sign, `$$`. Dollar-quoted queries are then comma separated. No extra
-punctuation is expected after the last SQL query.
-
-BEFORE LOAD EXECUTE
-^^^^^^^^^^^^^^^^^^^
-
-Same behaviour as in the *BEFORE LOAD DO* clause. Allows you to read the SQL
-queries from a SQL file. Implements support for PostgreSQL dollar-quoting
-and the `\i` and `\ir` include facilities as in `psql` batch mode (where
-they are the same thing).
-
-AFTER LOAD DO
-^^^^^^^^^^^^^
-
-Same format as *BEFORE LOAD DO*, the dollar-quoted queries found in that
-section are executed once the load is done. That's the right time to create
-indexes and constraints, or re-enable triggers.
-
-AFTER LOAD EXECUTE
-^^^^^^^^^^^^^^^^^^
-
-Same behaviour as in the *AFTER LOAD DO* clause. Allows you to read the SQL
-queries from a SQL file. Implements support for PostgreSQL dollar-quoting
-and the `\i` and `\ir` include facilities as in `psql` batch mode (where
-they are the same thing).
-
-AFTER CREATE SCHEMA DO
-^^^^^^^^^^^^^^^^^^^^^^
-
-Same format as *BEFORE LOAD DO*, the dollar-quoted queries found in that
-section are executed once the schema has been created by pgloader, and
-before the data is loaded. It's the right time to ALTER TABLE or do some
-custom implementation on-top of what pgloader does, like maybe partitioning.
-
-AFTER CREATE SCHEMA EXECUTE
-^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Same behaviour as in the *AFTER CREATE SCHEMA DO* clause. Allows you to read
-the SQL queries from a SQL file. Implements support for PostgreSQL
-dollar-quoting and the `\i` and `\ir` include facilities as in `psql` batch
-mode (where they are the same thing).
-
-Connection String
-^^^^^^^^^^^^^^^^^
-
-The `<postgresql-url>` parameter is expected to be given as a *Connection URI*
-as documented in the PostgreSQL documentation at
-http://www.postgresql.org/docs/9.3/static/libpq-connect.html#LIBPQ-CONNSTRING.
-
-::
-
-    postgresql://[user[:password]@][netloc][:port][/dbname][?option=value&...]
-
-Where:
-
-  - *user*
-
-    Can contain any character, including colon (`:`) which must then be
-    doubled (`::`) and at-sign (`@`) which must then be doubled (`@@`).
-
-    When omitted, the *user* name defaults to the value of the `PGUSER`
-    environment variable, and if it is unset, the value of the `USER`
-    environment variable.
-
-  - *password*
-
-	Can contain any character, including the at sign (`@`) which must then
-	be doubled (`@@`). To leave the password empty, when the *user* name
-	ends with at at sign, you then have to use the syntax user:@.
-
-    When omitted, the *password* defaults to the value of the `PGPASSWORD`
-    environment variable if it is set, otherwise the password is left
-    unset.
-    
-    When no *password* is found either in the connection URI nor in the
-    environment, then pgloader looks for a `.pgpass` file as documented at
-    https://www.postgresql.org/docs/current/static/libpq-pgpass.html. The
-    implementation is not that of `libpq` though. As with `libpq` you can
-    set the environment variable `PGPASSFILE` to point to a `.pgpass` file,
-    and pgloader defaults to `~/.pgpass` on unix like systems and
-    `%APPDATA%\postgresql\pgpass.conf` on windows. Matching rules and syntax
-    are the same as with `libpq`, refer to its documentation.
-
-  - *netloc*
-
-    Can be either a hostname in dotted notation, or an ipv4, or an Unix
-    domain socket path. Empty is the default network location, under a
-    system providing *unix domain socket* that method is preferred, otherwise
-    the *netloc* default to `localhost`.
-
-	It's possible to force the *unix domain socket* path by using the syntax
-	`unix:/path/to/where/the/socket/file/is`, so to force a non default
-	socket path and a non default port, you would have:
-
-	    postgresql://unix:/tmp:54321/dbname
-
-    The *netloc* defaults to the value of the `PGHOST` environment
-    variable, and if it is unset, to either the default `unix` socket path
-    when running on a Unix system, and `localhost` otherwise.
-
-    Socket path containing colons are supported by doubling the colons
-    within the path, as in the following example:
-    
-        postgresql://unix:/tmp/project::region::instance:5432/dbname
-
-  - *dbname*
-
-	Should be a proper identifier (letter followed by a mix of letters,
-	digits and the punctuation signs comma (`,`), dash (`-`) and underscore
-	(`_`).
-
-    When omitted, the *dbname* defaults to the value of the environment
-    variable `PGDATABASE`, and if that is unset, to the *user* value as
-    determined above.
-
-  - *options*
-
-    The optional parameters must be supplied with the form `name=value`, and
-    you may use several parameters by separating them away using an
-    ampersand (`&`) character.
-
-    Only some options are supported here, *tablename* (which might be
-    qualified with a schema name) *sslmode*, *host*, *port*, *dbname*,
-    *user* and *password*.
-
-    The *sslmode* parameter values can be one of `disable`, `allow`,
-    `prefer` or `require`.
-
-    For backward compatibility reasons, it's possible to specify the
-    *tablename* option directly, without spelling out the `tablename=`
-    parts.
-
-    The options override the main URI components when both are given, and
-    using the percent-encoded option parameters allow using passwords
-    starting with a colon and bypassing other URI components parsing
-    limitations.
-
-Regular Expressions
-^^^^^^^^^^^^^^^^^^^
-
-Several clauses listed in the following accept *regular expressions* with
-the following input rules:
-
-  - A regular expression begins with a tilde sign (`~`),
-
-  - is then followed with an opening sign,
-
-  - then any character is allowed and considered part of the regular
-    expression, except for the closing sign,
-
-  - then a closing sign is expected.
-
-The opening and closing sign are allowed by pair, here's the complete list
-of allowed delimiters::
-
-    ~//
-    ~[]
-    ~{}
-    ~()
-    ~<>
-    ~""
-    ~''
-    ~||
-    ~##
-
-Pick the set of delimiters that don't collide with the *regular expression*
-you're trying to input. If your expression is such that none of the
-solutions allow you to enter it, the places where such expressions are
-allowed should allow for a list of expressions.
-
-Comments
-^^^^^^^^
-
-Any command may contain comments, following those input rules:
-
-  - the `--` delimiter begins a comment that ends with the end of the
-    current line,
-
-  - the delimiters `/*` and `*/` respectively start and end a comment, which
-    can be found in the middle of a command or span several lines.
-
-Any place where you could enter a *whitespace* will accept a comment too.
-
-Batch behaviour options
-^^^^^^^^^^^^^^^^^^^^^^^
-
-All pgloader commands have support for a *WITH* clause that allows for
-specifying options. Some options are generic and accepted by all commands,
-such as the *batch behaviour options*, and some options are specific to a
-data source kind, such as the CSV *skip header* option.
-
-The global batch behaviour options are:
-
-  - *batch rows*
-
-    Takes a numeric value as argument, used as the maximum number of rows
-    allowed in a batch. The default is `25 000` and can be changed to try
-    having better performance characteristics or to control pgloader memory
-    usage;
-
-  - *batch size*
-
-    Takes a memory unit as argument, such as *20 MB*, its default value.
-    Accepted multipliers are *kB*, *MB*, *GB*, *TB* and *PB*. The case is
-    important so as not to be confused about bits versus bytes, we're only
-    talking bytes here.
-
-  - *prefetch rows*
-
-    Takes a numeric value as argument, defaults to `100000`. That's the
-    number of rows that pgloader is allowed to read in memory in each reader
-    thread. See the *workers* setting for how many reader threads are
-    allowed to run at the same time.
-
-Other options are specific to each input source, please refer to specific
-parts of the documentation for their listing and covering.
-
-A batch is then closed as soon as either the *batch rows* or the *batch
-size* threshold is crossed, whichever comes first. In cases when a batch has
-to be closed because of the *batch size* setting, a *debug* level log
-message is printed with how many rows did fit in the *oversized* batch.
-
--- a/docs/ref/archive.rst
+++ b/docs/ref/archive.rst
@ -1,5 +1,5 @@
-Loading From an Archive
-=======================
+Archive (http, zip)
+===================

 This command instructs pgloader to load data from one or more files contained
 in an archive. Currently the only supported archive format is *ZIP*, and the
--- a/docs/ref/copy.rst
+++ b/docs/ref/copy.rst
@ -1,5 +1,5 @@
-Loading COPY Formatted Files
-============================
+COPY
+====

 This commands instructs pgloader to load from a file containing COPY TEXT
 data as described in the PostgreSQL documentation.
--- a/docs/ref/csv.rst
+++ b/docs/ref/csv.rst
@ -1,5 +1,5 @@
-Loading CSV data
-================
+CSV
+===

 This command instructs pgloader to load data from a `CSV` file. Because of
 the complexity of guessing the parameters of a CSV file, it's simpler to
--- a/docs/ref/dbf.rst
+++ b/docs/ref/dbf.rst
@ -1,5 +1,5 @@
-Loading DBF data
-=================
+DBF
+===

 This command instructs pgloader to load data from a `DBF` file. A default
 set of casting rules are provided and might be overloaded and appended to by
--- a/docs/ref/fixed.rst
+++ b/docs/ref/fixed.rst
@ -1,5 +1,5 @@
-Loading Fixed Cols File Formats
-===============================
+Fixed Columns
+=============

 This command instructs pgloader to load data from a text file containing
 columns arranged in a *fixed size* manner.
--- a/docs/ref/ixf.rst
+++ b/docs/ref/ixf.rst
@ -1,5 +1,5 @@
-Loading IXF Data
-================
+IXF
+===

 This command instructs pgloader to load data from an IBM `IXF` file.

--- a/docs/ref/mssql.rst
+++ b/docs/ref/mssql.rst
@ -1,5 +1,5 @@
-Migrating a MS SQL Database to PostgreSQL
-=========================================
+MS SQL to Postgres
+==================

 This command instructs pgloader to load data from a MS SQL database.
 Automatic discovery of the schema is supported, including build of the
--- a/docs/ref/mysql.rst
+++ b/docs/ref/mysql.rst
@ -1,5 +1,5 @@
-Migrating a MySQL Database to PostgreSQL
-========================================
+MySQL to Postgres
+=================

 This command instructs pgloader to load data from a database connection.
 pgloader supports dynamically converting the schema of the source database
--- a/docs/ref/pgsql-citus-target.rst
+++ b/docs/ref/pgsql-citus-target.rst
@ -1,5 +1,5 @@
-Migrating a PostgreSQL Database to Citus
-========================================
+PostgreSQL to Citus
+===================

 This command instructs pgloader to load data from a database connection.
 Automatic discovery of the schema is supported, including build of the
--- a/docs/ref/pgsql-redshift.rst
+++ b/docs/ref/pgsql-redshift.rst
@ -1,5 +1,5 @@
-Support for Redshift in pgloader
-================================
+Redshift to Postgres
+====================

 The command and behavior are the same as when migration from a PostgreSQL
 database source, see :ref:`migrating_to_pgsql`. pgloader automatically
--- a/docs/ref/pgsql.rst
+++ b/docs/ref/pgsql.rst
@ -1,13 +1,18 @@
 .. _migrating_to_pgsql:

-Migrating a PostgreSQL Database to PostgreSQL
-=============================================
+Postgres to Postgres
+====================

 This command instructs pgloader to load data from a database connection.
 Automatic discovery of the schema is supported, including build of the
 indexes, primary and foreign keys constraints. A default set of casting
 rules are provided and might be overloaded and appended to by the command.

+For a complete Postgres to Postgres solution including Change Data Capture
+support with Logical Decoding, see `pgcopydb`__.
+
+__ https://pgcopydb.readthedocs.io/
+
 Using default settings
 ----------------------

--- a/docs/ref/sqlite.rst
+++ b/docs/ref/sqlite.rst
@ -1,5 +1,5 @@
-Migrating a SQLite database to PostgreSQL
-=========================================
+SQLite to Postgres
+==================

 This command instructs pgloader to load data from a SQLite file. Automatic
 discovery of the schema is supported, including build of the indexes.