mirror of
https://github.com/dimitri/pgloader.git
synced 2026-04-29 08:01:03 +02:00
Improve pgloader docs (Table of Contents, titles, organisation). (#1440)
Make it easier to nagivate our docs, which are dense enough to warrant proper organisation and guided Table of Contents.
This commit is contained in:
parent
6d73667685
commit
925996000b
123
docs/batches.rst
Normal file
123
docs/batches.rst
Normal file
@ -0,0 +1,123 @@
|
||||
Batch Processing
|
||||
================
|
||||
|
||||
To load data to PostgreSQL, pgloader uses the `COPY` streaming protocol.
|
||||
While this is the faster way to load data, `COPY` has an important drawback:
|
||||
as soon as PostgreSQL emits an error with any bit of data sent to it,
|
||||
whatever the problem is, the whole data set is rejected by PostgreSQL.
|
||||
|
||||
To work around that, pgloader cuts the data into *batches* of 25000 rows
|
||||
each, so that when a problem occurs it's only impacting that many rows of
|
||||
data. Each batch is kept in memory while the `COPY` streaming happens, in
|
||||
order to be able to handle errors should some happen.
|
||||
|
||||
When PostgreSQL rejects the whole batch, pgloader logs the error message
|
||||
then isolates the bad row(s) from the accepted ones by retrying the batched
|
||||
rows in smaller batches. To do that, pgloader parses the *CONTEXT* error
|
||||
message from the failed COPY, as the message contains the line number where
|
||||
the error was found in the batch, as in the following example::
|
||||
|
||||
CONTEXT: COPY errors, line 3, column b: "2006-13-11"
|
||||
|
||||
Using that information, pgloader will reload all rows in the batch before
|
||||
the erroneous one, log the erroneous one as rejected, then try loading the
|
||||
remaining of the batch in a single attempt, which may or may not contain
|
||||
other erroneous data.
|
||||
|
||||
At the end of a load containing rejected rows, you will find two files in
|
||||
the *root-dir* location, under a directory named the same as the target
|
||||
database of your setup. The filenames are the target table, and their
|
||||
extensions are `.dat` for the rejected data and `.log` for the file
|
||||
containing the full PostgreSQL client side logs about the rejected data.
|
||||
|
||||
The `.dat` file is formatted in PostgreSQL the text COPY format as documented
|
||||
in `http://www.postgresql.org/docs/9.2/static/sql-copy.html#AEN66609`.
|
||||
|
||||
It is possible to use the following WITH options to control pgloader batch
|
||||
behavior:
|
||||
|
||||
- *on error stop*, *on error resume next*
|
||||
|
||||
This option controls if pgloader is using building batches of data at
|
||||
all. The batch implementation allows pgloader to recover errors by
|
||||
sending the data that PostgreSQL accepts again, and by keeping away the
|
||||
data that PostgreSQL rejects.
|
||||
|
||||
To enable retrying the data and loading the good parts, use the option
|
||||
*on error resume next*, which is the default to file based data loads
|
||||
(such as CSV, IXF or DBF).
|
||||
|
||||
When migrating from another RDMBS technology, it's best to have a
|
||||
reproducible loading process. In that case it's possible to use *on
|
||||
error stop* and fix either the casting rules, the data transformation
|
||||
functions or in cases the input data until your migration runs through
|
||||
completion. That's why *on error resume next* is the default for SQLite,
|
||||
MySQL and MS SQL source kinds.
|
||||
|
||||
A Note About Performance
|
||||
------------------------
|
||||
|
||||
pgloader has been developed with performance in mind, to be able to cope
|
||||
with ever growing needs in loading large amounts of data into PostgreSQL.
|
||||
|
||||
The basic architecture it uses is the old Unix pipe model, where a thread is
|
||||
responsible for loading the data (reading a CSV file, querying MySQL, etc)
|
||||
and fills pre-processed data into a queue. Another threads feeds from the
|
||||
queue, apply some more *transformations* to the input data and stream the
|
||||
end result to PostgreSQL using the COPY protocol.
|
||||
|
||||
When given a file that the PostgreSQL `COPY` command knows how to parse, and
|
||||
if the file contains no erroneous data, then pgloader will never be as fast
|
||||
as just using the PostgreSQL `COPY` command.
|
||||
|
||||
Note that while the `COPY` command is restricted to read either from its
|
||||
standard input or from a local file on the server's file system, the command
|
||||
line tool `psql` implements a `\copy` command that knows how to stream a
|
||||
file local to the client over the network and into the PostgreSQL server,
|
||||
using the same protocol as pgloader uses.
|
||||
|
||||
A Note About Parallelism
|
||||
------------------------
|
||||
|
||||
pgloader uses several concurrent tasks to process the data being loaded:
|
||||
|
||||
- a reader task reads the data in and pushes it to a queue,
|
||||
|
||||
- at last one write task feeds from the queue and formats the raw into the
|
||||
PostgreSQL COPY format in batches (so that it's possible to then retry a
|
||||
failed batch without reading the data from source again), and then sends
|
||||
the data to PostgreSQL using the COPY protocol.
|
||||
|
||||
The parameter *workers* allows to control how many worker threads are
|
||||
allowed to be active at any time (that's the parallelism level); and the
|
||||
parameter *concurrency* allows to control how many tasks are started to
|
||||
handle the data (they may not all run at the same time, depending on the
|
||||
*workers* setting).
|
||||
|
||||
We allow *workers* simultaneous workers to be active at the same time in the
|
||||
context of a single table. A single unit of work consist of several kinds of
|
||||
workers:
|
||||
|
||||
- a reader getting raw data from the source,
|
||||
- N writers preparing and sending the data down to PostgreSQL.
|
||||
|
||||
The N here is setup to the *concurrency* parameter: with a *CONCURRENCY* of
|
||||
2, we start (+ 1 2) = 3 concurrent tasks, with a *concurrency* of 4 we start
|
||||
(+ 1 4) = 5 concurrent tasks, of which only *workers* may be active
|
||||
simultaneously.
|
||||
|
||||
The defaults are `workers = 4, concurrency = 1` when loading from a database
|
||||
source, and `workers = 8, concurrency = 2` when loading from something else
|
||||
(currently, a file). Those defaults are arbitrary and waiting for feedback
|
||||
from users, so please consider providing feedback if you play with the
|
||||
settings.
|
||||
|
||||
As the `CREATE INDEX` threads started by pgloader are only waiting until
|
||||
PostgreSQL is done with the real work, those threads are *NOT* counted into
|
||||
the concurrency levels as detailed here.
|
||||
|
||||
By default, as many `CREATE INDEX` threads as the maximum number of indexes
|
||||
per table are found in your source schema. It is possible to set the `max
|
||||
parallel create index` *WITH* option to another number in case there's just
|
||||
too many of them to create.
|
||||
|
||||
380
docs/command.rst
Normal file
380
docs/command.rst
Normal file
@ -0,0 +1,380 @@
|
||||
Command Syntax
|
||||
==============
|
||||
|
||||
pgloader implements a Domain Specific Language allowing to setup complex
|
||||
data loading scripts handling computed columns and on-the-fly sanitization
|
||||
of the input data. For more complex data loading scenarios, you will be
|
||||
required to learn that DSL's syntax. It's meant to look familiar to DBA by
|
||||
being inspired by SQL where it makes sense, which is not that much after
|
||||
all.
|
||||
|
||||
The pgloader commands follow the same global grammar rules. Each of them
|
||||
might support only a subset of the general options and provide specific
|
||||
options.
|
||||
|
||||
::
|
||||
|
||||
LOAD <source-type>
|
||||
FROM <source-url>
|
||||
[ HAVING FIELDS <source-level-options> ]
|
||||
INTO <postgresql-url>
|
||||
[ TARGET TABLE [ "<schema>" ]."<table name>" ]
|
||||
[ TARGET COLUMNS <columns-and-options> ]
|
||||
|
||||
[ WITH <load-options> ]
|
||||
|
||||
[ SET <postgresql-settings> ]
|
||||
|
||||
[ BEFORE LOAD [ DO <sql statements> | EXECUTE <sql file> ] ... ]
|
||||
[ AFTER LOAD [ DO <sql statements> | EXECUTE <sql file> ] ... ]
|
||||
;
|
||||
|
||||
The main clauses are the `LOAD`, `FROM`, `INTO` and `WITH` clauses that each
|
||||
command implements. Some command then implement the `SET` command, or some
|
||||
specific clauses such as the `CAST` clause.
|
||||
|
||||
.. _common_clauses:
|
||||
|
||||
Command Clauses
|
||||
---------------
|
||||
|
||||
The pgloader command syntax allows composing CLAUSEs together. Some clauses
|
||||
are specific to the FROM source-type, most clauses are always available.
|
||||
|
||||
FROM
|
||||
----
|
||||
|
||||
The *FROM* clause specifies where to read the data from, and each command
|
||||
introduces its own variant of sources. For instance, the *CSV* source
|
||||
supports `inline`, `stdin`, a filename, a quoted filename, and a *FILENAME
|
||||
MATCHING* clause (see above); whereas the *MySQL* source only supports a
|
||||
MySQL database URI specification.
|
||||
|
||||
INTO
|
||||
----
|
||||
|
||||
The PostgreSQL connection URI must contains the name of the target table
|
||||
where to load the data into. That table must have already been created in
|
||||
PostgreSQL, and the name might be schema qualified.
|
||||
|
||||
Then *INTO* option also supports an optional comma separated list of target
|
||||
columns, which are either the name of an input *field* or the white space
|
||||
separated list of the target column name, its PostgreSQL data type and a
|
||||
*USING* expression.
|
||||
|
||||
The *USING* expression can be any valid Common Lisp form and will be read
|
||||
with the current package set to `pgloader.transforms`, so that you can use
|
||||
functions defined in that package, such as functions loaded dynamically with
|
||||
the `--load` command line parameter.
|
||||
|
||||
Each *USING* expression is compiled at runtime to native code.
|
||||
|
||||
This feature allows pgloader to load any number of fields in a CSV file into
|
||||
a possibly different number of columns in the database, using custom code
|
||||
for that projection.
|
||||
|
||||
WITH
|
||||
----
|
||||
|
||||
Set of options to apply to the command, using a global syntax of either:
|
||||
|
||||
- *key = value*
|
||||
- *use option*
|
||||
- *do not use option*
|
||||
|
||||
See each specific command for details.
|
||||
|
||||
All data sources specific commands support the following options:
|
||||
|
||||
- *on error stop*, *on error resume next*
|
||||
- *batch rows = R*
|
||||
- *batch size = ... MB*
|
||||
- *prefetch rows = ...*
|
||||
|
||||
See the section BATCH BEHAVIOUR OPTIONS for more details.
|
||||
|
||||
In addition, the following settings are available:
|
||||
|
||||
- *workers = W*
|
||||
- *concurrency = C*
|
||||
- *max parallel create index = I*
|
||||
|
||||
See section A NOTE ABOUT PARALLELISM for more details.
|
||||
|
||||
SET
|
||||
---
|
||||
|
||||
This clause allows to specify session parameters to be set for all the
|
||||
sessions opened by pgloader. It expects a list of parameter name, the equal
|
||||
sign, then the single-quoted value as a comma separated list.
|
||||
|
||||
The names and values of the parameters are not validated by pgloader, they
|
||||
are given as-is to PostgreSQL.
|
||||
|
||||
BEFORE LOAD DO
|
||||
--------------
|
||||
|
||||
You can run SQL queries against the database before loading the data from
|
||||
the `CSV` file. Most common SQL queries are `CREATE TABLE IF NOT EXISTS` so
|
||||
that the data can be loaded.
|
||||
|
||||
Each command must be *dollar-quoted*: it must begin and end with a double
|
||||
dollar sign, `$$`. Dollar-quoted queries are then comma separated. No extra
|
||||
punctuation is expected after the last SQL query.
|
||||
|
||||
BEFORE LOAD EXECUTE
|
||||
-------------------
|
||||
|
||||
Same behaviour as in the *BEFORE LOAD DO* clause. Allows you to read the SQL
|
||||
queries from a SQL file. Implements support for PostgreSQL dollar-quoting
|
||||
and the `\i` and `\ir` include facilities as in `psql` batch mode (where
|
||||
they are the same thing).
|
||||
|
||||
AFTER LOAD DO
|
||||
-------------
|
||||
|
||||
Same format as *BEFORE LOAD DO*, the dollar-quoted queries found in that
|
||||
section are executed once the load is done. That's the right time to create
|
||||
indexes and constraints, or re-enable triggers.
|
||||
|
||||
AFTER LOAD EXECUTE
|
||||
------------------
|
||||
|
||||
Same behaviour as in the *AFTER LOAD DO* clause. Allows you to read the SQL
|
||||
queries from a SQL file. Implements support for PostgreSQL dollar-quoting
|
||||
and the `\i` and `\ir` include facilities as in `psql` batch mode (where
|
||||
they are the same thing).
|
||||
|
||||
AFTER CREATE SCHEMA DO
|
||||
----------------------
|
||||
|
||||
Same format as *BEFORE LOAD DO*, the dollar-quoted queries found in that
|
||||
section are executed once the schema has been created by pgloader, and
|
||||
before the data is loaded. It's the right time to ALTER TABLE or do some
|
||||
custom implementation on-top of what pgloader does, like maybe partitioning.
|
||||
|
||||
AFTER CREATE SCHEMA EXECUTE
|
||||
---------------------------
|
||||
|
||||
Same behaviour as in the *AFTER CREATE SCHEMA DO* clause. Allows you to read
|
||||
the SQL queries from a SQL file. Implements support for PostgreSQL
|
||||
dollar-quoting and the `\i` and `\ir` include facilities as in `psql` batch
|
||||
mode (where they are the same thing).
|
||||
|
||||
Connection String
|
||||
-----------------
|
||||
|
||||
The `<postgresql-url>` parameter is expected to be given as a *Connection URI*
|
||||
as documented in the PostgreSQL documentation at
|
||||
http://www.postgresql.org/docs/9.3/static/libpq-connect.html#LIBPQ-CONNSTRING.
|
||||
|
||||
::
|
||||
|
||||
postgresql://[user[:password]@][netloc][:port][/dbname][?option=value&...]
|
||||
|
||||
Where:
|
||||
|
||||
- *user*
|
||||
|
||||
Can contain any character, including colon (`:`) which must then be
|
||||
doubled (`::`) and at-sign (`@`) which must then be doubled (`@@`).
|
||||
|
||||
When omitted, the *user* name defaults to the value of the `PGUSER`
|
||||
environment variable, and if it is unset, the value of the `USER`
|
||||
environment variable.
|
||||
|
||||
- *password*
|
||||
|
||||
Can contain any character, including the at sign (`@`) which must then
|
||||
be doubled (`@@`). To leave the password empty, when the *user* name
|
||||
ends with at at sign, you then have to use the syntax user:@.
|
||||
|
||||
When omitted, the *password* defaults to the value of the `PGPASSWORD`
|
||||
environment variable if it is set, otherwise the password is left
|
||||
unset.
|
||||
|
||||
When no *password* is found either in the connection URI nor in the
|
||||
environment, then pgloader looks for a `.pgpass` file as documented at
|
||||
https://www.postgresql.org/docs/current/static/libpq-pgpass.html. The
|
||||
implementation is not that of `libpq` though. As with `libpq` you can
|
||||
set the environment variable `PGPASSFILE` to point to a `.pgpass` file,
|
||||
and pgloader defaults to `~/.pgpass` on unix like systems and
|
||||
`%APPDATA%\postgresql\pgpass.conf` on windows. Matching rules and syntax
|
||||
are the same as with `libpq`, refer to its documentation.
|
||||
|
||||
- *netloc*
|
||||
|
||||
Can be either a hostname in dotted notation, or an ipv4, or an Unix
|
||||
domain socket path. Empty is the default network location, under a
|
||||
system providing *unix domain socket* that method is preferred, otherwise
|
||||
the *netloc* default to `localhost`.
|
||||
|
||||
It's possible to force the *unix domain socket* path by using the syntax
|
||||
`unix:/path/to/where/the/socket/file/is`, so to force a non default
|
||||
socket path and a non default port, you would have:
|
||||
|
||||
postgresql://unix:/tmp:54321/dbname
|
||||
|
||||
The *netloc* defaults to the value of the `PGHOST` environment
|
||||
variable, and if it is unset, to either the default `unix` socket path
|
||||
when running on a Unix system, and `localhost` otherwise.
|
||||
|
||||
Socket path containing colons are supported by doubling the colons
|
||||
within the path, as in the following example:
|
||||
|
||||
postgresql://unix:/tmp/project::region::instance:5432/dbname
|
||||
|
||||
- *dbname*
|
||||
|
||||
Should be a proper identifier (letter followed by a mix of letters,
|
||||
digits and the punctuation signs comma (`,`), dash (`-`) and underscore
|
||||
(`_`).
|
||||
|
||||
When omitted, the *dbname* defaults to the value of the environment
|
||||
variable `PGDATABASE`, and if that is unset, to the *user* value as
|
||||
determined above.
|
||||
|
||||
- *options*
|
||||
|
||||
The optional parameters must be supplied with the form `name=value`, and
|
||||
you may use several parameters by separating them away using an
|
||||
ampersand (`&`) character.
|
||||
|
||||
Only some options are supported here, *tablename* (which might be
|
||||
qualified with a schema name) *sslmode*, *host*, *port*, *dbname*,
|
||||
*user* and *password*.
|
||||
|
||||
The *sslmode* parameter values can be one of `disable`, `allow`,
|
||||
`prefer` or `require`.
|
||||
|
||||
For backward compatibility reasons, it's possible to specify the
|
||||
*tablename* option directly, without spelling out the `tablename=`
|
||||
parts.
|
||||
|
||||
The options override the main URI components when both are given, and
|
||||
using the percent-encoded option parameters allow using passwords
|
||||
starting with a colon and bypassing other URI components parsing
|
||||
limitations.
|
||||
|
||||
Regular Expressions
|
||||
-------------------
|
||||
|
||||
Several clauses listed in the following accept *regular expressions* with
|
||||
the following input rules:
|
||||
|
||||
- A regular expression begins with a tilde sign (`~`),
|
||||
|
||||
- is then followed with an opening sign,
|
||||
|
||||
- then any character is allowed and considered part of the regular
|
||||
expression, except for the closing sign,
|
||||
|
||||
- then a closing sign is expected.
|
||||
|
||||
The opening and closing sign are allowed by pair, here's the complete list
|
||||
of allowed delimiters::
|
||||
|
||||
~//
|
||||
~[]
|
||||
~{}
|
||||
~()
|
||||
~<>
|
||||
~""
|
||||
~''
|
||||
~||
|
||||
~##
|
||||
|
||||
Pick the set of delimiters that don't collide with the *regular expression*
|
||||
you're trying to input. If your expression is such that none of the
|
||||
solutions allow you to enter it, the places where such expressions are
|
||||
allowed should allow for a list of expressions.
|
||||
|
||||
Comments
|
||||
--------
|
||||
|
||||
Any command may contain comments, following those input rules:
|
||||
|
||||
- the `--` delimiter begins a comment that ends with the end of the
|
||||
current line,
|
||||
|
||||
- the delimiters `/*` and `*/` respectively start and end a comment, which
|
||||
can be found in the middle of a command or span several lines.
|
||||
|
||||
Any place where you could enter a *whitespace* will accept a comment too.
|
||||
|
||||
Batch behaviour options
|
||||
-----------------------
|
||||
|
||||
All pgloader commands have support for a *WITH* clause that allows for
|
||||
specifying options. Some options are generic and accepted by all commands,
|
||||
such as the *batch behaviour options*, and some options are specific to a
|
||||
data source kind, such as the CSV *skip header* option.
|
||||
|
||||
The global batch behaviour options are:
|
||||
|
||||
- *batch rows*
|
||||
|
||||
Takes a numeric value as argument, used as the maximum number of rows
|
||||
allowed in a batch. The default is `25 000` and can be changed to try
|
||||
having better performance characteristics or to control pgloader memory
|
||||
usage;
|
||||
|
||||
- *batch size*
|
||||
|
||||
Takes a memory unit as argument, such as *20 MB*, its default value.
|
||||
Accepted multipliers are *kB*, *MB*, *GB*, *TB* and *PB*. The case is
|
||||
important so as not to be confused about bits versus bytes, we're only
|
||||
talking bytes here.
|
||||
|
||||
- *prefetch rows*
|
||||
|
||||
Takes a numeric value as argument, defaults to `100000`. That's the
|
||||
number of rows that pgloader is allowed to read in memory in each reader
|
||||
thread. See the *workers* setting for how many reader threads are
|
||||
allowed to run at the same time.
|
||||
|
||||
Other options are specific to each input source, please refer to specific
|
||||
parts of the documentation for their listing and covering.
|
||||
|
||||
A batch is then closed as soon as either the *batch rows* or the *batch
|
||||
size* threshold is crossed, whichever comes first. In cases when a batch has
|
||||
to be closed because of the *batch size* setting, a *debug* level log
|
||||
message is printed with how many rows did fit in the *oversized* batch.
|
||||
|
||||
Templating with Mustache
|
||||
------------------------
|
||||
|
||||
pgloader implements the https://mustache.github.io/ templating system so
|
||||
that you may have dynamic parts of your commands. See the documentation for
|
||||
this template system online.
|
||||
|
||||
A specific feature of pgloader is the ability to fetch a variable from the
|
||||
OS environment of the pgloader process, making it possible to run pgloader
|
||||
as in the following example::
|
||||
|
||||
$ DBPATH=sqlite/sqlite.db pgloader ./test/sqlite-env.load
|
||||
|
||||
or in several steps::
|
||||
|
||||
$ export DBPATH=sqlite/sqlite.db
|
||||
$ pgloader ./test/sqlite-env.load
|
||||
|
||||
The variable can then be used in a typical mustache fashion::
|
||||
|
||||
load database
|
||||
from '{{DBPATH}}'
|
||||
into postgresql:///pgloader;
|
||||
|
||||
It's also possible to prepare a INI file such as the following::
|
||||
|
||||
[pgloader]
|
||||
|
||||
DBPATH = sqlite/sqlite.db
|
||||
|
||||
And run the following command, feeding the INI values as a *context* for
|
||||
pgloader templating system::
|
||||
|
||||
$ pgloader --context ./test/sqlite.ini ./test/sqlite-ini.load
|
||||
|
||||
The mustache templates implementation with OS environment support replaces
|
||||
former `GETENV` implementation, which didn't work anyway.
|
||||
@ -6,6 +6,14 @@
|
||||
Welcome to pgloader's documentation!
|
||||
====================================
|
||||
|
||||
The `pgloader`__ project is an Open Source Software project. The development
|
||||
happens at `https://github.com/dimitri/pgloader`__ and is public: everyone
|
||||
is welcome to participate by opening issues, pull requests, giving feedback,
|
||||
etc.
|
||||
|
||||
__ https://github.com/dimitri/pgloader
|
||||
__ https://github.com/dimitri/pgloader
|
||||
|
||||
pgloader loads data from various sources into PostgreSQL. It can transform
|
||||
the data it reads on the fly and submit raw SQL before and after the
|
||||
loading. It uses the `COPY` PostgreSQL protocol to stream the data into the
|
||||
@ -238,28 +246,47 @@ In order to be able to follow this great methodology, you need tooling to
|
||||
implement the third step in a fully automated way. That's pgloader.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:caption: Table Of Contents:
|
||||
:hidden:
|
||||
:caption: Getting Started
|
||||
|
||||
intro
|
||||
quickstart
|
||||
tutorial/tutorial
|
||||
install
|
||||
bugreport
|
||||
|
||||
.. toctree::
|
||||
:hidden:
|
||||
:caption: Reference Manual
|
||||
|
||||
pgloader
|
||||
command
|
||||
batches
|
||||
ref/transforms
|
||||
|
||||
.. toctree::
|
||||
:hidden:
|
||||
:caption: Manual for file formats
|
||||
|
||||
ref/csv
|
||||
ref/fixed
|
||||
ref/copy
|
||||
ref/dbf
|
||||
ref/ixf
|
||||
ref/archive
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:hidden:
|
||||
:caption: Manual for Database Servers
|
||||
|
||||
ref/mysql
|
||||
ref/sqlite
|
||||
ref/mssql
|
||||
ref/pgsql
|
||||
ref/pgsql-citus-target
|
||||
ref/pgsql-redshift
|
||||
ref/transforms
|
||||
bugreport
|
||||
|
||||
|
||||
Indices and tables
|
||||
==================
|
||||
|
||||
@ -13,7 +13,9 @@ pgloader knows how to read data from different kind of sources:
|
||||
|
||||
* CSV
|
||||
* Fixed Format
|
||||
* Postgres COPY text format
|
||||
* DBF
|
||||
* IXF
|
||||
|
||||
* Databases
|
||||
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
PgLoader Reference Manual
|
||||
=========================
|
||||
Command Line
|
||||
============
|
||||
|
||||
pgloader loads data from various sources into PostgreSQL. It can
|
||||
transform the data it reads on the fly and submit raw SQL before and
|
||||
@ -230,535 +230,3 @@ to saying `--client-min-messages data`. Then the log messages will show the
|
||||
data being processed, in the cases where the code has explicit support for
|
||||
it.
|
||||
|
||||
Batches And Retry Behaviour
|
||||
---------------------------
|
||||
|
||||
To load data to PostgreSQL, pgloader uses the `COPY` streaming protocol.
|
||||
While this is the faster way to load data, `COPY` has an important drawback:
|
||||
as soon as PostgreSQL emits an error with any bit of data sent to it,
|
||||
whatever the problem is, the whole data set is rejected by PostgreSQL.
|
||||
|
||||
To work around that, pgloader cuts the data into *batches* of 25000 rows
|
||||
each, so that when a problem occurs it's only impacting that many rows of
|
||||
data. Each batch is kept in memory while the `COPY` streaming happens, in
|
||||
order to be able to handle errors should some happen.
|
||||
|
||||
When PostgreSQL rejects the whole batch, pgloader logs the error message
|
||||
then isolates the bad row(s) from the accepted ones by retrying the batched
|
||||
rows in smaller batches. To do that, pgloader parses the *CONTEXT* error
|
||||
message from the failed COPY, as the message contains the line number where
|
||||
the error was found in the batch, as in the following example::
|
||||
|
||||
CONTEXT: COPY errors, line 3, column b: "2006-13-11"
|
||||
|
||||
Using that information, pgloader will reload all rows in the batch before
|
||||
the erroneous one, log the erroneous one as rejected, then try loading the
|
||||
remaining of the batch in a single attempt, which may or may not contain
|
||||
other erroneous data.
|
||||
|
||||
At the end of a load containing rejected rows, you will find two files in
|
||||
the *root-dir* location, under a directory named the same as the target
|
||||
database of your setup. The filenames are the target table, and their
|
||||
extensions are `.dat` for the rejected data and `.log` for the file
|
||||
containing the full PostgreSQL client side logs about the rejected data.
|
||||
|
||||
The `.dat` file is formatted in PostgreSQL the text COPY format as documented
|
||||
in `http://www.postgresql.org/docs/9.2/static/sql-copy.html#AEN66609`.
|
||||
|
||||
It is possible to use the following WITH options to control pgloader batch
|
||||
behavior:
|
||||
|
||||
- *on error stop*, *on error resume next*
|
||||
|
||||
This option controls if pgloader is using building batches of data at
|
||||
all. The batch implementation allows pgloader to recover errors by
|
||||
sending the data that PostgreSQL accepts again, and by keeping away the
|
||||
data that PostgreSQL rejects.
|
||||
|
||||
To enable retrying the data and loading the good parts, use the option
|
||||
*on error resume next*, which is the default to file based data loads
|
||||
(such as CSV, IXF or DBF).
|
||||
|
||||
When migrating from another RDMBS technology, it's best to have a
|
||||
reproducible loading process. In that case it's possible to use *on
|
||||
error stop* and fix either the casting rules, the data transformation
|
||||
functions or in cases the input data until your migration runs through
|
||||
completion. That's why *on error resume next* is the default for SQLite,
|
||||
MySQL and MS SQL source kinds.
|
||||
|
||||
A Note About Performance
|
||||
------------------------
|
||||
|
||||
pgloader has been developed with performance in mind, to be able to cope
|
||||
with ever growing needs in loading large amounts of data into PostgreSQL.
|
||||
|
||||
The basic architecture it uses is the old Unix pipe model, where a thread is
|
||||
responsible for loading the data (reading a CSV file, querying MySQL, etc)
|
||||
and fills pre-processed data into a queue. Another threads feeds from the
|
||||
queue, apply some more *transformations* to the input data and stream the
|
||||
end result to PostgreSQL using the COPY protocol.
|
||||
|
||||
When given a file that the PostgreSQL `COPY` command knows how to parse, and
|
||||
if the file contains no erroneous data, then pgloader will never be as fast
|
||||
as just using the PostgreSQL `COPY` command.
|
||||
|
||||
Note that while the `COPY` command is restricted to read either from its
|
||||
standard input or from a local file on the server's file system, the command
|
||||
line tool `psql` implements a `\copy` command that knows how to stream a
|
||||
file local to the client over the network and into the PostgreSQL server,
|
||||
using the same protocol as pgloader uses.
|
||||
|
||||
A Note About Parallelism
|
||||
------------------------
|
||||
|
||||
pgloader uses several concurrent tasks to process the data being loaded:
|
||||
|
||||
- a reader task reads the data in and pushes it to a queue,
|
||||
|
||||
- at last one write task feeds from the queue and formats the raw into the
|
||||
PostgreSQL COPY format in batches (so that it's possible to then retry a
|
||||
failed batch without reading the data from source again), and then sends
|
||||
the data to PostgreSQL using the COPY protocol.
|
||||
|
||||
The parameter *workers* allows to control how many worker threads are
|
||||
allowed to be active at any time (that's the parallelism level); and the
|
||||
parameter *concurrency* allows to control how many tasks are started to
|
||||
handle the data (they may not all run at the same time, depending on the
|
||||
*workers* setting).
|
||||
|
||||
We allow *workers* simultaneous workers to be active at the same time in the
|
||||
context of a single table. A single unit of work consist of several kinds of
|
||||
workers:
|
||||
|
||||
- a reader getting raw data from the source,
|
||||
- N writers preparing and sending the data down to PostgreSQL.
|
||||
|
||||
The N here is setup to the *concurrency* parameter: with a *CONCURRENCY* of
|
||||
2, we start (+ 1 2) = 3 concurrent tasks, with a *concurrency* of 4 we start
|
||||
(+ 1 4) = 5 concurrent tasks, of which only *workers* may be active
|
||||
simultaneously.
|
||||
|
||||
The defaults are `workers = 4, concurrency = 1` when loading from a database
|
||||
source, and `workers = 8, concurrency = 2` when loading from something else
|
||||
(currently, a file). Those defaults are arbitrary and waiting for feedback
|
||||
from users, so please consider providing feedback if you play with the
|
||||
settings.
|
||||
|
||||
As the `CREATE INDEX` threads started by pgloader are only waiting until
|
||||
PostgreSQL is done with the real work, those threads are *NOT* counted into
|
||||
the concurrency levels as detailed here.
|
||||
|
||||
By default, as many `CREATE INDEX` threads as the maximum number of indexes
|
||||
per table are found in your source schema. It is possible to set the `max
|
||||
parallel create index` *WITH* option to another number in case there's just
|
||||
too many of them to create.
|
||||
|
||||
Source Formats
|
||||
--------------
|
||||
|
||||
pgloader supports the following input formats:
|
||||
|
||||
- csv, which includes also tsv and other common variants where you can
|
||||
change the *separator* and the *quoting* rules and how to *escape* the
|
||||
*quotes* themselves;
|
||||
|
||||
- fixed columns file, where pgloader is flexible enough to accomodate with
|
||||
source files missing columns (*ragged fixed length column files* do
|
||||
exist);
|
||||
|
||||
- PostgreSLQ COPY formatted files, following the COPY TEXT documentation
|
||||
of PostgreSQL, such as the reject files prepared by pgloader;
|
||||
|
||||
- dbase files known as db3 or dbf file;
|
||||
|
||||
- ixf formated files, ixf being a binary storage format from IBM;
|
||||
|
||||
- sqlite databases with fully automated discovery of the schema and
|
||||
advanced cast rules;
|
||||
|
||||
- mysql databases with fully automated discovery of the schema and
|
||||
advanced cast rules;
|
||||
|
||||
- MS SQL databases with fully automated discovery of the schema and
|
||||
advanced cast rules.
|
||||
|
||||
Pgloader Commands Syntax
|
||||
------------------------
|
||||
|
||||
pgloader implements a Domain Specific Language allowing to setup complex
|
||||
data loading scripts handling computed columns and on-the-fly sanitization
|
||||
of the input data. For more complex data loading scenarios, you will be
|
||||
required to learn that DSL's syntax. It's meant to look familiar to DBA by
|
||||
being inspired by SQL where it makes sense, which is not that much after
|
||||
all.
|
||||
|
||||
The pgloader commands follow the same global grammar rules. Each of them
|
||||
might support only a subset of the general options and provide specific
|
||||
options.
|
||||
|
||||
::
|
||||
|
||||
LOAD <source-type>
|
||||
FROM <source-url>
|
||||
[ HAVING FIELDS <source-level-options> ]
|
||||
INTO <postgresql-url>
|
||||
[ TARGET TABLE [ "<schema>" ]."<table name>" ]
|
||||
[ TARGET COLUMNS <columns-and-options> ]
|
||||
|
||||
[ WITH <load-options> ]
|
||||
|
||||
[ SET <postgresql-settings> ]
|
||||
|
||||
[ BEFORE LOAD [ DO <sql statements> | EXECUTE <sql file> ] ... ]
|
||||
[ AFTER LOAD [ DO <sql statements> | EXECUTE <sql file> ] ... ]
|
||||
;
|
||||
|
||||
The main clauses are the `LOAD`, `FROM`, `INTO` and `WITH` clauses that each
|
||||
command implements. Some command then implement the `SET` command, or some
|
||||
specific clauses such as the `CAST` clause.
|
||||
|
||||
Templating with Mustache
|
||||
------------------------
|
||||
|
||||
pgloader implements the https://mustache.github.io/ templating system so
|
||||
that you may have dynamic parts of your commands. See the documentation for
|
||||
this template system online.
|
||||
|
||||
A specific feature of pgloader is the ability to fetch a variable from the
|
||||
OS environment of the pgloader process, making it possible to run pgloader
|
||||
as in the following example::
|
||||
|
||||
$ DBPATH=sqlite/sqlite.db pgloader ./test/sqlite-env.load
|
||||
|
||||
or in several steps::
|
||||
|
||||
$ export DBPATH=sqlite/sqlite.db
|
||||
$ pgloader ./test/sqlite-env.load
|
||||
|
||||
The variable can then be used in a typical mustache fashion::
|
||||
|
||||
load database
|
||||
from '{{DBPATH}}'
|
||||
into postgresql:///pgloader;
|
||||
|
||||
It's also possible to prepare a INI file such as the following::
|
||||
|
||||
[pgloader]
|
||||
|
||||
DBPATH = sqlite/sqlite.db
|
||||
|
||||
And run the following command, feeding the INI values as a *context* for
|
||||
pgloader templating system::
|
||||
|
||||
$ pgloader --context ./test/sqlite.ini ./test/sqlite-ini.load
|
||||
|
||||
The mustache templates implementation with OS environment support replaces
|
||||
former `GETENV` implementation, which didn't work anyway.
|
||||
|
||||
.. _common_clauses:
|
||||
|
||||
Common Clauses
|
||||
--------------
|
||||
|
||||
Some clauses are common to all commands:
|
||||
|
||||
FROM
|
||||
^^^^
|
||||
|
||||
The *FROM* clause specifies where to read the data from, and each command
|
||||
introduces its own variant of sources. For instance, the *CSV* source
|
||||
supports `inline`, `stdin`, a filename, a quoted filename, and a *FILENAME
|
||||
MATCHING* clause (see above); whereas the *MySQL* source only supports a
|
||||
MySQL database URI specification.
|
||||
|
||||
INTO
|
||||
^^^^
|
||||
|
||||
The PostgreSQL connection URI must contains the name of the target table
|
||||
where to load the data into. That table must have already been created in
|
||||
PostgreSQL, and the name might be schema qualified.
|
||||
|
||||
Then *INTO* option also supports an optional comma separated list of target
|
||||
columns, which are either the name of an input *field* or the white space
|
||||
separated list of the target column name, its PostgreSQL data type and a
|
||||
*USING* expression.
|
||||
|
||||
The *USING* expression can be any valid Common Lisp form and will be read
|
||||
with the current package set to `pgloader.transforms`, so that you can use
|
||||
functions defined in that package, such as functions loaded dynamically with
|
||||
the `--load` command line parameter.
|
||||
|
||||
Each *USING* expression is compiled at runtime to native code.
|
||||
|
||||
This feature allows pgloader to load any number of fields in a CSV file into
|
||||
a possibly different number of columns in the database, using custom code
|
||||
for that projection.
|
||||
|
||||
WITH
|
||||
^^^^
|
||||
|
||||
Set of options to apply to the command, using a global syntax of either:
|
||||
|
||||
- *key = value*
|
||||
- *use option*
|
||||
- *do not use option*
|
||||
|
||||
See each specific command for details.
|
||||
|
||||
All data sources specific commands support the following options:
|
||||
|
||||
- *on error stop*, *on error resume next*
|
||||
- *batch rows = R*
|
||||
- *batch size = ... MB*
|
||||
- *prefetch rows = ...*
|
||||
|
||||
See the section BATCH BEHAVIOUR OPTIONS for more details.
|
||||
|
||||
In addition, the following settings are available:
|
||||
|
||||
- *workers = W*
|
||||
- *concurrency = C*
|
||||
- *max parallel create index = I*
|
||||
|
||||
See section A NOTE ABOUT PARALLELISM for more details.
|
||||
|
||||
SET
|
||||
^^^
|
||||
|
||||
This clause allows to specify session parameters to be set for all the
|
||||
sessions opened by pgloader. It expects a list of parameter name, the equal
|
||||
sign, then the single-quoted value as a comma separated list.
|
||||
|
||||
The names and values of the parameters are not validated by pgloader, they
|
||||
are given as-is to PostgreSQL.
|
||||
|
||||
BEFORE LOAD DO
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
You can run SQL queries against the database before loading the data from
|
||||
the `CSV` file. Most common SQL queries are `CREATE TABLE IF NOT EXISTS` so
|
||||
that the data can be loaded.
|
||||
|
||||
Each command must be *dollar-quoted*: it must begin and end with a double
|
||||
dollar sign, `$$`. Dollar-quoted queries are then comma separated. No extra
|
||||
punctuation is expected after the last SQL query.
|
||||
|
||||
BEFORE LOAD EXECUTE
|
||||
^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Same behaviour as in the *BEFORE LOAD DO* clause. Allows you to read the SQL
|
||||
queries from a SQL file. Implements support for PostgreSQL dollar-quoting
|
||||
and the `\i` and `\ir` include facilities as in `psql` batch mode (where
|
||||
they are the same thing).
|
||||
|
||||
AFTER LOAD DO
|
||||
^^^^^^^^^^^^^
|
||||
|
||||
Same format as *BEFORE LOAD DO*, the dollar-quoted queries found in that
|
||||
section are executed once the load is done. That's the right time to create
|
||||
indexes and constraints, or re-enable triggers.
|
||||
|
||||
AFTER LOAD EXECUTE
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Same behaviour as in the *AFTER LOAD DO* clause. Allows you to read the SQL
|
||||
queries from a SQL file. Implements support for PostgreSQL dollar-quoting
|
||||
and the `\i` and `\ir` include facilities as in `psql` batch mode (where
|
||||
they are the same thing).
|
||||
|
||||
AFTER CREATE SCHEMA DO
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Same format as *BEFORE LOAD DO*, the dollar-quoted queries found in that
|
||||
section are executed once the schema has been created by pgloader, and
|
||||
before the data is loaded. It's the right time to ALTER TABLE or do some
|
||||
custom implementation on-top of what pgloader does, like maybe partitioning.
|
||||
|
||||
AFTER CREATE SCHEMA EXECUTE
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Same behaviour as in the *AFTER CREATE SCHEMA DO* clause. Allows you to read
|
||||
the SQL queries from a SQL file. Implements support for PostgreSQL
|
||||
dollar-quoting and the `\i` and `\ir` include facilities as in `psql` batch
|
||||
mode (where they are the same thing).
|
||||
|
||||
Connection String
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
The `<postgresql-url>` parameter is expected to be given as a *Connection URI*
|
||||
as documented in the PostgreSQL documentation at
|
||||
http://www.postgresql.org/docs/9.3/static/libpq-connect.html#LIBPQ-CONNSTRING.
|
||||
|
||||
::
|
||||
|
||||
postgresql://[user[:password]@][netloc][:port][/dbname][?option=value&...]
|
||||
|
||||
Where:
|
||||
|
||||
- *user*
|
||||
|
||||
Can contain any character, including colon (`:`) which must then be
|
||||
doubled (`::`) and at-sign (`@`) which must then be doubled (`@@`).
|
||||
|
||||
When omitted, the *user* name defaults to the value of the `PGUSER`
|
||||
environment variable, and if it is unset, the value of the `USER`
|
||||
environment variable.
|
||||
|
||||
- *password*
|
||||
|
||||
Can contain any character, including the at sign (`@`) which must then
|
||||
be doubled (`@@`). To leave the password empty, when the *user* name
|
||||
ends with at at sign, you then have to use the syntax user:@.
|
||||
|
||||
When omitted, the *password* defaults to the value of the `PGPASSWORD`
|
||||
environment variable if it is set, otherwise the password is left
|
||||
unset.
|
||||
|
||||
When no *password* is found either in the connection URI nor in the
|
||||
environment, then pgloader looks for a `.pgpass` file as documented at
|
||||
https://www.postgresql.org/docs/current/static/libpq-pgpass.html. The
|
||||
implementation is not that of `libpq` though. As with `libpq` you can
|
||||
set the environment variable `PGPASSFILE` to point to a `.pgpass` file,
|
||||
and pgloader defaults to `~/.pgpass` on unix like systems and
|
||||
`%APPDATA%\postgresql\pgpass.conf` on windows. Matching rules and syntax
|
||||
are the same as with `libpq`, refer to its documentation.
|
||||
|
||||
- *netloc*
|
||||
|
||||
Can be either a hostname in dotted notation, or an ipv4, or an Unix
|
||||
domain socket path. Empty is the default network location, under a
|
||||
system providing *unix domain socket* that method is preferred, otherwise
|
||||
the *netloc* default to `localhost`.
|
||||
|
||||
It's possible to force the *unix domain socket* path by using the syntax
|
||||
`unix:/path/to/where/the/socket/file/is`, so to force a non default
|
||||
socket path and a non default port, you would have:
|
||||
|
||||
postgresql://unix:/tmp:54321/dbname
|
||||
|
||||
The *netloc* defaults to the value of the `PGHOST` environment
|
||||
variable, and if it is unset, to either the default `unix` socket path
|
||||
when running on a Unix system, and `localhost` otherwise.
|
||||
|
||||
Socket path containing colons are supported by doubling the colons
|
||||
within the path, as in the following example:
|
||||
|
||||
postgresql://unix:/tmp/project::region::instance:5432/dbname
|
||||
|
||||
- *dbname*
|
||||
|
||||
Should be a proper identifier (letter followed by a mix of letters,
|
||||
digits and the punctuation signs comma (`,`), dash (`-`) and underscore
|
||||
(`_`).
|
||||
|
||||
When omitted, the *dbname* defaults to the value of the environment
|
||||
variable `PGDATABASE`, and if that is unset, to the *user* value as
|
||||
determined above.
|
||||
|
||||
- *options*
|
||||
|
||||
The optional parameters must be supplied with the form `name=value`, and
|
||||
you may use several parameters by separating them away using an
|
||||
ampersand (`&`) character.
|
||||
|
||||
Only some options are supported here, *tablename* (which might be
|
||||
qualified with a schema name) *sslmode*, *host*, *port*, *dbname*,
|
||||
*user* and *password*.
|
||||
|
||||
The *sslmode* parameter values can be one of `disable`, `allow`,
|
||||
`prefer` or `require`.
|
||||
|
||||
For backward compatibility reasons, it's possible to specify the
|
||||
*tablename* option directly, without spelling out the `tablename=`
|
||||
parts.
|
||||
|
||||
The options override the main URI components when both are given, and
|
||||
using the percent-encoded option parameters allow using passwords
|
||||
starting with a colon and bypassing other URI components parsing
|
||||
limitations.
|
||||
|
||||
Regular Expressions
|
||||
^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Several clauses listed in the following accept *regular expressions* with
|
||||
the following input rules:
|
||||
|
||||
- A regular expression begins with a tilde sign (`~`),
|
||||
|
||||
- is then followed with an opening sign,
|
||||
|
||||
- then any character is allowed and considered part of the regular
|
||||
expression, except for the closing sign,
|
||||
|
||||
- then a closing sign is expected.
|
||||
|
||||
The opening and closing sign are allowed by pair, here's the complete list
|
||||
of allowed delimiters::
|
||||
|
||||
~//
|
||||
~[]
|
||||
~{}
|
||||
~()
|
||||
~<>
|
||||
~""
|
||||
~''
|
||||
~||
|
||||
~##
|
||||
|
||||
Pick the set of delimiters that don't collide with the *regular expression*
|
||||
you're trying to input. If your expression is such that none of the
|
||||
solutions allow you to enter it, the places where such expressions are
|
||||
allowed should allow for a list of expressions.
|
||||
|
||||
Comments
|
||||
^^^^^^^^
|
||||
|
||||
Any command may contain comments, following those input rules:
|
||||
|
||||
- the `--` delimiter begins a comment that ends with the end of the
|
||||
current line,
|
||||
|
||||
- the delimiters `/*` and `*/` respectively start and end a comment, which
|
||||
can be found in the middle of a command or span several lines.
|
||||
|
||||
Any place where you could enter a *whitespace* will accept a comment too.
|
||||
|
||||
Batch behaviour options
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
All pgloader commands have support for a *WITH* clause that allows for
|
||||
specifying options. Some options are generic and accepted by all commands,
|
||||
such as the *batch behaviour options*, and some options are specific to a
|
||||
data source kind, such as the CSV *skip header* option.
|
||||
|
||||
The global batch behaviour options are:
|
||||
|
||||
- *batch rows*
|
||||
|
||||
Takes a numeric value as argument, used as the maximum number of rows
|
||||
allowed in a batch. The default is `25 000` and can be changed to try
|
||||
having better performance characteristics or to control pgloader memory
|
||||
usage;
|
||||
|
||||
- *batch size*
|
||||
|
||||
Takes a memory unit as argument, such as *20 MB*, its default value.
|
||||
Accepted multipliers are *kB*, *MB*, *GB*, *TB* and *PB*. The case is
|
||||
important so as not to be confused about bits versus bytes, we're only
|
||||
talking bytes here.
|
||||
|
||||
- *prefetch rows*
|
||||
|
||||
Takes a numeric value as argument, defaults to `100000`. That's the
|
||||
number of rows that pgloader is allowed to read in memory in each reader
|
||||
thread. See the *workers* setting for how many reader threads are
|
||||
allowed to run at the same time.
|
||||
|
||||
Other options are specific to each input source, please refer to specific
|
||||
parts of the documentation for their listing and covering.
|
||||
|
||||
A batch is then closed as soon as either the *batch rows* or the *batch
|
||||
size* threshold is crossed, whichever comes first. In cases when a batch has
|
||||
to be closed because of the *batch size* setting, a *debug* level log
|
||||
message is printed with how many rows did fit in the *oversized* batch.
|
||||
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
Loading From an Archive
|
||||
=======================
|
||||
Archive (http, zip)
|
||||
===================
|
||||
|
||||
This command instructs pgloader to load data from one or more files contained
|
||||
in an archive. Currently the only supported archive format is *ZIP*, and the
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
Loading COPY Formatted Files
|
||||
============================
|
||||
COPY
|
||||
====
|
||||
|
||||
This commands instructs pgloader to load from a file containing COPY TEXT
|
||||
data as described in the PostgreSQL documentation.
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
Loading CSV data
|
||||
================
|
||||
CSV
|
||||
===
|
||||
|
||||
This command instructs pgloader to load data from a `CSV` file. Because of
|
||||
the complexity of guessing the parameters of a CSV file, it's simpler to
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
Loading DBF data
|
||||
=================
|
||||
DBF
|
||||
===
|
||||
|
||||
This command instructs pgloader to load data from a `DBF` file. A default
|
||||
set of casting rules are provided and might be overloaded and appended to by
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
Loading Fixed Cols File Formats
|
||||
===============================
|
||||
Fixed Columns
|
||||
=============
|
||||
|
||||
This command instructs pgloader to load data from a text file containing
|
||||
columns arranged in a *fixed size* manner.
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
Loading IXF Data
|
||||
================
|
||||
IXF
|
||||
===
|
||||
|
||||
This command instructs pgloader to load data from an IBM `IXF` file.
|
||||
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
Migrating a MS SQL Database to PostgreSQL
|
||||
=========================================
|
||||
MS SQL to Postgres
|
||||
==================
|
||||
|
||||
This command instructs pgloader to load data from a MS SQL database.
|
||||
Automatic discovery of the schema is supported, including build of the
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
Migrating a MySQL Database to PostgreSQL
|
||||
========================================
|
||||
MySQL to Postgres
|
||||
=================
|
||||
|
||||
This command instructs pgloader to load data from a database connection.
|
||||
pgloader supports dynamically converting the schema of the source database
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
Migrating a PostgreSQL Database to Citus
|
||||
========================================
|
||||
PostgreSQL to Citus
|
||||
===================
|
||||
|
||||
This command instructs pgloader to load data from a database connection.
|
||||
Automatic discovery of the schema is supported, including build of the
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
Support for Redshift in pgloader
|
||||
================================
|
||||
Redshift to Postgres
|
||||
====================
|
||||
|
||||
The command and behavior are the same as when migration from a PostgreSQL
|
||||
database source, see :ref:`migrating_to_pgsql`. pgloader automatically
|
||||
|
||||
@ -1,13 +1,18 @@
|
||||
.. _migrating_to_pgsql:
|
||||
|
||||
Migrating a PostgreSQL Database to PostgreSQL
|
||||
=============================================
|
||||
Postgres to Postgres
|
||||
====================
|
||||
|
||||
This command instructs pgloader to load data from a database connection.
|
||||
Automatic discovery of the schema is supported, including build of the
|
||||
indexes, primary and foreign keys constraints. A default set of casting
|
||||
rules are provided and might be overloaded and appended to by the command.
|
||||
|
||||
For a complete Postgres to Postgres solution including Change Data Capture
|
||||
support with Logical Decoding, see `pgcopydb`__.
|
||||
|
||||
__ https://pgcopydb.readthedocs.io/
|
||||
|
||||
Using default settings
|
||||
----------------------
|
||||
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
Migrating a SQLite database to PostgreSQL
|
||||
=========================================
|
||||
SQLite to Postgres
|
||||
==================
|
||||
|
||||
This command instructs pgloader to load data from a SQLite file. Automatic
|
||||
discovery of the schema is supported, including build of the indexes.
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user