mirror of
https://github.com/dimitri/pgloader.git
synced 2025-08-10 08:17:00 +02:00
Update documentation with new features.
We have a lot of new features to document. This is a first patch about that, some more work is to be done. That said, it's better than nothing already.
This commit is contained in:
parent
af2995b918
commit
56d24de67a
@ -22,6 +22,10 @@ Welcome to pgloader's documentation!
|
||||
ref/mysql
|
||||
ref/sqlite
|
||||
ref/mssql
|
||||
ref/pgsql
|
||||
ref/pgsql-citus-target
|
||||
ref/pgsql-redshift-source
|
||||
ref/pgsql-redshift-target
|
||||
ref/transforms
|
||||
bugreport
|
||||
|
||||
|
@ -17,6 +17,14 @@ pgloader knows how to read data from different kind of sources:
|
||||
* SQLite
|
||||
* MySQL
|
||||
* MS SQL Server
|
||||
* PostgreSQL
|
||||
* Redshift
|
||||
|
||||
pgloader knows how to target different products using the PostgresQL Protocol:
|
||||
|
||||
* PostgreSQL
|
||||
* `Citus <https://www.citusdata.com>`_
|
||||
* Redshift
|
||||
|
||||
The level of automation provided by pgloader depends on the data source
|
||||
type. In the case of CSV and Fixed Format files, a full description of the
|
||||
|
@ -154,6 +154,18 @@ Those options are meant to tweak `pgloader` behavior when loading data.
|
||||
machine code) another version of itself, usually a newer one like a very
|
||||
recent git checkout.
|
||||
|
||||
* `--no-ssl-cert-verification`
|
||||
|
||||
Uses the OpenSSL option to accept a locally issued server-side
|
||||
certificate, avoiding the following error message::
|
||||
|
||||
SSL verify error: 20 X509_V_ERR_UNABLE_TO_GET_ISSUER_CERT_LOCALLY
|
||||
|
||||
The right way to fix the SSL issue is to use a trusted certificate, of
|
||||
course. Sometimes though it's useful to make progress with the pgloader
|
||||
setup while the certificate chain of trust is being fixed, maybe by
|
||||
another team. That's when this option is useful.
|
||||
|
||||
Command Line Only Operations
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
@ -552,6 +564,22 @@ queries from a SQL file. Implements support for PostgreSQL dollar-quoting
|
||||
and the `\i` and `\ir` include facilities as in `psql` batch mode (where
|
||||
they are the same thing).
|
||||
|
||||
AFTER CREATE SCHEMA DO
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Same format as *BEFORE LOAD DO*, the dollar-quoted queries found in that
|
||||
section are executed once the schema has been craeted by pgloader, and
|
||||
before the data is loaded. It's the right time to ALTER TABLE or do some
|
||||
custom implementation on-top of what pgloader does, like maybe partitioning.
|
||||
|
||||
AFTER CREATE SCHEMA EXECUTE
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Same behaviour as in the *AFTER CREATE SCHEMA DO* clause. Allows you to read
|
||||
the SQL queries from a SQL file. Implements support for PostgreSQL
|
||||
dollar-quoting and the `\i` and `\ir` include facilities as in `psql` batch
|
||||
mode (where they are the same thing).
|
||||
|
||||
Connection String
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
|
@ -1,10 +1,9 @@
|
||||
Migrating a MySQL Database to PostgreSQL
|
||||
========================================
|
||||
|
||||
This command instructs pgloader to load data from a database connection. The
|
||||
only supported database source is currently *MySQL*, and pgloader supports
|
||||
dynamically converting the schema of the source database and the indexes
|
||||
building.
|
||||
This command instructs pgloader to load data from a database connection.
|
||||
pgloader supports dynamically converting the schema of the source database
|
||||
and the indexes building.
|
||||
|
||||
A default set of casting rules are provided and might be overloaded and
|
||||
appended to by the command.
|
||||
@ -609,6 +608,14 @@ Date::
|
||||
to timestamptz drop default
|
||||
using zero-dates-to-null
|
||||
|
||||
type datetime with extra on update current timestamp when not null
|
||||
to timestamptz drop not null drop default
|
||||
using zero-dates-to-null
|
||||
|
||||
type datetime with extra on update current timestamp
|
||||
to timestamptz drop default
|
||||
using zero-dates-to-null
|
||||
|
||||
type timestamp when default "0000-00-00 00:00:00" and not null
|
||||
to timestamptz drop not null drop default
|
||||
using zero-dates-to-null
|
||||
|
77
docs/ref/pgsql-citus-target.rst
Normal file
77
docs/ref/pgsql-citus-target.rst
Normal file
@ -0,0 +1,77 @@
|
||||
Migrating a PostgreSQL Database to Citus
|
||||
========================================
|
||||
|
||||
This command instructs pgloader to load data from a database connection.
|
||||
Automatic discovery of the schema is supported, including build of the
|
||||
indexes, primary and foreign keys constraints. A default set of casting
|
||||
rules are provided and might be overloaded and appended to by the command.
|
||||
|
||||
Automatic distribution column backfilling is supported, either from commands
|
||||
that specify what is the distribution column in every table, or only in the
|
||||
main table, then relying on foreign key constraints to discover the other
|
||||
distribution keys.
|
||||
|
||||
Here's a short example of migrating a database from a PostgreSQL server to
|
||||
another:
|
||||
|
||||
::
|
||||
|
||||
load database
|
||||
from pgsql:///hackathon
|
||||
into pgsql://localhost:9700/dim
|
||||
|
||||
with include drop, reset no sequences
|
||||
|
||||
cast column impressions.seen_at to "timestamp with time zone"
|
||||
|
||||
distribute companies using id
|
||||
-- distribute campaigns using company_id
|
||||
-- distribute ads using company_id from campaigns
|
||||
-- distribute clicks using company_id from ads, campaigns
|
||||
-- distribute impressions using company_id from ads, campaigns
|
||||
;
|
||||
|
||||
Everything works exactly the same way as when doing a PostgreSQL to
|
||||
PostgreSQL migration, with the added fonctionality of this new `distribute`
|
||||
command.
|
||||
|
||||
Distribute Command
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The distribute command syntax is as following::
|
||||
|
||||
distribute <table name> using <column name>
|
||||
distribute <table name> using <column name> from <table> [, <table>, ...]
|
||||
distribute <table name> as reference table
|
||||
|
||||
When using the distribute command, the following steps are added to pgloader
|
||||
operations when migrating the schema:
|
||||
|
||||
- if the distribution column does not exist in the table, it is added as
|
||||
the first column of the table
|
||||
|
||||
- if the distribution column does not exists in the primary key of the
|
||||
table, it is added as the first column of the primary of the table
|
||||
|
||||
- all the foreign keys that point to the table are added the distribution
|
||||
key automatically too, including the source tables of the foreign key
|
||||
constraints
|
||||
|
||||
- once the schema has been created on the target database, pgloader then
|
||||
issues Citus specific command `create_reference_table()
|
||||
<http://docs.citusdata.com/en/v8.0/develop/api_udf.html?highlight=create_reference_table#create-reference-table>`_
|
||||
and `create_distributed_table()
|
||||
<http://docs.citusdata.com/en/v8.0/develop/api_udf.html?highlight=create_reference_table#create-distributed-table>`_
|
||||
to make the tables distributed
|
||||
|
||||
Those operations are done in the schema section of pgloader, before the data
|
||||
is loaded. When the data is loaded, the newly added columns need to be
|
||||
backfilled from referenced data. pgloader knows how to do that by generating
|
||||
a query like the following and importing the result set of such a query
|
||||
rather than the raw data from the source table.
|
||||
|
||||
Citus Migration: Limitations
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The way pgloader implements *reset sequence* does not work with Citus at
|
||||
this point, so sequences need to be taken care of separately at this point.
|
12
docs/ref/pgsql-redshift-source.rst
Normal file
12
docs/ref/pgsql-redshift-source.rst
Normal file
@ -0,0 +1,12 @@
|
||||
Migrating a Redhift Database to PostgreSQL
|
||||
==========================================
|
||||
|
||||
This command instructs pgloader to load data from a database connection.
|
||||
Automatic discovery of the schema is supported, including build of the
|
||||
indexes, primary and foreign keys constraints. A default set of casting
|
||||
rules are provided and might be overloaded and appended to by the command.
|
||||
|
||||
The command and behavior are the same as when migration from a PostgreSQL
|
||||
database source. pgloader automatically discovers that it's talking to a
|
||||
Redshift database by parsing the output of the `SELECT version()` SQL query.
|
||||
|
10
docs/ref/pgsql-redshift-target.rst
Normal file
10
docs/ref/pgsql-redshift-target.rst
Normal file
@ -0,0 +1,10 @@
|
||||
Migrating a PostgreSQL Database to Redshift
|
||||
===========================================
|
||||
|
||||
This command instructs pgloader to load data from a database connection.
|
||||
Automatic discovery of the schema is supported, including build of the
|
||||
indexes, primary and foreign keys constraints. A default set of casting
|
||||
rules are provided and might be overloaded and appended to by the command.
|
||||
|
||||
|
||||
TODO: add details about S3 credentials and bucket configuration.
|
371
docs/ref/pgsql.rst
Normal file
371
docs/ref/pgsql.rst
Normal file
@ -0,0 +1,371 @@
|
||||
Migrating a PostgreSQL Database to PostgreSQL
|
||||
=============================================
|
||||
|
||||
This command instructs pgloader to load data from a database connection.
|
||||
Automatic discovery of the schema is supported, including build of the
|
||||
indexes, primary and foreign keys constraints. A default set of casting
|
||||
rules are provided and might be overloaded and appended to by the command.
|
||||
|
||||
Here's a short example of migrating a database from a PostgreSQL server to
|
||||
another:
|
||||
|
||||
::
|
||||
|
||||
load database
|
||||
from pgsql://localhost/pgloader
|
||||
into pgsql://localhost/copy
|
||||
|
||||
including only table names matching 'bits', ~/utilisateur/ in schema 'mysql'
|
||||
including only table names matching ~/geolocations/ in schema 'public'
|
||||
;
|
||||
|
||||
PostgreSQL Database Source Specification: FROM
|
||||
----------------------------------------------
|
||||
|
||||
Must be a connection URL pointing to a PostgreSQL database.
|
||||
|
||||
See the `SOURCE CONNECTION STRING` section above for details on how to write
|
||||
the connection string.
|
||||
|
||||
::
|
||||
|
||||
pgsql://[user[:password]@][netloc][:port][/dbname][?option=value&...]
|
||||
|
||||
|
||||
PostgreSQL Database Migration Options: WITH
|
||||
-------------------------------------------
|
||||
|
||||
When loading from a `PostgreSQL` database, the following options are
|
||||
supported, and the default *WITH* clause is: *no truncate*, *create schema*,
|
||||
*create tables*, *include drop*, *create indexes*, *reset sequences*,
|
||||
*foreign keys*, *downcase identifiers*, *uniquify index names*, *reindex*.
|
||||
|
||||
- *include drop*
|
||||
|
||||
When this option is listed, pgloader drops all the tables in the target
|
||||
PostgreSQL database whose names appear in the MySQL database. This
|
||||
option allows for using the same command several times in a row until
|
||||
you figure out all the options, starting automatically from a clean
|
||||
environment. Please note that `CASCADE` is used to ensure that tables
|
||||
are dropped even if there are foreign keys pointing to them. This is
|
||||
precisely what `include drop` is intended to do: drop all target tables
|
||||
and recreate them.
|
||||
|
||||
Great care needs to be taken when using `include drop`, as it will
|
||||
cascade to *all* objects referencing the target tables, possibly
|
||||
including other tables that are not being loaded from the source DB.
|
||||
|
||||
- *include no drop*
|
||||
|
||||
When this option is listed, pgloader will not include any `DROP`
|
||||
statement when loading the data.
|
||||
|
||||
- *truncate*
|
||||
|
||||
When this option is listed, pgloader issue the `TRUNCATE` command
|
||||
against each PostgreSQL table just before loading data into it.
|
||||
|
||||
- *no truncate*
|
||||
|
||||
When this option is listed, pgloader issues no `TRUNCATE` command.
|
||||
|
||||
- *disable triggers*
|
||||
|
||||
When this option is listed, pgloader issues an `ALTER TABLE ... DISABLE
|
||||
TRIGGER ALL` command against the PostgreSQL target table before copying
|
||||
the data, then the command `ALTER TABLE ... ENABLE TRIGGER ALL` once the
|
||||
`COPY` is done.
|
||||
|
||||
This option allows loading data into a pre-existing table ignoring the
|
||||
*foreign key constraints* and user defined triggers and may result in
|
||||
invalid *foreign key constraints* once the data is loaded. Use with
|
||||
care.
|
||||
|
||||
- *create tables*
|
||||
|
||||
When this option is listed, pgloader creates the table using the meta
|
||||
data found in the `MySQL` file, which must contain a list of fields with
|
||||
their data type. A standard data type conversion from DBF to PostgreSQL
|
||||
is done.
|
||||
|
||||
- *create no tables*
|
||||
|
||||
When this option is listed, pgloader skips the creation of table before
|
||||
loading data, target tables must then already exist.
|
||||
|
||||
Also, when using *create no tables* pgloader fetches the metadata from
|
||||
the current target database and checks type casting, then will remove
|
||||
constraints and indexes prior to loading the data and install them back
|
||||
again once the loading is done.
|
||||
|
||||
- *create indexes*
|
||||
|
||||
When this option is listed, pgloader gets the definitions of all the
|
||||
indexes found in the MySQL database and create the same set of index
|
||||
definitions against the PostgreSQL database.
|
||||
|
||||
- *create no indexes*
|
||||
|
||||
When this option is listed, pgloader skips the creating indexes.
|
||||
|
||||
- *drop indexes*
|
||||
|
||||
When this option is listed, pgloader drops the indexes in the target
|
||||
database before loading the data, and creates them again at the end
|
||||
of the data copy.
|
||||
|
||||
- *reindex*
|
||||
|
||||
When this option is used, pgloader does both *drop indexes* before
|
||||
loading the data and *create indexes* once data is loaded.
|
||||
|
||||
- *drop schema*
|
||||
|
||||
When this option is listed, pgloader drops the target schema in the
|
||||
target PostgreSQL database before creating it again and all the objects
|
||||
it contains. The default behavior doesn't drop the target schemas.
|
||||
|
||||
- *foreign keys*
|
||||
|
||||
When this option is listed, pgloader gets the definitions of all the
|
||||
foreign keys found in the MySQL database and create the same set of
|
||||
foreign key definitions against the PostgreSQL database.
|
||||
|
||||
- *no foreign keys*
|
||||
|
||||
When this option is listed, pgloader skips creating foreign keys.
|
||||
|
||||
- *reset sequences*
|
||||
|
||||
When this option is listed, at the end of the data loading and after the
|
||||
indexes have all been created, pgloader resets all the PostgreSQL
|
||||
sequences created to the current maximum value of the column they are
|
||||
attached to.
|
||||
|
||||
The options *schema only* and *data only* have no effects on this
|
||||
option.
|
||||
|
||||
- *reset no sequences*
|
||||
|
||||
When this option is listed, pgloader skips resetting sequences after the
|
||||
load.
|
||||
|
||||
The options *schema only* and *data only* have no effects on this
|
||||
option.
|
||||
|
||||
- *downcase identifiers*
|
||||
|
||||
When this option is listed, pgloader converts all MySQL identifiers
|
||||
(table names, index names, column names) to *downcase*, except for
|
||||
PostgreSQL *reserved* keywords.
|
||||
|
||||
The PostgreSQL *reserved* keywords are determined dynamically by using
|
||||
the system function `pg_get_keywords()`.
|
||||
|
||||
- *quote identifiers*
|
||||
|
||||
When this option is listed, pgloader quotes all MySQL identifiers so
|
||||
that their case is respected. Note that you will then have to do the
|
||||
same thing in your application code queries.
|
||||
|
||||
- *schema only*
|
||||
|
||||
When this option is listed pgloader refrains from migrating the data
|
||||
over. Note that the schema in this context includes the indexes when the
|
||||
option *create indexes* has been listed.
|
||||
|
||||
- *data only*
|
||||
|
||||
When this option is listed pgloader only issues the `COPY` statements,
|
||||
without doing any other processing.
|
||||
|
||||
- *rows per range*
|
||||
|
||||
How many rows are fetched per `SELECT` query when using *multiple
|
||||
readers per thread*, see above for details.
|
||||
|
||||
PostgreSQL Database Casting Rules
|
||||
---------------------------------
|
||||
|
||||
The command *CAST* introduces user-defined casting rules.
|
||||
|
||||
The cast clause allows to specify custom casting rules, either to overload
|
||||
the default casting rules or to amend them with special cases.
|
||||
|
||||
A casting rule is expected to follow one of the forms::
|
||||
|
||||
type <type-name> [ <guard> ... ] to <pgsql-type-name> [ <option> ... ]
|
||||
column <table-name>.<column-name> [ <guards> ] to ...
|
||||
|
||||
It's possible for a *casting rule* to either match against a PostgreSQL data
|
||||
type or against a given *column name* in a given *table name*. So it's
|
||||
possible to migrate a table from a PostgreSQL database while changing and
|
||||
`int` column to a `bigint` one, automatically.
|
||||
|
||||
The *casting rules* are applied in order, the first match prevents following
|
||||
rules to be applied, and user defined rules are evaluated first.
|
||||
|
||||
The supported guards are:
|
||||
|
||||
- *when default 'value'*
|
||||
|
||||
The casting rule is only applied against MySQL columns of the source
|
||||
type that have given *value*, which must be a single-quoted or a
|
||||
double-quoted string.
|
||||
|
||||
- *when typemod expression*
|
||||
|
||||
The casting rule is only applied against MySQL columns of the source
|
||||
type that have a *typemod* value matching the given *typemod
|
||||
expression*. The *typemod* is separated into its *precision* and *scale*
|
||||
components.
|
||||
|
||||
Example of a cast rule using a *typemod* guard::
|
||||
|
||||
type char when (= precision 1) to char keep typemod
|
||||
|
||||
This expression casts MySQL `char(1)` column to a PostgreSQL column of
|
||||
type `char(1)` while allowing for the general case `char(N)` will be
|
||||
converted by the default cast rule into a PostgreSQL type `varchar(N)`.
|
||||
|
||||
- *with extra auto_increment*
|
||||
|
||||
The casting rule is only applied against PostgreSQL attached to a
|
||||
sequence. This can be the result of doing that manually, using a
|
||||
`serial` or a `bigserial` data type, or an `identity` column.
|
||||
|
||||
|
||||
The supported casting options are:
|
||||
|
||||
- *drop default*, *keep default*
|
||||
|
||||
When the option *drop default* is listed, pgloader drops any
|
||||
existing default expression in the MySQL database for columns of the
|
||||
source type from the `CREATE TABLE` statement it generates.
|
||||
|
||||
The spelling *keep default* explicitly prevents that behaviour and
|
||||
can be used to overload the default casting rules.
|
||||
|
||||
- *drop not null*, *keep not null*, *set not null*
|
||||
|
||||
When the option *drop not null* is listed, pgloader drops any
|
||||
existing `NOT NULL` constraint associated with the given source
|
||||
MySQL datatype when it creates the tables in the PostgreSQL
|
||||
database.
|
||||
|
||||
The spelling *keep not null* explicitly prevents that behaviour and
|
||||
can be used to overload the default casting rules.
|
||||
|
||||
When the option *set not null* is listed, pgloader sets a `NOT NULL`
|
||||
constraint on the target column regardless whether it has been set
|
||||
in the source MySQL column.
|
||||
|
||||
- *drop typemod*, *keep typemod*
|
||||
|
||||
When the option *drop typemod* is listed, pgloader drops any
|
||||
existing *typemod* definition (e.g. *precision* and *scale*) from
|
||||
the datatype definition found in the MySQL columns of the source
|
||||
type when it created the tables in the PostgreSQL database.
|
||||
|
||||
The spelling *keep typemod* explicitly prevents that behaviour and
|
||||
can be used to overload the default casting rules.
|
||||
|
||||
- *using*
|
||||
|
||||
This option takes as its single argument the name of a function to
|
||||
be found in the `pgloader.transforms` Common Lisp package. See above
|
||||
for details.
|
||||
|
||||
It's possible to augment a default cast rule (such as one that
|
||||
applies against `ENUM` data type for example) with a *transformation
|
||||
function* by omitting entirely the `type` parts of the casting rule,
|
||||
as in the following example::
|
||||
|
||||
column enumerate.foo using empty-string-to-null
|
||||
|
||||
PostgreSQL Partial Migration
|
||||
----------------------------
|
||||
|
||||
INCLUDING ONLY TABLE NAMES MATCHING
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Introduce a comma separated list of table names or *regular expression* used
|
||||
to limit the tables to migrate to a sublist.
|
||||
|
||||
Example::
|
||||
|
||||
including only table names matching ~/film/, 'actor' in schema 'public'
|
||||
|
||||
EXCLUDING TABLE NAMES MATCHING
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Introduce a comma separated list of table names or *regular expression* used
|
||||
to exclude table names from the migration. This filter only applies to the
|
||||
result of the *INCLUDING* filter.
|
||||
|
||||
::
|
||||
|
||||
excluding table names matching ~<ory> in schema 'public'
|
||||
|
||||
PostgreSQL Schema Transformations
|
||||
---------------------------------
|
||||
|
||||
ALTER TABLE NAMES MATCHING
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Introduce a comma separated list of table names or *regular expressions*
|
||||
that you want to target in the pgloader *ALTER TABLE* command. The only two
|
||||
available actions are *SET SCHEMA* and *RENAME TO*, both take a quoted
|
||||
string as parameter::
|
||||
|
||||
ALTER TABLE NAMES MATCHING ~/_list$/, 'sales_by_store', ~/sales_by/
|
||||
SET SCHEMA 'mv'
|
||||
|
||||
ALTER TABLE NAMES MATCHING 'film' RENAME TO 'films'
|
||||
|
||||
ALTER TABLE NAMES MATCHING ~/./ SET (fillfactor='40')
|
||||
|
||||
You can use as many such rules as you need. The list of tables to be
|
||||
migrated is searched in pgloader memory against the *ALTER TABLE* matching
|
||||
rules, and for each command pgloader stops at the first matching criteria
|
||||
(regexp or string).
|
||||
|
||||
No *ALTER TABLE* command is sent to PostgreSQL, the modification happens at
|
||||
the level of the pgloader in-memory representation of your source database
|
||||
schema. In case of a name change, the mapping is kept and reused in the
|
||||
*foreign key* and *index* support.
|
||||
|
||||
The *SET ()* action takes effect as a *WITH* clause for the `CREATE TABLE`
|
||||
command that pgloader will run when it has to create a table.
|
||||
|
||||
PostgreSQL Migration: limitations
|
||||
---------------------------------
|
||||
|
||||
The only PostgreSQL objects supported at this time in pgloader are
|
||||
extensions, schema, tables, indexes and constraints. Anything else is ignored.
|
||||
|
||||
- Views are not migrated,
|
||||
|
||||
Supporting views might require implementing a full SQL parser for the
|
||||
MySQL dialect with a porting engine to rewrite the SQL against
|
||||
PostgreSQL, including renaming functions and changing some constructs.
|
||||
|
||||
While it's not theoretically impossible, don't hold your breath.
|
||||
|
||||
- Triggers are not migrated
|
||||
|
||||
The difficulty of doing so is not yet assessed.
|
||||
|
||||
- Stored Procedures and Functions are not migrated.
|
||||
|
||||
|
||||
Default PostgreSQL Casting Rules
|
||||
--------------------------------
|
||||
|
||||
When migrating from PostgreSQL the following Casting Rules are provided::
|
||||
|
||||
type int with extra auto_increment to serial
|
||||
type bigint with extra auto_increment to bigserial
|
||||
type "character varying" to text drop typemod
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user