mirror of
https://github.com/dimitri/pgloader.git
synced 2026-05-04 18:36:12 +02:00
Improve Redshift support documentation.
This commit is contained in:
parent
f72afeeae7
commit
007003647d
@ -24,8 +24,7 @@ Welcome to pgloader's documentation!
|
||||
ref/mssql
|
||||
ref/pgsql
|
||||
ref/pgsql-citus-target
|
||||
ref/pgsql-redshift-source
|
||||
ref/pgsql-redshift-target
|
||||
ref/pgsql-redshift
|
||||
ref/transforms
|
||||
bugreport
|
||||
|
||||
|
||||
@ -10,10 +10,13 @@ the data into the server, and manages errors by filling a pair of
|
||||
pgloader knows how to read data from different kind of sources:
|
||||
|
||||
* Files
|
||||
|
||||
* CSV
|
||||
* Fixed Format
|
||||
* DBF
|
||||
|
||||
* Databases
|
||||
|
||||
* SQLite
|
||||
* MySQL
|
||||
* MS SQL Server
|
||||
|
||||
@ -1,12 +0,0 @@
|
||||
Migrating a Redhift Database to PostgreSQL
|
||||
==========================================
|
||||
|
||||
This command instructs pgloader to load data from a database connection.
|
||||
Automatic discovery of the schema is supported, including build of the
|
||||
indexes, primary and foreign keys constraints. A default set of casting
|
||||
rules are provided and might be overloaded and appended to by the command.
|
||||
|
||||
The command and behavior are the same as when migration from a PostgreSQL
|
||||
database source. pgloader automatically discovers that it's talking to a
|
||||
Redshift database by parsing the output of the `SELECT version()` SQL query.
|
||||
|
||||
@ -1,10 +0,0 @@
|
||||
Migrating a PostgreSQL Database to Redshift
|
||||
===========================================
|
||||
|
||||
This command instructs pgloader to load data from a database connection.
|
||||
Automatic discovery of the schema is supported, including build of the
|
||||
indexes, primary and foreign keys constraints. A default set of casting
|
||||
rules are provided and might be overloaded and appended to by the command.
|
||||
|
||||
|
||||
TODO: add details about S3 credentials and bucket configuration.
|
||||
70
docs/ref/pgsql-redshift.rst
Normal file
70
docs/ref/pgsql-redshift.rst
Normal file
@ -0,0 +1,70 @@
|
||||
Support for Redshift in pgloader
|
||||
================================
|
||||
|
||||
The command and behavior are the same as when migration from a PostgreSQL
|
||||
database source. pgloader automatically discovers that it's talking to a
|
||||
Redshift database by parsing the output of the `SELECT version()` SQL query.
|
||||
|
||||
Redhift as a data source
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Redshit is a variant of PostgreSQL version 8.0.2, which allows pgloader to
|
||||
work with only a very small amount of adaptation in the catalog queries
|
||||
used. In other words, migrating from Redshift to PostgreSQL works just the
|
||||
same as when migrating from a PostgreSQL data source, including the
|
||||
connection string specification.
|
||||
|
||||
Redshift as a data destination
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The Redshift variant of PostgreSQL 8.0.2 does not have support for the
|
||||
``COPY FROM STDIN`` feature that pgloader normally relies upon. To use COPY
|
||||
with Redshift, the data must first be made available in an S3 bucket.
|
||||
|
||||
First, pgloader must authenticate to Amazon S3. pgloader uses the following
|
||||
setup for that:
|
||||
|
||||
- ``~/.aws/config``
|
||||
|
||||
This INI formatted file contains sections with your default region and
|
||||
other global values relevant to using the S3 API. pgloader parses it to
|
||||
get the region when it's setup in the ``default`` INI section.
|
||||
|
||||
The environment variable ``AWS_DEFAULT_REGION`` can be used to override
|
||||
the configuration file value.
|
||||
|
||||
- ``~/.aws/credentials``
|
||||
|
||||
The INI formatted file contains your authentication setup to Amazon,
|
||||
with the properties ``aws_access_key_id`` and ``aws_secret_access_key``
|
||||
in the section ``default``. pgloader parses this file for those keys,
|
||||
and uses their values when communicating with Amazon S3.
|
||||
|
||||
The environment variables ``AWS_ACCESS_KEY_ID`` and
|
||||
``AWS_SECRET_ACCESS_KEY`` can be used to override the configuration file
|
||||
|
||||
- ``AWS_S3_BUCKET_NAME``
|
||||
|
||||
Finally, the value of the environment variable ``AWS_S3_BUCKET_NAME`` is
|
||||
used by pgloader as the name of the S3 bucket where to upload the files
|
||||
to COPY to the Redshift database. The bucket name defaults to
|
||||
``pgloader``.
|
||||
|
||||
Then pgloader works as usual, see the other sections of the documentation
|
||||
for the details, depending on the data source (files, other databases, etc).
|
||||
When preparing the data for PostgreSQL, pgloader now uploads each batch into
|
||||
a single CSV file, and then issue such as the following, for each batch:
|
||||
|
||||
::
|
||||
|
||||
COPY <target_table_name>
|
||||
FROM 's3://<s3 bucket>/<s3-filename-just-uploaded>'
|
||||
FORMAT CSV
|
||||
TIMEFORMAT 'auto'
|
||||
REGION '<aws-region>'
|
||||
ACCESS_KEY_ID '<aws-access-key-id>'
|
||||
SECRET_ACCESS_KEY '<aws-secret-access-key>;
|
||||
|
||||
This is the only difference with a PostgreSQL core version, where pgloader
|
||||
can rely on the classic ``COPY FROM STDIN`` command, which allows to send
|
||||
data through the already established connection to PostgreSQL.
|
||||
Loading…
x
Reference in New Issue
Block a user