Improve Redshift support documentation.

2026-05-05 10:56:10 +02:00 · 2018-12-14 18:21:34 +09:00 · 2018-12-14 18:21:34 +09:00 · 007003647d
commit 007003647d
parent f72afeeae7
5 changed files with 74 additions and 24 deletions
--- a/docs/index.rst
+++ b/docs/index.rst
@ -24,8 +24,7 @@ Welcome to pgloader's documentation!
   ref/mssql
   ref/pgsql
   ref/pgsql-citus-target
-   ref/pgsql-redshift-source
-   ref/pgsql-redshift-target
+   ref/pgsql-redshift
   ref/transforms
   bugreport

--- a/docs/intro.rst
+++ b/docs/intro.rst
@ -10,10 +10,13 @@ the data into the server, and manages errors by filling a pair of
 pgloader knows how to read data from different kind of sources:

  * Files
+
    * CSV
    * Fixed Format
    * DBF
+
  * Databases
+
    * SQLite
    * MySQL
    * MS SQL Server
--- a/docs/ref/pgsql-redshift-source.rst
+++ b/docs/ref/pgsql-redshift-source.rst
@ -1,12 +0,0 @@
-Migrating a Redhift Database to PostgreSQL
-==========================================
-
-This command instructs pgloader to load data from a database connection.
-Automatic discovery of the schema is supported, including build of the
-indexes, primary and foreign keys constraints. A default set of casting
-rules are provided and might be overloaded and appended to by the command.
-
-The command and behavior are the same as when migration from a PostgreSQL
-database source. pgloader automatically discovers that it's talking to a
-Redshift database by parsing the output of the `SELECT version()` SQL query.
-
--- a/docs/ref/pgsql-redshift-target.rst
+++ b/docs/ref/pgsql-redshift-target.rst
@ -1,10 +0,0 @@
-Migrating a PostgreSQL Database to Redshift
-===========================================
-
-This command instructs pgloader to load data from a database connection.
-Automatic discovery of the schema is supported, including build of the
-indexes, primary and foreign keys constraints. A default set of casting
-rules are provided and might be overloaded and appended to by the command.
-
-
-TODO: add details about S3 credentials and bucket configuration.
--- a/docs/ref/pgsql-redshift.rst
+++ b/docs/ref/pgsql-redshift.rst
@ -0,0 +1,70 @@
+Support for Redshift in pgloader
+================================
+
+The command and behavior are the same as when migration from a PostgreSQL
+database source. pgloader automatically discovers that it's talking to a
+Redshift database by parsing the output of the `SELECT version()` SQL query.
+
+Redhift as a data source
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Redshit is a variant of PostgreSQL version 8.0.2, which allows pgloader to
+work with only a very small amount of adaptation in the catalog queries
+used. In other words, migrating from Redshift to PostgreSQL works just the
+same as when migrating from a PostgreSQL data source, including the
+connection string specification.
+
+Redshift as a data destination
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The Redshift variant of PostgreSQL 8.0.2 does not have support for the
+``COPY FROM STDIN`` feature that pgloader normally relies upon. To use COPY
+with Redshift, the data must first be made available in an S3 bucket.
+
+First, pgloader must authenticate to Amazon S3. pgloader uses the following
+setup for that:
+
+  - ``~/.aws/config``
+
+    This INI formatted file contains sections with your default region and
+    other global values relevant to using the S3 API. pgloader parses it to
+    get the region when it's setup in the ``default`` INI section.
+
+    The environment variable ``AWS_DEFAULT_REGION`` can be used to override
+    the configuration file value.
+    
+  - ``~/.aws/credentials``
+
+    The INI formatted file contains your authentication setup to Amazon,
+    with the properties ``aws_access_key_id`` and ``aws_secret_access_key``
+    in the section ``default``. pgloader parses this file for those keys,
+    and uses their values when communicating with Amazon S3.
+
+    The environment variables ``AWS_ACCESS_KEY_ID`` and
+    ``AWS_SECRET_ACCESS_KEY`` can be used to override the configuration file
+    
+  - ``AWS_S3_BUCKET_NAME``
+    
+    Finally, the value of the environment variable ``AWS_S3_BUCKET_NAME`` is
+    used by pgloader as the name of the S3 bucket where to upload the files
+    to COPY to the Redshift database. The bucket name defaults to
+    ``pgloader``.
+
+Then pgloader works as usual, see the other sections of the documentation
+for the details, depending on the data source (files, other databases, etc).
+When preparing the data for PostgreSQL, pgloader now uploads each batch into
+a single CSV file, and then issue such as the following, for each batch:
+
+::
+
+  COPY <target_table_name>
+        FROM 's3://<s3 bucket>/<s3-filename-just-uploaded>'
+        FORMAT CSV
+        TIMEFORMAT 'auto'
+        REGION '<aws-region>'
+        ACCESS_KEY_ID '<aws-access-key-id>'
+        SECRET_ACCESS_KEY '<aws-secret-access-key>;
+
+This is the only difference with a PostgreSQL core version, where pgloader
+can rely on the classic ``COPY FROM STDIN`` command, which allows to send
+data through the already established connection to PostgreSQL.