From 007003647d6bd16726917f44d635c5d7902c0104 Mon Sep 17 00:00:00 2001 From: Dimitri Fontaine Date: Fri, 14 Dec 2018 18:21:34 +0900 Subject: [PATCH] Improve Redshift support documentation. --- docs/index.rst | 3 +- docs/intro.rst | 3 ++ docs/ref/pgsql-redshift-source.rst | 12 ----- docs/ref/pgsql-redshift-target.rst | 10 ----- docs/ref/pgsql-redshift.rst | 70 ++++++++++++++++++++++++++++++ 5 files changed, 74 insertions(+), 24 deletions(-) delete mode 100644 docs/ref/pgsql-redshift-source.rst delete mode 100644 docs/ref/pgsql-redshift-target.rst create mode 100644 docs/ref/pgsql-redshift.rst diff --git a/docs/index.rst b/docs/index.rst index 3fb2f9a..aac16f7 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -24,8 +24,7 @@ Welcome to pgloader's documentation! ref/mssql ref/pgsql ref/pgsql-citus-target - ref/pgsql-redshift-source - ref/pgsql-redshift-target + ref/pgsql-redshift ref/transforms bugreport diff --git a/docs/intro.rst b/docs/intro.rst index 2a098d9..ed981b7 100644 --- a/docs/intro.rst +++ b/docs/intro.rst @@ -10,10 +10,13 @@ the data into the server, and manages errors by filling a pair of pgloader knows how to read data from different kind of sources: * Files + * CSV * Fixed Format * DBF + * Databases + * SQLite * MySQL * MS SQL Server diff --git a/docs/ref/pgsql-redshift-source.rst b/docs/ref/pgsql-redshift-source.rst deleted file mode 100644 index b69b6d9..0000000 --- a/docs/ref/pgsql-redshift-source.rst +++ /dev/null @@ -1,12 +0,0 @@ -Migrating a Redhift Database to PostgreSQL -========================================== - -This command instructs pgloader to load data from a database connection. -Automatic discovery of the schema is supported, including build of the -indexes, primary and foreign keys constraints. A default set of casting -rules are provided and might be overloaded and appended to by the command. - -The command and behavior are the same as when migration from a PostgreSQL -database source. pgloader automatically discovers that it's talking to a -Redshift database by parsing the output of the `SELECT version()` SQL query. - diff --git a/docs/ref/pgsql-redshift-target.rst b/docs/ref/pgsql-redshift-target.rst deleted file mode 100644 index 50cc356..0000000 --- a/docs/ref/pgsql-redshift-target.rst +++ /dev/null @@ -1,10 +0,0 @@ -Migrating a PostgreSQL Database to Redshift -=========================================== - -This command instructs pgloader to load data from a database connection. -Automatic discovery of the schema is supported, including build of the -indexes, primary and foreign keys constraints. A default set of casting -rules are provided and might be overloaded and appended to by the command. - - -TODO: add details about S3 credentials and bucket configuration. diff --git a/docs/ref/pgsql-redshift.rst b/docs/ref/pgsql-redshift.rst new file mode 100644 index 0000000..09d73e1 --- /dev/null +++ b/docs/ref/pgsql-redshift.rst @@ -0,0 +1,70 @@ +Support for Redshift in pgloader +================================ + +The command and behavior are the same as when migration from a PostgreSQL +database source. pgloader automatically discovers that it's talking to a +Redshift database by parsing the output of the `SELECT version()` SQL query. + +Redhift as a data source +^^^^^^^^^^^^^^^^^^^^^^^^ + +Redshit is a variant of PostgreSQL version 8.0.2, which allows pgloader to +work with only a very small amount of adaptation in the catalog queries +used. In other words, migrating from Redshift to PostgreSQL works just the +same as when migrating from a PostgreSQL data source, including the +connection string specification. + +Redshift as a data destination +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The Redshift variant of PostgreSQL 8.0.2 does not have support for the +``COPY FROM STDIN`` feature that pgloader normally relies upon. To use COPY +with Redshift, the data must first be made available in an S3 bucket. + +First, pgloader must authenticate to Amazon S3. pgloader uses the following +setup for that: + + - ``~/.aws/config`` + + This INI formatted file contains sections with your default region and + other global values relevant to using the S3 API. pgloader parses it to + get the region when it's setup in the ``default`` INI section. + + The environment variable ``AWS_DEFAULT_REGION`` can be used to override + the configuration file value. + + - ``~/.aws/credentials`` + + The INI formatted file contains your authentication setup to Amazon, + with the properties ``aws_access_key_id`` and ``aws_secret_access_key`` + in the section ``default``. pgloader parses this file for those keys, + and uses their values when communicating with Amazon S3. + + The environment variables ``AWS_ACCESS_KEY_ID`` and + ``AWS_SECRET_ACCESS_KEY`` can be used to override the configuration file + + - ``AWS_S3_BUCKET_NAME`` + + Finally, the value of the environment variable ``AWS_S3_BUCKET_NAME`` is + used by pgloader as the name of the S3 bucket where to upload the files + to COPY to the Redshift database. The bucket name defaults to + ``pgloader``. + +Then pgloader works as usual, see the other sections of the documentation +for the details, depending on the data source (files, other databases, etc). +When preparing the data for PostgreSQL, pgloader now uploads each batch into +a single CSV file, and then issue such as the following, for each batch: + +:: + + COPY + FROM 's3:///' + FORMAT CSV + TIMEFORMAT 'auto' + REGION '' + ACCESS_KEY_ID '' + SECRET_ACCESS_KEY '; + +This is the only difference with a PostgreSQL core version, where pgloader +can rely on the classic ``COPY FROM STDIN`` command, which allows to send +data through the already established connection to PostgreSQL.