It's now possible to use pgloader to migrate from PostgreSQL to PostgreSQL.
That might be useful for several reasons, including applying user defined
cast rules at COPY time, or just moving from an hosted solution to another.
* Add dockerfiles to .dockerignore
Otherwise changes in the dockerfiles would invalidate the cache
* Rewrite Dockerfile
- Fix deprecated MAINTAINER instruction
- Move maintainer label to the bottom (improving cache)
- Tidy up apt-get
- Use COPY instead of ADD
see https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#add-or-copy
- Remove WORKDIR instruction (we don't really need this)
- Combine remaining RUN layers to reduce layer count
- Move final binary instead of copying (reduce image size)
* Use -slim image an multistage build
Reduce size by using multistage builds and the -slim image.
Use debian:stable instead of an specific code name (future proof).
* [cosmetic] indent Dockerfile instructions
Make it easier to see where a new build stage begins
* Rewrite Dockerfile.ccl
Apply the same changes to Dockerfile.ccl as we did for Dockerfile
Given the variety of ways to setup default behavior for datetime and
timestamp data types in MySQL, we need yet more default casting rules. It
might be time to think about a more principled way to solve the problem, but
on the other hand, this ad-hoc one also comes with full overriding
flexibility for the end user.
Fixes#811.
Travis spotted a bug with CCL that I failed to see, and that happens with
Clozure-CL but not with SBCL apparently:
2018-07-03T21:04:11.053795Z FATAL The value "\\\"", derived from the initarg :DELIMITER, can not be used to set the value of the slot CL-CSV::DELIMITER in #<CL-CSV::READ-DISPATCH-TABLE-ENTRY #x30200143DDCD>, because it is not of type (VECTOR (OR (MEMBER T NIL) CHARACTER)).
To fix, prefer the syntax #(#\\ #\") rather than "\\\"".
It's easy to avoid having the warning about unused lexical variable with the
proper declaration, that I failed to install before because of a syntax
error when I tried. Let's fix it now that I realise what was wrong.
When the LOAD command does not provide the name of the target table for a
DBF source, we can get the name of the table from the DBF file itself. That
feature got broken, here's a fix.
Fix#805.
Instead, it needs to parse CSV files. On the other hand, as we don't have to
implement the COPY protocol from within pgloader for Redshift (because it's
using S3 as the data source, and not STDIN), we don't need the level of
control that we are using when talking to a normal PostgreSQL.
It's a little more involved that what was done previously. In particular we
need to pay attention to MySQL varchar(x) and transform them into something
big enough when counting bytes rather than chars, like varchar(3x).
Then there's the "text" datatype to take into account, and some more.
Due to how type casting matching is implemented in pgloader, we need to add
two more MySQL casting rules in the default pgloader set to handle
specically the case when a datetime or timestamp column in MySQL has the
"extra" bit of information "on update current timestamp".
That's because for a casting rule to match, both the type definition and the
casting rule must have the :on-update-current-timestamp property positionned
the same, so that the existing default rules would not apply.
Also add some schema-level support by disabling our usual index and
constraint support when the target is Redshift, because it doesn't support
those parts of SQL.
The S3 parameters are read form either the process environment variables or
from the AWS configuration files in ~/.aws.
Redshift looks like a very old PostgreSQL (8.0.2) with some extra features
and a very limited selection of data types. In this patch we parse the
PostgreSQL version() function output and automatically determine if we're
connected to Redshift.
When connected to Redshift, we then dumb-down our target catalogs to the
subset of data types that Redshift actually does support.
Also, some catalog queries can't be done in Redshift, and 8.0 didn't have
fully compliant VALUES statement, so we use a temporary table in places
where we used to use SELECT ... FROM (VALUES(...)) in pgloader.
COPYing data to Redshift isn't possible with just this set of changes,
because Redshift also don't support the COPY FROM STDIN form. COPY sources
are limited, and another patch will have to be cooked to prepare the data
from pgloader into a format and location that Redshift knows how to handle.
At least, it's possible to migrate a database schema to Redshift already.