That helps having both an overview of what pgloader is capable of doing with
a database migration, and also documenting that some sources don't have the
full support for some features yet.
The latter is not tested yet, but should have no impact if not used. Given
how rare it is that I get a chance to play around with a MS SQL instance
anyway, it might be better to push blind changes for it when it doesn't
impact existing features…
We have a lot of new features to document. This is a first patch about that,
some more work is to be done. That said, it's better than nothing already.
The previous fix was wrong for missing the point: rather than unquote column
names in the table definition when matching the column names in the index
definition, we should in the first place have quoted the index column names
when needed.
Fixes#872 for real this time.
If the source database is using a keyword (such as "order") as a column
name, then pgloader is going to quote this column name in its internal
catalogs. In that case, unquote the column in the pgloader catalogs when
matching it against the unquoted column name we have in the index
definition.
Fixes#872.
The debian/Ubuntu packaging would defeat the quite simple regexp parsing
PostgreSQL version string that we have in pgloader. To make it more robust,
make it more open to unforeseen strings.
See #800, see #810.
As for the other datetime types we have to use CONVERT at the SQL level in
order to get a format that PostgreSQL understands. This time the magic
number for it is 114.
This should make it easier to build pgloader with CCL rather than SBCL, all
from the bundle distribution, and also easier to support windows.
In passing, add a new file in the bundle distribution: version.sexp should
contain a CL string containing the pgloader version string.
A user reported a case where pgloader fails to find the table an index has
been created on in pgloader catalogs. That's a weird case. For now, just
issue a warning about the situation and skip the index.
It might be that some random condition is signaled during process-catalogs,
causing the errors reported so far and that I can't reproduce. Let's add
some handler-case protection to have more clues about what could be
happening.
See #865, #800, #810, #859, #824.
We might have MS SQL failures at this stage, or even Redshift or other
PostgreSQL variants failing to execute our catalog queries. Handle
conditions by cleanly logging them and returning from copy-database without
doing anything. That's the best we can do here.
Fixes#605, fixes#757.
This time we directly call into the save-lisp-and-die feature of the
implementation. As pgloader only supports SBCL and CCL at the time being,
doing things without an abstraction layer is easy enough.
This needs more testing and a special version for the bundle case too. One
step at a time, etc.
Make it so that we generate a proper error message to the user when failing
to figure out the PATH to the distribution key, rather than failing with an
internal error about The value NIL is not of type PGLOADER.CATALOG:TABLE.
Our catalogs representation is designed to be circular, which helps
navigating the graph from anywhere when processing it. This means that we
need to have *print-circle* set to t in the pgloader image, otherwise we
might run into Control stack exhausted when trying to print out debug
information...
Fixes#865, #800, #810, #859, #824.
This patch adds an option --no-ssl-cert-verification that allows bypassing
OpenSSL server certificate verification. It's hopefully a temporary measure
that we set up in order to make progress when confronted to:
SSL verify error: 20 X509_V_ERR_UNABLE_TO_GET_ISSUER_CERT_LOCALLY
The real solution is of course to install the SSL certificates at a place
where pgloader will look for them, which defaults to
~/.postgresql/postgresql.crt at the moment. It's not clear what the story is
with the defaults from /etc/ssl, or how to make things happen in a better
way.
See #648, See #679, See #768, See #748, See #775.
This gives a default "null if" option to all the input columns at once, and
it's still possible to override the default per column.
In passing, fix project-fields declarations that SBCL now complains about
when they're not true, such as declaring a vector when we might have :null
or nil. As a result, remove the (declare (optimize speed)) in the generated
field processing code.
An hostname could be written [::1] in .pgass, without having to escape the
colon characters, and with a proper enclosing in square brackets, as common
for ipv6 addresses.
Fixes#837.
The code emitted by pgloader to transform input fields into PostgreSQL
column values was using too many optimization declarations, some of them
that SBCL failed to follow through for lack of type marking in the generated
code.
As SBCL doesn't have enough information to be optimizing anyway, at least we
can make it so that we don't have a warning about it. The new code does that.
Fixes#803.
The code expects the keyword :auto-increment rather than a string nowadays
in order to process an extra column bits of information as meaning that we
want to cast to a serial/bigserial datatype.
At this stage we don't even parse the details of the Redshift identity such
as the seed and step values and consider them the same as a MySQL
auto_increment extra description field.
Fixes#860 (again).
Neither does array_agg(), unnest() and other very useful PostgreSQL
functions. Redshift is from 8.0 times, so do things the old way: parse the
output of the index definition that get from calling pg_index_def().
For that, this patch introduces the notion of SQL support that depends on
PostgreSQL major version. If no major-version specific query is found in the
pgloader source tree, then we use the generic one.
Fixes#860.
It turns out that the rules about the names of users and databases are more
lax than pgloader would know, so it might be a good move for our DSN parsing
to accept more values and then let the source/target systems to complain
when something goes wrong.
See #230 which got broke again somewhere.
With this patch, the following distribution rule
distribute companies using id
is equivalent to the following distribution rule set, given foreign keys in
the source schema:
distribute companies using id
distribute campaigns using company_id
distribute ads using company_id from campaigns
distribute clicks using company_id from ads, campaigns
distribute impressions using company_id from ads, campaigns
In the current code (of this patch) pgloader walks the foreign-keys
dependency tree and knows how to automatically derive distribution rules
from a single rule and the foreign keys.
With this patch it's now actually possible to backfill the data on the fly
when using the "distribute" new commands. The schema is modified to add the
distribution key where specified, and changes to the primary and foreign
keys happen automatically. Then a JOIN is generated to get the data directly
during the COPY streaming to the Citus cluster.
The idea is for pgloader to tweak the schema from a description of the
sharding model, the distribute clause. Here's an example of such a clause:
distribute company using id
distribute campaign using company_id
distribute ads using company_id from campaign
distribute clicks using company_id from ads, campaign
Given such commands, pgloader adds the distibution key to the table when
needed, to the primary key definition of the table, and also to the foreign
keys that are pointing to the changed primary key.
Then when SELECTing the data from the source database, the idea is for
pgloader to automatically JOIN the base table with the source table where to
find the distribution key, in case it was just added in the schema.
Finally, pgloader also calls the following Citus commands:
SELECT create_distributed_table('company', 'id');
SELECT create_distributed_table('campaign', 'company_id');
SELECT create_distributed_table('ads', 'company_id');
SELECT create_distributed_table('clicks', 'company_id');
The catalog queries used in pgloader have to be adjusted for Redshift
because this thing forked PostgreSQL 8.0, which is a long time ago now.
Also, we had a couple bugs here and there that were not really related to
Redshift support but were shown in that context.
Fixes#813.
It turns out that when trying to debug "decoding as" the SQLtype listing
support in sqltype-list was found broken, so this patch fixes it. Then goes
on to fix the DECODING AS filters support, which we have switched to using
the better regexp-or-string filter struct but forgot to update the matching
code accordingly.
Fixes#665.
When dealing with PostgreSQL protocol compatible databases, often enough
they don't support the same catalogs as PostgreSQL itself. Redshift for
instance lacks foreign key support.
We now accept the more general string and regex match rules, but the code to
generate including and excluding lists from the catalogs had not been updated.