The concurrency nature of pgloader made it non obvious where to implement
the timers properly, and as a result the tracking of how long it took to
actually transfer the data was... just wrong.
Rather than trying to measure the time spent in any particular piece of the
code, we now emit "start" and "stop" stats messages to the monitor thread at
the right places (which are way easier to find, in the worker threads) and
have the monitor figure out how long it took really.
Fix#506.
Now that we have fixed the output of the per-table total timing, we can only
show that timing by default. With more verbosity pgloader will add the extra
columns, and in computer oriented formats (json, csv, copy) all the details
are always provided of course.
See #506.
The handler-case form installed would catch any non-fatal warning and would
also fail to display any error to the user. Both are wrong behavior that
this patch fixes, using *error-output* (that's stderr) explicitely for any
thing that may happen while loading the user provided code.
Fix#526.
When the intarray extension is installed our PostgreSQL catalog query fails
because we now have more than one operator solving smallint[] <@ smallint[].
It is easy to avoid that problem by casting to integer[], smallint being an
implementation detail here anyway.
Fix#532.
Force double-quoting of objects name in DROP INDEX commands by using the
format directive ~s. The names of the objects we are dropping usually come
from a PostgreSQL catalog, but still might contain force-quote conditions
like starting with a number, as shown in #530.
This fix certainly means we will have to review all the DDL formatting we do
in pgloader and apply a single method of quoting all along. The simpler one
is of course to force quote every object name in "", but it might not be the
smartest one (what if some sources are sending already quoted object names,
that needs a check), and it's certainly not the prettier way to go at it:
people usually like to avoid unnecessary quotes, calling them clutter.
Fix#530.
Given a test case and some reading of the FreeTDS source code, it appears
that the XML data type is sent on the wire as (unicode) text. This patch
makes pgloader aware of that and also revisit the choice of casting XML to
PostgreSQL XML data type (thanks to the test case where we see it just works
without surprise).
Fix#503.
Add a cast rule to support tinyint being an “identity” type in MS SQL, which
means using a sequence to derive its values from. We didn't address the
whole MS SQL integer type tower here, and suspect we will have to add more
in the future.
Fix#528 where I could have access to a test case and easily reproduce the
bug, thanks!
In advanced projections it could be that we call the transformation function
for some input fields twice. This is a bug that manifest in particular when
the output of the transformation can't be used/parsed again by the same
function as shown in the bug reported.
Fix#523.
We have to pay attention that column names in MS SQL don't follow the same
rules as in PostgreSQL and may e.g. begin with numbers. Apply identifier
case and rules to index column names too.
Many options are now available to pgloader users, including short cuts that
where not defined clearly enough. That could result in stupid things being
done at times.
In particular, when picking the "data only" option then indexes are not to
be dropped before loading the data, but pgloader would still try and create
them again at the end of the load, because the option that controls that
behavior default to true and is not impacted by the "data only" choice.
In this patch we review the logic and ensure it's applied in the same
fashion in the different phases of the database migration: preparation,
copying, rebuilding of indexes and completion of the database model.
See also 96b2af6b2a where we began fixing
oddities but didn't go far enough.
The new version of the sakila database uses geometry typed columns that
contain POINT data. Add support for that kind of data by copying what we did
already for POINT datatype.
In PostgreSQL it is possible at CREATE TABLE time to set some extra storage
parameters, the most useful of them in the context of pgloader being the
FILLFACTOR. For the setting to be useful, it needs to be positionned at
CREATE TABLE time, before we load the data.
The BEFORE LOAD clause of the pgloader command allows to run SQL scripts
that will be executed before the load, and even before the creation of the
target schema when pgloader does that, which is nice for other use case.
Here we implement a new `ALTER TABLE` rule that one can set in the pgloader
command in order to change storage parameters at CREATE TABLE time:
ALTER TABLE NAMES MATCHING ~/\./ SET (fillfactor='40')
Fix#516.
We cast MS SQL "int" type to "integer" in PostgreSQL, so add an entry in our
type name mapping where they are known equivalent to avoid WARNINGs about
the situation in DATA ONLY loads.
When reading table names from PostgreSQL, we might find some that need
systematic quoting (such as names that begin with a digit). In that case,
when later comparing the catalogs to match source database table names
against PostgreSQL catalog table names, we need to unquote the PostgreSQL
table name we are using.
In passing, force the *identifier-case* to :none when reading object names
from the PostgreSQL catalogs.
The parallelism in pgloader is now smart enough to begin fetching data from
the next table while the previous one is still not done being written down
to PostgreSQL, but when doing so I introduced a bug in the way indexes are
taken care of.
Specifically, in schema-only mode of operations, we would wait for indexes
we skipped creating. The skipping is the bug here, so make sure we create
indexes even when we don't want to copy any data over.
Defining rules on standard symbols like CL:NAMESTRING is a bad idea
since other systems may do the same, inadvertently overwriting each
other's rules.
Furthermore, future esrap versions will probably prevent defining
rules whose names are symbols in locked packages, making this change
mandatory.
Availability of a test case for MS SQL allows to make progress on this
limitation and add support to the smalldatetime data type. It is
converted server-side with the same CONVERT expression as the longer
datetime datatype.
Fixes#431.
In the case of targetting an already existing PostgreSQL database,
columns might have been reordered. Add the column name list to the COPY
command we send so that we figure the mapping out automatically.
Fixes#509.
It turns out that it's possible and not too complex, when using the
FreeTDS driver, to enforce the client encoding for MS SQL to be utf-8.
Document how to tweak ~/.freetds.conf to that end.
When the column is known to be non-nullable, refrain from adding a null
default value to it. This also fixes the case of casting from an
[int] IDENTITY(1,1) NOT NULL
That otherwise did get transformed into a
bigserial not null default NULL
Causing then the error
Database error 42601: multiple default values specified for column ... of table ...
When updating the catalog support we forgot to fix the references to the
index and fkey name slots that are now provided centrally for the
catalog of all database source types.
Again, we don't have unit test cases for MS SQL, so that's a blind
fix (but at least that compiles).
See #343.
Avoid double quoting the schema names when used in PostgreSQL catalog
queries, where the identifiers are used as literal values and need to be
single-quoted.
Fix#476, again.
See #476 where it would have been helpful to see the PostgreSQL catalog
queries with `--log-min-messages sql` in the bug report. Also more
generally useful.
The code comment displayed in the release notes for 3.3.1 is reported to
be better at explaining the concurrency control than what we had in the
main documentation, so add it there.
Fix#496.
Now that we have a proper flush system for reporting the summary at the
proper time (see 7c5396f097), refrain from
also taking care of the reporting when stopping the monitor.
Adapt the regression driver code to flush the summary after loading the
expected data, which also provides better output.
When the summary output is sent to a file, that would also create a
backup file and replace our summary with an empty new file at monitor
stop...
Fixes#499.
This sits between NOTICE and INFO, allowing to have a complete log of
the SQL queries sent to the server while avoiding the very verbose
trafic of the DEBUG log level.
See #498.
This pgloader command allows to migrate tables while changing the schema
they are found into in between their MySQL source database and their
PostgreSQL target database.
This changes the default behavior of pgloader with MySQL from always
targetting the 'public' schema to targetting by default a schema named
the same as the MySQL database. You can revert to the old behavior by
adding a rule:
ALTER SCHEMA 'dbname' RENAME TO 'public
We might want to add a patch to re-install the default behavior later.
Also see #489 where it used not to be possible to rename the schema at
migration time, causing strange errors (you need to spot NIL as the
schema name in the "failed to find target table" messages.
A PostgreSQL index is always created in the same schema as the table it
is defined against, and the CREATE INDEX command doesn't accept schema
qualified index names.
Make it so that fatal errors are printed only once, and when possible
included in the usual log format as handled by our monitoring thread.
Also, improve error and summary reporting when we load from several
sources on the same command line.
All this work has been triggered because of an edge case where the OS
return value of the pgloader command was 0 (zero, success) although the
given file on the command line does not exists.
Fixes#486.
When the option "drop indexes" is in use in loading data from a file, we
collect the indexes from the PostgreSQL catalogs and then issue DROP
commands against them before the load, then CREATE commands when it's
done.
The CREATE is done in parallel, and we create an lparallel kernel for
that. The kernel must have a worker-count of at least 1, and we where
not considering the case of 0 indexes on the target table.
Fix#484.