Use case: Django dissuades setting NULL “on string-based fields […]
because empty string values will always be stored as empty strings, not
as NULL. If a string-based field has null=True, that means it has two
possible values for »no data«: NULL, and the empty string. In most
cases, it’s redundant to have two possible values for »no data«; the
Django convention is to use the empty string, not NULL.”.
pgloader already supports custom transformations which can be used to
replace NULL values in string-based columns with empty strings. Setting
NOT NULL constraint on those columns could possibly be achieved by
running a database query to extract their names and then generating
relevant ALTER TABLE statements, but a cast option in pgloader is a more
convenient way.
This format of source file specifications is available for CSV, COPY and
FIXED formats but was only documented for the CSV one. The paragraph is
copy/pasted around in the hope to produce per-format man pages and web
documentation in a fully automated way sometime.
Fix#397.
In MySQL the information_schema.statistics table lists all indexes and
has a row per index column, which means that the index level properties
are duplicated on every row of the view.
Our query against that catalog was lazily assuming the classic and
faulty MySQL behavior where GROUP BY would allow non aggregated columns
to be reported even when the result isn't deterministic, this patch
fixes that by using a trick: the NON_UNIQUE column is 0 for a unique
index and 1 otherwise, so we sum the numbers and process 0 equality.
Fix#345 again.
It's possible that in MySQL a foreign key constraint definition is
pointing to a non-existing table. In such a case, issue an error message
and refrain from trying to then reinstall the faulty foreign key
definition.
The lack of error handling at this point led to a frozen instance of
pgloader apparently, I think because it could not display the
interactive debugger at the point where the error occurs.
See #328, also #337 that might be fixed here.
By default, pgloader will start as many parallel CREATE INDEX commands
as the maximum number of indexes you have on any single table that takes
part in the load.
As this number might be so great as to exhaust the target PostgreSQL
server (e.g. maintenance_work_mem), we add an option to limit that to
something reasonnable when the source schema isn't.
Fix#386 in which 150 indexes are found on a single source table.
It's always been possible to set application_name to anything really,
making it easier to follow the PostgreSQL queries made by pgloader.
Force that setting to 'pgloader' by default.
Fix#387.
For some reasons, with the default DYNSIZE and even when using the 64
bits Clozure-CL variant, I get a series of error messages like the one
below, so that I had to restrain to using 256 MB only:
Fatal error in "buildapp" : Fault during read of memory address #x7F8C37522668
Fatal error in "buildapp" : Fault during
Fatal error in "buildapp" : Stack overflow on temp stack.
Fatal error in "buildapp" : Fault during read of memory address #x7F8C37522668
It's worth trying something else as the limitation might be tied to my
local virtual build environment.
See #327 where the SBCL Garbage Collector is introducing problems which
might not appear at all when compiling with Clozure-CL instead.
FusionBox bought a Moral License and helped test case pgloader against a
test instance of SQL Server with which it was easy to reproduce bugs.
Those got fixed thanks to their support!
Windows default end of line is #\Return then #\Newline and the parser
gets to see both of them, so it needs to be prepared. See #159 which is
all about windows support.
The max function requires at least 1 argument to be given, and in the
case where we have no table to load it then fails badly, as show here:
CL-USER> (handler-case
(reduce #'max nil)
(condition (c)
(format nil "~a" c)))
"invalid number of arguments: 0"
Of course Common Lisp comes with a very easy way around that problem:
CL-USER> (reduce #'max nil :initial-value 0)
0
Fix#381.
It's been broken by a recent commit where we did force the internal
table representation to always be an instance of the table structure,
which wasn't yet true for regression testing.
In passing, re-indent a large portion of the function, which accounts
for most of the diff.
The function needs to return a string to be added to the COPY stream, we
still need to make sure whatever given here looks like an integer. Given
the very dynamic nature of data types in SQLite, the integer-to-string
function was already a default now, but failed to be published before in
its fixed version, somehow.
It turns out recent changes broke tne SQLite index support (from adding
support for MS SQL partial/filtered indexes), so fix it by using the
pgsql-index structure rather than the specific sqlite-idx one.
In passing, improve detection of PRIMARY KEY indexes, which was still
lacking. This work showed that the introspection done by pgloader was
wrong, it's way more crazy that we though, so adjust the code to loop
over PRAGMA calls for each object we inspect.
While adding PRAGMA calls, add support for foreign keys too, we have the
code infrastructure that makes it easy now.
Make it work on the second run, when the triggers and functions have
already been deplyed, by doing the DROP function and trigger before we
CREATE the table, then CREATE them again: we need to split the list
again.
The newid() function seems to be equivalent to the newsequentialid() one
if I'm to believe issue #204, so let's just add that assumption in the
code.
Fix#204.
The WIP branch about better error messages made its way through the main
code, so switch back to the mainline as available directly in Quicklisp.
See https://github.com/nikodemus/esrap/issues/26.
The first error of a batch was lost somewhere in the recent changes. My
current best guess is that the rewrite of the copy-batch function made
the handler-bind form setup by the handling-pgsql-notices macro
ineffective, but I can't see why that is.
See #85.
The common lisp default printer is nice enough to know how to print
symbols as strings, but that won't cut it when the symbol :is-not-null
needs to be printed out "is not null", without the dashes.
See #365.
We should not block any processing just because we can't parse an index.
The best we can do just tonight is to try creating the index without the
filter, ideally we would have to skip building the index entirely.
That's for a later effort though, it's running late here.
See #365.
The only case with a test is the "([deleted]=(0))" case, which showed a
tad too much in the current implementation of the MS SQL index filters
parsing. Try to prepare better for next filters.
Next step: adding some test cases.
See #365.
MS SQL has a notion of a "filtered index" that matches the notion of a
PostgreSQL partial index: the index only applies to the rows matching
the index WHERE clause, or filter.
The WHERE clause in both case are limited to simple expressions over a
base table's row at a time, so we implement a limited WHERE clause
parser for MS SQL filters and a transformation routine to rewrite the
clause in PostgreSQL slang.
In passing, we transform the filter constants using the same
transformation functions as in the CAST rules, so that e.g. a MS SQL
bit(1) value that got transformed into a PostgreSQL boolean is properly
translated, as in the following example:
MS SQL: "([deleted]=(0))" (that's from the catalogs)
PostgreSQL: deleted = 'f'
Of course the parser is still very badly tested, let's see what happens
in the wild now.
(Should) Fix#365.
The implementation uses the dynamic binding *on-error-stop* so it's also
available when pgloader is used as Common Lisp librairy.
The (not-all-that-) recent changes made to the error handling make that
implementation straightforward enough, so let's finally do it!
Fix#85.
The PostgreSQL search_path allows multiple schemas and might even need
it to be able to reference types and other tables. Allow setting more
than one schema by using the fact that PostgreSQL schema names don't
need to be individually quoted, and passing down the exact content of
the SET search_path value down to PostgreSQL.
Fix#359.
The previous code required non-zero data length for all MS SQL returned
values, where it makes no sense for text like values (an empty string is
ok). Also, the code was trimming spaces from both ends on syb-char data,
and in testing that return type is used for varchar too.
Fix#366. Fix#368.
Once more we can't use an aggregate over a text column in MS SQL to
build the index definition from its catalog structure, so we have to do
that in the lisp part of the code.
Multi-column indexes are now supported, but filtered indexes still are a
problem: the WHERE clause in MS SQL is not compatible with the
PostgreSQL syntax (because of [names] and type casting.
For example we cast MS SQL bit to PostgreSQL boolean, so
WHERE ([deleted]=(0))
should be translated to
WHERE not deleted
And the code to do that is not included yet.
The following documentation page offers more examples of WHERE
expression we might want to support:
https://technet.microsoft.com/en-us/library/cc280372.aspx
WHERE EndDate IS NOT NULL
AND ComponentID = 5
AND StartDate > '01/01/2008'
EndDate IN ('20000825', '20000908', '20000918')
It might be worth automating the translation to PostgreSQL syntax and
operators, but it's not done in this patch.
See #365, where the created index will now be as follows, which is a
problem because of being UNIQUE: some existing data won't reload fine.
CREATE UNIQUE INDEX idx_<oid>_foo_name_unique ON dbo.foo (name, type, deleted);
Have a pretty-print option where we try to be nice for the reader, and
don't use it in the CAST debug messages. Also allow working with the
real maximum length of column names rather than hardcoding 22 cols...
Having been given a test instance of a MS SQL database allows to quickly
fix a series of assorted bugs related to schema handling of MS SQL
databases. As it's the only source with a proper notion of schema that
pgloader supports currently, it's not a surprise we had them.
Fix#343. Fix#349. Fix#354.
It turns out sloppy SQL code made its way to pgloader wherein the GROUP
BY clause of the foreign key listing wasn't reference the whole set of
non aggregated output columns.
Thanks to thiagokronig for the new query, which fixes#345.
The new ALTER TABLE facility allows to act on tables found in the MySQL
database before the migration happens. In this patch the only provided
actions are RENAME TO and SET SCHEMA, which fixes#224.
In order to be able to provide the same option for MS SQL users, we will
have to make it work at the SCHEMA level (ALTER SCHEMA ... RENAME TO
...) and modify the internal schema-struct so that the schema slot of
our table instances are a schema instance rather than its name.
Lacking MS SQL test database and instance, the facility is not yet
provided for that source type.