As shown in #476, it is sometimes needed to be able to quote the
identifier names even when loading from a file, that is when specifying
the target table name in the database uri.
To that ends, allow the option "identifier case" to be used in the file
based cases too. Fixes#476.
The example was still using a very old syntax for per-field options, and
even the current debian package doesn't support this syntax anymore...
Update the docs to use current syntax.
Fix#475.
I'm not sure if anyone is using those scripts anymore, but I suppose
keeping them known broken isn't helping anyone either. This is a blind
fix in reaction to latest comment in bug #131.
Introduced recently when refactoring the match rules, forgot to update
all call sites, and the bug went unnoticed for a while, oops. Not sure
the fix is all we need to get back a working feature (alter schema
rename to), but it allows to compile and that's all I have the time to
handle today.
See #466.
We added some confution about who's responsible to quote the SQL obejct
names in between src/utils/quoting.lisp and src/pgsql/pgsql-ddl.lisp and
as a result some migrations from MySQL with identifier case set to quote
where broken, as in #439.
To fix, remove any use of the format directive ~s in the PostgreSQL ddl
output methods: we consider that the quoting of ~s is to be decided in
apply-identifier-case. We then use ~a instead of ~s.
Fix#439.
In the MySQL source we have explicit support for both string equality
and regexps for the INCLUDING and EXCLUDING clauses. This got broken
when moved to be shared with the ALTER TABLE implementation, because
we were no longer using the type system in the same way in all places.
To fix, create new abstractions for strings and regexps and use those
new structs in the proper way (thanks to defstruct and CLOS).
Fixes#441.
In cases where we have a WITH include drop option, we are generating
lots of SQL DROP statements. We may be running an empty target database
or in other situations where the target object of the DROP command might
not exists. Add support for that case.
In the FILENAME MATCHING case it might be good to have the information,
which can also explain some of the timing spent. The example in
test/bossa.load currently loads data from 296 files total...
The internal catalog representation are deeply recursive in order to
make it easy to traverse the catalog both downwards (catalog to schema
to tables) and upward (table to its schema to its catalog).
In consequence we need to set *print-circles* to non-nil when we're
going to log the catalogs, so turn it to non-nil before generating the
log messages.
While at it, add logging of such catalogs in the :data log verbosity
mode. The catalog output is very verbose, but it's easy to copy/paste it
from a bug report into being a live object we can inspect in the REPL,
thanks to Common Lisp notion of a reader and readable printer!
Calling a -with-timing from within a with-stats-collection macro is
redundant and will have the numbers counted twice. Which in this case
didn't happen because the stats label was manually copied, but borked
with a typo in one copy.
When loading data into an existing PostgreSQL catalog, we DROP the
indexes for better performance of the data loading. Some of the indexes
are UNIQUE or even PRIMARY KEYS, and some FOREIGN KEYS might depend on
them in the PostgreSQL dependency tracking of the catalog.
We used to use the CASCADE option when dropping the indexes, which hides
a bug: if we exclude from the load tables with foreign keys pointing to
tables we target, then we would DROP those foreign keys because of the
CASCADE option, but fail to install them again at the end of the load.
To prevent that from happening, pgloader now query the PostgreSQL
pg_depend system catalog to list the “missing” foreign keys and add them
to our internal catalog representation, from which we know to DROP then
CREATE the SQL object at the proper times.
See #400 as this was an oversight in fixing this issue.
When we do have a condef (constraint definition in the PostgreSQL
catalog slang), use it rather than trying to invent it again from the
bits and pieces. See #400, which it actually fixes now...
We used to force overly strict rules for a quoted field name in a CSV
load file, now accept any character but a quote to be part of the field
name.
Fixes#416.
Also known as the ORM case, it happens that other tools are used to
create the target schema. In that case pgloader job is to fill in the
exiting target tables with the data from the source tables.
We still focus on load speed and pgloader will now DROP the
constraints (Primary Key, Unique, Foreign Keys) and indexes before
running the COPY statements, and re-install the schema it found in the
target database once the data load is done.
This behavior is activated when using the “create no tables” option as
in the following test-case setup:
with create no tables, include drop, truncate
Fixes#400, for which I got a test-case to play with!
Replace the ad-hoc code that was used before in the load from file code
path to use our full internal catalog representation, and adjust APIs to
that end.
The goal is to use catalogs everywhere in the PostgreSQL target API and
allowing to process reason explicitely about source and target catalogs,
see #400 for the main use case.
First, add index and foreign keys to the list of objects supported by
the shared catalog facility, where is was only found in the pgsql schema
specific package for historical raisons.
Then also add to our catalog internal structures the notion of a trigger
and a stored procedure, allowing for cleaner advanced default values
support in the MySQL cast functions.
Once we now have a proper and complete catalog, review the pgsql module
DDL output function in terms of the catalog and rewrite the schema
creation support so that it takes direct benefit of our internal
catalogs representation.
In passing, clean-up the code organisation of the pgsql target support
module to be easier to work with.
Next step consists of getting rid of src/pgsql/queries.lisp: this
facility should be replaced by the usage of a target catalog that we
fetch the usual way, thanks to the new src/pgsql/pgsql-schema.lisp file
and list-all-* functions.
That will in turn allow for an explicit step of merging the pre-existing
PostgreSQL catalog when it's been created by other tools than pgloader,
that is when migrating with the help of an ORM. See #400 for details.
The MSSQL index filters parser needs to parse digits and keep them as
text, but was piggybacking on the main parsers and the fixed file format
positions parser by re-using the rule name "number".
My understanding was that by calling `defrule' in different packages one
would create a separate set of rules. It might have been wrong from the
beginning or just changed in newer versions of esrap. Will have to
investigate more.
This fixes#434 while not applying suggested code: the comment about
where to fix the bug is spot on.
Also, it should be noted that the regression tests framework seems to be
failing us and returns success in that error case, despite code
installed to properly handle the situation. This will also need to be
investigated.
The other user-provided names (schema and table) were already quoted
using the quote_ident() PostgreSQL functio, but the column name (attname
in the catalogs) were not.
Blind attempt to fix#425.
Use case: Django dissuades setting NULL “on string-based fields […]
because empty string values will always be stored as empty strings, not
as NULL. If a string-based field has null=True, that means it has two
possible values for »no data«: NULL, and the empty string. In most
cases, it’s redundant to have two possible values for »no data«; the
Django convention is to use the empty string, not NULL.”.
pgloader already supports custom transformations which can be used to
replace NULL values in string-based columns with empty strings. Setting
NOT NULL constraint on those columns could possibly be achieved by
running a database query to extract their names and then generating
relevant ALTER TABLE statements, but a cast option in pgloader is a more
convenient way.
This format of source file specifications is available for CSV, COPY and
FIXED formats but was only documented for the CSV one. The paragraph is
copy/pasted around in the hope to produce per-format man pages and web
documentation in a fully automated way sometime.
Fix#397.
In MySQL the information_schema.statistics table lists all indexes and
has a row per index column, which means that the index level properties
are duplicated on every row of the view.
Our query against that catalog was lazily assuming the classic and
faulty MySQL behavior where GROUP BY would allow non aggregated columns
to be reported even when the result isn't deterministic, this patch
fixes that by using a trick: the NON_UNIQUE column is 0 for a unique
index and 1 otherwise, so we sum the numbers and process 0 equality.
Fix#345 again.
It's possible that in MySQL a foreign key constraint definition is
pointing to a non-existing table. In such a case, issue an error message
and refrain from trying to then reinstall the faulty foreign key
definition.
The lack of error handling at this point led to a frozen instance of
pgloader apparently, I think because it could not display the
interactive debugger at the point where the error occurs.
See #328, also #337 that might be fixed here.
By default, pgloader will start as many parallel CREATE INDEX commands
as the maximum number of indexes you have on any single table that takes
part in the load.
As this number might be so great as to exhaust the target PostgreSQL
server (e.g. maintenance_work_mem), we add an option to limit that to
something reasonnable when the source schema isn't.
Fix#386 in which 150 indexes are found on a single source table.
It's always been possible to set application_name to anything really,
making it easier to follow the PostgreSQL queries made by pgloader.
Force that setting to 'pgloader' by default.
Fix#387.
For some reasons, with the default DYNSIZE and even when using the 64
bits Clozure-CL variant, I get a series of error messages like the one
below, so that I had to restrain to using 256 MB only:
Fatal error in "buildapp" : Fault during read of memory address #x7F8C37522668
Fatal error in "buildapp" : Fault during
Fatal error in "buildapp" : Stack overflow on temp stack.
Fatal error in "buildapp" : Fault during read of memory address #x7F8C37522668
It's worth trying something else as the limitation might be tied to my
local virtual build environment.
See #327 where the SBCL Garbage Collector is introducing problems which
might not appear at all when compiling with Clozure-CL instead.
FusionBox bought a Moral License and helped test case pgloader against a
test instance of SQL Server with which it was easy to reproduce bugs.
Those got fixed thanks to their support!
Windows default end of line is #\Return then #\Newline and the parser
gets to see both of them, so it needs to be prepared. See #159 which is
all about windows support.
The max function requires at least 1 argument to be given, and in the
case where we have no table to load it then fails badly, as show here:
CL-USER> (handler-case
(reduce #'max nil)
(condition (c)
(format nil "~a" c)))
"invalid number of arguments: 0"
Of course Common Lisp comes with a very easy way around that problem:
CL-USER> (reduce #'max nil :initial-value 0)
0
Fix#381.
It's been broken by a recent commit where we did force the internal
table representation to always be an instance of the table structure,
which wasn't yet true for regression testing.
In passing, re-indent a large portion of the function, which accounts
for most of the diff.
The function needs to return a string to be added to the COPY stream, we
still need to make sure whatever given here looks like an integer. Given
the very dynamic nature of data types in SQLite, the integer-to-string
function was already a default now, but failed to be published before in
its fixed version, somehow.