In passing, fix the condition message to read the unknown IXF data types
as decimal numbers (rather than hexadecimal) as they are documented that
way in the IBM reference documentation.
To be able to use "t" (or "nil") as a column name, pgloader needs to be
able to generate lisp code where those symbols are available. It's
simple enough in that a Common Lisp package that doesn't :use :cl
fullfills the condition, so intern user symbols in a specially crafted
package that doesn't :use :cl.
Now, we still need to be able to run transformation code that is using
the :cl package symbols and the pgloader.transforms functions too. In
this commit we introduce a heuristic to pick symbols either as functions
from pgloader.transforms or anything else in pgloader.user-symbols.
And so that user code may use NIL too, we provide an override mechanism
to the intern-symbol heuristic and use it only when parsing user code,
not when producing Common Lisp code from the parsed load command.
When building from source you should really build from current's HEAD in
git master branch...
In passing, comment out the --self-update paragraph as it's know to be
broken unless you still have all the source dependencies at the right
place for ASDF to find them... making the feature developer only.
The date format wouldn't allow using colon (:) in the noise parts of it,
and would also insist that milliseconds should be on 4 digits and micro
seconds on 6 digits. Allow for "ragged" input and take however many
digits we actually find in the input.
When the list of columns of the PostgreSQL target table isn't given in
the load command, pgloader will happily query the system catalogs to get
that information. The list-columns query didn't get the memo about the
qualified table name format and the with-schema macro... fix#288.
The TimeZone parameter should be set both for input and for output in
order to match our expected result file. Let's try to set PGTZ in the
shell environment...
The cvs-parse-date test is failing on Travis because the server up there
in the Cloud isn't using the same timezone as my local machine. Let's
just force the timezone in the SET clause...
A useful use case for date parsing at tine input level is to parse
time (hour, minutes, seconds) rather than a full date (timestamp).
Improve the code so that it's possible to use the date format facility
even when the data field lacks the year/month/day information.
Fix#288.
This is a blind fix that I hope is the one needed, wherein -1 get sent
to unsigned-to-sign which fails to handle it as expected. The protocol
and driver are then sending unsigned ints and that's what we now handle
in the CFFI layer.
This allows to push the attempt to ease testing of it, and allows me
also to cut a release without modified files handling around.
The download section now explains why binaries are not to be found here
anymore. Also we add Redpill Linpro as a sponsor, and we add a pgloader
Moral License page in the spirit of the Varnish License. Let's see what
happens with that.
The cleanup introduced by eabfbb9cc8 did
break the case when we lack a target table name for the IXF load. Just
default to using the source name in that case.
It turns out that IXF format might embed particulars of the systems the
data comes from (Informix or DB2) and we need to process some strings so
that they are compatible with PostgreSQL ("CURRENT TIMESTAMP" here).
See #272 that should be fixed here, but being in the blind, don't close
the issue just yet.
Our IXF type matching appears to be incomplete, and pgloader would then
happily try to create a table column of type NIL in PostgreSQL: prevent
that from happening by raising an error condition early. Fix#289.
When parsing table names in the target URI, we are careful of splitting
the table and schema name and store them into a cons in that case. Not
all sources methods got the memo, clean that up.
See #182 and #186, a pull request I am now going to be able to accept.
Also see #287 that should be helped by being able to apply #186.
As seen in #287 the previous decision to force quoting to :none is
wrong, because index names in MS SQL source database might contain
spaces, and then need to be quoted.
Let's see what happens if we do it the usual way for MS SQL too, and
allow users to control the quoting behaviour of pgloader here.
pgloader has to parse external SQL files because of the driver we use,
Postmodern, only know how to deal with sending one query at a time. So
SQL parsing we do, and split the queries, and send them one after the
other to the server.
PostgreSQL allows String Constants with C-style Escapes to be used in
some situations, and the SQL parsing done in pgloader failed to support
that.
http://www.postgresql.org/docs/9.4/static/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-ESCAPE
This fixes#284.
In some cases pgloader total time computing is quite off: in the archive
case because it fails to take into account per-file before and after
sections, and in the general case when there's parallel work done.
This patch helps by storing the start time explicitely and using it at
the moment when the summary is displayed: no guessing any more. This is
only used in the archive case for now because I want some feedback.
On my machine with the usual test cases I have, the difference with and
without this patch is line-noise, something more serious has to be done:
let's push testing to the user by committing this early version of the
work.
Apparently I just forgot to apply any smartness whatsoever to SQLite
identifiers and just copied them as they are to PostgreSQL. Change that
by calling apply-identifier-case.
When the notion of a connection class with a generic set of method was
invented, the very flexible specification formats available for the file
based sources where not integrated into the new connection system.
This patch provides a new connection class md-connection with a specific
sub-protocol (after opening a connection, the caller is supposed to loop
around open-next-stream) so that it's possible to both properly fit into
the connection concept and to better share the code in between our three
implementation (csv, copy, fixed).
The dry run option will currently only check database connections, but
as that happens after having correctly parsed the load file, it allows
to also check that the command file is correct for the parser.
Note that the list load-data API isn't subject to the dry-run method.
In passing, we add some more API entry points to the connection objects
and we should actually clean the code base to use the new QUERY generic
all over the place. It's for another patch tho.
The COPY TEXT format accepts non printable characters with an escaped
sequence wherin pgloader can pass in the octal number for the character
in its encoding. When doing that with small numbers like \6 and the
non-printable character is then followed by other numbers, then it
becomes e.g. \646 which might not be part of the target encoding...
To fix, always left pad the character octal number with zeroes, so that
we now send in \00646 which COPY knows how to read: the char at \006
then 4 then 6.
Also copy the test case over to pgloader and run it in the test suite.
The user experience is greatly enhanced by this little change, where you
know from the logs that pgloader could actually connect rather than
thinking it might be still trying...
As reported by clisp maintainer (thanks jackdaniel!) when trying to load
pgloader, we had redoundant labels function names in places. Get rid of
those by pushing the new columns found directly at the end of the list,
avoiding the bulky code to then reverse the complex anonymous data
structure.
The Real Fix™ would be to define proper structures where to hold all
those database catalogs representation, but that's an invasive patch and
now isn't a good time to write it.
At least pgloader should load and run with clisp now.
The index and constraint names given by PostgreSQL catalogs should not
be second guessed, we need to just quote them. The identifier down
casing is interesting when we get identifiers from other system for a
migration, but are wrong when dropping existing indexes in PostgreSQL.
Also, the :null special value from Postmodern was routing the code
badly, just transform it manually to nil when fetching the index list,
manually.
The previous coding decided to add 2000 to the year as an integer if it
was below 2000, which parses 1999 as 3999. Oops. Trigger the correction
only when the date is given on 2 digits only, parsing 04 as 2004.
Dates given on 3 digits are kept as-is.
Playing with the *century* special parameter allows to cancel this
behavior, that maybe should be made entirely optional. It's just too
common to find current years on 2 digits only, sadly.
It's now possible to use several files in a BEFORE LOAD EXECUTE section,
and to mix DO and EXECUTE parts, bringing lots of flexibility in the
commands. Also it actually simplifies the parser.
MySQL names its primary keys "PRIMARY" and we need to always uniquify
this name even when the used asked pgloader to preserve index names.
Also, the create-indexes-again function now needs to ask for index names
to be preserved specifically.
The new option 'drop indexes' reuses the existing code to build all the
indexes in parallel but failed to properly account for that fact in the
summary report with timings.
While fixing this, also fix the SQL used to re-establish the indexes and
associated constraints to allow for parallel execution, the ALTER TABLE
statements would block in ACCESS EXCLUSIVE MODE otherwise and make our
efforts vain.
PostgreSQL rightfully forbifs DROP INDEX when the index is used to
enforce a constraint, the proper SQL to issue is then ALTER TABLE DROP
CONSTRAINT. Also, in such a case pg_dump issues a single ALTER TABLE ADD
CONSTRAINT statement to restore the situation.
Have pgloader do the same with indexes that are used to back a constraint.
In MySQL it's possible to have a bigint of 20 digits when using the
"unsigned" variant of the data type, whereas in PostgreSQL there's no
such variant and bigints are "limited" to the range -9223372036854775808
to +9223372036854775807 (19 digits numbers).
Fix the default casting rule to switch to PostgreSQL numeric in such cases.
The option doesn't seem relevant to the db3 source type which contains a
table definition: pgloader will create the table from scratch and no
indexes are going to be found.