This allows to bypass SSL when you don't need it, like over localhost for
instance. Takes the same syntax as the PostgreSQL sslmode connection string
parameter.
In this commit we fail the guess faster, allowing to test for a much larger
sample. The sample is still hard-coded, but this time to 1000 lines.
Also add a test case, see #618.
Understanding the timings requires not only the number of rows copied into
each table but also how many bytes that represent. We add that information
now in tht output.
The number of bytes presented is computed from the unicode representation we
prepare in pgloader for each row before sending it down to PostgreSQL.
Use a generic function protocol in order to implement the human readable,
verbose, csv, copy and json reporting output formats. This is much cleaner
and extensible than the previous way.
Use that new power to implement a real JSON output from the internal state
object.
It seems that SBCL still needs some help in deciding when to GC with very
large values. In a test case with a “data” column averaging 375kB (up to
about 3 MB per datum), it allows much larger batch size and prefetch rows
settings without entering lldb.
The pgstate infrastructure already had lots of details about what's going
on, add to it the information about how many bytes are sent in every batch,
and use this information in the monitor when something long is happening to
display how many rows we sent from the beginning for this (supposedly) huge
table, along with bytes and speed (bytes per seconds).
Add a message every 20 batches so that the user knows it's still going on.
Also, in passing, fix some messages: present is not precise enough to decide
if the log refers to an event that is being done or starting next.
The verbosity is not that easy to adjust. Remove useless messages and add a
new one telling when the COPY of a table is done. As we might have to wait
for some time for indexes being built. keep the CREATE INDEX lines. Also
keep the ALTER TABLE both for primary keys and foreign keys, again because
the user might have to wait for quite some time.
This feature has been asked several times, and I can't see any way to fix
the GETENV parsing mess that we have. In this patch the GETENV support is
retired and replaced with a templating system, using the Mustache syntax.
To get back the GETENV feature, our implementation of the Mustache template
system adds support for fetching the template variable values from the OS
environment.
Fixes#555, Fixes#609.
See #500, #477, #278.
It is sometimes needed to tweak MS SQL server parameters, such as the
textsize parameters which allows fetching the whole set of bytes of a text
of binary column (not kidding).
Now it's possible to add such a line in the load file:
set mssql parameters textsize to '104857600'
Fixes#603.
This code path is exercised from the command line only, which means I don't
get to run it that often. And it's a pain to debug. So make it easier to run
`process-source-and-target` from the REPL.
Startup log messages could be lost because the monitor would be started but
not ready to process messages. Fix that by “warming up” the monitoring
thread, having it execute a small computation and more importantly wait for
the result to be received back, blocking.
See #599 where parsing errors from a wrong URL were missed in the command
line output, quite disturbingly.
The previous patch fixed CREATE TYPE so that ENUM types are created in the
same schema than the table using them, but failed to update the DROP TYPE
statements to also target this schema...
Previously to this patch, pgloader wouldn't care about which schema it
creates extra types in. Extra types are mainly ENUM and SET support from
MySQL. Now, pgloader creates those extra PostgreSQL ENUM types in the same
schema as the table using them, which is a more sound default.
The MySQL enum are casted to PostgreSQL enum types just fine, but sometimes
that's not what the user wants. In case when we have a CAST rule for an ENUM
column, recognize the fact and respect user choice.
Fixes#608.
The spelling in SQLite for the default value is "current_date", instruct
pgloader about that. This commit also adds a test case in our sqlite.db
unit tests database.
Fixes#607.
Now when building a bundle file for source distribution of pgloader, always
test it by building a binary image from the bundle tarball in a test
directory. Also make it easy to target "latest" Quicklisp distribution with
the following spelling:
make BUNDLEDIST=latest bundle
When distributing a pgloader bundle we're using the ql-dist facility. In
recent commit we hand-picked the last known working distribution of
quicklisp for pgloader. Make it easy to target "latest" known distribution
or hard-code one from the Makefile or the bundle/ql.lisp file.
We use a CSV parser for the MySQL enum values, but the quote escaping wasn't
properly setup: MySQL quotes ENUM values with a single-quote (') and uses
two of them ('') for escaping single-quotes when found in the ENUM value
itself.
Fixes#597.
When the table is empty we get nil for min and max values of the id column.
In that case we don't compute a set of ranges and “cancel” concurrency
support for the empty table.
Fixes#596.
The with option “include drop” used to also apply to schemas, which is not
that useful and problematic when trying to DROP SCHEMA public, because you
might not connect as the owner of that schema.
Even if we don't target the public schema by default, users can choose to do
so thanks to our ALTER SCHEMA ... RENAME TO ... command.
Fixes#594.
As we know how many columns we expect from the input file, it's possible to
read a sample (10 lines as of this patch) and try many different CSV reader
parameters combinations until we find one that works: it returns the right
number of fields.
It is still possible of course to specify parameters on the command line or
in a load file if necessary, but it makes the simple case even simpler. As
simple as:
pgloader file.csv pgsql:///pgloader?tablename=target
From a load file, as soon as pgloader can retrieve the schema of the target
table the source field list defaults to the target column list. Let's apply
the same rules to the command line.
It was only offered for SQLite without good reason really, and tests show
that it works as well with MySQL of course. Offer the option there too.
See 3eab88b144 for details.
In b301aa9394 the "create schema" default
changed to true, which is a good idea. As a consequence pgloader should
consider this operation only when "create tables" is set: we don't want to
start with creating target schemas in a target database that is said to be
ready to host the data.
We already have apply-identifier-case and *identifier-case* to decide how
and when to quote our SQL object names, so don't force extra quotes in
format string: refrain from using ~s.
We sure can trust PostgreSQL to use names it knows how to handle. Still, it
will be happy to store in its catalogs names containing upper case, and in
that case we must quote them.
In md-methods copy-database function, don't pretend we are able to handle
any condition when preparing the PostgreSQL schema, database-error is all we
are dealing with there really.
This change was long overdue. Ideally we would use something like the YeSQL
library for Clojure, but it seems like the cl-yesql equivalent is not ready
yet, and it depends on an experimental build system...
So this patch introduces an URL abstraction built on-top of a hash table.
You can then reference src/pgsql/sql/list-all-columns.sql as
(sql "pgsql/list-all-columns.sql")
in the source code directly.
So for now the templating system is CL's format language. It is still an
improvement from embedded string. Again, one step at a time.
It might be that a column-type-name is actually an sqltype instance, and
then #'string= won't be happy. Prevent that now with discarding any smarts
when the type name does not satisfies stringp.
When pgloader fetches the index list from a source database, it doesn't
fetch information about access methods for the indexes: I don't even know if
the overlap in between index access methods from one RDMBS to another covers
more than just btree...
It could happen that MySQL indexes a "geometry" column tho. This datatype is
converted automatically to "point" by pgloader, which is good. But the index
creation would fail with the following error message:
Database error 42704: data type point has no default operator class for access method "btree"
In this patch when setting up the target schema we issue a PostgreSQL
catalog query to dynamically list those datatypes without btree support and
fetch their opclasses, with an hard-coded preference to GiST, then GIN, so
as to be able to automatically use the proper access method when btree isn't
available. And now pgloader transparently issues the proper statement:
CREATE INDEX idx_168468_idx_location ON pagila.address USING gist(location);
Currently this exploration is limited to indexes with a single column. To
implement the general case we would need a more complex lookup: we would
have to find the intersection of all the supported access methods for all
involved columns.
Of course we might need to do that someday. One step at a time is plenty
good enough tho.
In the complete PostgreSQL schema step, an error would be logged as you
expect but poorly handled: it would have the whole transaction rolled back,
meaning that a single Primary Key definition failure would cancel all the
others, plus the foreign keys, and also the triggers and comments.
It happens that other systems allow a primary column to contain NULL values,
which is forbidden in the standard and enforced by PostgreSQL, so that's not
a theoritical concern here.
In cases when pgloader needs to build a new identifer from existing
ones (mainly for renaming indexes, because they are unique per-table in the
source database and unique per-schema in PostgreSQL), and we compose the new
name from already quoted strings, pgloader was doing the wrong thing.
Fix that by having a build-identifier function that may unquote parts then
re-quote properly (if needed) the new identifier.