Commit Graph

14 Commits

Author SHA1 Message Date
adrian
3da4422fb5 few typos in comments/strings 2023-05-01 10:18:30 +02:00
Dimitri Fontaine
9661c5874d Fix previous patch.
It's easy to avoid having the warning about unused lexical variable with the
proper declaration, that I failed to install before because of a syntax
error when I tried. Let's fix it now that I realise what was wrong.
2018-06-23 00:50:35 +02:00
Dimitri Fontaine
8930734bea Ensure unquoted file names for logs and data.
The previous code could create files having as an example the following,
unhelpful name: \"errors\"/\"err\".\"errors\".log.

Fix #808.
2018-06-22 23:02:07 +02:00
Dimitri Fontaine
8004a9dd59 Improve report output with bytes information.
Understanding the timings requires not only the number of rows copied into
each table but also how many bytes that represent. We add that information
now in tht output.

The number of bytes presented is computed from the unicode representation we
prepare in pgloader for each row before sending it down to PostgreSQL.
2017-08-24 12:45:51 +02:00
Dimitri Fontaine
3b93ffa37a Rewrite the reporting support entirely.
Use a generic function protocol in order to implement the human readable,
verbose, csv, copy and json reporting output formats. This is much cleaner
and extensible than the previous way.

Use that new power to implement a real JSON output from the internal state
object.
2017-08-24 12:33:51 +02:00
Dimitri Fontaine
4f9eb8c06b Track bytes sent to PostgreSQL.
The pgstate infrastructure already had lots of details about what's going
on, add to it the information about how many bytes are sent in every batch,
and use this information in the monitor when something long is happening to
display how many rows we sent from the beginning for this (supposedly) huge
table, along with bytes and speed (bytes per seconds).
2017-08-23 11:55:49 +02:00
Dimitri Fontaine
8254d63453 Fix incorrect per-table total time metrics.
The concurrency nature of pgloader made it non obvious where to implement
the timers properly, and as a result the tracking of how long it took to
actually transfer the data was... just wrong.

Rather than trying to measure the time spent in any particular piece of the
code, we now emit "start" and "stop" stats messages to the monitor thread at
the right places (which are way easier to find, in the worker threads) and
have the monitor figure out how long it took really.

Fix #506.
2017-04-30 18:09:50 +02:00
Dimitri Fontaine
20ea1d78c4 Improve default summary readability.
Now that we have fixed the output of the per-table total timing, we can only
show that timing by default. With more verbosity pgloader will add the extra
columns, and in computer oriented formats (json, csv, copy) all the details
are always provided of course.

See #506.
2017-04-30 18:09:50 +02:00
Dimitri Fontaine
9e4938cea4 Implement PostgreSQL catalogs data structure.
In order to share more code in between the different source types,
finally have a go at the quite horrible mess of anonymous data
structures floating around.

Having a catalog and schema instances not only allows for code cleanup,
but will also allow to implement some bug fixes and wishlist items such
as mapping tables from a schema to another one.

Also, supporting database sources having a notion of "schema" (in
between "catalog" and "table") should get easier, including getting
on-par with MySQL in the MS SQL support (materialized views has been
asked for already).

See #320, #316, #224 for references and a notion of progress being made.

In passing, also clean up the copy-databases methods for database source
types, so that they all use a fetch-metadata generic function and a
prepare-pgsql-database and a complete-pgsql-database generic function.
Actually, a single method does the job here.

The responsibility of introspecting the source to populate the internal
catalog/schema representation is now held by the fetch-metadata generic
function, which in turn will call the specialized versions of
list-all-columns and friends implementations. Once the catalog has been
fetched, an explicit CAST call is then needed before we can continue.

Finally, the fields/columns/transforms slots in the copy objects are
still being used by the operative code, so the internal catalog
representation is only used up to starting the data copy step, where the
copy class instances are then all that's used.

This might be refactored again in a follow-up patch.
2015-12-30 21:53:01 +01:00
Dimitri Fontaine
187565b181 Add read/write separate stats.
Add metrics to devise where the time is spent in current pgloader code
so that it's possible to then optimize away the batch processing as we
do it today.

Given the following extract of the measures, it seems that doing the
data transformations in the reader thread isn't so bright an idea. More
to come.

          table name         total time       read     write
   -----------------     --------------  --------- ---------
             extract             2.014s
         before load             0.050s
               fetch             0.000s
   -----------------     --------------  --------- ---------
    geolite.location            16.090s    15.933s    5.732s
      geolite.blocks            28.896s    28.795s    5.312s
   -----------------     --------------  --------- ---------
          after load            37.772s
   -----------------     --------------  --------- ---------
   Total import time          1m25.082s    44.728s   11.044s
2015-10-11 21:35:19 +02:00
Dimitri Fontaine
96a33de084 Review the stats and reporting code organisation.
In order to later be able to have more worker threads sharing the
load (multiple readers and/or writers, maybe more specialized threads
too), have all the stats be managed centrally by a single thread. We
already have a "monitor" thread that get passed log messages so that the
output buffer is not subject to race conditions, extend its use to also
deal with statistics messages.

In the current code, we send a message each time we read a row. In some
future commits we should probably reduce the messaging here to something
like one message per batch in the common case.

Also, as a nice side effect of the code simplification and refactoring
this fixes #283 wherein the before/after sections of individual CSV
files within an ARCHIVE command where not counted in the reporting.
2015-10-05 01:46:29 +02:00
Dimitri Fontaine
0068a45e1c Fix parsing of qualified target table names, see #186.
We used to parse qualified table names as a simple string, which then
breaks attempts to be smart about how to quote idenfifiers. Some sources
are known to accept dots in quoted table names and we need to be able to
process that properly without tripping on qualified table names too
late.

Current code might not be the best approach as it's just using either a
cons or a string for table names internally, rather than defining a
proper data structure with a schema and a name slot.

Well, that's for a later cleanup patch, I happen to be lazy tonight.
2015-04-17 23:22:30 +02:00
Dimitri Fontaine
ed853a7bea Allow pgloader to work on windows. 2014-11-06 22:12:20 +01:00
Dimitri Fontaine
2369a142a7 Refactor source code organisation.
In passing, fix a bug in the previous commit where left-over code would
cancel the whole new parsing code for advanced source fields options.
2014-10-01 23:20:24 +02:00