pgloader

mirror of https://github.com/dimitri/pgloader.git synced 2025-08-13 09:46:59 +02:00

Author	SHA1	Message	Date
adrian	3da4422fb5	few typos in comments/strings	2023-05-01 10:18:30 +02:00
Dimitri Fontaine	9661c5874d	Fix previous patch. It's easy to avoid having the warning about unused lexical variable with the proper declaration, that I failed to install before because of a syntax error when I tried. Let's fix it now that I realise what was wrong.	2018-06-23 00:50:35 +02:00
Dimitri Fontaine	8930734bea	Ensure unquoted file names for logs and data. The previous code could create files having as an example the following, unhelpful name: \"errors\"/\"err\".\"errors\".log. Fix #808.	2018-06-22 23:02:07 +02:00
Dimitri Fontaine	8004a9dd59	Improve report output with bytes information. Understanding the timings requires not only the number of rows copied into each table but also how many bytes that represent. We add that information now in tht output. The number of bytes presented is computed from the unicode representation we prepare in pgloader for each row before sending it down to PostgreSQL.	2017-08-24 12:45:51 +02:00
Dimitri Fontaine	3b93ffa37a	Rewrite the reporting support entirely. Use a generic function protocol in order to implement the human readable, verbose, csv, copy and json reporting output formats. This is much cleaner and extensible than the previous way. Use that new power to implement a real JSON output from the internal state object.	2017-08-24 12:33:51 +02:00
Dimitri Fontaine	4f9eb8c06b	Track bytes sent to PostgreSQL. The pgstate infrastructure already had lots of details about what's going on, add to it the information about how many bytes are sent in every batch, and use this information in the monitor when something long is happening to display how many rows we sent from the beginning for this (supposedly) huge table, along with bytes and speed (bytes per seconds).	2017-08-23 11:55:49 +02:00
Dimitri Fontaine	8254d63453	Fix incorrect per-table total time metrics. The concurrency nature of pgloader made it non obvious where to implement the timers properly, and as a result the tracking of how long it took to actually transfer the data was... just wrong. Rather than trying to measure the time spent in any particular piece of the code, we now emit "start" and "stop" stats messages to the monitor thread at the right places (which are way easier to find, in the worker threads) and have the monitor figure out how long it took really. Fix #506.	2017-04-30 18:09:50 +02:00
Dimitri Fontaine	20ea1d78c4	Improve default summary readability. Now that we have fixed the output of the per-table total timing, we can only show that timing by default. With more verbosity pgloader will add the extra columns, and in computer oriented formats (json, csv, copy) all the details are always provided of course. See #506.	2017-04-30 18:09:50 +02:00
Dimitri Fontaine	9e4938cea4	Implement PostgreSQL catalogs data structure. In order to share more code in between the different source types, finally have a go at the quite horrible mess of anonymous data structures floating around. Having a catalog and schema instances not only allows for code cleanup, but will also allow to implement some bug fixes and wishlist items such as mapping tables from a schema to another one. Also, supporting database sources having a notion of "schema" (in between "catalog" and "table") should get easier, including getting on-par with MySQL in the MS SQL support (materialized views has been asked for already). See #320, #316, #224 for references and a notion of progress being made. In passing, also clean up the copy-databases methods for database source types, so that they all use a fetch-metadata generic function and a prepare-pgsql-database and a complete-pgsql-database generic function. Actually, a single method does the job here. The responsibility of introspecting the source to populate the internal catalog/schema representation is now held by the fetch-metadata generic function, which in turn will call the specialized versions of list-all-columns and friends implementations. Once the catalog has been fetched, an explicit CAST call is then needed before we can continue. Finally, the fields/columns/transforms slots in the copy objects are still being used by the operative code, so the internal catalog representation is only used up to starting the data copy step, where the copy class instances are then all that's used. This might be refactored again in a follow-up patch.	2015-12-30 21:53:01 +01:00
Dimitri Fontaine	187565b181	Add read/write separate stats. Add metrics to devise where the time is spent in current pgloader code so that it's possible to then optimize away the batch processing as we do it today. Given the following extract of the measures, it seems that doing the data transformations in the reader thread isn't so bright an idea. More to come. table name total time read write ----------------- -------------- --------- --------- extract 2.014s before load 0.050s fetch 0.000s ----------------- -------------- --------- --------- geolite.location 16.090s 15.933s 5.732s geolite.blocks 28.896s 28.795s 5.312s ----------------- -------------- --------- --------- after load 37.772s ----------------- -------------- --------- --------- Total import time 1m25.082s 44.728s 11.044s	2015-10-11 21:35:19 +02:00
Dimitri Fontaine	96a33de084	Review the stats and reporting code organisation. In order to later be able to have more worker threads sharing the load (multiple readers and/or writers, maybe more specialized threads too), have all the stats be managed centrally by a single thread. We already have a "monitor" thread that get passed log messages so that the output buffer is not subject to race conditions, extend its use to also deal with statistics messages. In the current code, we send a message each time we read a row. In some future commits we should probably reduce the messaging here to something like one message per batch in the common case. Also, as a nice side effect of the code simplification and refactoring this fixes #283 wherein the before/after sections of individual CSV files within an ARCHIVE command where not counted in the reporting.	2015-10-05 01:46:29 +02:00
Dimitri Fontaine	0068a45e1c	Fix parsing of qualified target table names, see #186 . We used to parse qualified table names as a simple string, which then breaks attempts to be smart about how to quote idenfifiers. Some sources are known to accept dots in quoted table names and we need to be able to process that properly without tripping on qualified table names too late. Current code might not be the best approach as it's just using either a cons or a string for table names internally, rather than defining a proper data structure with a schema and a name slot. Well, that's for a later cleanup patch, I happen to be lazy tonight.	2015-04-17 23:22:30 +02:00
Dimitri Fontaine	ed853a7bea	Allow pgloader to work on windows.	2014-11-06 22:12:20 +01:00
Dimitri Fontaine	2369a142a7	Refactor source code organisation. In passing, fix a bug in the previous commit where left-over code would cancel the whole new parsing code for advanced source fields options.	2014-10-01 23:20:24 +02:00

14 Commits