From effa916b3175502738d0b7591a908f43538f6f19 Mon Sep 17 00:00:00 2001 From: Dimitri Fontaine Date: Tue, 3 Jan 2017 23:13:01 +0100 Subject: [PATCH] Improve parallelism setup documentation. The code comment displayed in the release notes for 3.3.1 is reported to be better at explaining the concurrency control than what we had in the main documentation, so add it there. Fix #496. --- Makefile | 2 +- pgloader.1 | 18 ++++++++++++++++-- pgloader.1.md | 15 ++++++++++++--- 3 files changed, 29 insertions(+), 6 deletions(-) diff --git a/Makefile b/Makefile index ff553a0..a0c0486 100644 --- a/Makefile +++ b/Makefile @@ -230,4 +230,4 @@ latest: check: test ; -.PHONY: test pgloader-standalone +.PHONY: test pgloader-standalone docs diff --git a/pgloader.1 b/pgloader.1 index 1e37079..a998a25 100644 --- a/pgloader.1 +++ b/pgloader.1 @@ -1,7 +1,7 @@ .\" generated with Ronn/v0.7.3 .\" http://github.com/rtomayko/ronn/tree/0.7.3 . -.TH "PGLOADER" "1" "December 2016" "ff" "" +.TH "PGLOADER" "1" "January 2017" "ff" "" . .SH "NAME" \fBpgloader\fR \- PostgreSQL data loader @@ -487,7 +487,21 @@ At the moment, the number of transformer and writer tasks are forced into being The parameter \fIworkers\fR allows to control how many worker threads are allowed to be active at any time (that\'s the parallelism level); and the parameter \fIconcurrency\fR allows to control how many tasks are started to handle the data (they may not all run at the same time, depending on the \fIworkers\fR setting)\. . .P -With a \fIconcurrency\fR of 2, we start 1 reader thread, 2 transformer threads and 2 writer tasks, that\'s 5 concurrent tasks to schedule into \fIworkers\fR threads\. +We allow \fIworkers\fR simultaneous workers to be active at the same time in the context of a single table\. A single unit of work consist of several kinds of workers: +. +.IP "\(bu" 4 +a reader getting raw data from the source, +. +.IP "\(bu" 4 +N transformers preparing raw data for PostgreSQL COPY protocol, +. +.IP "\(bu" 4 +N writers sending the data down to PostgreSQL\. +. +.IP "" 0 +. +.P +The N here is setup to the \fIconcurrency\fR parameter: with a \fICONCURRENCY\fR of 2, we start (+ 1 2 2) = 5 concurrent tasks, with a \fIconcurrency\fR of 4 we start (+ 1 4 4) = 9 concurrent tasks, of which only \fIworkers\fR may be active simultaneously\. . .P So with \fBworkers = 4, concurrency = 2\fR, the parallel scheduler will maintain active only 4 of the 5 tasks that are started\. diff --git a/pgloader.1.md b/pgloader.1.md index e164999..136c3a7 100644 --- a/pgloader.1.md +++ b/pgloader.1.md @@ -433,9 +433,18 @@ parameter *concurrency* allows to control how many tasks are started to handle the data (they may not all run at the same time, depending on the *workers* setting). -With a *concurrency* of 2, we start 1 reader thread, 2 transformer threads -and 2 writer tasks, that's 5 concurrent tasks to schedule into *workers* -threads. +We allow *workers* simultaneous workers to be active at the same time in the +context of a single table. A single unit of work consist of several kinds of +workers: + + - a reader getting raw data from the source, + - N transformers preparing raw data for PostgreSQL COPY protocol, + - N writers sending the data down to PostgreSQL. + +The N here is setup to the *concurrency* parameter: with a *CONCURRENCY* of +2, we start (+ 1 2 2) = 5 concurrent tasks, with a *concurrency* of 4 we +start (+ 1 4 4) = 9 concurrent tasks, of which only *workers* may be active +simultaneously. So with `workers = 4, concurrency = 2`, the parallel scheduler will maintain active only 4 of the 5 tasks that are started.