pgloader/pgloader.asd
Dimitri Fontaine db947e1467 Rework reader and writer data exchange.
With this patch, the whole data massaging and final formating into the
PostgreSQL COPY TEXT format is done by the reader thread, which publishes a
batch at a time in the communication channel: a lparallel.queue object.

Before that, the raw vectors where pushed directly in the queue, offering
more flexibility to adjust to the reader and writer IO rates and
capabilities, but impeding the ability of the Garbage Collector: data still
in the queue was not collected even if not needed anymore.

The new model also uses less memory, and allows a better control over what
amount of data stays in memory. The new *concurrent-batches* parameter
should be key to being able to process huge rows.

The intent is to offering a way for the users to tune *concurrent-batches*
down to 1 for sources with massive per-row memory footprint. Even better
would be to find a way to automatically adjust the setting without spending
too much time counting the bytes we're batching.

Preliminary tests show no sensible impact on performances from this patch,
even some improvements in cases.
2014-01-25 23:54:49 +01:00

127 lines
4.6 KiB
Common Lisp

;;;; pgloader.asd
(asdf:defsystem #:pgloader
:serial t
:description "Load data into PostgreSQL"
:author "Dimitri Fontaine <dimitri@2ndQuadrant.fr>"
:license "The PostgreSQL Licence"
:depends-on (#:uiop ; host system integration
#:cl-log ; logging
#:postmodern ; PostgreSQL protocol implementation
#:cl-postgres ; low level bits for COPY streaming
#:simple-date ; FIXME: recheck dependency
#:qmynd ; MySQL protocol implemenation
#:split-sequence ; some parsing is made easy
#:cl-csv ; full CSV reader
#:cl-fad ; file and directories
#:lparallel ; threads, workers, queues
#:esrap ; parser generator
#:alexandria ; utils
#:drakma ; http client, download archives
#:zip ; support for zip archive files
#:flexi-streams ; streams
#:com.informatimago.clext ; portable character-sets listings
#:usocket ; UDP / syslog
#:local-time ; UDP date parsing
#:command-line-arguments ; for the main function
#:abnf ; ABNF parser generator (for syslog)
#:db3 ; DBF version 3 file reader
#:py-configparser ; Read old-style INI config files
#:sqlite ; Query a SQLite file
#:trivial-backtrace ; For --debug cli usage
#:cl-markdown ; To produce the website
)
:components
((:module "src"
:components
((:file "params")
(:file "package" :depends-on ("params"))
(:file "logs" :depends-on ("package" "params"))
(:file "monitor" :depends-on ("params"
"package"
"logs"))
(:file "utils" :depends-on ("params"
"package"
"monitor"))
;; those are one-package-per-file
(:file "transforms")
(:file "queue" :depends-on ("params" "package"))
(:file "parser" :depends-on ("package"
"params"
"transforms"
"utils"
"monitor"
"pgsql"))
(:file "parse-ini" :depends-on ("package"
"params"
"utils"))
(:file "archive" :depends-on ("params"
"package"
"utils"
"sources"
"pgsql"))
;; package pgloader.pgsql
(:module pgsql
:depends-on ("package"
"params"
"queue"
"utils"
"logs"
"monitor")
:components
((:file "copy-format")
(:file "queries")
(:file "schema")
(:file "pgsql"
:depends-on ("copy-format"
"queries"
"schema"))))
;; Source format specific implementations
(:module sources
:depends-on ("params"
"package"
"pgsql"
"utils"
"logs"
"monitor"
"queue"
"transforms")
:components
((:file "sources")
(:file "csv" :depends-on ("sources"))
(:file "fixed" :depends-on ("sources"))
(:file "db3" :depends-on ("sources"))
(:file "sqlite" :depends-on ("sources"))
(:file "syslog" :depends-on ("sources"))
(:file "mysql-cast-rules")
(:file "mysql-schema")
(:file "mysql" :depends-on ("mysql-cast-rules"
"mysql-schema"))))
;; the main entry file, used when building a stand-alone
;; executable image
(:file "main" :depends-on ("params"
"package"
"monitor"
"utils"
"parser"
"sources"))))
;; to produce the website
(:module "web"
:components
((:module src
:components
((:file "docs")))))))