Allow - from the command line to process from standard input.

Also augment the documentation with examples of bare stdin reading and
of advantages of the unix pipes to stream even remove archived content
down to PostgreSQL.
This commit is contained in:
Dimitri Fontaine 2014-12-27 21:20:40 +01:00
parent f2bf5c4a62
commit 6d76bc57e3
4 changed files with 112 additions and 9 deletions

View File

@ -220,6 +220,38 @@ For documentation about the available syntaxes for the \fB\-\-field\fR and \fB\-
.P .P
Note also that the PostgreSQL URI includes the target \fItablename\fR\. Note also that the PostgreSQL URI includes the target \fItablename\fR\.
. .
.SS "Reading from STDIN"
File based pgloader sources can be loaded from the standard input, as in the following example:
.
.IP "" 4
.
.nf
pgloader \-\-type csv \e
\-\-field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong" \e
\-\-with "skip header = 1" \e
\-\-with "fields terminated by \'\et\'" \e
\- \e
postgresql:///pgloader?districts_longlat \e
< test/data/2013_Gaz_113CDs_national\.txt
.
.fi
.
.IP "" 0
.
.P
The dash (\fB\-\fR) character as a source is used to mean \fIstandard input\fR, as usual in Unix command lines\. It\'s possible to stream compressed content to pgloader with this technique, using the Unix pipe:
.
.IP "" 4
.
.nf
gunzip \-c source\.gz | pgloader \-\-type csv \.\.\. \- pgsql:///target?foo
.
.fi
.
.IP "" 0
.
.SS "Loading from CSV available through HTTP" .SS "Loading from CSV available through HTTP"
The same command as just above can also be run if the CSV file happens to be found on a remote HTTP location: The same command as just above can also be run if the CSV file happens to be found on a remote HTTP location:
. .
@ -267,6 +299,32 @@ create table districts_longlat
.P .P
Also notice that the same command will work against an archived version of the same data, e\.g\. http://pgsql\.tapoueh\.org/temp/2013_Gaz_113CDs_national\.txt\.gz\. Also notice that the same command will work against an archived version of the same data, e\.g\. http://pgsql\.tapoueh\.org/temp/2013_Gaz_113CDs_national\.txt\.gz\.
. .
.P
Finally, it\'s important to note that pgloader first fetches the content from the HTTP URL it to a local file, then expand the archive when it\'s recognized to be one, and only then processes the locally expanded file\.
.
.P
In some cases, either because pgloader has no direct support for your archive format or maybe because expanding the archive is not feasible in your environment, you might want to \fIstream\fR the content straight from its remote location into PostgreSQL\. Here\'s how to do that, using the old battle tested Unix Pipes trick:
.
.IP "" 4
.
.nf
curl http://pgsql\.tapoueh\.org/temp/2013_Gaz_113CDs_national\.txt\.gz \e
| gunzip \-c \e
| pgloader \-\-type csv \e
\-\-field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong"
\-\-with "skip header = 1" \e
\-\-with "fields terminated by \'\et\'" \e
\- \e
postgresql:///pgloader?districts_longlat
.
.fi
.
.IP "" 0
.
.P
Now the OS will take care of the streaming and buffering between the network and the commands and pgloader will take care of streaming the data down to PostgreSQL\.
.
.SS "Migrating from SQLite" .SS "Migrating from SQLite"
The following command will open the SQLite database, discover its tables definitions including indexes and foreign keys, migrate those definitions while \fIcasting\fR the data type specifications to their PostgreSQL equivalent and then migrate the data over: The following command will open the SQLite database, discover its tables definitions including indexes and foreign keys, migrate those definitions while \fIcasting\fR the data type specifications to their PostgreSQL equivalent and then migrate the data over:
. .

View File

@ -184,6 +184,25 @@ For documentation about the available syntaxes for the `--field` and
Note also that the PostgreSQL URI includes the target *tablename*. Note also that the PostgreSQL URI includes the target *tablename*.
### Reading from STDIN
File based pgloader sources can be loaded from the standard input, as in the
following example:
pgloader --type csv \
--field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong" \
--with "skip header = 1" \
--with "fields terminated by '\t'" \
- \
postgresql:///pgloader?districts_longlat \
< test/data/2013_Gaz_113CDs_national.txt
The dash (`-`) character as a source is used to mean *standard input*, as
usual in Unix command lines. It's possible to stream compressed content to
pgloader with this technique, using the Unix pipe:
gunzip -c source.gz | pgloader --type csv ... - pgsql:///target?foo
### Loading from CSV available through HTTP ### Loading from CSV available through HTTP
The same command as just above can also be run if the CSV file happens to be The same command as just above can also be run if the CSV file happens to be
@ -222,6 +241,29 @@ Also notice that the same command will work against an archived version of
the same data, e.g. the same data, e.g.
http://pgsql.tapoueh.org/temp/2013_Gaz_113CDs_national.txt.gz. http://pgsql.tapoueh.org/temp/2013_Gaz_113CDs_national.txt.gz.
Finally, it's important to note that pgloader first fetches the content from
the HTTP URL it to a local file, then expand the archive when it's
recognized to be one, and only then processes the locally expanded file.
In some cases, either because pgloader has no direct support for your
archive format or maybe because expanding the archive is not feasible in
your environment, you might want to *stream* the content straight from its
remote location into PostgreSQL. Here's how to do that, using the old battle
tested Unix Pipes trick:
curl http://pgsql.tapoueh.org/temp/2013_Gaz_113CDs_national.txt.gz \
| gunzip -c \
| pgloader --type csv \
--field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong"
--with "skip header = 1" \
--with "fields terminated by '\t'" \
- \
postgresql:///pgloader?districts_longlat
Now the OS will take care of the streaming and buffering between the network
and the commands and pgloader will take care of streaming the data down to
PostgreSQL.
### Migrating from SQLite ### Migrating from SQLite
The following command will open the SQLite database, discover its tables The following command will open the SQLite database, discover its tables

View File

@ -189,13 +189,14 @@
(uiop:native-namestring filename)) (uiop:native-namestring filename))
(declare (ignore abs paths no-path-p)) (declare (ignore abs paths no-path-p))
(let ((dotted-parts (reverse (sq:split-sequence #\. filename)))) (let ((dotted-parts (reverse (sq:split-sequence #\. filename))))
(destructuring-bind (extension name-or-ext &rest parts) (when (<= 2 (length dotted-parts))
dotted-parts (destructuring-bind (extension name-or-ext &rest parts)
(declare (ignore parts)) dotted-parts
(if (string-equal "tar" name-or-ext) :archive (declare (ignore parts))
(loop :for (type . extensions) :in *data-source-filename-extensions* (if (string-equal "tar" name-or-ext) :archive
:when (member extension extensions :test #'string-equal) (loop :for (type . extensions) :in *data-source-filename-extensions*
:return type)))))) :when (member extension extensions :test #'string-equal)
:return type)))))))
(defvar *parse-rule-for-source-types* (defvar *parse-rule-for-source-types*
'(:csv csv-file-source '(:csv csv-file-source
@ -234,7 +235,8 @@
(:filename (parse-filename-for-source-type url)) (:filename (parse-filename-for-source-type url))
(:http (parse-filename-for-source-type (:http (parse-filename-for-source-type
(puri:uri-path (puri:parse-uri url))))))) (puri:uri-path (puri:parse-uri url)))))))
(parse-source-string-for-type type source-string))))))) (when type
(parse-source-string-for-type type source-string))))))))
(defun parse-target-string (target-string) (defun parse-target-string (target-string)
(parse 'pgsql-uri target-string)) (parse 'pgsql-uri target-string))

View File

@ -12,7 +12,8 @@
(or (member char #.(quote (coerce "/\\:.-_!@#$%^&*()" 'list))) (or (member char #.(quote (coerce "/\\:.-_!@#$%^&*()" 'list)))
(alphanumericp char))) (alphanumericp char)))
(defrule stdin (~ "stdin") (:constant (list :stdin nil))) (defrule stdin (or "-" (~ "stdin")) (:constant (list :stdin nil)))
(defrule inline (~ "inline") (defrule inline (~ "inline")
(:lambda (i) (:lambda (i)
(declare (ignore i)) (declare (ignore i))