pgloader/docs/ref/archive.rst
Dimitri Fontaine 925996000b
Improve pgloader docs (Table of Contents, titles, organisation). (#1440)
Make it easier to nagivate our docs, which are dense enough to warrant
proper organisation and guided Table of Contents.
2022-10-18 17:28:34 +02:00

121 lines
3.7 KiB
ReStructuredText

Archive (http, zip)
===================
This command instructs pgloader to load data from one or more files contained
in an archive. Currently the only supported archive format is *ZIP*, and the
archive might be downloaded from an *HTTP* URL.
Using advanced options and a load command file
----------------------------------------------
The command then would be:
::
$ pgloader archive.load
And the contents of the ``archive.load`` file could be inspired from the
following:
::
LOAD ARCHIVE
FROM /Users/dim/Downloads/GeoLiteCity-latest.zip
INTO postgresql:///ip4r
BEFORE LOAD
DO $$ create extension if not exists ip4r; $$,
$$ create schema if not exists geolite; $$,
EXECUTE 'geolite.sql'
LOAD CSV
FROM FILENAME MATCHING ~/GeoLiteCity-Location.csv/
WITH ENCODING iso-8859-1
(
locId,
country,
region null if blanks,
city null if blanks,
postalCode null if blanks,
latitude,
longitude,
metroCode null if blanks,
areaCode null if blanks
)
INTO postgresql:///ip4r?geolite.location
(
locid,country,region,city,postalCode,
location point using (format nil "(~a,~a)" longitude latitude),
metroCode,areaCode
)
WITH skip header = 2,
fields optionally enclosed by '"',
fields escaped by double-quote,
fields terminated by ','
AND LOAD CSV
FROM FILENAME MATCHING ~/GeoLiteCity-Blocks.csv/
WITH ENCODING iso-8859-1
(
startIpNum, endIpNum, locId
)
INTO postgresql:///ip4r?geolite.blocks
(
iprange ip4r using (ip-range startIpNum endIpNum),
locId
)
WITH skip header = 2,
fields optionally enclosed by '"',
fields escaped by double-quote,
fields terminated by ','
FINALLY DO
$$ create index blocks_ip4r_idx on geolite.blocks using gist(iprange); $$;
Common Clauses
--------------
Please refer to :ref:`common_clauses` for documentation about common
clauses.
Archive Source Specification: FROM
----------------------------------
Filename or HTTP URI where to load the data from. When given an HTTP URL the
linked file will get downloaded locally before processing.
If the file is a `zip` file, the command line utility `unzip` is used to
expand the archive into files in `$TMPDIR`, or `/tmp` if `$TMPDIR` is unset
or set to a non-existing directory.
Then the following commands are used from the top level directory where the
archive has been expanded.
Archive Sub Commands
--------------------
- command [ *AND* command ... ]
A series of commands against the contents of the archive, at the moment
only `CSV`,`'FIXED` and `DBF` commands are supported.
Note that commands are supporting the clause *FROM FILENAME MATCHING*
which allows the pgloader command not to depend on the exact names of
the archive directories.
The same clause can also be applied to several files with using the
spelling *FROM ALL FILENAMES MATCHING* and a regular expression.
The whole *matching* clause must follow the following rule::
FROM [ ALL FILENAMES | [ FIRST ] FILENAME ] MATCHING
Archive Final SQL Commands
--------------------------
- *FINALLY DO*
SQL Queries to run once the data is loaded, such as `CREATE INDEX`.