mirror of
https://github.com/dimitri/pgloader.git
synced 2025-08-07 14:56:59 +02:00
* Improve documentation with command lines and defaults. The advanced documentation coverage skipped the simple approach and didn't explain fully what to do with the sample commands. Fixes #1413. * Fix docs indentation in the MySQL connection string options. * Desultory docs and docs config fixes.
121 lines
3.7 KiB
ReStructuredText
121 lines
3.7 KiB
ReStructuredText
Loading From an Archive
|
|
=======================
|
|
|
|
This command instructs pgloader to load data from one or more files contained
|
|
in an archive. Currently the only supported archive format is *ZIP*, and the
|
|
archive might be downloaded from an *HTTP* URL.
|
|
|
|
Using advanced options and a load command file
|
|
----------------------------------------------
|
|
|
|
The command then would be:
|
|
|
|
::
|
|
|
|
$ pgloader archive.load
|
|
|
|
And the contents of the ``archive.load`` file could be inspired from the
|
|
following:
|
|
|
|
::
|
|
|
|
LOAD ARCHIVE
|
|
FROM /Users/dim/Downloads/GeoLiteCity-latest.zip
|
|
INTO postgresql:///ip4r
|
|
|
|
BEFORE LOAD
|
|
DO $$ create extension if not exists ip4r; $$,
|
|
$$ create schema if not exists geolite; $$,
|
|
|
|
EXECUTE 'geolite.sql'
|
|
|
|
LOAD CSV
|
|
FROM FILENAME MATCHING ~/GeoLiteCity-Location.csv/
|
|
WITH ENCODING iso-8859-1
|
|
(
|
|
locId,
|
|
country,
|
|
region null if blanks,
|
|
city null if blanks,
|
|
postalCode null if blanks,
|
|
latitude,
|
|
longitude,
|
|
metroCode null if blanks,
|
|
areaCode null if blanks
|
|
)
|
|
INTO postgresql:///ip4r?geolite.location
|
|
(
|
|
locid,country,region,city,postalCode,
|
|
location point using (format nil "(~a,~a)" longitude latitude),
|
|
metroCode,areaCode
|
|
)
|
|
WITH skip header = 2,
|
|
fields optionally enclosed by '"',
|
|
fields escaped by double-quote,
|
|
fields terminated by ','
|
|
|
|
AND LOAD CSV
|
|
FROM FILENAME MATCHING ~/GeoLiteCity-Blocks.csv/
|
|
WITH ENCODING iso-8859-1
|
|
(
|
|
startIpNum, endIpNum, locId
|
|
)
|
|
INTO postgresql:///ip4r?geolite.blocks
|
|
(
|
|
iprange ip4r using (ip-range startIpNum endIpNum),
|
|
locId
|
|
)
|
|
WITH skip header = 2,
|
|
fields optionally enclosed by '"',
|
|
fields escaped by double-quote,
|
|
fields terminated by ','
|
|
|
|
FINALLY DO
|
|
$$ create index blocks_ip4r_idx on geolite.blocks using gist(iprange); $$;
|
|
|
|
Common Clauses
|
|
--------------
|
|
|
|
Please refer to :ref:`common_clauses` for documentation about common
|
|
clauses.
|
|
|
|
Archive Source Specification: FROM
|
|
----------------------------------
|
|
|
|
Filename or HTTP URI where to load the data from. When given an HTTP URL the
|
|
linked file will get downloaded locally before processing.
|
|
|
|
If the file is a `zip` file, the command line utility `unzip` is used to
|
|
expand the archive into files in `$TMPDIR`, or `/tmp` if `$TMPDIR` is unset
|
|
or set to a non-existing directory.
|
|
|
|
Then the following commands are used from the top level directory where the
|
|
archive has been expanded.
|
|
|
|
Archive Sub Commands
|
|
--------------------
|
|
|
|
- command [ *AND* command ... ]
|
|
|
|
A series of commands against the contents of the archive, at the moment
|
|
only `CSV`,`'FIXED` and `DBF` commands are supported.
|
|
|
|
Note that commands are supporting the clause *FROM FILENAME MATCHING*
|
|
which allows the pgloader command not to depend on the exact names of
|
|
the archive directories.
|
|
|
|
The same clause can also be applied to several files with using the
|
|
spelling *FROM ALL FILENAMES MATCHING* and a regular expression.
|
|
|
|
The whole *matching* clause must follow the following rule::
|
|
|
|
FROM [ ALL FILENAMES | [ FIRST ] FILENAME ] MATCHING
|
|
|
|
Archive Final SQL Commands
|
|
--------------------------
|
|
|
|
- *FINALLY DO*
|
|
|
|
SQL Queries to run once the data is loaded, such as `CREATE INDEX`.
|
|
|