mirror of
https://github.com/dimitri/pgloader.git
synced 2025-08-07 23:07:00 +02:00
The website is moving to pgloader.org and readthedocs.io is going to be integrated. Let's see what happens. The docs build fine locally with the sphinx tools and the docs/Makefile. Having separate files for the documentation should help ease the maintenance and add new topics, such as support for Common Lisp Hackers level docs, which are currently missing.
97 lines
4.6 KiB
ReStructuredText
97 lines
4.6 KiB
ReStructuredText
Loading Fixed Width Data File with pgloader
|
|
-------------------------------------------
|
|
|
|
Some data providers still use a format where each column is specified with a
|
|
starting index position and a given length. Usually the columns are
|
|
blank-padded when the data is shorter than the full reserved range.
|
|
|
|
The Command
|
|
^^^^^^^^^^^
|
|
|
|
To load data with [pgloader](http://pgloader.io/) you need to define in a
|
|
*command* the operations in some details. Here's our example for loading
|
|
Fixed Width Data, using a file provided by the US census.
|
|
|
|
You can find more files from them at the
|
|
[Census 2000 Gazetteer Files](http://www.census.gov/geo/maps-data/data/gazetteer2000.html).
|
|
|
|
Here's our command::
|
|
|
|
LOAD ARCHIVE
|
|
FROM http://www.census.gov/geo/maps-data/data/docs/gazetteer/places2k.zip
|
|
INTO postgresql:///pgloader
|
|
|
|
BEFORE LOAD DO
|
|
$$ drop table if exists places; $$,
|
|
$$ create table places
|
|
(
|
|
usps char(2) not null,
|
|
fips char(2) not null,
|
|
fips_code char(5),
|
|
loc_name varchar(64)
|
|
);
|
|
$$
|
|
|
|
LOAD FIXED
|
|
FROM FILENAME MATCHING ~/places2k.txt/
|
|
WITH ENCODING latin1
|
|
(
|
|
usps from 0 for 2,
|
|
fips from 2 for 2,
|
|
fips_code from 4 for 5,
|
|
"LocationName" from 9 for 64 [trim right whitespace],
|
|
p from 73 for 9,
|
|
h from 82 for 9,
|
|
land from 91 for 14,
|
|
water from 105 for 14,
|
|
ldm from 119 for 14,
|
|
wtm from 131 for 14,
|
|
lat from 143 for 10,
|
|
long from 153 for 11
|
|
)
|
|
INTO postgresql:///pgloader?places
|
|
(
|
|
usps, fips, fips_code, "LocationName"
|
|
);
|
|
|
|
The Data
|
|
^^^^^^^^
|
|
|
|
This command allows loading the following file content, where we are only
|
|
showing the first couple of lines::
|
|
|
|
AL0100124Abbeville city 2987 1353 40301945 120383 15.560669 0.046480 31.566367 -85.251300
|
|
AL0100460Adamsville city 4965 2042 50779330 14126 19.606010 0.005454 33.590411 -86.949166
|
|
AL0100484Addison town 723 339 9101325 0 3.514041 0.000000 34.200042 -87.177851
|
|
AL0100676Akron town 521 239 1436797 0 0.554750 0.000000 32.876425 -87.740978
|
|
AL0100820Alabaster city 22619 8594 53023800 141711 20.472605 0.054715 33.231162 -86.823829
|
|
AL0100988Albertville city 17247 7090 67212867 258738 25.951034 0.099899 34.265362 -86.211261
|
|
AL0101132Alexander City city 15008 6855 100534344 433413 38.816529 0.167342 32.933157 -85.936008
|
|
|
|
Loading the data
|
|
^^^^^^^^^^^^^^^^
|
|
|
|
Let's start the `pgloader` command with our `census-places.load` command file::
|
|
|
|
$ pgloader census-places.load
|
|
... LOG Starting pgloader, log system is ready.
|
|
... LOG Parsing commands from file "/Users/dim/dev/pgloader/test/census-places.load"
|
|
... LOG Fetching 'http://www.census.gov/geo/maps-data/data/docs/gazetteer/places2k.zip'
|
|
... LOG Extracting files from archive '//private/var/folders/w7/9n8v8pw54t1gngfff0lj16040000gn/T/pgloader//places2k.zip'
|
|
|
|
table name read imported errors time
|
|
----------------- --------- --------- --------- --------------
|
|
download 0 0 0 1.494s
|
|
extract 0 0 0 1.013s
|
|
before load 2 2 0 0.013s
|
|
----------------- --------- --------- --------- --------------
|
|
places 25375 25375 0 0.499s
|
|
----------------- --------- --------- --------- --------------
|
|
Total import time 25375 25375 0 3.019s
|
|
|
|
We can see that pgloader did download the file from its HTTP URL location
|
|
then *unziped* it before the loading itself.
|
|
|
|
Note that the output of the command has been edited to facilitate its
|
|
browsing online.
|