mirror of
https://github.com/dimitri/pgloader.git
synced 2025-08-07 06:47:00 +02:00
Review website material, introduce pgloader cli operations.
This commit is contained in:
parent
560c838d34
commit
087d4d28cb
1
.gitignore
vendored
1
.gitignore
vendored
@ -12,4 +12,5 @@ web/howto/fixed.html
|
||||
web/howto/geolite.html
|
||||
web/howto/mysql.html
|
||||
web/howto/pgloader.1.html
|
||||
web/howto/quickstart.html
|
||||
web/howto/sqlite.html
|
||||
|
12
pgloader.1
12
pgloader.1
@ -307,12 +307,12 @@ The same command as just above can also be run if the CSV file happens to be fou
|
||||
.
|
||||
.nf
|
||||
|
||||
\&\.pgloader \-\-type csv \e
|
||||
\-\-field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong" \e
|
||||
\-\-with "skip header = 1" \e
|
||||
\-\-with "fields terminated by \'\et\'" \e
|
||||
http://pgsql\.tapoueh\.org/temp/2013_Gaz_113CDs_national\.txt \e
|
||||
postgresql:///pgloader?districts_longlat
|
||||
pgloader \-\-type csv \e
|
||||
\-\-field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong" \e
|
||||
\-\-with "skip header = 1" \e
|
||||
\-\-with "fields terminated by \'\et\'" \e
|
||||
http://pgsql\.tapoueh\.org/temp/2013_Gaz_113CDs_national\.txt \e
|
||||
postgresql:///pgloader?districts_longlat
|
||||
.
|
||||
.fi
|
||||
.
|
||||
|
@ -248,12 +248,12 @@ pgloader with this technique, using the Unix pipe:
|
||||
The same command as just above can also be run if the CSV file happens to be
|
||||
found on a remote HTTP location:
|
||||
|
||||
.pgloader --type csv \
|
||||
--field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong" \
|
||||
--with "skip header = 1" \
|
||||
--with "fields terminated by '\t'" \
|
||||
http://pgsql.tapoueh.org/temp/2013_Gaz_113CDs_national.txt \
|
||||
postgresql:///pgloader?districts_longlat
|
||||
pgloader --type csv \
|
||||
--field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong" \
|
||||
--with "skip header = 1" \
|
||||
--with "fields terminated by '\t'" \
|
||||
http://pgsql.tapoueh.org/temp/2013_Gaz_113CDs_national.txt \
|
||||
postgresql:///pgloader?districts_longlat
|
||||
|
||||
Some more options have to be used in that case, as the file contains a
|
||||
one-line header (most commonly that's column names, could be a copyright
|
||||
|
@ -36,9 +36,10 @@
|
||||
<div class="navbar-collapse collapse">
|
||||
<ul class="nav navbar-nav">
|
||||
<li><a href="../index.html">Home</a></li>
|
||||
<li><a href="quickstart.html">Quick Start</a></li>
|
||||
<li><a href="pgloader.1.html">Reference documentation</a></li>
|
||||
<li class="dropdown active">
|
||||
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Data Sources HowTos <b class="caret"></b></a>
|
||||
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Advanced HowTos <b class="caret"></b></a>
|
||||
<ul class="dropdown-menu">
|
||||
<li class="dropdown-header">Plain Files</li>
|
||||
<li><a href="csv.html">CSV</a></li>
|
||||
|
@ -37,9 +37,10 @@
|
||||
<div class="navbar-collapse collapse">
|
||||
<ul class="nav navbar-nav">
|
||||
<li class="active"><a href="#">Home</a></li>
|
||||
<li><a href="howto/quickstart.html">Quick Start</a></li>
|
||||
<li><a href="howto/pgloader.1.html">Reference documentation</a></li>
|
||||
<li class="dropdown">
|
||||
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Data Sources HowTos <b class="caret"></b></a>
|
||||
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Advanced HowTos <b class="caret"></b></a>
|
||||
<ul class="dropdown-menu">
|
||||
<li class="dropdown-header">Plain Files</li>
|
||||
<li><a href="howto/csv.html">CSV</a></li>
|
||||
|
134
web/src/quickstart.md
Normal file
134
web/src/quickstart.md
Normal file
@ -0,0 +1,134 @@
|
||||
# pgloader: a quickstart
|
||||
|
||||
In simple cases, pgloader is very easy to use.
|
||||
|
||||
## CSV
|
||||
|
||||
Load data from a CSV file into a pre-existing table in your database:
|
||||
|
||||
pgloader --type csv \
|
||||
--field id --field field \
|
||||
--with truncate \
|
||||
--with "fields terminated by ','" \
|
||||
./test/data/matching-1.csv \
|
||||
postgres:///pgloader?tablename=matching
|
||||
|
||||
In that example the whole loading is driven from the command line, bypassing
|
||||
the need for writing a command in the pgloader command syntax entirely. As
|
||||
there's no command though, the extra inforamtion needed must be provided on
|
||||
the command line using the `--type` and `--field` and `--with` switches.
|
||||
|
||||
For documentation about the available syntaxes for the `--field` and
|
||||
`--with` switches, please refer to the CSV section later in the man page.
|
||||
|
||||
Note also that the PostgreSQL URI includes the target *tablename*.
|
||||
|
||||
## Reading from STDIN
|
||||
|
||||
File based pgloader sources can be loaded from the standard input, as in the
|
||||
following example:
|
||||
|
||||
pgloader --type csv \
|
||||
--field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong" \
|
||||
--with "skip header = 1" \
|
||||
--with "fields terminated by '\t'" \
|
||||
- \
|
||||
postgresql:///pgloader?districts_longlat \
|
||||
< test/data/2013_Gaz_113CDs_national.txt
|
||||
|
||||
The dash (`-`) character as a source is used to mean *standard input*, as
|
||||
usual in Unix command lines. It's possible to stream compressed content to
|
||||
pgloader with this technique, using the Unix pipe:
|
||||
|
||||
gunzip -c source.gz | pgloader --type csv ... - pgsql:///target?foo
|
||||
|
||||
## Loading from CSV available through HTTP
|
||||
|
||||
The same command as just above can also be run if the CSV file happens to be
|
||||
found on a remote HTTP location:
|
||||
|
||||
pgloader --type csv \
|
||||
--field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong" \
|
||||
--with "skip header = 1" \
|
||||
--with "fields terminated by '\t'" \
|
||||
http://pgsql.tapoueh.org/temp/2013_Gaz_113CDs_national.txt \
|
||||
postgresql:///pgloader?districts_longlat
|
||||
|
||||
Some more options have to be used in that case, as the file contains a
|
||||
one-line header (most commonly that's column names, could be a copyright
|
||||
notice). Also, in that case, we specify all the fields right into a single
|
||||
`--field` option argument.
|
||||
|
||||
Again, the PostgreSQL target connection string must contain the *tablename*
|
||||
option and you have to ensure that the target table exists and may fit the
|
||||
data. Here's the SQL command used in that example in case you want to try it
|
||||
yourself:
|
||||
|
||||
create table districts_longlat
|
||||
(
|
||||
usps text,
|
||||
geoid text,
|
||||
aland bigint,
|
||||
awater bigint,
|
||||
aland_sqmi double precision,
|
||||
awater_sqmi double precision,
|
||||
intptlat double precision,
|
||||
intptlong double precision
|
||||
);
|
||||
|
||||
Also notice that the same command will work against an archived version of
|
||||
the same data.
|
||||
|
||||
## Streaming CSV data from an HTTP compressed file
|
||||
|
||||
Finally, it's important to note that pgloader first fetches the content from
|
||||
the HTTP URL it to a local file, then expand the archive when it's
|
||||
recognized to be one, and only then processes the locally expanded file.
|
||||
|
||||
In some cases, either because pgloader has no direct support for your
|
||||
archive format or maybe because expanding the archive is not feasible in
|
||||
your environment, you might want to *stream* the content straight from its
|
||||
remote location into PostgreSQL. Here's how to do that, using the old battle
|
||||
tested Unix Pipes trick:
|
||||
|
||||
curl http://pgsql.tapoueh.org/temp/2013_Gaz_113CDs_national.txt.gz \
|
||||
| gunzip -c \
|
||||
| pgloader --type csv \
|
||||
--field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong"
|
||||
--with "skip header = 1" \
|
||||
--with "fields terminated by '\t'" \
|
||||
- \
|
||||
postgresql:///pgloader?districts_longlat
|
||||
|
||||
Now the OS will take care of the streaming and buffering between the network
|
||||
and the commands and pgloader will take care of streaming the data down to
|
||||
PostgreSQL.
|
||||
|
||||
## Migrating from SQLite
|
||||
|
||||
The following command will open the SQLite database, discover its tables
|
||||
definitions including indexes and foreign keys, migrate those definitions
|
||||
while *casting* the data type specifications to their PostgreSQL equivalent
|
||||
and then migrate the data over:
|
||||
|
||||
createdb newdb
|
||||
pgloader ./test/sqlite/sqlite.db postgresql:///newdb
|
||||
|
||||
## Migrating from MySQL
|
||||
|
||||
Just create a database where to host the MySQL data and definitions and have
|
||||
pgloader do the migration for you in a single command line:
|
||||
|
||||
createdb pagila
|
||||
pgloader mysql://user@localhost/sakila postgresql:///pagila
|
||||
|
||||
## Fetching an archived DBF file from a HTTP remote location
|
||||
|
||||
It's possible for pgloader to download a file from HTTP, unarchive it, and
|
||||
only then open it to discover the schema then load the data:
|
||||
|
||||
createdb foo
|
||||
pgloader --type dbf http://www.insee.fr/fr/methodes/nomenclatures/cog/telechargement/2013/dbf/historiq2013.zip postgresql:///foo
|
||||
|
||||
Here it's not possible for pgloader to guess the kind of data source it's
|
||||
being given, so it's necessary to use the `--type` command line switch.
|
Loading…
Reference in New Issue
Block a user