Review website material, introduce pgloader cli operations.

2025-08-09 15:56:58 +02:00 · 2015-01-15 16:51:43 +01:00 · 2015-01-15 16:51:43 +01:00 · 087d4d28cb
commit 087d4d28cb
parent 560c838d34
6 changed files with 151 additions and 14 deletions
--- a/.gitignore
+++ b/.gitignore
@ -12,4 +12,5 @@ web/howto/fixed.html
 web/howto/geolite.html
 web/howto/mysql.html
 web/howto/pgloader.1.html
 web/howto/quickstart.html
 web/howto/sqlite.html
--- a/pgloader.1
+++ b/pgloader.1
@ -307,12 +307,12 @@ The same command as just above can also be run if the CSV file happens to be fou
 .
 .nf
-\&\.pgloader \-\-type csv                                                     \e
+pgloader \-\-type csv                                                     \e
-          \-\-field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong" \e
+         \-\-field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong" \e
-          \-\-with "skip header = 1"                                       \e
+         \-\-with "skip header = 1"                                       \e
-          \-\-with "fields terminated by \'\et\'"                             \e
+         \-\-with "fields terminated by \'\et\'"                             \e
-          http://pgsql\.tapoueh\.org/temp/2013_Gaz_113CDs_national\.txt     \e
+         http://pgsql\.tapoueh\.org/temp/2013_Gaz_113CDs_national\.txt     \e
-          postgresql:///pgloader?districts_longlat
+         postgresql:///pgloader?districts_longlat
 .
 .fi
 .
--- a/pgloader.1.md
+++ b/pgloader.1.md
@ -248,12 +248,12 @@ pgloader with this technique, using the Unix pipe:
 The same command as just above can also be run if the CSV file happens to be
 found on a remote HTTP location:
-    .pgloader --type csv                                                     \
+    pgloader --type csv                                                     \
-              --field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong" \
+             --field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong" \
-              --with "skip header = 1"                                       \
+             --with "skip header = 1"                                       \
-              --with "fields terminated by '\t'"                             \
+             --with "fields terminated by '\t'"                             \
-              http://pgsql.tapoueh.org/temp/2013_Gaz_113CDs_national.txt     \
+             http://pgsql.tapoueh.org/temp/2013_Gaz_113CDs_national.txt     \
-              postgresql:///pgloader?districts_longlat
+             postgresql:///pgloader?districts_longlat
 Some more options have to be used in that case, as the file contains a
 one-line header (most commonly that's column names, could be a copyright
--- a/web/howto/header.html
+++ b/web/howto/header.html
@ -36,9 +36,10 @@
            <div class="navbar-collapse collapse">
              <ul class="nav navbar-nav">
                <li><a href="../index.html">Home</a></li>
                <li><a href="quickstart.html">Quick Start</a></li>
                <li><a href="pgloader.1.html">Reference documentation</a></li>
                <li class="dropdown active">
-                  <a href="#" class="dropdown-toggle" data-toggle="dropdown">Data Sources HowTos <b class="caret"></b></a>
+                  <a href="#" class="dropdown-toggle" data-toggle="dropdown">Advanced HowTos <b class="caret"></b></a>
                  <ul class="dropdown-menu">
                    <li class="dropdown-header">Plain Files</li>
                    <li><a href="csv.html">CSV</a></li>
--- a/web/index.html
+++ b/web/index.html
@ -37,9 +37,10 @@
            <div class="navbar-collapse collapse">
              <ul class="nav navbar-nav">
                <li class="active"><a href="#">Home</a></li>
                <li><a href="howto/quickstart.html">Quick Start</a></li>
                <li><a href="howto/pgloader.1.html">Reference documentation</a></li>
                <li class="dropdown">
-                  <a href="#" class="dropdown-toggle" data-toggle="dropdown">Data Sources HowTos <b class="caret"></b></a>
+                  <a href="#" class="dropdown-toggle" data-toggle="dropdown">Advanced HowTos <b class="caret"></b></a>
                  <ul class="dropdown-menu">
                    <li class="dropdown-header">Plain Files</li>
                    <li><a href="howto/csv.html">CSV</a></li>
--- a/web/src/quickstart.md
+++ b/web/src/quickstart.md
@ -0,0 +1,134 @@
 # pgloader: a quickstart
 In simple cases, pgloader is very easy to use.
 ## CSV
 Load data from a CSV file into a pre-existing table in your database:
    pgloader --type csv                                   \
             --field id --field field                     \
             --with truncate                              \
             --with "fields terminated by ','"            \
             ./test/data/matching-1.csv                   \
             postgres:///pgloader?tablename=matching
 In that example the whole loading is driven from the command line, bypassing
 the need for writing a command in the pgloader command syntax entirely. As
 there's no command though, the extra inforamtion needed must be provided on
 the command line using the `--type` and `--field` and `--with` switches.
 For documentation about the available syntaxes for the `--field` and
 `--with` switches, please refer to the CSV section later in the man page.
 Note also that the PostgreSQL URI includes the target *tablename*.
 ## Reading from STDIN
 File based pgloader sources can be loaded from the standard input, as in the
 following example:
    pgloader --type csv                                         \
             --field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong" \
             --with "skip header = 1"                          \
             --with "fields terminated by '\t'"                \
             -                                                 \
             postgresql:///pgloader?districts_longlat          \
             < test/data/2013_Gaz_113CDs_national.txt
 The dash (`-`) character as a source is used to mean *standard input*, as
 usual in Unix command lines. It's possible to stream compressed content to
 pgloader with this technique, using the Unix pipe:
    gunzip -c source.gz | pgloader --type csv ... - pgsql:///target?foo
 ## Loading from CSV available through HTTP
 The same command as just above can also be run if the CSV file happens to be
 found on a remote HTTP location:
    pgloader --type csv                                                     \
             --field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong" \
             --with "skip header = 1"                                       \
             --with "fields terminated by '\t'"                             \
             http://pgsql.tapoueh.org/temp/2013_Gaz_113CDs_national.txt     \
             postgresql:///pgloader?districts_longlat
 Some more options have to be used in that case, as the file contains a
 one-line header (most commonly that's column names, could be a copyright
 notice). Also, in that case, we specify all the fields right into a single
 `--field` option argument.
 Again, the PostgreSQL target connection string must contain the *tablename*
 option and you have to ensure that the target table exists and may fit the
 data. Here's the SQL command used in that example in case you want to try it
 yourself:
    create table districts_longlat
    (
             usps        text,
             geoid       text,
             aland       bigint,
             awater      bigint,
             aland_sqmi  double precision,
             awater_sqmi double precision,
             intptlat    double precision,
             intptlong   double precision
    );
 Also notice that the same command will work against an archived version of
 the same data.
 ## Streaming CSV data from an HTTP compressed file
 Finally, it's important to note that pgloader first fetches the content from
 the HTTP URL it to a local file, then expand the archive when it's
 recognized to be one, and only then processes the locally expanded file.
 In some cases, either because pgloader has no direct support for your
 archive format or maybe because expanding the archive is not feasible in
 your environment, you might want to *stream* the content straight from its
 remote location into PostgreSQL. Here's how to do that, using the old battle
 tested Unix Pipes trick:
    curl http://pgsql.tapoueh.org/temp/2013_Gaz_113CDs_national.txt.gz \
    | gunzip -c                                                        \
    | pgloader --type csv                                              \
               --field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong"
               --with "skip header = 1"                                \
               --with "fields terminated by '\t'"                      \
               -                                                       \
               postgresql:///pgloader?districts_longlat
 Now the OS will take care of the streaming and buffering between the network
 and the commands and pgloader will take care of streaming the data down to
 PostgreSQL.
 ## Migrating from SQLite
 The following command will open the SQLite database, discover its tables
 definitions including indexes and foreign keys, migrate those definitions
 while *casting* the data type specifications to their PostgreSQL equivalent
 and then migrate the data over:
    createdb newdb
    pgloader ./test/sqlite/sqlite.db postgresql:///newdb
 ## Migrating from MySQL
 Just create a database where to host the MySQL data and definitions and have
 pgloader do the migration for you in a single command line:
    createdb pagila
    pgloader mysql://user@localhost/sakila postgresql:///pagila
 ## Fetching an archived DBF file from a HTTP remote location
 It's possible for pgloader to download a file from HTTP, unarchive it, and
 only then open it to discover the schema then load the data:
    createdb foo
    pgloader --type dbf http://www.insee.fr/fr/methodes/nomenclatures/cog/telechargement/2013/dbf/historiq2013.zip postgresql:///foo
 Here it's not possible for pgloader to guess the kind of data source it's
 being given, so it's necessary to use the `--type` command line switch.