pgloader/docs/_build/html/tutorial/tutorial.html
Dimitri Fontaine 25c79dfebc Switch the documentation to using Sphinx.
The website is moving to pgloader.org and readthedocs.io is going to be
integrated. Let's see what happens. The docs build fine locally with the
sphinx tools and the docs/Makefile.

Having separate files for the documentation should help ease the maintenance
and add new topics, such as support for Common Lisp Hackers level docs,
which are currently missing.
2017-12-21 17:45:09 +01:00

893 lines
65 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>PgLoader Tutorial &#8212; pgloader 3.4.1 documentation</title>
<link rel="stylesheet" href="../_static/alabaster.css" type="text/css" />
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: '../',
VERSION: '3.4.1',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true,
SOURCELINK_SUFFIX: '.txt'
};
</script>
<script type="text/javascript" src="../_static/jquery.js"></script>
<script type="text/javascript" src="../_static/underscore.js"></script>
<script type="text/javascript" src="../_static/doctools.js"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="PgLoader Reference Manual" href="../pgloader.html" />
<link rel="prev" title="Introduction" href="../intro.html" />
<link rel="stylesheet" href="../_static/custom.css" type="text/css" />
<meta name="viewport" content="width=device-width, initial-scale=0.9, maximum-scale=0.9" />
</head>
<body>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="pgloader-tutorial">
<h1>PgLoader Tutorial<a class="headerlink" href="#pgloader-tutorial" title="Permalink to this headline"></a></h1>
<div class="section" id="pgloader-quick-start">
<h2>PgLoader Quick Start<a class="headerlink" href="#pgloader-quick-start" title="Permalink to this headline"></a></h2>
<p>In simple cases, pgloader is very easy to use.</p>
<div class="section" id="csv">
<h3>CSV<a class="headerlink" href="#csv" title="Permalink to this headline"></a></h3>
<p>Load data from a CSV file into a pre-existing table in your database:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>pgloader --type csv \
--field id --field field \
--with truncate \
--with &quot;fields terminated by &#39;,&#39;&quot; \
./test/data/matching-1.csv \
postgres:///pgloader?tablename=matching
</pre></div>
</div>
<p>In that example the whole loading is driven from the command line, bypassing
the need for writing a command in the pgloader command syntax entirely. As
theres no command though, the extra information needed must be provided on
the command line using the <cite>type</cite> and <cite>field</cite> and <cite>with</cite> switches.</p>
<p>For documentation about the available syntaxes for the <cite>field</cite> and
<cite>with</cite> switches, please refer to the CSV section later in the man page.</p>
<p>Note also that the PostgreSQL URI includes the target <em>tablename</em>.</p>
</div>
<div class="section" id="reading-from-stdin">
<h3>Reading from STDIN<a class="headerlink" href="#reading-from-stdin" title="Permalink to this headline"></a></h3>
<p>File based pgloader sources can be loaded from the standard input, as in the
following example:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>pgloader --type csv \
--field &quot;usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong&quot; \
--with &quot;skip header = 1&quot; \
--with &quot;fields terminated by &#39;\t&#39;&quot; \
- \
postgresql:///pgloader?districts_longlat \
&lt; test/data/2013_Gaz_113CDs_national.txt
</pre></div>
</div>
<p>The dash (<cite>-</cite>) character as a source is used to mean <em>standard input</em>, as
usual in Unix command lines. Its possible to stream compressed content to
pgloader with this technique, using the Unix pipe:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>gunzip -c source.gz | pgloader --type csv ... - pgsql:///target?foo
</pre></div>
</div>
</div>
<div class="section" id="loading-from-csv-available-through-http">
<h3>Loading from CSV available through HTTP<a class="headerlink" href="#loading-from-csv-available-through-http" title="Permalink to this headline"></a></h3>
<p>The same command as just above can also be run if the CSV file happens to be
found on a remote HTTP location:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>pgloader --type csv \
--field &quot;usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong&quot; \
--with &quot;skip header = 1&quot; \
--with &quot;fields terminated by &#39;\t&#39;&quot; \
http://pgsql.tapoueh.org/temp/2013_Gaz_113CDs_national.txt \
postgresql:///pgloader?districts_longlat
</pre></div>
</div>
<p>Some more options have to be used in that case, as the file contains a
one-line header (most commonly thats column names, could be a copyright
notice). Also, in that case, we specify all the fields right into a single
<cite>field</cite> option argument.</p>
<p>Again, the PostgreSQL target connection string must contain the <em>tablename</em>
option and you have to ensure that the target table exists and may fit the
data. Heres the SQL command used in that example in case you want to try it
yourself:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">create</span> <span class="n">table</span> <span class="n">districts_longlat</span>
<span class="p">(</span>
<span class="n">usps</span> <span class="n">text</span><span class="p">,</span>
<span class="n">geoid</span> <span class="n">text</span><span class="p">,</span>
<span class="n">aland</span> <span class="n">bigint</span><span class="p">,</span>
<span class="n">awater</span> <span class="n">bigint</span><span class="p">,</span>
<span class="n">aland_sqmi</span> <span class="n">double</span> <span class="n">precision</span><span class="p">,</span>
<span class="n">awater_sqmi</span> <span class="n">double</span> <span class="n">precision</span><span class="p">,</span>
<span class="n">intptlat</span> <span class="n">double</span> <span class="n">precision</span><span class="p">,</span>
<span class="n">intptlong</span> <span class="n">double</span> <span class="n">precision</span>
<span class="p">);</span>
</pre></div>
</div>
<p>Also notice that the same command will work against an archived version of
the same data.</p>
</div>
<div class="section" id="streaming-csv-data-from-an-http-compressed-file">
<h3>Streaming CSV data from an HTTP compressed file<a class="headerlink" href="#streaming-csv-data-from-an-http-compressed-file" title="Permalink to this headline"></a></h3>
<p>Finally, its important to note that pgloader first fetches the content from
the HTTP URL it to a local file, then expand the archive when its
recognized to be one, and only then processes the locally expanded file.</p>
<p>In some cases, either because pgloader has no direct support for your
archive format or maybe because expanding the archive is not feasible in
your environment, you might want to <em>stream</em> the content straight from its
remote location into PostgreSQL. Heres how to do that, using the old battle
tested Unix Pipes trick:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>curl http://pgsql.tapoueh.org/temp/2013_Gaz_113CDs_national.txt.gz \
| gunzip -c \
| pgloader --type csv \
--field &quot;usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong&quot;
--with &quot;skip header = 1&quot; \
--with &quot;fields terminated by &#39;\t&#39;&quot; \
- \
postgresql:///pgloader?districts_longlat
</pre></div>
</div>
<p>Now the OS will take care of the streaming and buffering between the network
and the commands and pgloader will take care of streaming the data down to
PostgreSQL.</p>
</div>
<div class="section" id="migrating-from-sqlite">
<h3>Migrating from SQLite<a class="headerlink" href="#migrating-from-sqlite" title="Permalink to this headline"></a></h3>
<p>The following command will open the SQLite database, discover its tables
definitions including indexes and foreign keys, migrate those definitions
while <em>casting</em> the data type specifications to their PostgreSQL equivalent
and then migrate the data over:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">createdb</span> <span class="n">newdb</span>
<span class="n">pgloader</span> <span class="o">./</span><span class="n">test</span><span class="o">/</span><span class="n">sqlite</span><span class="o">/</span><span class="n">sqlite</span><span class="o">.</span><span class="n">db</span> <span class="n">postgresql</span><span class="p">:</span><span class="o">///</span><span class="n">newdb</span>
</pre></div>
</div>
</div>
<div class="section" id="migrating-from-mysql">
<h3>Migrating from MySQL<a class="headerlink" href="#migrating-from-mysql" title="Permalink to this headline"></a></h3>
<p>Just create a database where to host the MySQL data and definitions and have
pgloader do the migration for you in a single command line:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">createdb</span> <span class="n">pagila</span>
<span class="n">pgloader</span> <span class="n">mysql</span><span class="p">:</span><span class="o">//</span><span class="n">user</span><span class="nd">@localhost</span><span class="o">/</span><span class="n">sakila</span> <span class="n">postgresql</span><span class="p">:</span><span class="o">///</span><span class="n">pagila</span>
</pre></div>
</div>
</div>
<div class="section" id="fetching-an-archived-dbf-file-from-a-http-remote-location">
<h3>Fetching an archived DBF file from a HTTP remote location<a class="headerlink" href="#fetching-an-archived-dbf-file-from-a-http-remote-location" title="Permalink to this headline"></a></h3>
<p>Its possible for pgloader to download a file from HTTP, unarchive it, and
only then open it to discover the schema then load the data:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">createdb</span> <span class="n">foo</span>
<span class="n">pgloader</span> <span class="o">--</span><span class="nb">type</span> <span class="n">dbf</span> <span class="n">http</span><span class="p">:</span><span class="o">//</span><span class="n">www</span><span class="o">.</span><span class="n">insee</span><span class="o">.</span><span class="n">fr</span><span class="o">/</span><span class="n">fr</span><span class="o">/</span><span class="n">methodes</span><span class="o">/</span><span class="n">nomenclatures</span><span class="o">/</span><span class="n">cog</span><span class="o">/</span><span class="n">telechargement</span><span class="o">/</span><span class="mi">2013</span><span class="o">/</span><span class="n">dbf</span><span class="o">/</span><span class="n">historiq2013</span><span class="o">.</span><span class="n">zip</span> <span class="n">postgresql</span><span class="p">:</span><span class="o">///</span><span class="n">foo</span>
</pre></div>
</div>
<p>Here its not possible for pgloader to guess the kind of data source its
being given, so its necessary to use the <cite>type</cite> command line switch.</p>
</div>
</div>
<div class="section" id="loading-csv-data-with-pgloader">
<h2>Loading CSV Data with pgloader<a class="headerlink" href="#loading-csv-data-with-pgloader" title="Permalink to this headline"></a></h2>
<p>CSV means <em>comma separated values</em> and is often found with quite varying
specifications. pgloader allows you to describe those specs in its command.</p>
<div class="section" id="the-command">
<h3>The Command<a class="headerlink" href="#the-command" title="Permalink to this headline"></a></h3>
<p>To load data with [pgloader](<a class="reference external" href="http://pgloader.io/">http://pgloader.io/</a>) you need to define in a
<em>command</em> the operations in some details. Heres our example for loading CSV
data:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>LOAD CSV
FROM &#39;path/to/file.csv&#39; (x, y, a, b, c, d)
INTO postgresql:///pgloader?csv (a, b, d, c)
WITH truncate,
skip header = 1,
fields optionally enclosed by &#39;&quot;&#39;,
fields escaped by double-quote,
fields terminated by &#39;,&#39;
SET client_encoding to &#39;latin1&#39;,
work_mem to &#39;12MB&#39;,
standard_conforming_strings to &#39;on&#39;
BEFORE LOAD DO
$$ drop table if exists csv; $$,
$$ create table csv (
a bigint,
b bigint,
c char(2),
d text
);
$$;
</pre></div>
</div>
</div>
<div class="section" id="the-data">
<h3>The Data<a class="headerlink" href="#the-data" title="Permalink to this headline"></a></h3>
<p>This command allows loading the following CSV file content:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>Header, with a © sign
&quot;2.6.190.56&quot;,&quot;2.6.190.63&quot;,&quot;33996344&quot;,&quot;33996351&quot;,&quot;GB&quot;,&quot;United Kingdom&quot;
&quot;3.0.0.0&quot;,&quot;4.17.135.31&quot;,&quot;50331648&quot;,&quot;68257567&quot;,&quot;US&quot;,&quot;United States&quot;
&quot;4.17.135.32&quot;,&quot;4.17.135.63&quot;,&quot;68257568&quot;,&quot;68257599&quot;,&quot;CA&quot;,&quot;Canada&quot;
&quot;4.17.135.64&quot;,&quot;4.17.142.255&quot;,&quot;68257600&quot;,&quot;68259583&quot;,&quot;US&quot;,&quot;United States&quot;
&quot;4.17.143.0&quot;,&quot;4.17.143.15&quot;,&quot;68259584&quot;,&quot;68259599&quot;,&quot;CA&quot;,&quot;Canada&quot;
&quot;4.17.143.16&quot;,&quot;4.18.32.71&quot;,&quot;68259600&quot;,&quot;68296775&quot;,&quot;US&quot;,&quot;United States&quot;
</pre></div>
</div>
</div>
<div class="section" id="loading-the-data">
<h3>Loading the data<a class="headerlink" href="#loading-the-data" title="Permalink to this headline"></a></h3>
<p>Heres how to start loading the data. Note that the ouput here has been
edited so as to facilitate its browsing online:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>$ pgloader csv.load
... LOG Starting pgloader, log system is ready.
... LOG Parsing commands from file &quot;/Users/dim/dev/pgloader/test/csv.load&quot;
table name read imported errors time
----------------- --------- --------- --------- --------------
before load 2 2 0 0.039s
----------------- --------- --------- --------- --------------
csv 6 6 0 0.019s
----------------- --------- --------- --------- --------------
Total import time 6 6 0 0.058s
</pre></div>
</div>
</div>
<div class="section" id="the-result">
<h3>The result<a class="headerlink" href="#the-result" title="Permalink to this headline"></a></h3>
<p>As you can see, the command described above is filtering the input and only
importing some of the columns from the example data file. Heres what gets
loaded in the PostgreSQL database:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">pgloader</span><span class="c1"># table csv;</span>
<span class="n">a</span> <span class="o">|</span> <span class="n">b</span> <span class="o">|</span> <span class="n">c</span> <span class="o">|</span> <span class="n">d</span>
<span class="o">----------+----------+----+----------------</span>
<span class="mi">33996344</span> <span class="o">|</span> <span class="mi">33996351</span> <span class="o">|</span> <span class="n">GB</span> <span class="o">|</span> <span class="n">United</span> <span class="n">Kingdom</span>
<span class="mi">50331648</span> <span class="o">|</span> <span class="mi">68257567</span> <span class="o">|</span> <span class="n">US</span> <span class="o">|</span> <span class="n">United</span> <span class="n">States</span>
<span class="mi">68257568</span> <span class="o">|</span> <span class="mi">68257599</span> <span class="o">|</span> <span class="n">CA</span> <span class="o">|</span> <span class="n">Canada</span>
<span class="mi">68257600</span> <span class="o">|</span> <span class="mi">68259583</span> <span class="o">|</span> <span class="n">US</span> <span class="o">|</span> <span class="n">United</span> <span class="n">States</span>
<span class="mi">68259584</span> <span class="o">|</span> <span class="mi">68259599</span> <span class="o">|</span> <span class="n">CA</span> <span class="o">|</span> <span class="n">Canada</span>
<span class="mi">68259600</span> <span class="o">|</span> <span class="mi">68296775</span> <span class="o">|</span> <span class="n">US</span> <span class="o">|</span> <span class="n">United</span> <span class="n">States</span>
<span class="p">(</span><span class="mi">6</span> <span class="n">rows</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="loading-fixed-width-data-file-with-pgloader">
<h2>Loading Fixed Width Data File with pgloader<a class="headerlink" href="#loading-fixed-width-data-file-with-pgloader" title="Permalink to this headline"></a></h2>
<p>Some data providers still use a format where each column is specified with a
starting index position and a given length. Usually the columns are
blank-padded when the data is shorter than the full reserved range.</p>
<div class="section" id="id1">
<h3>The Command<a class="headerlink" href="#id1" title="Permalink to this headline"></a></h3>
<p>To load data with [pgloader](<a class="reference external" href="http://pgloader.io/">http://pgloader.io/</a>) you need to define in a
<em>command</em> the operations in some details. Heres our example for loading
Fixed Width Data, using a file provided by the US census.</p>
<p>You can find more files from them at the
[Census 2000 Gazetteer Files](<a class="reference external" href="http://www.census.gov/geo/maps-data/data/gazetteer2000.html">http://www.census.gov/geo/maps-data/data/gazetteer2000.html</a>).</p>
<p>Heres our command:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>LOAD ARCHIVE
FROM http://www.census.gov/geo/maps-data/data/docs/gazetteer/places2k.zip
INTO postgresql:///pgloader
BEFORE LOAD DO
$$ drop table if exists places; $$,
$$ create table places
(
usps char(2) not null,
fips char(2) not null,
fips_code char(5),
loc_name varchar(64)
);
$$
LOAD FIXED
FROM FILENAME MATCHING ~/places2k.txt/
WITH ENCODING latin1
(
usps from 0 for 2,
fips from 2 for 2,
fips_code from 4 for 5,
&quot;LocationName&quot; from 9 for 64 [trim right whitespace],
p from 73 for 9,
h from 82 for 9,
land from 91 for 14,
water from 105 for 14,
ldm from 119 for 14,
wtm from 131 for 14,
lat from 143 for 10,
long from 153 for 11
)
INTO postgresql:///pgloader?places
(
usps, fips, fips_code, &quot;LocationName&quot;
);
</pre></div>
</div>
</div>
<div class="section" id="id2">
<h3>The Data<a class="headerlink" href="#id2" title="Permalink to this headline"></a></h3>
<p>This command allows loading the following file content, where we are only
showing the first couple of lines:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">AL0100124Abbeville</span> <span class="n">city</span> <span class="mi">2987</span> <span class="mi">1353</span> <span class="mi">40301945</span> <span class="mi">120383</span> <span class="mf">15.560669</span> <span class="mf">0.046480</span> <span class="mf">31.566367</span> <span class="o">-</span><span class="mf">85.251300</span>
<span class="n">AL0100460Adamsville</span> <span class="n">city</span> <span class="mi">4965</span> <span class="mi">2042</span> <span class="mi">50779330</span> <span class="mi">14126</span> <span class="mf">19.606010</span> <span class="mf">0.005454</span> <span class="mf">33.590411</span> <span class="o">-</span><span class="mf">86.949166</span>
<span class="n">AL0100484Addison</span> <span class="n">town</span> <span class="mi">723</span> <span class="mi">339</span> <span class="mi">9101325</span> <span class="mi">0</span> <span class="mf">3.514041</span> <span class="mf">0.000000</span> <span class="mf">34.200042</span> <span class="o">-</span><span class="mf">87.177851</span>
<span class="n">AL0100676Akron</span> <span class="n">town</span> <span class="mi">521</span> <span class="mi">239</span> <span class="mi">1436797</span> <span class="mi">0</span> <span class="mf">0.554750</span> <span class="mf">0.000000</span> <span class="mf">32.876425</span> <span class="o">-</span><span class="mf">87.740978</span>
<span class="n">AL0100820Alabaster</span> <span class="n">city</span> <span class="mi">22619</span> <span class="mi">8594</span> <span class="mi">53023800</span> <span class="mi">141711</span> <span class="mf">20.472605</span> <span class="mf">0.054715</span> <span class="mf">33.231162</span> <span class="o">-</span><span class="mf">86.823829</span>
<span class="n">AL0100988Albertville</span> <span class="n">city</span> <span class="mi">17247</span> <span class="mi">7090</span> <span class="mi">67212867</span> <span class="mi">258738</span> <span class="mf">25.951034</span> <span class="mf">0.099899</span> <span class="mf">34.265362</span> <span class="o">-</span><span class="mf">86.211261</span>
<span class="n">AL0101132Alexander</span> <span class="n">City</span> <span class="n">city</span> <span class="mi">15008</span> <span class="mi">6855</span> <span class="mi">100534344</span> <span class="mi">433413</span> <span class="mf">38.816529</span> <span class="mf">0.167342</span> <span class="mf">32.933157</span> <span class="o">-</span><span class="mf">85.936008</span>
</pre></div>
</div>
</div>
<div class="section" id="id3">
<h3>Loading the data<a class="headerlink" href="#id3" title="Permalink to this headline"></a></h3>
<p>Lets start the <cite>pgloader</cite> command with our <cite>census-places.load</cite> command file:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>$ pgloader census-places.load
... LOG Starting pgloader, log system is ready.
... LOG Parsing commands from file &quot;/Users/dim/dev/pgloader/test/census-places.load&quot;
... LOG Fetching &#39;http://www.census.gov/geo/maps-data/data/docs/gazetteer/places2k.zip&#39;
... LOG Extracting files from archive &#39;//private/var/folders/w7/9n8v8pw54t1gngfff0lj16040000gn/T/pgloader//places2k.zip&#39;
table name read imported errors time
----------------- --------- --------- --------- --------------
download 0 0 0 1.494s
extract 0 0 0 1.013s
before load 2 2 0 0.013s
----------------- --------- --------- --------- --------------
places 25375 25375 0 0.499s
----------------- --------- --------- --------- --------------
Total import time 25375 25375 0 3.019s
</pre></div>
</div>
<p>We can see that pgloader did download the file from its HTTP URL location
then <em>unziped</em> it before the loading itself.</p>
<p>Note that the output of the command has been edited to facilitate its
browsing online.</p>
</div>
</div>
<div class="section" id="loading-maxmind-geolite-data-with-pgloader">
<h2>Loading MaxMind Geolite Data with pgloader<a class="headerlink" href="#loading-maxmind-geolite-data-with-pgloader" title="Permalink to this headline"></a></h2>
<p><a class="reference external" href="http://www.maxmind.com/">MaxMind</a> provides a free dataset for
geolocation, which is quite popular. Using pgloader you can download the
lastest version of it, extract the CSV files from the archive and load their
content into your database directly.</p>
<div class="section" id="id4">
<h3>The Command<a class="headerlink" href="#id4" title="Permalink to this headline"></a></h3>
<p>To load data with pgloader you need to define in a <em>command</em> the operations
in some details. Heres our example for loading the Geolite data:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>/*
* Loading from a ZIP archive containing CSV files. The full test can be
* done with using the archive found at
* http://geolite.maxmind.com/download/geoip/database/GeoLiteCity_CSV/GeoLiteCity-latest.zip
*
* And a very light version of this data set is found at
* http://pgsql.tapoueh.org/temp/foo.zip for quick testing.
*/
LOAD ARCHIVE
FROM http://geolite.maxmind.com/download/geoip/database/GeoLiteCity_CSV/GeoLiteCity-latest.zip
INTO postgresql:///ip4r
BEFORE LOAD DO
$$ create extension if not exists ip4r; $$,
$$ create schema if not exists geolite; $$,
$$ create table if not exists geolite.location
(
locid integer primary key,
country text,
region text,
city text,
postalcode text,
location point,
metrocode text,
areacode text
);
$$,
$$ create table if not exists geolite.blocks
(
iprange ip4r,
locid integer
);
$$,
$$ drop index if exists geolite.blocks_ip4r_idx; $$,
$$ truncate table geolite.blocks, geolite.location cascade; $$
LOAD CSV
FROM FILENAME MATCHING ~/GeoLiteCity-Location.csv/
WITH ENCODING iso-8859-1
(
locId,
country,
region null if blanks,
city null if blanks,
postalCode null if blanks,
latitude,
longitude,
metroCode null if blanks,
areaCode null if blanks
)
INTO postgresql:///ip4r?geolite.location
(
locid,country,region,city,postalCode,
location point using (format nil &quot;(~a,~a)&quot; longitude latitude),
metroCode,areaCode
)
WITH skip header = 2,
fields optionally enclosed by &#39;&quot;&#39;,
fields escaped by double-quote,
fields terminated by &#39;,&#39;
AND LOAD CSV
FROM FILENAME MATCHING ~/GeoLiteCity-Blocks.csv/
WITH ENCODING iso-8859-1
(
startIpNum, endIpNum, locId
)
INTO postgresql:///ip4r?geolite.blocks
(
iprange ip4r using (ip-range startIpNum endIpNum),
locId
)
WITH skip header = 2,
fields optionally enclosed by &#39;&quot;&#39;,
fields escaped by double-quote,
fields terminated by &#39;,&#39;
FINALLY DO
$$ create index blocks_ip4r_idx on geolite.blocks using gist(iprange); $$;
</pre></div>
</div>
<p>Note that while the <em>Geolite</em> data is using a pair of integers (<em>start</em>,
<em>end</em>) to represent <em>ipv4</em> data, we use the very poweful <a class="reference external" href="https://github.com/RhodiumToad/ip4r">ip4r</a> PostgreSQL Extension instead.</p>
<p>The transformation from a pair of integers into an IP is done dynamically by
the pgloader process.</p>
<p>Also, the location is given as a pair of <em>float</em> columns for the <em>longitude</em>
and the <em>latitude</em> where PostgreSQL offers the
<a class="reference external" href="http://www.postgresql.org/docs/9.3/interactive/functions-geometry.html">point</a>
datatype, so the pgloader command here will actually transform the data on
the fly to use the appropriate data type and its input representation.</p>
</div>
<div class="section" id="id5">
<h3>Loading the data<a class="headerlink" href="#id5" title="Permalink to this headline"></a></h3>
<p>Heres how to start loading the data. Note that the ouput here has been
edited so as to facilitate its browsing online:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>$ pgloader archive.load
... LOG Starting pgloader, log system is ready.
... LOG Parsing commands from file &quot;/Users/dim/dev/pgloader/test/archive.load&quot;
... LOG Fetching &#39;http://geolite.maxmind.com/download/geoip/database/GeoLiteCity_CSV/GeoLiteCity-latest.zip&#39;
... LOG Extracting files from archive &#39;//private/var/folders/w7/9n8v8pw54t1gngfff0lj16040000gn/T/pgloader//GeoLiteCity-latest.zip&#39;
table name read imported errors time
----------------- --------- --------- --------- --------------
download 0 0 0 11.592s
extract 0 0 0 1.012s
before load 6 6 0 0.019s
----------------- --------- --------- --------- --------------
geolite.location 470387 470387 0 7.743s
geolite.blocks 1903155 1903155 0 16.332s
----------------- --------- --------- --------- --------------
finally 1 1 0 31.692s
----------------- --------- --------- --------- --------------
Total import time 2373542 2373542 0 1m8.390s
</pre></div>
</div>
<p>The timing of course includes the transformation of the <em>1.9 million</em> pairs
of integer into a single <em>ipv4 range</em> each. The <em>finally</em> step consists of
creating the <em>GiST</em> specialized index as given in the main command:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">CREATE</span> <span class="n">INDEX</span> <span class="n">blocks_ip4r_idx</span> <span class="n">ON</span> <span class="n">geolite</span><span class="o">.</span><span class="n">blocks</span> <span class="n">USING</span> <span class="n">gist</span><span class="p">(</span><span class="n">iprange</span><span class="p">);</span>
</pre></div>
</div>
<p>That index will then be used to speed up queries wanting to find which
recorded geolocation contains a specific IP address:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">ip4r</span><span class="o">&gt;</span> <span class="n">select</span> <span class="o">*</span>
<span class="kn">from</span> <span class="nn">geolite.location</span> <span class="n">l</span>
<span class="n">join</span> <span class="n">geolite</span><span class="o">.</span><span class="n">blocks</span> <span class="n">b</span> <span class="n">using</span><span class="p">(</span><span class="n">locid</span><span class="p">)</span>
<span class="n">where</span> <span class="n">iprange</span> <span class="o">&gt;&gt;=</span> <span class="s1">&#39;8.8.8.8&#39;</span><span class="p">;</span>
<span class="o">-</span><span class="p">[</span> <span class="n">RECORD</span> <span class="mi">1</span> <span class="p">]</span><span class="o">------------------</span>
<span class="n">locid</span> <span class="o">|</span> <span class="mi">223</span>
<span class="n">country</span> <span class="o">|</span> <span class="n">US</span>
<span class="n">region</span> <span class="o">|</span>
<span class="n">city</span> <span class="o">|</span>
<span class="n">postalcode</span> <span class="o">|</span>
<span class="n">location</span> <span class="o">|</span> <span class="p">(</span><span class="o">-</span><span class="mi">97</span><span class="p">,</span><span class="mi">38</span><span class="p">)</span>
<span class="n">metrocode</span> <span class="o">|</span>
<span class="n">areacode</span> <span class="o">|</span>
<span class="n">iprange</span> <span class="o">|</span> <span class="mf">8.8</span><span class="o">.</span><span class="mf">8.8</span><span class="o">-</span><span class="mf">8.8</span><span class="o">.</span><span class="mf">37.255</span>
<span class="n">Time</span><span class="p">:</span> <span class="mf">0.747</span> <span class="n">ms</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="loading-dbase-files-with-pgloader">
<h2>Loading dBase files with pgloader<a class="headerlink" href="#loading-dbase-files-with-pgloader" title="Permalink to this headline"></a></h2>
<p>The dBase format is still in use in some places as modern tools such as
<em>Filemaker</em> and <em>Excel</em> offer some level of support for it. Speaking of
support in modern tools, pgloader is right there on the list too!</p>
<div class="section" id="id6">
<h3>The Command<a class="headerlink" href="#id6" title="Permalink to this headline"></a></h3>
<p>To load data with [pgloader](<a class="reference external" href="http://pgloader.io/">http://pgloader.io/</a>) you need to define in a
<em>command</em> the operations in some details. Heres our example for loading a
dBase file, using a file provided by the french administration.</p>
<p>You can find more files from them at the <a class="reference external" href="http://www.insee.fr/fr/methodes/nomenclatures/cog/telechargement.asp">Insee</a>
website.</p>
<p>Heres our command:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">LOAD</span> <span class="n">DBF</span>
<span class="n">FROM</span> <span class="n">http</span><span class="p">:</span><span class="o">//</span><span class="n">www</span><span class="o">.</span><span class="n">insee</span><span class="o">.</span><span class="n">fr</span><span class="o">/</span><span class="n">fr</span><span class="o">/</span><span class="n">methodes</span><span class="o">/</span><span class="n">nomenclatures</span><span class="o">/</span><span class="n">cog</span><span class="o">/</span><span class="n">telechargement</span><span class="o">/</span><span class="mi">2013</span><span class="o">/</span><span class="n">dbf</span><span class="o">/</span><span class="n">historiq2013</span><span class="o">.</span><span class="n">zip</span>
<span class="n">INTO</span> <span class="n">postgresql</span><span class="p">:</span><span class="o">///</span><span class="n">pgloader</span>
<span class="n">WITH</span> <span class="n">truncate</span><span class="p">,</span> <span class="n">create</span> <span class="n">table</span>
<span class="n">SET</span> <span class="n">client_encoding</span> <span class="n">TO</span> <span class="s1">&#39;latin1&#39;</span><span class="p">;</span>
</pre></div>
</div>
<p>Note that here pgloader will benefit from the meta-data information found in
the dBase file to create a PostgreSQL table capable of hosting the data as
described, then load the data.</p>
</div>
<div class="section" id="id7">
<h3>Loading the data<a class="headerlink" href="#id7" title="Permalink to this headline"></a></h3>
<p>Lets start the <cite>pgloader</cite> command with our <cite>dbf-zip.load</cite> command file:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>$ pgloader dbf-zip.load
... LOG Starting pgloader, log system is ready.
... LOG Parsing commands from file &quot;/Users/dim/dev/pgloader/test/dbf-zip.load&quot;
... LOG Fetching &#39;http://www.insee.fr/fr/methodes/nomenclatures/cog/telechargement/2013/dbf/historiq2013.zip&#39;
... LOG Extracting files from archive &#39;//private/var/folders/w7/9n8v8pw54t1gngfff0lj16040000gn/T/pgloader//historiq2013.zip&#39;
table name read imported errors time
----------------- --------- --------- --------- --------------
download 0 0 0 0.167s
extract 0 0 0 1.010s
create, truncate 0 0 0 0.071s
----------------- --------- --------- --------- --------------
historiq2013 9181 9181 0 0.658s
----------------- --------- --------- --------- --------------
Total import time 9181 9181 0 1.906s
</pre></div>
</div>
<p>We can see that <a class="reference external" href="http://pgloader.io">pgloader</a> did download the file from
its HTTP URL location then <em>unziped</em> it before the loading itself.</p>
<p>Note that the output of the command has been edited to facilitate its
browsing online.</p>
</div>
</div>
<div class="section" id="loading-sqlite-files-with-pgloader">
<h2>Loading SQLite files with pgloader<a class="headerlink" href="#loading-sqlite-files-with-pgloader" title="Permalink to this headline"></a></h2>
<p>The SQLite database is a respected solution to manage your data with. Its
embeded nature makes it a source of migrations when a projects now needs to
handle more concurrency, which [PostgreSQL](<a class="reference external" href="http://www.postgresql.org/">http://www.postgresql.org/</a>) is
very good at. pgloader can help you there.</p>
<div class="section" id="in-a-single-command-line">
<h3>In a Single Command Line<a class="headerlink" href="#in-a-single-command-line" title="Permalink to this headline"></a></h3>
<p>You can</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>$ createdb chinook
$ pgloader https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite_AutoIncrementPKs.sqlite pgsql:///chinook
</pre></div>
</div>
<p>Done! All with the schema, data, constraints, primary keys and foreign keys,
etc. We also see an error with the Chinook schema that contains several
primary key definitions against the same table, which is not accepted by
PostgreSQL:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="mi">2017</span><span class="o">-</span><span class="mi">06</span><span class="o">-</span><span class="mi">20</span><span class="n">T16</span><span class="p">:</span><span class="mi">18</span><span class="p">:</span><span class="mf">59.019000</span><span class="o">+</span><span class="mi">02</span><span class="p">:</span><span class="mi">00</span> <span class="n">LOG</span> <span class="n">Data</span> <span class="n">errors</span> <span class="ow">in</span> <span class="s1">&#39;/private/tmp/pgloader/&#39;</span>
<span class="mi">2017</span><span class="o">-</span><span class="mi">06</span><span class="o">-</span><span class="mi">20</span><span class="n">T16</span><span class="p">:</span><span class="mi">18</span><span class="p">:</span><span class="mf">59.236000</span><span class="o">+</span><span class="mi">02</span><span class="p">:</span><span class="mi">00</span> <span class="n">LOG</span> <span class="n">Fetching</span> <span class="s1">&#39;https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite_AutoIncrementPKs.sqlite&#39;</span>
<span class="mi">2017</span><span class="o">-</span><span class="mi">06</span><span class="o">-</span><span class="mi">20</span><span class="n">T16</span><span class="p">:</span><span class="mi">19</span><span class="p">:</span><span class="mf">00.664000</span><span class="o">+</span><span class="mi">02</span><span class="p">:</span><span class="mi">00</span> <span class="n">ERROR</span> <span class="n">Database</span> <span class="n">error</span> <span class="mi">42</span><span class="n">P16</span><span class="p">:</span> <span class="n">multiple</span> <span class="n">primary</span> <span class="n">keys</span> <span class="k">for</span> <span class="n">table</span> <span class="s2">&quot;playlisttrack&quot;</span> <span class="n">are</span> <span class="ow">not</span> <span class="n">allowed</span>
<span class="n">QUERY</span><span class="p">:</span> <span class="n">ALTER</span> <span class="n">TABLE</span> <span class="n">playlisttrack</span> <span class="n">ADD</span> <span class="n">PRIMARY</span> <span class="n">KEY</span> <span class="n">USING</span> <span class="n">INDEX</span> <span class="n">idx_66873_sqlite_autoindex_playlisttrack_1</span><span class="p">;</span>
<span class="mi">2017</span><span class="o">-</span><span class="mi">06</span><span class="o">-</span><span class="mi">20</span><span class="n">T16</span><span class="p">:</span><span class="mi">19</span><span class="p">:</span><span class="mf">00.665000</span><span class="o">+</span><span class="mi">02</span><span class="p">:</span><span class="mi">00</span> <span class="n">LOG</span> <span class="n">report</span> <span class="n">summary</span> <span class="n">reset</span>
<span class="n">table</span> <span class="n">name</span> <span class="n">read</span> <span class="n">imported</span> <span class="n">errors</span> <span class="n">total</span> <span class="n">time</span>
<span class="o">-----------------------</span> <span class="o">---------</span> <span class="o">---------</span> <span class="o">---------</span> <span class="o">--------------</span>
<span class="n">fetch</span> <span class="mi">0</span> <span class="mi">0</span> <span class="mi">0</span> <span class="mf">0.877</span><span class="n">s</span>
<span class="n">fetch</span> <span class="n">meta</span> <span class="n">data</span> <span class="mi">33</span> <span class="mi">33</span> <span class="mi">0</span> <span class="mf">0.033</span><span class="n">s</span>
<span class="n">Create</span> <span class="n">Schemas</span> <span class="mi">0</span> <span class="mi">0</span> <span class="mi">0</span> <span class="mf">0.003</span><span class="n">s</span>
<span class="n">Create</span> <span class="n">SQL</span> <span class="n">Types</span> <span class="mi">0</span> <span class="mi">0</span> <span class="mi">0</span> <span class="mf">0.006</span><span class="n">s</span>
<span class="n">Create</span> <span class="n">tables</span> <span class="mi">22</span> <span class="mi">22</span> <span class="mi">0</span> <span class="mf">0.043</span><span class="n">s</span>
<span class="n">Set</span> <span class="n">Table</span> <span class="n">OIDs</span> <span class="mi">11</span> <span class="mi">11</span> <span class="mi">0</span> <span class="mf">0.012</span><span class="n">s</span>
<span class="o">-----------------------</span> <span class="o">---------</span> <span class="o">---------</span> <span class="o">---------</span> <span class="o">--------------</span>
<span class="n">album</span> <span class="mi">347</span> <span class="mi">347</span> <span class="mi">0</span> <span class="mf">0.023</span><span class="n">s</span>
<span class="n">artist</span> <span class="mi">275</span> <span class="mi">275</span> <span class="mi">0</span> <span class="mf">0.023</span><span class="n">s</span>
<span class="n">customer</span> <span class="mi">59</span> <span class="mi">59</span> <span class="mi">0</span> <span class="mf">0.021</span><span class="n">s</span>
<span class="n">employee</span> <span class="mi">8</span> <span class="mi">8</span> <span class="mi">0</span> <span class="mf">0.018</span><span class="n">s</span>
<span class="n">invoice</span> <span class="mi">412</span> <span class="mi">412</span> <span class="mi">0</span> <span class="mf">0.031</span><span class="n">s</span>
<span class="n">genre</span> <span class="mi">25</span> <span class="mi">25</span> <span class="mi">0</span> <span class="mf">0.021</span><span class="n">s</span>
<span class="n">invoiceline</span> <span class="mi">2240</span> <span class="mi">2240</span> <span class="mi">0</span> <span class="mf">0.034</span><span class="n">s</span>
<span class="n">mediatype</span> <span class="mi">5</span> <span class="mi">5</span> <span class="mi">0</span> <span class="mf">0.025</span><span class="n">s</span>
<span class="n">playlisttrack</span> <span class="mi">8715</span> <span class="mi">8715</span> <span class="mi">0</span> <span class="mf">0.040</span><span class="n">s</span>
<span class="n">playlist</span> <span class="mi">18</span> <span class="mi">18</span> <span class="mi">0</span> <span class="mf">0.016</span><span class="n">s</span>
<span class="n">track</span> <span class="mi">3503</span> <span class="mi">3503</span> <span class="mi">0</span> <span class="mf">0.111</span><span class="n">s</span>
<span class="o">-----------------------</span> <span class="o">---------</span> <span class="o">---------</span> <span class="o">---------</span> <span class="o">--------------</span>
<span class="n">COPY</span> <span class="n">Threads</span> <span class="n">Completion</span> <span class="mi">33</span> <span class="mi">33</span> <span class="mi">0</span> <span class="mf">0.313</span><span class="n">s</span>
<span class="n">Create</span> <span class="n">Indexes</span> <span class="mi">22</span> <span class="mi">22</span> <span class="mi">0</span> <span class="mf">0.160</span><span class="n">s</span>
<span class="n">Index</span> <span class="n">Build</span> <span class="n">Completion</span> <span class="mi">22</span> <span class="mi">22</span> <span class="mi">0</span> <span class="mf">0.027</span><span class="n">s</span>
<span class="n">Reset</span> <span class="n">Sequences</span> <span class="mi">0</span> <span class="mi">0</span> <span class="mi">0</span> <span class="mf">0.017</span><span class="n">s</span>
<span class="n">Primary</span> <span class="n">Keys</span> <span class="mi">12</span> <span class="mi">0</span> <span class="mi">1</span> <span class="mf">0.013</span><span class="n">s</span>
<span class="n">Create</span> <span class="n">Foreign</span> <span class="n">Keys</span> <span class="mi">11</span> <span class="mi">11</span> <span class="mi">0</span> <span class="mf">0.040</span><span class="n">s</span>
<span class="n">Create</span> <span class="n">Triggers</span> <span class="mi">0</span> <span class="mi">0</span> <span class="mi">0</span> <span class="mf">0.000</span><span class="n">s</span>
<span class="n">Install</span> <span class="n">Comments</span> <span class="mi">0</span> <span class="mi">0</span> <span class="mi">0</span> <span class="mf">0.000</span><span class="n">s</span>
<span class="o">-----------------------</span> <span class="o">---------</span> <span class="o">---------</span> <span class="o">---------</span> <span class="o">--------------</span>
<span class="n">Total</span> <span class="kn">import</span> <span class="nn">time</span> <span class="mi">15607</span> <span class="mi">15607</span> <span class="mi">0</span> <span class="mf">1.669</span><span class="n">s</span>
</pre></div>
</div>
<p>You may need to have special cases to take care of tho. In advanced case you
can use the pgloader command.</p>
</div>
<div class="section" id="id8">
<h3>The Command<a class="headerlink" href="#id8" title="Permalink to this headline"></a></h3>
<p>To load data with [pgloader](<a class="reference external" href="http://pgloader.io/">http://pgloader.io/</a>) you need to define in a
<em>command</em> the operations in some details. Heres our command:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">load</span> <span class="n">database</span>
<span class="kn">from</span> <span class="s1">&#39;sqlite/Chinook_Sqlite_AutoIncrementPKs.sqlite&#39;</span>
<span class="n">into</span> <span class="n">postgresql</span><span class="p">:</span><span class="o">///</span><span class="n">pgloader</span>
<span class="k">with</span> <span class="n">include</span> <span class="n">drop</span><span class="p">,</span> <span class="n">create</span> <span class="n">tables</span><span class="p">,</span> <span class="n">create</span> <span class="n">indexes</span><span class="p">,</span> <span class="n">reset</span> <span class="n">sequences</span>
<span class="nb">set</span> <span class="n">work_mem</span> <span class="n">to</span> <span class="s1">&#39;16MB&#39;</span><span class="p">,</span> <span class="n">maintenance_work_mem</span> <span class="n">to</span> <span class="s1">&#39;512 MB&#39;</span><span class="p">;</span>
</pre></div>
</div>
<p>Note that here pgloader will benefit from the meta-data information found in
the SQLite file to create a PostgreSQL database capable of hosting the data
as described, then load the data.</p>
</div>
<div class="section" id="id9">
<h3>Loading the data<a class="headerlink" href="#id9" title="Permalink to this headline"></a></h3>
<p>Lets start the <cite>pgloader</cite> command with our <cite>sqlite.load</cite> command file:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>$ pgloader sqlite.load
... LOG Starting pgloader, log system is ready.
... LOG Parsing commands from file &quot;/Users/dim/dev/pgloader/test/sqlite.load&quot;
... WARNING Postgres warning: table &quot;album&quot; does not exist, skipping
... WARNING Postgres warning: table &quot;artist&quot; does not exist, skipping
... WARNING Postgres warning: table &quot;customer&quot; does not exist, skipping
... WARNING Postgres warning: table &quot;employee&quot; does not exist, skipping
... WARNING Postgres warning: table &quot;genre&quot; does not exist, skipping
... WARNING Postgres warning: table &quot;invoice&quot; does not exist, skipping
... WARNING Postgres warning: table &quot;invoiceline&quot; does not exist, skipping
... WARNING Postgres warning: table &quot;mediatype&quot; does not exist, skipping
... WARNING Postgres warning: table &quot;playlist&quot; does not exist, skipping
... WARNING Postgres warning: table &quot;playlisttrack&quot; does not exist, skipping
... WARNING Postgres warning: table &quot;track&quot; does not exist, skipping
table name read imported errors time
---------------------- --------- --------- --------- --------------
create, truncate 0 0 0 0.052s
Album 347 347 0 0.070s
Artist 275 275 0 0.014s
Customer 59 59 0 0.014s
Employee 8 8 0 0.012s
Genre 25 25 0 0.018s
Invoice 412 412 0 0.032s
InvoiceLine 2240 2240 0 0.077s
MediaType 5 5 0 0.012s
Playlist 18 18 0 0.008s
PlaylistTrack 8715 8715 0 0.071s
Track 3503 3503 0 0.105s
index build completion 0 0 0 0.000s
---------------------- --------- --------- --------- --------------
Create Indexes 20 20 0 0.279s
reset sequences 0 0 0 0.043s
---------------------- --------- --------- --------- --------------
Total streaming time 15607 15607 0 0.476s
</pre></div>
</div>
<p>We can see that <a class="reference external" href="http://pgloader.io">pgloader</a> did download the file from
its HTTP URL location then <em>unziped</em> it before loading it.</p>
<p>Also, the <em>WARNING</em> messages we see here are expected as the PostgreSQL
database is empty when running the command, and pgloader is using the SQL
commands <cite>DROP TABLE IF EXISTS</cite> when the given command uses the <cite>include
drop</cite> option.</p>
<p>Note that the output of the command has been edited to facilitate its
browsing online.</p>
</div>
</div>
<div class="section" id="migrating-from-mysql-to-postgresql">
<h2>Migrating from MySQL to PostgreSQL<a class="headerlink" href="#migrating-from-mysql-to-postgresql" title="Permalink to this headline"></a></h2>
<p>If you want to migrate your data over to <a class="reference external" href="http://www.postgresql.org">PostgreSQL</a> from MySQL then pgloader is the tool of
choice!</p>
<p>Most tools around are skipping the main problem with migrating from MySQL,
which is to do with the type casting and data sanitizing that needs to be
done. pgloader will not leave you alone on those topics.</p>
<div class="section" id="id11">
<h3>In a Single Command Line<a class="headerlink" href="#id11" title="Permalink to this headline"></a></h3>
<p>As an example, we will use the f1db database from &lt;<a class="reference external" href="http://ergast.com/mrd/">http://ergast.com/mrd/</a>&gt;
which which provides a historical record of motor racing data for
non-commercial purposes. You can either use their API or download the whole
database at <a class="reference external" href="http://ergast.com/downloads/f1db.sql.gz">http://ergast.com/downloads/f1db.sql.gz</a>. Once youve done that load the
database in MySQL:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>$ mysql -u root
&gt; create database f1db;
&gt; source f1db.sql
</pre></div>
</div>
<p>Now lets migrate this database into PostgreSQL in a single command line:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>$ createdb f1db
$ pgloader mysql://root@localhost/f1db pgsql:///f1db
</pre></div>
</div>
<p>Done! All with schema, table definitions, constraints, indexes, primary
keys, <em>auto_increment</em> columns turned into <em>bigserial</em> , foreign keys,
comments, and if you had some MySQL default values such as <em>ON UPDATE
CURRENT_TIMESTAMP</em> they would have been translated to a <a class="reference external" href="https://www.postgresql.org/docs/current/static/plpgsql-trigger.html">PostgreSQL before
update trigger</a>
automatically.</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>$ pgloader mysql://root@localhost/f1db pgsql:///f1db
2017-06-16T08:56:14.064000+02:00 LOG Main logs in &#39;/private/tmp/pgloader/pgloader.log&#39;
2017-06-16T08:56:14.068000+02:00 LOG Data errors in &#39;/private/tmp/pgloader/&#39;
2017-06-16T08:56:19.542000+02:00 LOG report summary reset
table name read imported errors total time
------------------------- --------- --------- --------- --------------
fetch meta data 33 33 0 0.365s
Create Schemas 0 0 0 0.007s
Create SQL Types 0 0 0 0.006s
Create tables 26 26 0 0.068s
Set Table OIDs 13 13 0 0.012s
------------------------- --------- --------- --------- --------------
f1db.constructorresults 11011 11011 0 0.205s
f1db.circuits 73 73 0 0.150s
f1db.constructors 208 208 0 0.059s
f1db.constructorstandings 11766 11766 0 0.365s
f1db.drivers 841 841 0 0.268s
f1db.laptimes 413578 413578 0 2.892s
f1db.driverstandings 31420 31420 0 0.583s
f1db.pitstops 5796 5796 0 2.154s
f1db.races 976 976 0 0.227s
f1db.qualifying 7257 7257 0 0.228s
f1db.seasons 68 68 0 0.527s
f1db.results 23514 23514 0 0.658s
f1db.status 133 133 0 0.130s
------------------------- --------- --------- --------- --------------
COPY Threads Completion 39 39 0 4.303s
Create Indexes 20 20 0 1.497s
Index Build Completion 20 20 0 0.214s
Reset Sequences 0 10 0 0.058s
Primary Keys 13 13 0 0.012s
Create Foreign Keys 0 0 0 0.000s
Create Triggers 0 0 0 0.001s
Install Comments 0 0 0 0.000s
------------------------- --------- --------- --------- --------------
Total import time 506641 506641 0 5.547s
</pre></div>
</div>
<p>You may need to have special cases to take care of tho, or views that you
want to materialize while doing the migration. In advanced case you can use
the pgloader command.</p>
</div>
<div class="section" id="id12">
<h3>The Command<a class="headerlink" href="#id12" title="Permalink to this headline"></a></h3>
<p>To load data with pgloader you need to define in a <em>command</em> the operations
in some details. Heres our example for loading the <a class="reference external" href="http://dev.mysql.com/doc/sakila/en/">MySQL Sakila Sample
Database</a>.</p>
<p>Heres our command:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>load database
from mysql://root@localhost/sakila
into postgresql:///sakila
WITH include drop, create tables, no truncate,
create indexes, reset sequences, foreign keys
SET maintenance_work_mem to &#39;128MB&#39;, work_mem to &#39;12MB&#39;, search_path to &#39;sakila&#39;
CAST type datetime to timestamptz
drop default drop not null using zero-dates-to-null,
type date drop not null drop default using zero-dates-to-null
MATERIALIZE VIEWS film_list, staff_list
-- INCLUDING ONLY TABLE NAMES MATCHING ~/film/, &#39;actor&#39;
-- EXCLUDING TABLE NAMES MATCHING ~&lt;ory&gt;
BEFORE LOAD DO
$$ create schema if not exists sakila; $$;
</pre></div>
</div>
<p>Note that here pgloader will benefit from the meta-data information found in
the MySQL database to create a PostgreSQL database capable of hosting the
data as described, then load the data.</p>
<p>In particular, some specific <em>casting rules</em> are given here, to cope with
date values such as <cite>0000-00-00</cite> that MySQL allows and PostgreSQL rejects
for not existing in our calendar. Its possible to add per-column casting
rules too, which is useful is some of your <cite>tinyint</cite> are in fact <cite>smallint</cite>
while some others are in fact <cite>boolean</cite> values.</p>
<p>Finaly note that we are using the <em>MATERIALIZE VIEWS</em> clause of pgloader:
the selected views here will be migrated over to PostgreSQL <em>with their
contents</em>.</p>
<p>Its possible to use the <em>MATERIALIZE VIEWS</em> clause and give both the name
and the SQL (in MySQL dialect) definition of view, then pgloader creates the
view before loading the data, then drops it again at the end.</p>
<p>## Loading the data</p>
<p>Lets start the <cite>pgloader</cite> command with our <cite>sakila.load</cite> command file:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>$ pgloader sakila.load
... LOG Starting pgloader, log system is ready.
... LOG Parsing commands from file &quot;/Users/dim/dev/pgloader/test/sakila.load&quot;
&lt;WARNING: table &quot;xxx&quot; does not exists have been edited away&gt;
table name read imported errors time
---------------------- --------- --------- --------- --------------
before load 1 1 0 0.007s
fetch meta data 45 45 0 0.402s
create, drop 0 36 0 0.208s
---------------------- --------- --------- --------- --------------
actor 200 200 0 0.071s
address 603 603 0 0.035s
category 16 16 0 0.018s
city 600 600 0 0.037s
country 109 109 0 0.023s
customer 599 599 0 0.073s
film 1000 1000 0 0.135s
film_actor 5462 5462 0 0.236s
film_category 1000 1000 0 0.070s
film_text 1000 1000 0 0.080s
inventory 4581 4581 0 0.136s
language 6 6 0 0.036s
payment 16049 16049 0 0.539s
rental 16044 16044 0 0.648s
staff 2 2 0 0.041s
store 2 2 0 0.036s
film_list 997 997 0 0.247s
staff_list 2 2 0 0.135s
Index Build Completion 0 0 0 0.000s
---------------------- --------- --------- --------- --------------
Create Indexes 41 41 0 0.964s
Reset Sequences 0 1 0 0.035s
Foreign Keys 22 22 0 0.254s
---------------------- --------- --------- --------- --------------
Total import time 48272 48272 0 3.502s
</pre></div>
</div>
<p>The <em>WARNING</em> messages we see here are expected as the PostgreSQL database
is empty when running the command, and pgloader is using the SQL commands
<cite>DROP TABLE IF EXISTS</cite> when the given command uses the <cite>include drop</cite>
option.</p>
<p>Note that the output of the command has been edited to facilitate its
browsing online.</p>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper"><div class="relations">
<h3>Related Topics</h3>
<ul>
<li><a href="../index.html">Documentation overview</a><ul>
<li>Previous: <a href="../intro.html" title="previous chapter">Introduction</a></li>
<li>Next: <a href="../pgloader.html" title="next chapter">PgLoader Reference Manual</a></li>
</ul></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3>Quick search</h3>
<form class="search" action="../search.html" method="get">
<div><input type="text" name="q" /></div>
<div><input type="submit" value="Go" /></div>
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="footer">
&copy;2017, Dimitri Fontaine.
|
Powered by <a href="http://sphinx-doc.org/">Sphinx 1.6.5</a>
&amp; <a href="https://github.com/bitprophet/alabaster">Alabaster 0.7.10</a>
|
<a href="../_sources/tutorial/tutorial.rst.txt"
rel="nofollow">Page source</a>
</div>
</body>
</html>