pgloader/docs/_build/html/tutorial/geolite.html
Dimitri Fontaine 25c79dfebc Switch the documentation to using Sphinx.
The website is moving to pgloader.org and readthedocs.io is going to be
integrated. Let's see what happens. The docs build fine locally with the
sphinx tools and the docs/Makefile.

Having separate files for the documentation should help ease the maintenance
and add new topics, such as support for Common Lisp Hackers level docs,
which are currently missing.
2017-12-21 17:45:09 +01:00

239 lines
11 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Loading MaxMind Geolite Data with pgloader &#8212; pgloader 3.4.1 documentation</title>
<link rel="stylesheet" href="../_static/alabaster.css" type="text/css" />
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: '../',
VERSION: '3.4.1',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true,
SOURCELINK_SUFFIX: '.txt'
};
</script>
<script type="text/javascript" src="../_static/jquery.js"></script>
<script type="text/javascript" src="../_static/underscore.js"></script>
<script type="text/javascript" src="../_static/doctools.js"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="stylesheet" href="../_static/custom.css" type="text/css" />
<meta name="viewport" content="width=device-width, initial-scale=0.9, maximum-scale=0.9" />
</head>
<body>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="loading-maxmind-geolite-data-with-pgloader">
<h1>Loading MaxMind Geolite Data with pgloader<a class="headerlink" href="#loading-maxmind-geolite-data-with-pgloader" title="Permalink to this headline"></a></h1>
<p><a class="reference external" href="http://www.maxmind.com/">MaxMind</a> provides a free dataset for
geolocation, which is quite popular. Using pgloader you can download the
lastest version of it, extract the CSV files from the archive and load their
content into your database directly.</p>
<div class="section" id="the-command">
<h2>The Command<a class="headerlink" href="#the-command" title="Permalink to this headline"></a></h2>
<p>To load data with pgloader you need to define in a <em>command</em> the operations
in some details. Heres our example for loading the Geolite data:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>/*
* Loading from a ZIP archive containing CSV files. The full test can be
* done with using the archive found at
* http://geolite.maxmind.com/download/geoip/database/GeoLiteCity_CSV/GeoLiteCity-latest.zip
*
* And a very light version of this data set is found at
* http://pgsql.tapoueh.org/temp/foo.zip for quick testing.
*/
LOAD ARCHIVE
FROM http://geolite.maxmind.com/download/geoip/database/GeoLiteCity_CSV/GeoLiteCity-latest.zip
INTO postgresql:///ip4r
BEFORE LOAD DO
$$ create extension if not exists ip4r; $$,
$$ create schema if not exists geolite; $$,
$$ create table if not exists geolite.location
(
locid integer primary key,
country text,
region text,
city text,
postalcode text,
location point,
metrocode text,
areacode text
);
$$,
$$ create table if not exists geolite.blocks
(
iprange ip4r,
locid integer
);
$$,
$$ drop index if exists geolite.blocks_ip4r_idx; $$,
$$ truncate table geolite.blocks, geolite.location cascade; $$
LOAD CSV
FROM FILENAME MATCHING ~/GeoLiteCity-Location.csv/
WITH ENCODING iso-8859-1
(
locId,
country,
region null if blanks,
city null if blanks,
postalCode null if blanks,
latitude,
longitude,
metroCode null if blanks,
areaCode null if blanks
)
INTO postgresql:///ip4r?geolite.location
(
locid,country,region,city,postalCode,
location point using (format nil &quot;(~a,~a)&quot; longitude latitude),
metroCode,areaCode
)
WITH skip header = 2,
fields optionally enclosed by &#39;&quot;&#39;,
fields escaped by double-quote,
fields terminated by &#39;,&#39;
AND LOAD CSV
FROM FILENAME MATCHING ~/GeoLiteCity-Blocks.csv/
WITH ENCODING iso-8859-1
(
startIpNum, endIpNum, locId
)
INTO postgresql:///ip4r?geolite.blocks
(
iprange ip4r using (ip-range startIpNum endIpNum),
locId
)
WITH skip header = 2,
fields optionally enclosed by &#39;&quot;&#39;,
fields escaped by double-quote,
fields terminated by &#39;,&#39;
FINALLY DO
$$ create index blocks_ip4r_idx on geolite.blocks using gist(iprange); $$;
</pre></div>
</div>
<p>Note that while the <em>Geolite</em> data is using a pair of integers (<em>start</em>,
<em>end</em>) to represent <em>ipv4</em> data, we use the very poweful <a class="reference external" href="https://github.com/RhodiumToad/ip4r">ip4r</a> PostgreSQL Extension instead.</p>
<p>The transformation from a pair of integers into an IP is done dynamically by
the pgloader process.</p>
<p>Also, the location is given as a pair of <em>float</em> columns for the <em>longitude</em>
and the <em>latitude</em> where PostgreSQL offers the
<a class="reference external" href="http://www.postgresql.org/docs/9.3/interactive/functions-geometry.html">point</a>
datatype, so the pgloader command here will actually transform the data on
the fly to use the appropriate data type and its input representation.</p>
</div>
<div class="section" id="loading-the-data">
<h2>Loading the data<a class="headerlink" href="#loading-the-data" title="Permalink to this headline"></a></h2>
<p>Heres how to start loading the data. Note that the ouput here has been
edited so as to facilitate its browsing online:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>$ pgloader archive.load
... LOG Starting pgloader, log system is ready.
... LOG Parsing commands from file &quot;/Users/dim/dev/pgloader/test/archive.load&quot;
... LOG Fetching &#39;http://geolite.maxmind.com/download/geoip/database/GeoLiteCity_CSV/GeoLiteCity-latest.zip&#39;
... LOG Extracting files from archive &#39;//private/var/folders/w7/9n8v8pw54t1gngfff0lj16040000gn/T/pgloader//GeoLiteCity-latest.zip&#39;
table name read imported errors time
----------------- --------- --------- --------- --------------
download 0 0 0 11.592s
extract 0 0 0 1.012s
before load 6 6 0 0.019s
----------------- --------- --------- --------- --------------
geolite.location 470387 470387 0 7.743s
geolite.blocks 1903155 1903155 0 16.332s
----------------- --------- --------- --------- --------------
finally 1 1 0 31.692s
----------------- --------- --------- --------- --------------
Total import time 2373542 2373542 0 1m8.390s
</pre></div>
</div>
<p>The timing of course includes the transformation of the <em>1.9 million</em> pairs
of integer into a single <em>ipv4 range</em> each. The <em>finally</em> step consists of
creating the <em>GiST</em> specialized index as given in the main command:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">CREATE</span> <span class="n">INDEX</span> <span class="n">blocks_ip4r_idx</span> <span class="n">ON</span> <span class="n">geolite</span><span class="o">.</span><span class="n">blocks</span> <span class="n">USING</span> <span class="n">gist</span><span class="p">(</span><span class="n">iprange</span><span class="p">);</span>
</pre></div>
</div>
<p>That index will then be used to speed up queries wanting to find which
recorded geolocation contains a specific IP address:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">ip4r</span><span class="o">&gt;</span> <span class="n">select</span> <span class="o">*</span>
<span class="kn">from</span> <span class="nn">geolite.location</span> <span class="n">l</span>
<span class="n">join</span> <span class="n">geolite</span><span class="o">.</span><span class="n">blocks</span> <span class="n">b</span> <span class="n">using</span><span class="p">(</span><span class="n">locid</span><span class="p">)</span>
<span class="n">where</span> <span class="n">iprange</span> <span class="o">&gt;&gt;=</span> <span class="s1">&#39;8.8.8.8&#39;</span><span class="p">;</span>
<span class="o">-</span><span class="p">[</span> <span class="n">RECORD</span> <span class="mi">1</span> <span class="p">]</span><span class="o">------------------</span>
<span class="n">locid</span> <span class="o">|</span> <span class="mi">223</span>
<span class="n">country</span> <span class="o">|</span> <span class="n">US</span>
<span class="n">region</span> <span class="o">|</span>
<span class="n">city</span> <span class="o">|</span>
<span class="n">postalcode</span> <span class="o">|</span>
<span class="n">location</span> <span class="o">|</span> <span class="p">(</span><span class="o">-</span><span class="mi">97</span><span class="p">,</span><span class="mi">38</span><span class="p">)</span>
<span class="n">metrocode</span> <span class="o">|</span>
<span class="n">areacode</span> <span class="o">|</span>
<span class="n">iprange</span> <span class="o">|</span> <span class="mf">8.8</span><span class="o">.</span><span class="mf">8.8</span><span class="o">-</span><span class="mf">8.8</span><span class="o">.</span><span class="mf">37.255</span>
<span class="n">Time</span><span class="p">:</span> <span class="mf">0.747</span> <span class="n">ms</span>
</pre></div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper"><div class="relations">
<h3>Related Topics</h3>
<ul>
<li><a href="../index.html">Documentation overview</a><ul>
</ul></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3>Quick search</h3>
<form class="search" action="../search.html" method="get">
<div><input type="text" name="q" /></div>
<div><input type="submit" value="Go" /></div>
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="footer">
&copy;2017, Dimitri Fontaine.
|
Powered by <a href="http://sphinx-doc.org/">Sphinx 1.6.5</a>
&amp; <a href="https://github.com/bitprophet/alabaster">Alabaster 0.7.10</a>
|
<a href="../_sources/tutorial/geolite.rst.txt"
rel="nofollow">Page source</a>
</div>
</body>
</html>