Rename web/ into docs/
This allows to benefit from github pages without having to maintain a separate orphaned branch.
|
Before Width: | Height: | Size: 61 KiB After Width: | Height: | Size: 61 KiB |
163
docs/howto/csv.html
Normal file
@ -0,0 +1,163 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<meta name="description" content="">
|
||||
<meta name="author" content="">
|
||||
<link rel="shortcut icon" href="../../docs-assets/ico/favicon.png">
|
||||
|
||||
<title>pgloader</title>
|
||||
|
||||
<!-- Bootstrap core CSS -->
|
||||
<link href="../dist/css/bootstrap.css" rel="stylesheet">
|
||||
|
||||
<!-- Custom styles for this template -->
|
||||
<link href="../dist/carousel.css" rel="stylesheet">
|
||||
</head>
|
||||
<!-- NAVBAR
|
||||
================================================== -->
|
||||
<body>
|
||||
<div class="navbar-wrapper">
|
||||
<div class="container">
|
||||
|
||||
<div class="navbar navbar-inverse navbar-static-top" role="navigation">
|
||||
<div class="container">
|
||||
<div class="navbar-header">
|
||||
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
|
||||
<span class="sr-only">Toggle navigation</span>
|
||||
<span class="icon-bar"></span>
|
||||
<span class="icon-bar"></span>
|
||||
<span class="icon-bar"></span>
|
||||
</button>
|
||||
<a class="navbar-brand" href="../index.html">pgloader</a>
|
||||
</div>
|
||||
<div class="navbar-collapse collapse">
|
||||
<ul class="nav navbar-nav">
|
||||
<li><a href="../index.html">Home</a></li>
|
||||
<li><a href="quickstart.html">Quick Start</a></li>
|
||||
<li><a href="pgloader.1.html">Reference documentation</a></li>
|
||||
<li class="dropdown active">
|
||||
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Advanced HowTos <b class="caret"></b></a>
|
||||
<ul class="dropdown-menu">
|
||||
<li class="dropdown-header">Plain Files</li>
|
||||
<li><a href="csv.html">CSV</a></li>
|
||||
<li><a href="fixed.html">Fixed format</a></li>
|
||||
<li><a href="geolite.html">Geolite</a></li>
|
||||
<li class="divider"></li>
|
||||
<li class="dropdown-header">Databases</li>
|
||||
<li><a href="dBase.html">dBase</a></li>
|
||||
<li><a href="sqlite.html">SQLite</a></li>
|
||||
<li><a href="mysql.html">MySQL</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a href="../download.html">Download</a></li>
|
||||
<li><a href="../sponsors.html">Sponsors</a></li>
|
||||
<li><a href="../pgloader-moral-license.html">License</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- an empty carousel -->
|
||||
<div id="myCarousel" class="carousel slide" data-ride="carousel" style="height: 100px">
|
||||
<div class="carousel-inner" style="height: 100px">
|
||||
<div class="item active" style="height: 100px">
|
||||
<img data-src="holder.js/900x100/auto/#777:#7a7a7a" style="height: 100px">
|
||||
<!-- <div class="container"> -->
|
||||
<!-- <div class="carousel-caption"> -->
|
||||
<!-- <h1>Load data into PostgreSQL. Fast.</h1> -->
|
||||
<!-- <p></p> -->
|
||||
<!-- </div> -->
|
||||
<!-- </div> -->
|
||||
</div>
|
||||
</div>
|
||||
</div><!-- /.carousel -->
|
||||
|
||||
<div class="container">
|
||||
<div class="row">
|
||||
<div class="col-md-2"> </div>
|
||||
<div class="col-md-8">
|
||||
<h1>Loading CSV Data with pgloader</h1><p>CSV means <em>comma separated values</em> and is often found with quite varying specifications. pgloader allows you to describe those specs in its command. </p><h2>The Command</h2><p>To load data with <a href="http://pgloader.io/">pgloader</a> you need to define in a <em>command</em> the operations in some details. Here's our example for loading CSV data: </p><pre><code> LOAD CSV
|
||||
FROM 'path/to/file.csv' (x, y, a, b, c, d)
|
||||
INTO postgresql:///pgloader?csv (a, b, d, c)
|
||||
|
||||
WITH truncate,
|
||||
skip header = 1,
|
||||
fields optionally enclosed by '"',
|
||||
fields escaped by double-quote,
|
||||
fields terminated by ','
|
||||
|
||||
SET client_encoding to 'latin1',
|
||||
work_mem to '12MB',
|
||||
standard_conforming_strings to 'on'
|
||||
|
||||
BEFORE LOAD DO
|
||||
$$ drop table if exists csv; $$,
|
||||
$$ create table csv (
|
||||
a bigint,
|
||||
b bigint,
|
||||
c char(2),
|
||||
d text
|
||||
);
|
||||
$$; </code></pre><p>You can see the full list of options in the <a href="pgloader.1.html">pgloader reference manual</a>, with a complete description of the options you see here. </p><h2>The Data</h2><p>This command allows loading the following CSV file content: </p><pre><code>Header, with a © sign
|
||||
"2.6.190.56","2.6.190.63","33996344","33996351","GB","United Kingdom"
|
||||
"3.0.0.0","4.17.135.31","50331648","68257567","US","United States"
|
||||
"4.17.135.32","4.17.135.63","68257568","68257599","CA","Canada"
|
||||
"4.17.135.64","4.17.142.255","68257600","68259583","US","United States"
|
||||
"4.17.143.0","4.17.143.15","68259584","68259599","CA","Canada"
|
||||
"4.17.143.16","4.18.32.71","68259600","68296775","US","United States" </code></pre><h2>Loading the data</h2><p>Here's how to start loading the data. Note that the ouput here has been edited so as to facilitate its browsing online. </p><pre><code>$ pgloader csv.load
|
||||
... LOG Starting pgloader, log system is ready.
|
||||
... LOG Parsing commands from file "/Users/dim/dev/pgloader/test/csv.load"
|
||||
|
||||
table name read imported errors time
|
||||
----------------- --------- --------- --------- --------------
|
||||
before load 2 2 0 0.039s
|
||||
----------------- --------- --------- --------- --------------
|
||||
csv 6 6 0 0.019s
|
||||
----------------- --------- --------- --------- --------------
|
||||
Total import time 6 6 0 0.058s </code></pre><h2>The result</h2><p>As you can see, the command described above is filtering the input and only importing some of the columns from the example data file. Here's what gets loaded in the PostgreSQL database: </p><pre><code>pgloader# table csv;
|
||||
a | b | c | d
|
||||
----------+----------+----+----------------
|
||||
33996344 | 33996351 | GB | United Kingdom
|
||||
50331648 | 68257567 | US | United States
|
||||
68257568 | 68257599 | CA | Canada
|
||||
68257600 | 68259583 | US | United States
|
||||
68259584 | 68259599 | CA | Canada
|
||||
68259600 | 68296775 | US | United States
|
||||
(6 rows) </code></pre> </div>
|
||||
<div class="col-md-2"> </div>
|
||||
</div>
|
||||
|
||||
<!-- FOOTER -->
|
||||
<footer>
|
||||
<p class="pull-right"><a href="#">Back to top</a></p>
|
||||
<p>© 2013-2014 Dimitri Fontaine. ·</p>
|
||||
</footer>
|
||||
|
||||
</div><!-- /.container -->
|
||||
|
||||
|
||||
<!-- Bootstrap core JavaScript
|
||||
================================================== -->
|
||||
<!-- Placed at the end of the document so the pages load faster -->
|
||||
<script src="https://code.jquery.com/jquery-1.10.2.min.js"></script>
|
||||
<script src="../dist/js/bootstrap.min.js"></script>
|
||||
<!-- <script src="docs-assets/js/holder.js"></script> -->
|
||||
|
||||
<script>
|
||||
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
|
||||
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
|
||||
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
|
||||
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
|
||||
|
||||
ga('create', 'UA-47059482-2', 'tapoueh.org');
|
||||
ga('send', 'pageview');
|
||||
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
134
docs/howto/dBase.html
Normal file
@ -0,0 +1,134 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<meta name="description" content="">
|
||||
<meta name="author" content="">
|
||||
<link rel="shortcut icon" href="../../docs-assets/ico/favicon.png">
|
||||
|
||||
<title>pgloader</title>
|
||||
|
||||
<!-- Bootstrap core CSS -->
|
||||
<link href="../dist/css/bootstrap.css" rel="stylesheet">
|
||||
|
||||
<!-- Custom styles for this template -->
|
||||
<link href="../dist/carousel.css" rel="stylesheet">
|
||||
</head>
|
||||
<!-- NAVBAR
|
||||
================================================== -->
|
||||
<body>
|
||||
<div class="navbar-wrapper">
|
||||
<div class="container">
|
||||
|
||||
<div class="navbar navbar-inverse navbar-static-top" role="navigation">
|
||||
<div class="container">
|
||||
<div class="navbar-header">
|
||||
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
|
||||
<span class="sr-only">Toggle navigation</span>
|
||||
<span class="icon-bar"></span>
|
||||
<span class="icon-bar"></span>
|
||||
<span class="icon-bar"></span>
|
||||
</button>
|
||||
<a class="navbar-brand" href="../index.html">pgloader</a>
|
||||
</div>
|
||||
<div class="navbar-collapse collapse">
|
||||
<ul class="nav navbar-nav">
|
||||
<li><a href="../index.html">Home</a></li>
|
||||
<li><a href="quickstart.html">Quick Start</a></li>
|
||||
<li><a href="pgloader.1.html">Reference documentation</a></li>
|
||||
<li class="dropdown active">
|
||||
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Advanced HowTos <b class="caret"></b></a>
|
||||
<ul class="dropdown-menu">
|
||||
<li class="dropdown-header">Plain Files</li>
|
||||
<li><a href="csv.html">CSV</a></li>
|
||||
<li><a href="fixed.html">Fixed format</a></li>
|
||||
<li><a href="geolite.html">Geolite</a></li>
|
||||
<li class="divider"></li>
|
||||
<li class="dropdown-header">Databases</li>
|
||||
<li><a href="dBase.html">dBase</a></li>
|
||||
<li><a href="sqlite.html">SQLite</a></li>
|
||||
<li><a href="mysql.html">MySQL</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a href="../download.html">Download</a></li>
|
||||
<li><a href="../sponsors.html">Sponsors</a></li>
|
||||
<li><a href="../pgloader-moral-license.html">License</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- an empty carousel -->
|
||||
<div id="myCarousel" class="carousel slide" data-ride="carousel" style="height: 100px">
|
||||
<div class="carousel-inner" style="height: 100px">
|
||||
<div class="item active" style="height: 100px">
|
||||
<img data-src="holder.js/900x100/auto/#777:#7a7a7a" style="height: 100px">
|
||||
<!-- <div class="container"> -->
|
||||
<!-- <div class="carousel-caption"> -->
|
||||
<!-- <h1>Load data into PostgreSQL. Fast.</h1> -->
|
||||
<!-- <p></p> -->
|
||||
<!-- </div> -->
|
||||
<!-- </div> -->
|
||||
</div>
|
||||
</div>
|
||||
</div><!-- /.carousel -->
|
||||
|
||||
<div class="container">
|
||||
<div class="row">
|
||||
<div class="col-md-2"> </div>
|
||||
<div class="col-md-8">
|
||||
<h1>Loading dBase files with pgloader</h1><p>The dBase format is still in use in some places as modern tools such as <em>Filemaker</em> and <em>Excel</em> offer some level of support for it. Speaking of support in modern tools, pgloader is right there on the list too! </p><h2>The Command</h2><p>To load data with <a href="http://pgloader.io/">pgloader</a> you need to define in a <em>command</em> the operations in some details. Here's our example for loading a dBase file, using a file provided by the french administration. </p><p>You can find more files from them at the <a href="http://www.insee.fr/fr/methodes/nomenclatures/cog/telechargement.asp">Insee</a> website. </p><p>Here's our command: </p><pre><code>LOAD DBF
|
||||
FROM http://www.insee.fr/fr/methodes/nomenclatures/cog/telechargement/2013/dbf/historiq2013.zip
|
||||
INTO postgresql:///pgloader
|
||||
WITH truncate, create table
|
||||
SET client_encoding TO 'latin1'; </code></pre><p>You can see the full list of options in the <a href="pgloader.1.html">pgloader reference manual</a>, with a complete description of the options you see here. </p><p>Note that here pgloader will benefit from the meta-data information found in the dBase file to create a PostgreSQL table capable of hosting the data as described, then load the data. </p><h2>Loading the data</h2><p>Let's start the <code>pgloader</code> command with our <code>dbf-zip.load</code> command file: </p><pre><code>$ pgloader dbf-zip.load
|
||||
... LOG Starting pgloader, log system is ready.
|
||||
... LOG Parsing commands from file "/Users/dim/dev/pgloader/test/dbf-zip.load"
|
||||
... LOG Fetching 'http://www.insee.fr/fr/methodes/nomenclatures/cog/telechargement/2013/dbf/historiq2013.zip'
|
||||
... LOG Extracting files from archive '//private/var/folders/w7/9n8v8pw54t1gngfff0lj16040000gn/T/pgloader//historiq2013.zip'
|
||||
|
||||
table name read imported errors time
|
||||
----------------- --------- --------- --------- --------------
|
||||
download 0 0 0 0.167s
|
||||
extract 0 0 0 1.010s
|
||||
create, truncate 0 0 0 0.071s
|
||||
----------------- --------- --------- --------- --------------
|
||||
historiq2013 9181 9181 0 0.658s
|
||||
----------------- --------- --------- --------- --------------
|
||||
Total import time 9181 9181 0 1.906s </code></pre><p>We can see that <a href="pgloader">http://pgloader.io</a> did download the file from its HTTP URL location then <em>unziped</em> it before the loading itself. </p><p>Note that the output of the command has been edited to facilitate its browsing online. </p> </div>
|
||||
<div class="col-md-2"> </div>
|
||||
</div>
|
||||
|
||||
<!-- FOOTER -->
|
||||
<footer>
|
||||
<p class="pull-right"><a href="#">Back to top</a></p>
|
||||
<p>© 2013-2014 Dimitri Fontaine. ·</p>
|
||||
</footer>
|
||||
|
||||
</div><!-- /.container -->
|
||||
|
||||
|
||||
<!-- Bootstrap core JavaScript
|
||||
================================================== -->
|
||||
<!-- Placed at the end of the document so the pages load faster -->
|
||||
<script src="https://code.jquery.com/jquery-1.10.2.min.js"></script>
|
||||
<script src="../dist/js/bootstrap.min.js"></script>
|
||||
<!-- <script src="docs-assets/js/holder.js"></script> -->
|
||||
|
||||
<script>
|
||||
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
|
||||
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
|
||||
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
|
||||
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
|
||||
|
||||
ga('create', 'UA-47059482-2', 'tapoueh.org');
|
||||
ga('send', 'pageview');
|
||||
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
171
docs/howto/fixed.html
Normal file
@ -0,0 +1,171 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<meta name="description" content="">
|
||||
<meta name="author" content="">
|
||||
<link rel="shortcut icon" href="../../docs-assets/ico/favicon.png">
|
||||
|
||||
<title>pgloader</title>
|
||||
|
||||
<!-- Bootstrap core CSS -->
|
||||
<link href="../dist/css/bootstrap.css" rel="stylesheet">
|
||||
|
||||
<!-- Custom styles for this template -->
|
||||
<link href="../dist/carousel.css" rel="stylesheet">
|
||||
</head>
|
||||
<!-- NAVBAR
|
||||
================================================== -->
|
||||
<body>
|
||||
<div class="navbar-wrapper">
|
||||
<div class="container">
|
||||
|
||||
<div class="navbar navbar-inverse navbar-static-top" role="navigation">
|
||||
<div class="container">
|
||||
<div class="navbar-header">
|
||||
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
|
||||
<span class="sr-only">Toggle navigation</span>
|
||||
<span class="icon-bar"></span>
|
||||
<span class="icon-bar"></span>
|
||||
<span class="icon-bar"></span>
|
||||
</button>
|
||||
<a class="navbar-brand" href="../index.html">pgloader</a>
|
||||
</div>
|
||||
<div class="navbar-collapse collapse">
|
||||
<ul class="nav navbar-nav">
|
||||
<li><a href="../index.html">Home</a></li>
|
||||
<li><a href="quickstart.html">Quick Start</a></li>
|
||||
<li><a href="pgloader.1.html">Reference documentation</a></li>
|
||||
<li class="dropdown active">
|
||||
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Advanced HowTos <b class="caret"></b></a>
|
||||
<ul class="dropdown-menu">
|
||||
<li class="dropdown-header">Plain Files</li>
|
||||
<li><a href="csv.html">CSV</a></li>
|
||||
<li><a href="fixed.html">Fixed format</a></li>
|
||||
<li><a href="geolite.html">Geolite</a></li>
|
||||
<li class="divider"></li>
|
||||
<li class="dropdown-header">Databases</li>
|
||||
<li><a href="dBase.html">dBase</a></li>
|
||||
<li><a href="sqlite.html">SQLite</a></li>
|
||||
<li><a href="mysql.html">MySQL</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a href="../download.html">Download</a></li>
|
||||
<li><a href="../sponsors.html">Sponsors</a></li>
|
||||
<li><a href="../pgloader-moral-license.html">License</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- an empty carousel -->
|
||||
<div id="myCarousel" class="carousel slide" data-ride="carousel" style="height: 100px">
|
||||
<div class="carousel-inner" style="height: 100px">
|
||||
<div class="item active" style="height: 100px">
|
||||
<img data-src="holder.js/900x100/auto/#777:#7a7a7a" style="height: 100px">
|
||||
<!-- <div class="container"> -->
|
||||
<!-- <div class="carousel-caption"> -->
|
||||
<!-- <h1>Load data into PostgreSQL. Fast.</h1> -->
|
||||
<!-- <p></p> -->
|
||||
<!-- </div> -->
|
||||
<!-- </div> -->
|
||||
</div>
|
||||
</div>
|
||||
</div><!-- /.carousel -->
|
||||
|
||||
<div class="container">
|
||||
<div class="row">
|
||||
<div class="col-md-2"> </div>
|
||||
<div class="col-md-8">
|
||||
<h1>Loading Fixed Width Data File with pgloader</h1><p>Some data providers still use a format where each column is specified with a starting index position and a given length. Usually the columns are blank-padded when the data is shorter than the full reserved range. </p><h2>The Command</h2><p>To load data with <a href="http://pgloader.io/">pgloader</a> you need to define in a <em>command</em> the operations in some details. Here's our example for loading Fixed Width Data, using a file provided by the US census. </p><p>You can find more files from them at the <a href="http://www.census.gov/geo/maps-data/data/gazetteer2000.html">Census 2000 Gazetteer Files</a>. </p><p>Here's our command: </p><pre><code>LOAD ARCHIVE
|
||||
FROM http://www.census.gov/geo/maps-data/data/docs/gazetteer/places2k.zip
|
||||
INTO postgresql:///pgloader
|
||||
|
||||
BEFORE LOAD DO
|
||||
$$ drop table if exists places; $$,
|
||||
$$ create table places
|
||||
(
|
||||
usps char(2) not null,
|
||||
fips char(2) not null,
|
||||
fips_code char(5),
|
||||
loc_name varchar(64)
|
||||
);
|
||||
$$
|
||||
|
||||
LOAD FIXED
|
||||
FROM FILENAME MATCHING ~/places2k.txt/
|
||||
WITH ENCODING latin1
|
||||
(
|
||||
usps from 0 for 2,
|
||||
fips from 2 for 2,
|
||||
fips_code from 4 for 5,
|
||||
"LocationName" from 9 for 64 [trim right whitespace],
|
||||
p from 73 for 9,
|
||||
h from 82 for 9,
|
||||
land from 91 for 14,
|
||||
water from 105 for 14,
|
||||
ldm from 119 for 14,
|
||||
wtm from 131 for 14,
|
||||
lat from 143 for 10,
|
||||
long from 153 for 11
|
||||
)
|
||||
INTO postgresql:///pgloader?places
|
||||
(
|
||||
usps, fips, fips_code, "LocationName"
|
||||
); </code></pre><p>You can see the full list of options in the <a href="pgloader.1.html">pgloader reference manual</a>, with a complete description of the options you see here. </p><h2>The Data</h2><p>This command allows loading the following file content, where we are only showing the first couple of lines: </p><pre><code>AL0100124Abbeville city 2987 1353 40301945 120383 15.560669 0.046480 31.566367 -85.251300
|
||||
AL0100460Adamsville city 4965 2042 50779330 14126 19.606010 0.005454 33.590411 -86.949166
|
||||
AL0100484Addison town 723 339 9101325 0 3.514041 0.000000 34.200042 -87.177851
|
||||
AL0100676Akron town 521 239 1436797 0 0.554750 0.000000 32.876425 -87.740978
|
||||
AL0100820Alabaster city 22619 8594 53023800 141711 20.472605 0.054715 33.231162 -86.823829
|
||||
AL0100988Albertville city 17247 7090 67212867 258738 25.951034 0.099899 34.265362 -86.211261
|
||||
AL0101132Alexander City city 15008 6855 100534344 433413 38.816529 0.167342 32.933157 -85.936008 </code></pre><h2>Loading the data</h2><p>Let's start the <code>pgloader</code> command with our <code>census-places.load</code> command file: </p><pre><code>$ pgloader census-places.load
|
||||
... LOG Starting pgloader, log system is ready.
|
||||
... LOG Parsing commands from file "/Users/dim/dev/pgloader/test/census-places.load"
|
||||
... LOG Fetching 'http://www.census.gov/geo/maps-data/data/docs/gazetteer/places2k.zip'
|
||||
... LOG Extracting files from archive '//private/var/folders/w7/9n8v8pw54t1gngfff0lj16040000gn/T/pgloader//places2k.zip'
|
||||
|
||||
table name read imported errors time
|
||||
----------------- --------- --------- --------- --------------
|
||||
download 0 0 0 1.494s
|
||||
extract 0 0 0 1.013s
|
||||
before load 2 2 0 0.013s
|
||||
----------------- --------- --------- --------- --------------
|
||||
places 25375 25375 0 0.499s
|
||||
----------------- --------- --------- --------- --------------
|
||||
Total import time 25375 25375 0 3.019s </code></pre><p>We can see that <a href="pgloader">http://pgloader.io</a> did download the file from its HTTP URL location then <em>unziped</em> it before the loading itself. </p><p>Note that the output of the command has been edited to facilitate its browsing online. </p> </div>
|
||||
<div class="col-md-2"> </div>
|
||||
</div>
|
||||
|
||||
<!-- FOOTER -->
|
||||
<footer>
|
||||
<p class="pull-right"><a href="#">Back to top</a></p>
|
||||
<p>© 2013-2014 Dimitri Fontaine. ·</p>
|
||||
</footer>
|
||||
|
||||
</div><!-- /.container -->
|
||||
|
||||
|
||||
<!-- Bootstrap core JavaScript
|
||||
================================================== -->
|
||||
<!-- Placed at the end of the document so the pages load faster -->
|
||||
<script src="https://code.jquery.com/jquery-1.10.2.min.js"></script>
|
||||
<script src="../dist/js/bootstrap.min.js"></script>
|
||||
<!-- <script src="docs-assets/js/holder.js"></script> -->
|
||||
|
||||
<script>
|
||||
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
|
||||
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
|
||||
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
|
||||
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
|
||||
|
||||
ga('create', 'UA-47059482-2', 'tapoueh.org');
|
||||
ga('send', 'pageview');
|
||||
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
228
docs/howto/geolite.html
Normal file
@ -0,0 +1,228 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<meta name="description" content="">
|
||||
<meta name="author" content="">
|
||||
<link rel="shortcut icon" href="../../docs-assets/ico/favicon.png">
|
||||
|
||||
<title>pgloader</title>
|
||||
|
||||
<!-- Bootstrap core CSS -->
|
||||
<link href="../dist/css/bootstrap.css" rel="stylesheet">
|
||||
|
||||
<!-- Custom styles for this template -->
|
||||
<link href="../dist/carousel.css" rel="stylesheet">
|
||||
</head>
|
||||
<!-- NAVBAR
|
||||
================================================== -->
|
||||
<body>
|
||||
<div class="navbar-wrapper">
|
||||
<div class="container">
|
||||
|
||||
<div class="navbar navbar-inverse navbar-static-top" role="navigation">
|
||||
<div class="container">
|
||||
<div class="navbar-header">
|
||||
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
|
||||
<span class="sr-only">Toggle navigation</span>
|
||||
<span class="icon-bar"></span>
|
||||
<span class="icon-bar"></span>
|
||||
<span class="icon-bar"></span>
|
||||
</button>
|
||||
<a class="navbar-brand" href="../index.html">pgloader</a>
|
||||
</div>
|
||||
<div class="navbar-collapse collapse">
|
||||
<ul class="nav navbar-nav">
|
||||
<li><a href="../index.html">Home</a></li>
|
||||
<li><a href="quickstart.html">Quick Start</a></li>
|
||||
<li><a href="pgloader.1.html">Reference documentation</a></li>
|
||||
<li class="dropdown active">
|
||||
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Advanced HowTos <b class="caret"></b></a>
|
||||
<ul class="dropdown-menu">
|
||||
<li class="dropdown-header">Plain Files</li>
|
||||
<li><a href="csv.html">CSV</a></li>
|
||||
<li><a href="fixed.html">Fixed format</a></li>
|
||||
<li><a href="geolite.html">Geolite</a></li>
|
||||
<li class="divider"></li>
|
||||
<li class="dropdown-header">Databases</li>
|
||||
<li><a href="dBase.html">dBase</a></li>
|
||||
<li><a href="sqlite.html">SQLite</a></li>
|
||||
<li><a href="mysql.html">MySQL</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a href="../download.html">Download</a></li>
|
||||
<li><a href="../sponsors.html">Sponsors</a></li>
|
||||
<li><a href="../pgloader-moral-license.html">License</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- an empty carousel -->
|
||||
<div id="myCarousel" class="carousel slide" data-ride="carousel" style="height: 100px">
|
||||
<div class="carousel-inner" style="height: 100px">
|
||||
<div class="item active" style="height: 100px">
|
||||
<img data-src="holder.js/900x100/auto/#777:#7a7a7a" style="height: 100px">
|
||||
<!-- <div class="container"> -->
|
||||
<!-- <div class="carousel-caption"> -->
|
||||
<!-- <h1>Load data into PostgreSQL. Fast.</h1> -->
|
||||
<!-- <p></p> -->
|
||||
<!-- </div> -->
|
||||
<!-- </div> -->
|
||||
</div>
|
||||
</div>
|
||||
</div><!-- /.carousel -->
|
||||
|
||||
<div class="container">
|
||||
<div class="row">
|
||||
<div class="col-md-2"> </div>
|
||||
<div class="col-md-8">
|
||||
<h1>Loading MaxMind Geolite Data with pgloader</h1><p>The <a href="http://www.maxmind.com/">MaxMind</a> provides a free dataset for geolocation, which is quite popular. Using pgloader you can download the lastest version of it, extract the CSV files from the archive and load their content into your database directly. </p><h2>The Command</h2><p>To load data with <a href="http://pgloader.io/">pgloader</a> you need to define in a <em>command</em> the operations in some details. Here's our example for loading the Geolite data: </p><pre><code>/*
|
||||
* Loading from a ZIP archive containing CSV files. The full test can be
|
||||
* done with using the archive found at
|
||||
* http://geolite.maxmind.com/download/geoip/database/GeoLiteCity_CSV/GeoLiteCity-latest.zip
|
||||
*
|
||||
* And a very light version of this data set is found at
|
||||
* http://pgsql.tapoueh.org/temp/foo.zip for quick testing.
|
||||
*/
|
||||
|
||||
LOAD ARCHIVE
|
||||
FROM http://geolite.maxmind.com/download/geoip/database/GeoLiteCity_CSV/GeoLiteCity-latest.zip
|
||||
INTO postgresql:///ip4r
|
||||
|
||||
BEFORE LOAD DO
|
||||
$$ create extension if not exists ip4r; $$,
|
||||
$$ create schema if not exists geolite; $$,
|
||||
$$ create table if not exists geolite.location
|
||||
(
|
||||
locid integer primary key,
|
||||
country text,
|
||||
region text,
|
||||
city text,
|
||||
postalcode text,
|
||||
location point,
|
||||
metrocode text,
|
||||
areacode text
|
||||
);
|
||||
$$,
|
||||
$$ create table if not exists geolite.blocks
|
||||
(
|
||||
iprange ip4r,
|
||||
locid integer
|
||||
);
|
||||
$$,
|
||||
$$ drop index if exists geolite.blocks_ip4r_idx; $$,
|
||||
$$ truncate table geolite.blocks, geolite.location cascade; $$
|
||||
|
||||
LOAD CSV
|
||||
FROM FILENAME MATCHING ~/GeoLiteCity-Location.csv/
|
||||
WITH ENCODING iso-8859-1
|
||||
(
|
||||
locId,
|
||||
country,
|
||||
region null if blanks,
|
||||
city null if blanks,
|
||||
postalCode null if blanks,
|
||||
latitude,
|
||||
longitude,
|
||||
metroCode null if blanks,
|
||||
areaCode null if blanks
|
||||
)
|
||||
INTO postgresql:///ip4r?geolite.location
|
||||
(
|
||||
locid,country,region,city,postalCode,
|
||||
location point using (format nil "(~a,~a)" longitude latitude),
|
||||
metroCode,areaCode
|
||||
)
|
||||
WITH skip header = 2,
|
||||
fields optionally enclosed by '"',
|
||||
fields escaped by double-quote,
|
||||
fields terminated by ','
|
||||
|
||||
AND LOAD CSV
|
||||
FROM FILENAME MATCHING ~/GeoLiteCity-Blocks.csv/
|
||||
WITH ENCODING iso-8859-1
|
||||
(
|
||||
startIpNum, endIpNum, locId
|
||||
)
|
||||
INTO postgresql:///ip4r?geolite.blocks
|
||||
(
|
||||
iprange ip4r using (ip-range startIpNum endIpNum),
|
||||
locId
|
||||
)
|
||||
WITH skip header = 2,
|
||||
fields optionally enclosed by '"',
|
||||
fields escaped by double-quote,
|
||||
fields terminated by ','
|
||||
|
||||
FINALLY DO
|
||||
$$ create index blocks_ip4r_idx on geolite.blocks using gist(iprange); $$; </code></pre><p>You can see the full list of options in the <a href="pgloader.1.html">pgloader reference manual</a>, with a complete description of the options you see here. </p><p>Note that while the <em>Geolite</em> data is using a pair of integers (<em>start</em>, <em>end</em>) to represent <em>ipv4</em> data, we use the very poweful <a href="https://github.com/RhodiumToad/ip4r">ip4r</a> PostgreSQL Extension instead. </p><p>The transformation from a pair of integers into an IP is done dynamically by the pgloader process. </p><p>Also, the location is given as a pair of <em>float</em> columns for the <em>longitude</em> and the <em>latitude</em> where PostgreSQL offers the <a href="http://www.postgresql.org/docs/9.3/interactive/functions-geometry.html">point</a> datatype, so the pgloader command here will actually transform the data on the fly to use the appropriate data type and its input representation. </p><h2>Loading the data</h2><p>Here's how to start loading the data. Note that the ouput here has been edited so as to facilitate its browsing online. </p><pre><code>$ pgloader archive.load
|
||||
... LOG Starting pgloader, log system is ready.
|
||||
... LOG Parsing commands from file "/Users/dim/dev/pgloader/test/archive.load"
|
||||
... LOG Fetching 'http://geolite.maxmind.com/download/geoip/database/GeoLiteCity_CSV/GeoLiteCity-latest.zip'
|
||||
... LOG Extracting files from archive '//private/var/folders/w7/9n8v8pw54t1gngfff0lj16040000gn/T/pgloader//GeoLiteCity-latest.zip'
|
||||
|
||||
table name read imported errors time
|
||||
----------------- --------- --------- --------- --------------
|
||||
download 0 0 0 11.592s
|
||||
extract 0 0 0 1.012s
|
||||
before load 6 6 0 0.019s
|
||||
----------------- --------- --------- --------- --------------
|
||||
geolite.location 470387 470387 0 7.743s
|
||||
geolite.blocks 1903155 1903155 0 16.332s
|
||||
----------------- --------- --------- --------- --------------
|
||||
finally 1 1 0 31.692s
|
||||
----------------- --------- --------- --------- --------------
|
||||
Total import time 2373542 2373542 0 1m8.390s </code></pre><p>The timing of course includes the transformation of the <em>1.9 million</em> pairs of integer into a single <em>ipv4 range</em> each. The <em>finally</em> step consists of creating the <em>GiST</em> specialized index as given in the main command: </p><pre><code>CREATE INDEX blocks_ip4r_idx ON geolite.blocks USING gist(iprange); </code></pre><p>That index will then be used to speed up queries wanting to find which recorded geolocation contains a specific IP address: </p><pre><code>ip4r> select *
|
||||
from geolite.location l
|
||||
join geolite.blocks b using(locid)
|
||||
where iprange >>= '8.8.8.8';
|
||||
|
||||
-[ RECORD 1 ]------------------
|
||||
locid | 223
|
||||
country | US
|
||||
region |
|
||||
city |
|
||||
postalcode |
|
||||
location | (-97,38)
|
||||
metrocode |
|
||||
areacode |
|
||||
iprange | 8.8.8.8-8.8.37.255
|
||||
|
||||
Time: 0.747 ms </code></pre> </div>
|
||||
<div class="col-md-2"> </div>
|
||||
</div>
|
||||
|
||||
<!-- FOOTER -->
|
||||
<footer>
|
||||
<p class="pull-right"><a href="#">Back to top</a></p>
|
||||
<p>© 2013-2014 Dimitri Fontaine. ·</p>
|
||||
</footer>
|
||||
|
||||
</div><!-- /.container -->
|
||||
|
||||
|
||||
<!-- Bootstrap core JavaScript
|
||||
================================================== -->
|
||||
<!-- Placed at the end of the document so the pages load faster -->
|
||||
<script src="https://code.jquery.com/jquery-1.10.2.min.js"></script>
|
||||
<script src="../dist/js/bootstrap.min.js"></script>
|
||||
<!-- <script src="docs-assets/js/holder.js"></script> -->
|
||||
|
||||
<script>
|
||||
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
|
||||
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
|
||||
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
|
||||
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
|
||||
|
||||
ga('create', 'UA-47059482-2', 'tapoueh.org');
|
||||
ga('send', 'pageview');
|
||||
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
170
docs/howto/mysql.html
Normal file
@ -0,0 +1,170 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<meta name="description" content="">
|
||||
<meta name="author" content="">
|
||||
<link rel="shortcut icon" href="../../docs-assets/ico/favicon.png">
|
||||
|
||||
<title>pgloader</title>
|
||||
|
||||
<!-- Bootstrap core CSS -->
|
||||
<link href="../dist/css/bootstrap.css" rel="stylesheet">
|
||||
|
||||
<!-- Custom styles for this template -->
|
||||
<link href="../dist/carousel.css" rel="stylesheet">
|
||||
</head>
|
||||
<!-- NAVBAR
|
||||
================================================== -->
|
||||
<body>
|
||||
<div class="navbar-wrapper">
|
||||
<div class="container">
|
||||
|
||||
<div class="navbar navbar-inverse navbar-static-top" role="navigation">
|
||||
<div class="container">
|
||||
<div class="navbar-header">
|
||||
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
|
||||
<span class="sr-only">Toggle navigation</span>
|
||||
<span class="icon-bar"></span>
|
||||
<span class="icon-bar"></span>
|
||||
<span class="icon-bar"></span>
|
||||
</button>
|
||||
<a class="navbar-brand" href="../index.html">pgloader</a>
|
||||
</div>
|
||||
<div class="navbar-collapse collapse">
|
||||
<ul class="nav navbar-nav">
|
||||
<li><a href="../index.html">Home</a></li>
|
||||
<li><a href="quickstart.html">Quick Start</a></li>
|
||||
<li><a href="pgloader.1.html">Reference documentation</a></li>
|
||||
<li class="dropdown active">
|
||||
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Advanced HowTos <b class="caret"></b></a>
|
||||
<ul class="dropdown-menu">
|
||||
<li class="dropdown-header">Plain Files</li>
|
||||
<li><a href="csv.html">CSV</a></li>
|
||||
<li><a href="fixed.html">Fixed format</a></li>
|
||||
<li><a href="geolite.html">Geolite</a></li>
|
||||
<li class="divider"></li>
|
||||
<li class="dropdown-header">Databases</li>
|
||||
<li><a href="dBase.html">dBase</a></li>
|
||||
<li><a href="sqlite.html">SQLite</a></li>
|
||||
<li><a href="mysql.html">MySQL</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a href="../download.html">Download</a></li>
|
||||
<li><a href="../sponsors.html">Sponsors</a></li>
|
||||
<li><a href="../pgloader-moral-license.html">License</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- an empty carousel -->
|
||||
<div id="myCarousel" class="carousel slide" data-ride="carousel" style="height: 100px">
|
||||
<div class="carousel-inner" style="height: 100px">
|
||||
<div class="item active" style="height: 100px">
|
||||
<img data-src="holder.js/900x100/auto/#777:#7a7a7a" style="height: 100px">
|
||||
<!-- <div class="container"> -->
|
||||
<!-- <div class="carousel-caption"> -->
|
||||
<!-- <h1>Load data into PostgreSQL. Fast.</h1> -->
|
||||
<!-- <p></p> -->
|
||||
<!-- </div> -->
|
||||
<!-- </div> -->
|
||||
</div>
|
||||
</div>
|
||||
</div><!-- /.carousel -->
|
||||
|
||||
<div class="container">
|
||||
<div class="row">
|
||||
<div class="col-md-2"> </div>
|
||||
<div class="col-md-8">
|
||||
<h1>Migrating from MySQL with pgloader</h1><p>If you want to migrate your data over to <a href="http://www.postgresql.org">PostgreSQL</a> from MySQL then pgloader is the tool of choice! </p><p>Most tools around are skipping the main problem with migrating from MySQL, which is to do with the type casting and data sanitizing that needs to be done. pgloader will not leave you alone on those topics. </p><h2>The Command</h2><p>To load data with <a href="http://pgloader.tapoueh.org/">pgloader</a> you need to define in a <em>command</em> the operations in some details. Here's our example for loading the <a href="http://dev.mysql.com/doc/sakila/en/">MySQL Sakila Sample Database</a>: </p><p>Here's our command: </p><pre><code>load database
|
||||
from mysql://root@localhost/sakila
|
||||
into postgresql:///sakila
|
||||
|
||||
WITH include drop, create tables, no truncate,
|
||||
create indexes, reset sequences, foreign keys
|
||||
|
||||
SET maintenance_work_mem to '128MB', work_mem to '12MB', search_path to 'sakila'
|
||||
|
||||
CAST type datetime to timestamptz
|
||||
drop default drop not null using zero-dates-to-null,
|
||||
type date drop not null drop default using zero-dates-to-null
|
||||
|
||||
MATERIALIZE VIEWS film_list, staff_list
|
||||
|
||||
-- INCLUDING ONLY TABLE NAMES MATCHING ~/film/, 'actor'
|
||||
-- EXCLUDING TABLE NAMES MATCHING ~<ory>
|
||||
|
||||
BEFORE LOAD DO
|
||||
$$ create schema if not exists sakila; $$; </code></pre><p>You can see the full list of options in the <a href="pgloader.1.html">pgloader reference manual</a>, with a complete description of the options you see here. </p><p>Note that here pgloader will benefit from the meta-data information found in the MySQL database to create a PostgreSQL database capable of hosting the data as described, then load the data. </p><p>In particular, some specific <em>casting rules</em> are given here, to cope with date values such as <code>0000-00-00</code> that MySQL allows and PostgreSQL rejects for not existing in our calendar. It's possible to add per-column casting rules too, which is useful is some of your <code>tinyint</code> are in fact <code>smallint</code> while some others are in fact <code>boolean</code> values. </p><p>Finaly note that we are using the <em>MATERIALIZE VIEWS</em> clause of pgloader: the selected views here will be migrated over to PostgreSQL <em>with their contents</em>. </p><p>It's possible to use the <em>MATERIALIZE VIEWS</em> clause and give both the name and the SQL (in MySQL dialect) definition of view, then pgloader creates the view before loading the data, then drops it again at the end. </p><h2>Loading the data</h2><p>Let's start the <code>pgloader</code> command with our <code>sakila.load</code> command file: </p><pre><code>$ pgloader sakila.load
|
||||
... LOG Starting pgloader, log system is ready.
|
||||
... LOG Parsing commands from file "/Users/dim/dev/pgloader/test/sakila.load"
|
||||
<WARNING: table "xxx" does not exists have been edited away>
|
||||
|
||||
table name read imported errors time
|
||||
---------------------- --------- --------- --------- --------------
|
||||
before load 1 1 0 0.007s
|
||||
fetch meta data 45 45 0 0.402s
|
||||
create, drop 0 36 0 0.208s
|
||||
---------------------- --------- --------- --------- --------------
|
||||
actor 200 200 0 0.071s
|
||||
address 603 603 0 0.035s
|
||||
category 16 16 0 0.018s
|
||||
city 600 600 0 0.037s
|
||||
country 109 109 0 0.023s
|
||||
customer 599 599 0 0.073s
|
||||
film 1000 1000 0 0.135s
|
||||
film_actor 5462 5462 0 0.236s
|
||||
film_category 1000 1000 0 0.070s
|
||||
film_text 1000 1000 0 0.080s
|
||||
inventory 4581 4581 0 0.136s
|
||||
language 6 6 0 0.036s
|
||||
payment 16049 16049 0 0.539s
|
||||
rental 16044 16044 0 0.648s
|
||||
staff 2 2 0 0.041s
|
||||
store 2 2 0 0.036s
|
||||
film_list 997 997 0 0.247s
|
||||
staff_list 2 2 0 0.135s
|
||||
Index Build Completion 0 0 0 0.000s
|
||||
---------------------- --------- --------- --------- --------------
|
||||
Create Indexes 41 41 0 0.964s
|
||||
Reset Sequences 0 1 0 0.035s
|
||||
Foreign Keys 22 22 0 0.254s
|
||||
---------------------- --------- --------- --------- --------------
|
||||
Total import time 48272 48272 0 3.502s </code></pre><p>The <em>WARNING</em> messages we see here are expected as the PostgreSQL database is empty when running the command, and pgloader is using the SQL commands <code>DROP TABLE IF EXISTS</code> when the given command uses the <code>include drop</code> option. </p><p>Note that the output of the command has been edited to facilitate its browsing online. </p> </div>
|
||||
<div class="col-md-2"> </div>
|
||||
</div>
|
||||
|
||||
<!-- FOOTER -->
|
||||
<footer>
|
||||
<p class="pull-right"><a href="#">Back to top</a></p>
|
||||
<p>© 2013-2014 Dimitri Fontaine. ·</p>
|
||||
</footer>
|
||||
|
||||
</div><!-- /.container -->
|
||||
|
||||
|
||||
<!-- Bootstrap core JavaScript
|
||||
================================================== -->
|
||||
<!-- Placed at the end of the document so the pages load faster -->
|
||||
<script src="https://code.jquery.com/jquery-1.10.2.min.js"></script>
|
||||
<script src="../dist/js/bootstrap.min.js"></script>
|
||||
<!-- <script src="docs-assets/js/holder.js"></script> -->
|
||||
|
||||
<script>
|
||||
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
|
||||
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
|
||||
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
|
||||
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
|
||||
|
||||
ga('create', 'UA-47059482-2', 'tapoueh.org');
|
||||
ga('send', 'pageview');
|
||||
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
391
docs/howto/pgloader.1.html
Normal file
152
docs/howto/quickstart.html
Normal file
@ -0,0 +1,152 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<meta name="description" content="">
|
||||
<meta name="author" content="">
|
||||
<link rel="shortcut icon" href="../../docs-assets/ico/favicon.png">
|
||||
|
||||
<title>pgloader</title>
|
||||
|
||||
<!-- Bootstrap core CSS -->
|
||||
<link href="../dist/css/bootstrap.css" rel="stylesheet">
|
||||
|
||||
<!-- Custom styles for this template -->
|
||||
<link href="../dist/carousel.css" rel="stylesheet">
|
||||
</head>
|
||||
<!-- NAVBAR
|
||||
================================================== -->
|
||||
<body>
|
||||
<div class="navbar-wrapper">
|
||||
<div class="container">
|
||||
|
||||
<div class="navbar navbar-inverse navbar-static-top" role="navigation">
|
||||
<div class="container">
|
||||
<div class="navbar-header">
|
||||
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
|
||||
<span class="sr-only">Toggle navigation</span>
|
||||
<span class="icon-bar"></span>
|
||||
<span class="icon-bar"></span>
|
||||
<span class="icon-bar"></span>
|
||||
</button>
|
||||
<a class="navbar-brand" href="../index.html">pgloader</a>
|
||||
</div>
|
||||
<div class="navbar-collapse collapse">
|
||||
<ul class="nav navbar-nav">
|
||||
<li><a href="../index.html">Home</a></li>
|
||||
<li><a href="quickstart.html">Quick Start</a></li>
|
||||
<li><a href="pgloader.1.html">Reference documentation</a></li>
|
||||
<li class="dropdown active">
|
||||
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Advanced HowTos <b class="caret"></b></a>
|
||||
<ul class="dropdown-menu">
|
||||
<li class="dropdown-header">Plain Files</li>
|
||||
<li><a href="csv.html">CSV</a></li>
|
||||
<li><a href="fixed.html">Fixed format</a></li>
|
||||
<li><a href="geolite.html">Geolite</a></li>
|
||||
<li class="divider"></li>
|
||||
<li class="dropdown-header">Databases</li>
|
||||
<li><a href="dBase.html">dBase</a></li>
|
||||
<li><a href="sqlite.html">SQLite</a></li>
|
||||
<li><a href="mysql.html">MySQL</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a href="../download.html">Download</a></li>
|
||||
<li><a href="../sponsors.html">Sponsors</a></li>
|
||||
<li><a href="../pgloader-moral-license.html">License</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- an empty carousel -->
|
||||
<div id="myCarousel" class="carousel slide" data-ride="carousel" style="height: 100px">
|
||||
<div class="carousel-inner" style="height: 100px">
|
||||
<div class="item active" style="height: 100px">
|
||||
<img data-src="holder.js/900x100/auto/#777:#7a7a7a" style="height: 100px">
|
||||
<!-- <div class="container"> -->
|
||||
<!-- <div class="carousel-caption"> -->
|
||||
<!-- <h1>Load data into PostgreSQL. Fast.</h1> -->
|
||||
<!-- <p></p> -->
|
||||
<!-- </div> -->
|
||||
<!-- </div> -->
|
||||
</div>
|
||||
</div>
|
||||
</div><!-- /.carousel -->
|
||||
|
||||
<div class="container">
|
||||
<div class="row">
|
||||
<div class="col-md-2"> </div>
|
||||
<div class="col-md-8">
|
||||
<h1>pgloader: a quickstart</h1><p>In simple cases, pgloader is very easy to use. </p><h2>CSV</h2><p>Load data from a CSV file into a pre-existing table in your database: </p><pre><code>pgloader --type csv \
|
||||
--field id --field field \
|
||||
--with truncate \
|
||||
--with "fields terminated by ','" \
|
||||
./test/data/matching-1.csv \
|
||||
postgres:///pgloader?tablename=matching </code></pre><p>In that example the whole loading is driven from the command line, bypassing the need for writing a command in the pgloader command syntax entirely. As there's no command though, the extra information needed must be provided on the command line using the <code>--type</code> and <code>--field</code> and <code>--with</code> switches. </p><p>For documentation about the available syntaxes for the <code>--field</code> and <code>--with</code> switches, please refer to the CSV section later in the man page. </p><p>Note also that the PostgreSQL URI includes the target <em>tablename</em>. </p><h2>Reading from STDIN</h2><p>File based pgloader sources can be loaded from the standard input, as in the following example: </p><pre><code>pgloader --type csv \
|
||||
--field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong" \
|
||||
--with "skip header = 1" \
|
||||
--with "fields terminated by '\t'" \
|
||||
- \
|
||||
postgresql:///pgloader?districts_longlat \
|
||||
< test/data/2013_Gaz_113CDs_national.txt </code></pre><p>The dash (<code>-</code>) character as a source is used to mean <em>standard input</em>, as usual in Unix command lines. It's possible to stream compressed content to pgloader with this technique, using the Unix pipe: </p><pre><code>gunzip -c source.gz | pgloader --type csv ... - pgsql:///target?foo </code></pre><h2>Loading from CSV available through HTTP</h2><p>The same command as just above can also be run if the CSV file happens to be found on a remote HTTP location: </p><pre><code>pgloader --type csv \
|
||||
--field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong" \
|
||||
--with "skip header = 1" \
|
||||
--with "fields terminated by '\t'" \
|
||||
http://pgsql.tapoueh.org/temp/2013_Gaz_113CDs_national.txt \
|
||||
postgresql:///pgloader?districts_longlat </code></pre><p>Some more options have to be used in that case, as the file contains a one-line header (most commonly that's column names, could be a copyright notice). Also, in that case, we specify all the fields right into a single <code>--field</code> option argument. </p><p>Again, the PostgreSQL target connection string must contain the <em>tablename</em> option and you have to ensure that the target table exists and may fit the data. Here's the SQL command used in that example in case you want to try it yourself: </p><pre><code>create table districts_longlat
|
||||
(
|
||||
usps text,
|
||||
geoid text,
|
||||
aland bigint,
|
||||
awater bigint,
|
||||
aland_sqmi double precision,
|
||||
awater_sqmi double precision,
|
||||
intptlat double precision,
|
||||
intptlong double precision
|
||||
); </code></pre><p>Also notice that the same command will work against an archived version of the same data. </p><h2>Streaming CSV data from an HTTP compressed file</h2><p>Finally, it's important to note that pgloader first fetches the content from the HTTP URL it to a local file, then expand the archive when it's recognized to be one, and only then processes the locally expanded file. </p><p>In some cases, either because pgloader has no direct support for your archive format or maybe because expanding the archive is not feasible in your environment, you might want to <em>stream</em> the content straight from its remote location into PostgreSQL. Here's how to do that, using the old battle tested Unix Pipes trick: </p><pre><code>curl http://pgsql.tapoueh.org/temp/2013_Gaz_113CDs_national.txt.gz \
|
||||
| gunzip -c \
|
||||
| pgloader --type csv \
|
||||
--field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong"
|
||||
--with "skip header = 1" \
|
||||
--with "fields terminated by '\t'" \
|
||||
- \
|
||||
postgresql:///pgloader?districts_longlat </code></pre><p>Now the OS will take care of the streaming and buffering between the network and the commands and pgloader will take care of streaming the data down to PostgreSQL. </p><h2>Migrating from SQLite</h2><p>The following command will open the SQLite database, discover its tables definitions including indexes and foreign keys, migrate those definitions while <em>casting</em> the data type specifications to their PostgreSQL equivalent and then migrate the data over: </p><pre><code>createdb newdb
|
||||
pgloader ./test/sqlite/sqlite.db postgresql:///newdb </code></pre><h2>Migrating from MySQL</h2><p>Just create a database where to host the MySQL data and definitions and have pgloader do the migration for you in a single command line: </p><pre><code>createdb pagila
|
||||
pgloader mysql://user@localhost/sakila postgresql:///pagila </code></pre><h2>Fetching an archived DBF file from a HTTP remote location</h2><p>It's possible for pgloader to download a file from HTTP, unarchive it, and only then open it to discover the schema then load the data: </p><pre><code>createdb foo
|
||||
pgloader --type dbf http://www.insee.fr/fr/methodes/nomenclatures/cog/telechargement/2013/dbf/historiq2013.zip postgresql:///foo </code></pre><p>Here it's not possible for pgloader to guess the kind of data source it's being given, so it's necessary to use the <code>--type</code> command line switch. </p> </div>
|
||||
<div class="col-md-2"> </div>
|
||||
</div>
|
||||
|
||||
<!-- FOOTER -->
|
||||
<footer>
|
||||
<p class="pull-right"><a href="#">Back to top</a></p>
|
||||
<p>© 2013-2014 Dimitri Fontaine. ·</p>
|
||||
</footer>
|
||||
|
||||
</div><!-- /.container -->
|
||||
|
||||
|
||||
<!-- Bootstrap core JavaScript
|
||||
================================================== -->
|
||||
<!-- Placed at the end of the document so the pages load faster -->
|
||||
<script src="https://code.jquery.com/jquery-1.10.2.min.js"></script>
|
||||
<script src="../dist/js/bootstrap.min.js"></script>
|
||||
<!-- <script src="docs-assets/js/holder.js"></script> -->
|
||||
|
||||
<script>
|
||||
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
|
||||
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
|
||||
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
|
||||
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
|
||||
|
||||
ga('create', 'UA-47059482-2', 'tapoueh.org');
|
||||
ga('send', 'pageview');
|
||||
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
155
docs/howto/sqlite.html
Normal file
@ -0,0 +1,155 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<meta name="description" content="">
|
||||
<meta name="author" content="">
|
||||
<link rel="shortcut icon" href="../../docs-assets/ico/favicon.png">
|
||||
|
||||
<title>pgloader</title>
|
||||
|
||||
<!-- Bootstrap core CSS -->
|
||||
<link href="../dist/css/bootstrap.css" rel="stylesheet">
|
||||
|
||||
<!-- Custom styles for this template -->
|
||||
<link href="../dist/carousel.css" rel="stylesheet">
|
||||
</head>
|
||||
<!-- NAVBAR
|
||||
================================================== -->
|
||||
<body>
|
||||
<div class="navbar-wrapper">
|
||||
<div class="container">
|
||||
|
||||
<div class="navbar navbar-inverse navbar-static-top" role="navigation">
|
||||
<div class="container">
|
||||
<div class="navbar-header">
|
||||
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
|
||||
<span class="sr-only">Toggle navigation</span>
|
||||
<span class="icon-bar"></span>
|
||||
<span class="icon-bar"></span>
|
||||
<span class="icon-bar"></span>
|
||||
</button>
|
||||
<a class="navbar-brand" href="../index.html">pgloader</a>
|
||||
</div>
|
||||
<div class="navbar-collapse collapse">
|
||||
<ul class="nav navbar-nav">
|
||||
<li><a href="../index.html">Home</a></li>
|
||||
<li><a href="quickstart.html">Quick Start</a></li>
|
||||
<li><a href="pgloader.1.html">Reference documentation</a></li>
|
||||
<li class="dropdown active">
|
||||
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Advanced HowTos <b class="caret"></b></a>
|
||||
<ul class="dropdown-menu">
|
||||
<li class="dropdown-header">Plain Files</li>
|
||||
<li><a href="csv.html">CSV</a></li>
|
||||
<li><a href="fixed.html">Fixed format</a></li>
|
||||
<li><a href="geolite.html">Geolite</a></li>
|
||||
<li class="divider"></li>
|
||||
<li class="dropdown-header">Databases</li>
|
||||
<li><a href="dBase.html">dBase</a></li>
|
||||
<li><a href="sqlite.html">SQLite</a></li>
|
||||
<li><a href="mysql.html">MySQL</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a href="../download.html">Download</a></li>
|
||||
<li><a href="../sponsors.html">Sponsors</a></li>
|
||||
<li><a href="../pgloader-moral-license.html">License</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- an empty carousel -->
|
||||
<div id="myCarousel" class="carousel slide" data-ride="carousel" style="height: 100px">
|
||||
<div class="carousel-inner" style="height: 100px">
|
||||
<div class="item active" style="height: 100px">
|
||||
<img data-src="holder.js/900x100/auto/#777:#7a7a7a" style="height: 100px">
|
||||
<!-- <div class="container"> -->
|
||||
<!-- <div class="carousel-caption"> -->
|
||||
<!-- <h1>Load data into PostgreSQL. Fast.</h1> -->
|
||||
<!-- <p></p> -->
|
||||
<!-- </div> -->
|
||||
<!-- </div> -->
|
||||
</div>
|
||||
</div>
|
||||
</div><!-- /.carousel -->
|
||||
|
||||
<div class="container">
|
||||
<div class="row">
|
||||
<div class="col-md-2"> </div>
|
||||
<div class="col-md-8">
|
||||
<h1>Loading SQLite files with pgloader</h1><p>The SQLite database is a respected solution to manage your data with. Its embeded nature makes it a source of migrations when a projects now needs to handle more concurrency, which <a href="http://www.postgresql.org/">PostgreSQL</a> is very good at. pgloader can help you there. </p><h2>The Command</h2><p>To load data with <a href="http://pgloader.io/">pgloader</a> you need to define in a <em>command</em> the operations in some details. Here's our command: </p><pre><code>load database
|
||||
from 'sqlite/Chinook_Sqlite_AutoIncrementPKs.sqlite'
|
||||
into postgresql:///pgloader
|
||||
|
||||
with include drop, create tables, create indexes, reset sequences
|
||||
|
||||
set work_mem to '16MB', maintenance_work_mem to '512 MB'; </code></pre><p>You can see the full list of options in the <a href="pgloader.1.html">pgloader reference manual</a>, with a complete description of the options you see here. </p><p>Note that here pgloader will benefit from the meta-data information found in the SQLite file to create a PostgreSQL database capable of hosting the data as described, then load the data. </p><h2>Loading the data</h2><p>Let's start the <code>pgloader</code> command with our <code>sqlite.load</code> command file: </p><pre><code>$ pgloader sqlite.load
|
||||
... LOG Starting pgloader, log system is ready.
|
||||
... LOG Parsing commands from file "/Users/dim/dev/pgloader/test/sqlite.load"
|
||||
... WARNING Postgres warning: table "album" does not exist, skipping
|
||||
... WARNING Postgres warning: table "artist" does not exist, skipping
|
||||
... WARNING Postgres warning: table "customer" does not exist, skipping
|
||||
... WARNING Postgres warning: table "employee" does not exist, skipping
|
||||
... WARNING Postgres warning: table "genre" does not exist, skipping
|
||||
... WARNING Postgres warning: table "invoice" does not exist, skipping
|
||||
... WARNING Postgres warning: table "invoiceline" does not exist, skipping
|
||||
... WARNING Postgres warning: table "mediatype" does not exist, skipping
|
||||
... WARNING Postgres warning: table "playlist" does not exist, skipping
|
||||
... WARNING Postgres warning: table "playlisttrack" does not exist, skipping
|
||||
... WARNING Postgres warning: table "track" does not exist, skipping
|
||||
table name read imported errors time
|
||||
---------------------- --------- --------- --------- --------------
|
||||
create, truncate 0 0 0 0.052s
|
||||
Album 347 347 0 0.070s
|
||||
Artist 275 275 0 0.014s
|
||||
Customer 59 59 0 0.014s
|
||||
Employee 8 8 0 0.012s
|
||||
Genre 25 25 0 0.018s
|
||||
Invoice 412 412 0 0.032s
|
||||
InvoiceLine 2240 2240 0 0.077s
|
||||
MediaType 5 5 0 0.012s
|
||||
Playlist 18 18 0 0.008s
|
||||
PlaylistTrack 8715 8715 0 0.071s
|
||||
Track 3503 3503 0 0.105s
|
||||
index build completion 0 0 0 0.000s
|
||||
---------------------- --------- --------- --------- --------------
|
||||
Create Indexes 20 20 0 0.279s
|
||||
reset sequences 0 0 0 0.043s
|
||||
---------------------- --------- --------- --------- --------------
|
||||
Total streaming time 15607 15607 0 0.476s </code></pre><p>We can see that <a href="pgloader">http://pgloader.io</a> did download the file from its HTTP URL location then <em>unziped</em> it before loading it. </p><p>Also, the <em>WARNING</em> messages we see here are expected as the PostgreSQL database is empty when running the command, and pgloader is using the SQL commands <code>DROP TABLE IF EXISTS</code> when the given command uses the <code>include drop</code> option. </p><p>Note that the output of the command has been edited to facilitate its browsing online. </p> </div>
|
||||
<div class="col-md-2"> </div>
|
||||
</div>
|
||||
|
||||
<!-- FOOTER -->
|
||||
<footer>
|
||||
<p class="pull-right"><a href="#">Back to top</a></p>
|
||||
<p>© 2013-2014 Dimitri Fontaine. ·</p>
|
||||
</footer>
|
||||
|
||||
</div><!-- /.container -->
|
||||
|
||||
|
||||
<!-- Bootstrap core JavaScript
|
||||
================================================== -->
|
||||
<!-- Placed at the end of the document so the pages load faster -->
|
||||
<script src="https://code.jquery.com/jquery-1.10.2.min.js"></script>
|
||||
<script src="../dist/js/bootstrap.min.js"></script>
|
||||
<!-- <script src="docs-assets/js/holder.js"></script> -->
|
||||
|
||||
<script>
|
||||
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
|
||||
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
|
||||
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
|
||||
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
|
||||
|
||||
ga('create', 'UA-47059482-2', 'tapoueh.org');
|
||||
ga('send', 'pageview');
|
||||
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
|
Before Width: | Height: | Size: 31 KiB After Width: | Height: | Size: 31 KiB |
|
Before Width: | Height: | Size: 41 KiB After Width: | Height: | Size: 41 KiB |
|
Before Width: | Height: | Size: 63 KiB After Width: | Height: | Size: 63 KiB |
|
Before Width: | Height: | Size: 11 KiB After Width: | Height: | Size: 11 KiB |
|
Before Width: | Height: | Size: 34 KiB After Width: | Height: | Size: 34 KiB |
|
Before Width: | Height: | Size: 37 KiB After Width: | Height: | Size: 37 KiB |
|
Before Width: | Height: | Size: 27 KiB After Width: | Height: | Size: 27 KiB |
|
Before Width: | Height: | Size: 1.9 KiB After Width: | Height: | Size: 1.9 KiB |
|
Before Width: | Height: | Size: 44 KiB After Width: | Height: | Size: 44 KiB |
|
Before Width: | Height: | Size: 5.3 KiB After Width: | Height: | Size: 5.3 KiB |
|
Before Width: | Height: | Size: 11 KiB After Width: | Height: | Size: 11 KiB |
|
Before Width: | Height: | Size: 28 KiB After Width: | Height: | Size: 28 KiB |
|
Before Width: | Height: | Size: 80 KiB After Width: | Height: | Size: 80 KiB |
|
Before Width: | Height: | Size: 25 KiB After Width: | Height: | Size: 25 KiB |
|
Before Width: | Height: | Size: 56 KiB After Width: | Height: | Size: 56 KiB |
|
Before Width: | Height: | Size: 85 KiB After Width: | Height: | Size: 85 KiB |
@ -6,19 +6,19 @@
|
||||
(in-package #:pgloader.docs)
|
||||
|
||||
(defparameter *docs-sources-directory*
|
||||
(asdf:system-relative-pathname :pgloader "web/src/"))
|
||||
(asdf:system-relative-pathname :pgloader "docs/src/"))
|
||||
|
||||
(defparameter *docs-output-directory*
|
||||
(asdf:system-relative-pathname :pgloader "web/howto/"))
|
||||
(asdf:system-relative-pathname :pgloader "docs/howto/"))
|
||||
|
||||
(defparameter *reference*
|
||||
(asdf:system-relative-pathname :pgloader "pgloader.1.md"))
|
||||
|
||||
(defparameter *header*
|
||||
(asdf:system-relative-pathname :pgloader "web/howto/header.html"))
|
||||
(asdf:system-relative-pathname :pgloader "docs/howto/header.html"))
|
||||
|
||||
(defparameter *footer*
|
||||
(asdf:system-relative-pathname :pgloader "web/howto/footer.html"))
|
||||
(asdf:system-relative-pathname :pgloader "docs/howto/footer.html"))
|
||||
|
||||
(defun build-page (file &optional target)
|
||||
"Build the HTML page from the markdown source FILE into the HTML TARGET."
|
||||
@ -225,7 +225,7 @@
|
||||
"regress"))))
|
||||
|
||||
;; to produce the website
|
||||
(:module "web"
|
||||
(:module "docs"
|
||||
:components
|
||||
((:module src
|
||||
:components
|
||||
|
||||