pgloader/docs/howto/fixed.html
Dimitri Fontaine 9e574ce884 Rename web/ into docs/
This allows to benefit from github pages without having to maintain a
separate orphaned branch.
2016-08-19 20:55:29 +02:00

172 lines
9.2 KiB
HTML

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="">
<meta name="author" content="">
<link rel="shortcut icon" href="../../docs-assets/ico/favicon.png">
<title>pgloader</title>
<!-- Bootstrap core CSS -->
<link href="../dist/css/bootstrap.css" rel="stylesheet">
<!-- Custom styles for this template -->
<link href="../dist/carousel.css" rel="stylesheet">
</head>
<!-- NAVBAR
================================================== -->
<body>
<div class="navbar-wrapper">
<div class="container">
<div class="navbar navbar-inverse navbar-static-top" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="../index.html">pgloader</a>
</div>
<div class="navbar-collapse collapse">
<ul class="nav navbar-nav">
<li><a href="../index.html">Home</a></li>
<li><a href="quickstart.html">Quick Start</a></li>
<li><a href="pgloader.1.html">Reference documentation</a></li>
<li class="dropdown active">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Advanced HowTos <b class="caret"></b></a>
<ul class="dropdown-menu">
<li class="dropdown-header">Plain Files</li>
<li><a href="csv.html">CSV</a></li>
<li><a href="fixed.html">Fixed format</a></li>
<li><a href="geolite.html">Geolite</a></li>
<li class="divider"></li>
<li class="dropdown-header">Databases</li>
<li><a href="dBase.html">dBase</a></li>
<li><a href="sqlite.html">SQLite</a></li>
<li><a href="mysql.html">MySQL</a></li>
</ul>
</li>
<li><a href="../download.html">Download</a></li>
<li><a href="../sponsors.html">Sponsors</a></li>
<li><a href="../pgloader-moral-license.html">License</a></li>
</ul>
</div>
</div>
</div>
</div>
</div>
<!-- an empty carousel -->
<div id="myCarousel" class="carousel slide" data-ride="carousel" style="height: 100px">
<div class="carousel-inner" style="height: 100px">
<div class="item active" style="height: 100px">
<img data-src="holder.js/900x100/auto/#777:#7a7a7a" style="height: 100px">
<!-- <div class="container"> -->
<!-- <div class="carousel-caption"> -->
<!-- <h1>Load data into PostgreSQL. Fast.</h1> -->
<!-- <p></p> -->
<!-- </div> -->
<!-- </div> -->
</div>
</div>
</div><!-- /.carousel -->
<div class="container">
<div class="row">
<div class="col-md-2"> </div>
<div class="col-md-8">
<h1>Loading Fixed Width Data File with pgloader</h1><p>Some data providers still use a format where each column is specified with a starting index position and a given length. Usually the columns are blank-padded when the data is shorter than the full reserved range. </p><h2>The Command</h2><p>To load data with <a href="http://pgloader.io/">pgloader</a> you need to define in a <em>command</em> the operations in some details. Here's our example for loading Fixed Width Data, using a file provided by the US census. </p><p>You can find more files from them at the <a href="http://www.census.gov/geo/maps-data/data/gazetteer2000.html">Census 2000 Gazetteer Files</a>. </p><p>Here's our command: </p><pre><code>LOAD ARCHIVE
FROM http://www.census.gov/geo/maps-data/data/docs/gazetteer/places2k.zip
INTO postgresql:///pgloader
BEFORE LOAD DO
$$ drop table if exists places; $$,
$$ create table places
(
usps char(2) not null,
fips char(2) not null,
fips_code char(5),
loc_name varchar(64)
);
$$
LOAD FIXED
FROM FILENAME MATCHING ~/places2k.txt/
WITH ENCODING latin1
(
usps from 0 for 2,
fips from 2 for 2,
fips_code from 4 for 5,
"LocationName" from 9 for 64 [trim right whitespace],
p from 73 for 9,
h from 82 for 9,
land from 91 for 14,
water from 105 for 14,
ldm from 119 for 14,
wtm from 131 for 14,
lat from 143 for 10,
long from 153 for 11
)
INTO postgresql:///pgloader?places
(
usps, fips, fips_code, "LocationName"
); </code></pre><p>You can see the full list of options in the <a href="pgloader.1.html">pgloader reference manual</a>, with a complete description of the options you see here. </p><h2>The Data</h2><p>This command allows loading the following file content, where we are only showing the first couple of lines: </p><pre><code>AL0100124Abbeville city 2987 1353 40301945 120383 15.560669 0.046480 31.566367 -85.251300
AL0100460Adamsville city 4965 2042 50779330 14126 19.606010 0.005454 33.590411 -86.949166
AL0100484Addison town 723 339 9101325 0 3.514041 0.000000 34.200042 -87.177851
AL0100676Akron town 521 239 1436797 0 0.554750 0.000000 32.876425 -87.740978
AL0100820Alabaster city 22619 8594 53023800 141711 20.472605 0.054715 33.231162 -86.823829
AL0100988Albertville city 17247 7090 67212867 258738 25.951034 0.099899 34.265362 -86.211261
AL0101132Alexander City city 15008 6855 100534344 433413 38.816529 0.167342 32.933157 -85.936008 </code></pre><h2>Loading the data</h2><p>Let's start the <code>pgloader</code> command with our <code>census-places.load</code> command file: </p><pre><code>$ pgloader census-places.load
... LOG Starting pgloader, log system is ready.
... LOG Parsing commands from file "/Users/dim/dev/pgloader/test/census-places.load"
... LOG Fetching 'http://www.census.gov/geo/maps-data/data/docs/gazetteer/places2k.zip'
... LOG Extracting files from archive '//private/var/folders/w7/9n8v8pw54t1gngfff0lj16040000gn/T/pgloader//places2k.zip'
table name read imported errors time
----------------- --------- --------- --------- --------------
download 0 0 0 1.494s
extract 0 0 0 1.013s
before load 2 2 0 0.013s
----------------- --------- --------- --------- --------------
places 25375 25375 0 0.499s
----------------- --------- --------- --------- --------------
Total import time 25375 25375 0 3.019s </code></pre><p>We can see that <a href="pgloader">http://pgloader.io</a> did download the file from its HTTP URL location then <em>unziped</em> it before the loading itself. </p><p>Note that the output of the command has been edited to facilitate its browsing online. </p> </div>
<div class="col-md-2"> </div>
</div>
<!-- FOOTER -->
<footer>
<p class="pull-right"><a href="#">Back to top</a></p>
<p>&copy; 2013-2014 Dimitri Fontaine. &middot;</p>
</footer>
</div><!-- /.container -->
<!-- Bootstrap core JavaScript
================================================== -->
<!-- Placed at the end of the document so the pages load faster -->
<script src="https://code.jquery.com/jquery-1.10.2.min.js"></script>
<script src="../dist/js/bootstrap.min.js"></script>
<!-- <script src="docs-assets/js/holder.js"></script> -->
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-47059482-2', 'tapoueh.org');
ga('send', 'pageview');
</script>
</body>
</html>