mirror of
https://github.com/dimitri/pgloader.git
synced 2025-08-11 00:36:59 +02:00
Updated pgloader (a little hack: we transform to utf-8 every line we read into the datafile, if the user specified that this datafile is *not* in utf-8). Updated documentation with few words of explanation about the new .conf parameter "file_is_utf8"
190 lines
5.2 KiB
ReStructuredText
190 lines
5.2 KiB
ReStructuredText
========
|
|
pgloader
|
|
========
|
|
|
|
:Author:
|
|
Jean-Paul Argudo <jean-paul.argudo@dalibo.com>
|
|
|
|
:Version:
|
|
$Id: doc_pgloader.rest,v 1.2 2006-03-28 21:39:25 jpargudo Exp $
|
|
:Type:
|
|
User manual
|
|
|
|
:Comment:
|
|
pgLoader v.1.x documentation (install, usage and example)
|
|
|
|
:Licence:
|
|
BSD
|
|
|
|
About
|
|
=====
|
|
|
|
pgloader (http://pgfoundry.org/projects/pgloader/) is a new project allowing
|
|
you to import data in a PostgreSQL database.
|
|
|
|
You have to launch pgloader as many times you have tables. pgloader handles
|
|
just one table at a time.
|
|
|
|
All bad records are put together in a file, with a logfile explaining origins
|
|
of errors.
|
|
|
|
|
|
Installation
|
|
============
|
|
|
|
Under Debian, the current installation is a bit tricky (as per 200510xx): ::
|
|
|
|
wget http://debian.wow-vision.com.sg/debian/pool/main/p/postgresql libpgtcl_7.4.7-6sarge1_i386.deb
|
|
dpgk -i libpgtcl_7.4.7-6sarge1_i386.deb
|
|
apt-get install tcllib
|
|
wget http://pgfoundry.org/frs/download.php/233/pgloader-1.0.tar.gz
|
|
tar zxvf pgloader-1.0.tar.gz
|
|
|
|
Then you can eventually put the binary into /usr/local/bin to facilitate
|
|
comandlines: ::
|
|
|
|
$ cp pgloader-1.0/pgloader /usr/local/bin
|
|
|
|
Principle
|
|
=========
|
|
|
|
You must fill two files per table:
|
|
|
|
* a parameter file, let's call it <table>.conf
|
|
* a datafile, let's call it <table>.data
|
|
|
|
You need also all necessary parameters to the db connexion you want to use:
|
|
|
|
Common ones are the following:
|
|
|
|
* host : name of the server where your PostgreSQL db lives (localhost ?)
|
|
* user : username (you?)
|
|
* password : username's password (mybigsecret)
|
|
* dbname : name of the PostgreSQL db
|
|
|
|
This parameters are put together in a double-quoted string:
|
|
|
|
"host=localhost user=me password=mybigsecret dbname=mydatabase"
|
|
|
|
This string as the same type that PQconnectdb awaits for in the libpq. Its
|
|
complete documentation can be read at:
|
|
http://www.postgresql.org/docs/current/static/libpq.html#LIBPQ-CONNECT
|
|
|
|
You can for sure add much more parameters, depending your db configuration.
|
|
|
|
Example
|
|
=======
|
|
|
|
We want to insert records in "foo" table: ::
|
|
|
|
test=> \d foo
|
|
Table «public.foo»
|
|
Colonne | Type | Modificateurs
|
|
---------+---------+---------------
|
|
a | integer | not null
|
|
b | date |
|
|
c | text |
|
|
Index :
|
|
«foo_pkey» PRIMARY KEY, btree (a)
|
|
|
|
The datafile
|
|
------------
|
|
|
|
Our datafile "foo.data" as following records: ::
|
|
|
|
1;1987-12-04;"This is a test of data file"
|
|
2;2005-03-02;"diziz'another test with som'o'lil'quotes"
|
|
42;;"No need to date this"
|
|
67;1999-01-02;Oops I didn't escape this string?!
|
|
|
|
Please note that:
|
|
|
|
* fields are separated with a semicolon
|
|
* you can handle presence of empty data: the empty field is represented with
|
|
two semicolons following
|
|
* we have a record per line
|
|
* theres is no other line separator excepted \n
|
|
* dates are in ISO format: YYYY-MM-DD (a fix is coming to handle "set datestyle
|
|
to" in the conf file)
|
|
* you can escape strings, optionnaly, double quoting them
|
|
|
|
Configuration file
|
|
------------------
|
|
|
|
The corresponding file "foo.conf" for the above datafile is the following: ::
|
|
|
|
# ----
|
|
# Conversion parameter file for pgloader
|
|
#
|
|
# Possible file formats:
|
|
# COPY native PostgreSQL COPY format (default)
|
|
# CSV Comma separated variables
|
|
# MSCSV Comma separated variables alternate format
|
|
#
|
|
# The COPY command is constructed from the table_name, the
|
|
# table_columns and the eventual nulls string definition.
|
|
#
|
|
# The default column separator character is comma.
|
|
# ----
|
|
|
|
table_name = foo
|
|
table_columns = a,b,c
|
|
file_format = CSV
|
|
group_size = 1000
|
|
file_sepchar = ;
|
|
#nulls = NULL
|
|
quote = "
|
|
file_is_utf8 = 0
|
|
|
|
Note that separation character is set to ";" and that quoting is specifyied
|
|
with the character double-quote: "
|
|
|
|
Inserts will be commited each 1000, per blocks of 1000 rows at a time.
|
|
|
|
The datafile nor the database is in utf-8, so the parameter *file_is_utf8* is
|
|
set to 0. Set it to 1 otherwise: when both database and datafile are in utf-8.
|
|
|
|
Since ``pgctl`` internals run in utf-8, the data must be converted *on the
|
|
fly* to utf-8 when reading the datafile, thats why pgloader needs to know how
|
|
is the datafile like, utf-8 or not.
|
|
|
|
|
|
pgloader execution
|
|
------------------
|
|
|
|
The execution is quite simple: ::
|
|
|
|
$ pgloader foo.conf foo.data "host=localhost user=me password=mybigsecret \
|
|
dbname=mydatabase"
|
|
|
|
4 row(s) loaded
|
|
0 row(s) rejected
|
|
|
|
A simple verification of what has been inserted: ::
|
|
|
|
test=> select * from foo ;
|
|
a | b | c
|
|
----+------------+------------------------------------------
|
|
1 | 1987-12-04 | This is a test of data file
|
|
2 | 2005-03-02 | diziz'another test with som'o'lil'quotes
|
|
42 | | No need to date this
|
|
67 | 1999-01-02 | Oops I didn't escape this string?!
|
|
(4 lines)
|
|
|
|
**Note**: You will find this example in the doc/example/ directory.
|
|
|
|
when errors occurs
|
|
------------------
|
|
|
|
Check the following:
|
|
|
|
* if your configuration file is not okay, pgloader will tell you whats wrong
|
|
|
|
* if you have a problem with the data you try to import, you'll find in the
|
|
.rej file data that have bee rejected. In the .rejlog file given problems
|
|
will be explicited: a group of error messages per rejected row.
|
|
|
|
Then you'll have to correct errors in .rej file and import *that* file like all
|
|
the others: don't reimport anything else, all the good data is already in the
|
|
box :)
|