mirror of
https://github.com/dimitri/pgloader.git
synced 2025-08-11 16:57:00 +02:00
Merge pull request #98 from cbbrowne/master
Some wordsmithing on the docs
This commit is contained in:
commit
6e324a1f74
33
README.md
33
README.md
@ -2,29 +2,30 @@
|
|||||||
|
|
||||||
pgloader is a data loading tool for PostgreSQL, using the `COPY` command.
|
pgloader is a data loading tool for PostgreSQL, using the `COPY` command.
|
||||||
|
|
||||||
Its main avantage over just using `COPY` or `\copy` and over using a
|
Its main advantage over just using `COPY` or `\copy`, and over using a
|
||||||
*Foreign Data Wrapper* is the transaction behaviour, where *pgloader* will
|
*Foreign Data Wrapper*, is its transaction behaviour, where *pgloader*
|
||||||
keep a separate file of rejected data and continue trying to `copy` good
|
will keep a separate file of rejected data, but continue trying to
|
||||||
data in your database.
|
`copy` good data in your database.
|
||||||
|
|
||||||
The default PostgreSQL behaviour is transactional, which means that any
|
The default PostgreSQL behaviour is transactional, which means that
|
||||||
erroneous line in the input data (file or remote database) will stop the
|
*any* erroneous line in the input data (file or remote database) will
|
||||||
bulk load for the whole table.
|
stop the entire bulk load for the table.
|
||||||
|
|
||||||
pgloader also implements data reformating, the main example of that being a
|
pgloader also implements data reformatting, a typical example of that
|
||||||
transformation from MySQL dates `0000-00-00` and `0000-00-00 00:00:00` to
|
being the transformation of MySQL datestamps `0000-00-00` and
|
||||||
PostgreSQL `NULL` value (because our calendar never had a *year zero*).
|
`0000-00-00 00:00:00` to PostgreSQL `NULL` value (because our calendar
|
||||||
|
never had a *year zero*).
|
||||||
|
|
||||||
## Versioning
|
## Versioning
|
||||||
|
|
||||||
The pgloader version 1.x from a long time ago had been developped in `TCL`.
|
The pgloader version 1.x from a long time ago was developed in `TCL`.
|
||||||
When faced with maintaining that code, the new emerging development team
|
When faced with maintaining that code, the new emerging development
|
||||||
(hi!) picked `python` instead because that made sense at the time. So
|
team (hi!) picked `python` instead because that made sense at the
|
||||||
pgloader version 2.x were in python.
|
time. So pgloader version 2.x were in python.
|
||||||
|
|
||||||
The current version of pgloader is the 3.x series, which is written in
|
The current version of pgloader is the 3.x series, which is written in
|
||||||
[Common Lisp](http://cliki.net/) for better development flexibility, run
|
[Common Lisp](http://cliki.net/) for better development flexibility,
|
||||||
time performances, real threading.
|
runtime performance, and support of real threading.
|
||||||
|
|
||||||
The versioning is now following the Emacs model, where any X.0 release
|
The versioning is now following the Emacs model, where any X.0 release
|
||||||
number means you're using a development version (alpha, beta, or release
|
number means you're using a development version (alpha, beta, or release
|
||||||
|
@ -6,12 +6,13 @@
|
|||||||
|
|
||||||
## DESCRIPTION
|
## DESCRIPTION
|
||||||
|
|
||||||
pgloader loads data from different sources into PostgreSQL. It can tranform
|
pgloader loads data from various sources into PostgreSQL. It can
|
||||||
the data it reads on the fly and send raw SQL before and after the loading.
|
transform the data it reads on the fly and submit raw SQL before and
|
||||||
It uses the `COPY` PostgreSQL protocol to stream the data into the server,
|
after the loading. It uses the `COPY` PostgreSQL protocol to stream
|
||||||
and manages errors by filling a pair fo *reject.dat* and *reject.log* files.
|
the data into the server, and manages errors by filling a pair of
|
||||||
|
*reject.dat* and *reject.log* files.
|
||||||
|
|
||||||
pgloader operates from commands which are read from files:
|
pgloader operates using commands which are read from files:
|
||||||
|
|
||||||
pgloader commands.load
|
pgloader commands.load
|
||||||
|
|
||||||
@ -108,12 +109,12 @@ database of your setup. The filenames are the target table, and their
|
|||||||
extensions are `.dat` for the rejected data and `.log` for the file
|
extensions are `.dat` for the rejected data and `.log` for the file
|
||||||
containing the full PostgreSQL client side logs about the rejected data.
|
containing the full PostgreSQL client side logs about the rejected data.
|
||||||
|
|
||||||
The `.dat` file is formated in PostgreSQL the text COPY format as documented
|
The `.dat` file is formatted in PostgreSQL the text COPY format as documented
|
||||||
in [http://www.postgresql.org/docs/9.2/static/sql-copy.html#AEN66609]().
|
in [http://www.postgresql.org/docs/9.2/static/sql-copy.html#AEN66609]().
|
||||||
|
|
||||||
## A NOTE ABOUT PERFORMANCES
|
## A NOTE ABOUT PERFORMANCES
|
||||||
|
|
||||||
pgloader has been developped with performances in mind, to be able to cope
|
pgloader has been developed with performances in mind, to be able to cope
|
||||||
with ever growing needs in loading large amounts of data into PostgreSQL.
|
with ever growing needs in loading large amounts of data into PostgreSQL.
|
||||||
|
|
||||||
The basic architecture it uses is the old Unix pipe model, where a thread is
|
The basic architecture it uses is the old Unix pipe model, where a thread is
|
||||||
@ -265,7 +266,7 @@ Where:
|
|||||||
Can contain any character, including colon (`:`) which must then be
|
Can contain any character, including colon (`:`) which must then be
|
||||||
doubled (`::`) and at-sign (`@`) which must then be doubled (`@@`).
|
doubled (`::`) and at-sign (`@`) which must then be doubled (`@@`).
|
||||||
|
|
||||||
When ommited, the *user* name defaults to the value of the `PGUSER`
|
When omitted, the *user* name defaults to the value of the `PGUSER`
|
||||||
environment variable, and if it is unset, the value of the `USER`
|
environment variable, and if it is unset, the value of the `USER`
|
||||||
environment variable.
|
environment variable.
|
||||||
|
|
||||||
@ -275,15 +276,15 @@ Where:
|
|||||||
be doubled (`@@`). To leave the password empty, when the *user* name
|
be doubled (`@@`). To leave the password empty, when the *user* name
|
||||||
ends with at at sign, you then have to use the syntax user:@.
|
ends with at at sign, you then have to use the syntax user:@.
|
||||||
|
|
||||||
When ommited, the *password* defaults to the value of the `PGPASSWORD`
|
When omitted, the *password* defaults to the value of the `PGPASSWORD`
|
||||||
environement variable if it is set, otherwise the password is left
|
environment variable if it is set, otherwise the password is left
|
||||||
unset.
|
unset.
|
||||||
|
|
||||||
- *netloc*
|
- *netloc*
|
||||||
|
|
||||||
Can be either a hostname in dotted notation, or an ipv4, or an unix
|
Can be either a hostname in dotted notation, or an ipv4, or an Unix
|
||||||
domain socket path. Empty is the default network location, under a
|
domain socket path. Empty is the default network location, under a
|
||||||
system providing *unix domain socket* that method is prefered, otherwise
|
system providing *unix domain socket* that method is preferred, otherwise
|
||||||
the *netloc* default to `localhost`.
|
the *netloc* default to `localhost`.
|
||||||
|
|
||||||
It's possible to force the *unix domain socket* path by using the syntax
|
It's possible to force the *unix domain socket* path by using the syntax
|
||||||
@ -292,7 +293,7 @@ Where:
|
|||||||
|
|
||||||
postgresql://unix:/tmp:54321/dbname
|
postgresql://unix:/tmp:54321/dbname
|
||||||
|
|
||||||
The *netloc* defaults to the value of the `PGHOST` environement
|
The *netloc* defaults to the value of the `PGHOST` environment
|
||||||
variable, and if it is unset, to either the default `unix` socket path
|
variable, and if it is unset, to either the default `unix` socket path
|
||||||
when running on a Unix system, and `localhost` otherwise.
|
when running on a Unix system, and `localhost` otherwise.
|
||||||
|
|
||||||
@ -302,11 +303,11 @@ Where:
|
|||||||
digits and the punctuation signs comma (`,`), dash (`-`) and underscore
|
digits and the punctuation signs comma (`,`), dash (`-`) and underscore
|
||||||
(`_`).
|
(`_`).
|
||||||
|
|
||||||
When ommited, the *dbname* defaults to the value of the environment
|
When omitted, the *dbname* defaults to the value of the environment
|
||||||
variable `PGDATABASE`, and if that is unset, to the *user* value as
|
variable `PGDATABASE`, and if that is unset, to the *user* value as
|
||||||
determined above.
|
determined above.
|
||||||
|
|
||||||
- The only optionnal parameter should be a possibly qualified table name.
|
- The only optional parameter should be a possibly qualified table name.
|
||||||
|
|
||||||
### Regular Expressions
|
### Regular Expressions
|
||||||
|
|
||||||
@ -383,7 +384,7 @@ The global batch behaviour options are:
|
|||||||
|
|
||||||
Supporting more than a single batch being sent at a time is on the TODO
|
Supporting more than a single batch being sent at a time is on the TODO
|
||||||
list of pgloader, but is not implemented yet. This option is about
|
list of pgloader, but is not implemented yet. This option is about
|
||||||
controling the memory needs of pgloader as a trade-off to the
|
controlling the memory needs of pgloader as a trade-off to the
|
||||||
performances characteristics, and not about parallel activity of
|
performances characteristics, and not about parallel activity of
|
||||||
pgloader.
|
pgloader.
|
||||||
|
|
||||||
@ -523,7 +524,7 @@ The `csv` format command accepts the following clauses and options:
|
|||||||
Takes a single character as argument, which must be found inside
|
Takes a single character as argument, which must be found inside
|
||||||
single quotes, and might be given as the printable character itself,
|
single quotes, and might be given as the printable character itself,
|
||||||
the special value \t to denote a tabulation character, or `0x` then
|
the special value \t to denote a tabulation character, or `0x` then
|
||||||
an hexadecimal value read as the ascii code for the character.
|
an hexadecimal value read as the ASCII code for the character.
|
||||||
|
|
||||||
This character is used as the quoting character in the `CSV` file,
|
This character is used as the quoting character in the `CSV` file,
|
||||||
and defaults to double-quote.
|
and defaults to double-quote.
|
||||||
@ -548,7 +549,7 @@ The `csv` format command accepts the following clauses and options:
|
|||||||
Takes a single character as argument, which must be found inside
|
Takes a single character as argument, which must be found inside
|
||||||
single quotes, and might be given as the printable character itself,
|
single quotes, and might be given as the printable character itself,
|
||||||
the special value \t to denote a tabulation character, or `0x` then
|
the special value \t to denote a tabulation character, or `0x` then
|
||||||
an hexadecimal value read as the ascii code for the character.
|
an hexadecimal value read as the ASCII code for the character.
|
||||||
|
|
||||||
This character is used as the *field separator* when reading the
|
This character is used as the *field separator* when reading the
|
||||||
`CSV` data.
|
`CSV` data.
|
||||||
@ -558,7 +559,7 @@ The `csv` format command accepts the following clauses and options:
|
|||||||
Takes a single character as argument, which must be found inside
|
Takes a single character as argument, which must be found inside
|
||||||
single quotes, and might be given as the printable character itself,
|
single quotes, and might be given as the printable character itself,
|
||||||
the special value \t to denote a tabulation character, or `0x` then
|
the special value \t to denote a tabulation character, or `0x` then
|
||||||
an hexadecimal value read as the ascii code for the character.
|
an hexadecimal value read as the ASCII code for the character.
|
||||||
|
|
||||||
This character is used to recognize *end-of-line* condition when
|
This character is used to recognize *end-of-line* condition when
|
||||||
reading the `CSV` data.
|
reading the `CSV` data.
|
||||||
@ -942,7 +943,7 @@ The `database` command accepts the following clauses and options:
|
|||||||
|
|
||||||
- *no truncate*
|
- *no truncate*
|
||||||
|
|
||||||
When this topion is listed, pgloader issues no `TRUNCATE` command.
|
When this option is listed, pgloader issues no `TRUNCATE` command.
|
||||||
|
|
||||||
- *create tables*
|
- *create tables*
|
||||||
|
|
||||||
@ -1072,8 +1073,8 @@ The `database` command accepts the following clauses and options:
|
|||||||
existing default expression in the MySQL database for columns of the
|
existing default expression in the MySQL database for columns of the
|
||||||
source type from the `CREATE TABLE` statement it generates.
|
source type from the `CREATE TABLE` statement it generates.
|
||||||
|
|
||||||
The spelling *keep default* explicitely prevents that behavior and
|
The spelling *keep default* explicitly prevents that behaviour and
|
||||||
can be used to overlad the default casting rules.
|
can be used to overload the default casting rules.
|
||||||
|
|
||||||
- *drop not null*, *keep not null*
|
- *drop not null*, *keep not null*
|
||||||
|
|
||||||
@ -1082,8 +1083,8 @@ The `database` command accepts the following clauses and options:
|
|||||||
MySQL datatype when it creates the tables in the PostgreSQL
|
MySQL datatype when it creates the tables in the PostgreSQL
|
||||||
database.
|
database.
|
||||||
|
|
||||||
The spelling *keep not null* explicitely prevents that behavior and
|
The spelling *keep not null* explicitly prevents that behaviour and
|
||||||
can be used to overlad the default casting rules.
|
can be used to overload the default casting rules.
|
||||||
|
|
||||||
- *drop typemod*, *keep typemod*
|
- *drop typemod*, *keep typemod*
|
||||||
|
|
||||||
@ -1092,13 +1093,13 @@ The `database` command accepts the following clauses and options:
|
|||||||
the datatype definition found in the MySQL columns of the source
|
the datatype definition found in the MySQL columns of the source
|
||||||
type when it created the tables in the PostgreSQL database.
|
type when it created the tables in the PostgreSQL database.
|
||||||
|
|
||||||
The spelling *keep typemod* explicitely prevents that behavior and
|
The spelling *keep typemod* explicitly prevents that behaviour and
|
||||||
can be used to overlad the default casting rules.
|
can be used to overload the default casting rules.
|
||||||
|
|
||||||
- *using*
|
- *using*
|
||||||
|
|
||||||
This option takes as its single argument the name of a function to
|
This option takes as its single argument the name of a function to
|
||||||
be found un the `pgloader.transforms` Common Lisp package. See above
|
be found in the `pgloader.transforms` Common Lisp package. See above
|
||||||
for details.
|
for details.
|
||||||
|
|
||||||
It's possible to augment a default cast rule (such as one that
|
It's possible to augment a default cast rule (such as one that
|
||||||
@ -1167,11 +1168,11 @@ the following limitations:
|
|||||||
|
|
||||||
- Views are not migrated,
|
- Views are not migrated,
|
||||||
|
|
||||||
Supporting views might require implemeting a full SQL parser for the
|
Supporting views might require implementing a full SQL parser for the
|
||||||
MySQL dialect with a porting engine to rewrite the SQL against
|
MySQL dialect with a porting engine to rewrite the SQL against
|
||||||
PostgreSQL, including renaming functions and changing some constructs.
|
PostgreSQL, including renaming functions and changing some constructs.
|
||||||
|
|
||||||
While it's not theorically impossible, don't hold your breath.
|
While it's not theoretically impossible, don't hold your breath.
|
||||||
|
|
||||||
- Triggers are not migrated
|
- Triggers are not migrated
|
||||||
|
|
||||||
@ -1181,7 +1182,7 @@ the following limitations:
|
|||||||
|
|
||||||
It's simple enough to implement, just not on the priority list yet.
|
It's simple enough to implement, just not on the priority list yet.
|
||||||
|
|
||||||
- Of the geometric datatypes, onle the `POINT` database has been covered.
|
- Of the geometric datatypes, only the `POINT` database has been covered.
|
||||||
The other ones should be easy enough to implement now, it's just not
|
The other ones should be easy enough to implement now, it's just not
|
||||||
done yet.
|
done yet.
|
||||||
|
|
||||||
@ -1209,7 +1210,7 @@ Numbers:
|
|||||||
- type double to double precision drop typemod
|
- type double to double precision drop typemod
|
||||||
|
|
||||||
- type numeric to numeric keep typemod
|
- type numeric to numeric keep typemod
|
||||||
- type decimal to deciman keep typemod
|
- type decimal to decimal keep typemod
|
||||||
|
|
||||||
Texts:
|
Texts:
|
||||||
|
|
||||||
@ -1314,7 +1315,7 @@ The `sqlite` command accepts the following clauses and options:
|
|||||||
|
|
||||||
- *no truncate*
|
- *no truncate*
|
||||||
|
|
||||||
When this topion is listed, pgloader issues no `TRUNCATE` command.
|
When this option is listed, pgloader issues no `TRUNCATE` command.
|
||||||
|
|
||||||
- *create tables*
|
- *create tables*
|
||||||
|
|
||||||
@ -1375,7 +1376,7 @@ The `sqlite` command accepts the following clauses and options:
|
|||||||
|
|
||||||
- *EXCLUDING TABLE NAMES MATCHING*
|
- *EXCLUDING TABLE NAMES MATCHING*
|
||||||
|
|
||||||
Introduce a comma separated list of table names or *rugular expression*
|
Introduce a comma separated list of table names or *regular expression*
|
||||||
used to exclude table names from the migration. This filter only applies
|
used to exclude table names from the migration. This filter only applies
|
||||||
to the result of the *INCLUDING* filter.
|
to the result of the *INCLUDING* filter.
|
||||||
|
|
||||||
@ -1454,7 +1455,7 @@ The provided transformation functions are:
|
|||||||
|
|
||||||
- *right-trimg*
|
- *right-trimg*
|
||||||
|
|
||||||
Remove whitespaces at end of string.
|
Remove whitespace at end of string.
|
||||||
|
|
||||||
- *byte-vector-to-bytea*
|
- *byte-vector-to-bytea*
|
||||||
|
|
||||||
@ -1464,9 +1465,9 @@ The provided transformation functions are:
|
|||||||
|
|
||||||
## LOAD MESSAGES
|
## LOAD MESSAGES
|
||||||
|
|
||||||
This command is still experimental and allows to receive messages in UDP
|
This command is still experimental and allows receiving messages via
|
||||||
with a syslod like format, and depending on matching rules load named parts
|
UDP using a syslog like format, and, depending on rule matching, loads
|
||||||
them to a destination table.
|
named portions of the data stream into a destination table.
|
||||||
|
|
||||||
LOAD MESSAGES
|
LOAD MESSAGES
|
||||||
FROM syslog://localhost:10514/
|
FROM syslog://localhost:10514/
|
||||||
|
Loading…
Reference in New Issue
Block a user