From f0ea64d627f1f4c5db45d3205f0c98e94b6f0e0f Mon Sep 17 00:00:00 2001 From: Christopher Browne Date: Wed, 23 Jul 2014 18:34:51 -0400 Subject: [PATCH 1/2] Wordsmithing; fixing typos and such --- README.md | 33 ++++++++++++----------- pgloader.1.md | 73 ++++++++++++++++++++++++++------------------------- 2 files changed, 54 insertions(+), 52 deletions(-) diff --git a/README.md b/README.md index 032fb6c..3e4bfe0 100644 --- a/README.md +++ b/README.md @@ -2,29 +2,30 @@ pgloader is a data loading tool for PostgreSQL, using the `COPY` command. -Its main avantage over just using `COPY` or `\copy` and over using a -*Foreign Data Wrapper* is the transaction behaviour, where *pgloader* will -keep a separate file of rejected data and continue trying to `copy` good -data in your database. +Its main advantage over just using `COPY` or `\copy`, and over using a +*Foreign Data Wrapper*, is its transaction behaviour, where *pgloader* +will keep a separate file of rejected data, but continue trying to +`copy` good data in your database. -The default PostgreSQL behaviour is transactional, which means that any -erroneous line in the input data (file or remote database) will stop the -bulk load for the whole table. +The default PostgreSQL behaviour is transactional, which means that +*any* erroneous line in the input data (file or remote database) will +stop the entire bulk load for the table. -pgloader also implements data reformating, the main example of that being a -transformation from MySQL dates `0000-00-00` and `0000-00-00 00:00:00` to -PostgreSQL `NULL` value (because our calendar never had a *year zero*). +pgloader also implements data reformatting, a typical example of that +being the transformation of MySQL datestamps `0000-00-00` and +`0000-00-00 00:00:00` to PostgreSQL `NULL` value (because our calendar +never had a *year zero*). ## Versioning -The pgloader version 1.x from a long time ago had been developped in `TCL`. -When faced with maintaining that code, the new emerging development team -(hi!) picked `python` instead because that made sense at the time. So -pgloader version 2.x were in python. +The pgloader version 1.x from a long time ago was developed in `TCL`. +When faced with maintaining that code, the new emerging development +team (hi!) picked `python` instead because that made sense at the +time. So pgloader version 2.x were in python. The current version of pgloader is the 3.x series, which is written in -[Common Lisp](http://cliki.net/) for better development flexibility, run -time performances, real threading. +[Common Lisp](http://cliki.net/) for better development flexibility, +runtime performance, and support of real threading. The versioning is now following the Emacs model, where any X.0 release number means you're using a development version (alpha, beta, or release diff --git a/pgloader.1.md b/pgloader.1.md index 5d63b05..d7c9499 100644 --- a/pgloader.1.md +++ b/pgloader.1.md @@ -6,12 +6,13 @@ ## DESCRIPTION -pgloader loads data from different sources into PostgreSQL. It can tranform -the data it reads on the fly and send raw SQL before and after the loading. -It uses the `COPY` PostgreSQL protocol to stream the data into the server, -and manages errors by filling a pair fo *reject.dat* and *reject.log* files. +pgloader loads data from various sources into PostgreSQL. It can +transform the data it reads on the fly and submit raw SQL before and +after the loading. It uses the `COPY` PostgreSQL protocol to stream +the data into the server, and manages errors by filling a pair of +*reject.dat* and *reject.log* files. -pgloader operates from commands which are read from files: +pgloader operates using commands which are read from files: pgloader commands.load @@ -108,12 +109,12 @@ database of your setup. The filenames are the target table, and their extensions are `.dat` for the rejected data and `.log` for the file containing the full PostgreSQL client side logs about the rejected data. -The `.dat` file is formated in PostgreSQL the text COPY format as documented +The `.dat` file is formatted in PostgreSQL the text COPY format as documented in [http://www.postgresql.org/docs/9.2/static/sql-copy.html#AEN66609](). ## A NOTE ABOUT PERFORMANCES -pgloader has been developped with performances in mind, to be able to cope +pgloader has been developed with performances in mind, to be able to cope with ever growing needs in loading large amounts of data into PostgreSQL. The basic architecture it uses is the old Unix pipe model, where a thread is @@ -173,7 +174,7 @@ Some clauses are common to all commands: Then *INTO* option also supports an optional comma separated list of target columns, which are either the name of an input *field* or the - whitespace separated list of the target column name, its PostgreSQL data + white space separated list of the target column name, its PostgreSQL data type and a *USING* expression. The *USING* expression can be any valid Common Lisp form and will be @@ -251,7 +252,7 @@ Where: Can contain any character, including colon (`:`) which must then be doubled (`::`) and at-sign (`@`) which must then be doubled (`@@`). - When ommited, the *user* name defaults to the value of the `PGUSER` + When omitted, the *user* name defaults to the value of the `PGUSER` environment variable, and if it is unset, the value of the `USER` environment variable. @@ -261,15 +262,15 @@ Where: be doubled (`@@`). To leave the password empty, when the *user* name ends with at at sign, you then have to use the syntax user:@. - When ommited, the *password* defaults to the value of the `PGPASSWORD` - environement variable if it is set, otherwise the password is left + When omitted, the *password* defaults to the value of the `PGPASSWORD` + environment variable if it is set, otherwise the password is left unset. - *netloc* - Can be either a hostname in dotted notation, or an ipv4, or an unix + Can be either a hostname in dotted notation, or an ipv4, or an Unix domain socket path. Empty is the default network location, under a - system providing *unix domain socket* that method is prefered, otherwise + system providing *unix domain socket* that method is preferred, otherwise the *netloc* default to `localhost`. It's possible to force the *unix domain socket* path by using the syntax @@ -278,7 +279,7 @@ Where: postgresql://unix:/tmp:54321/dbname - The *netloc* defaults to the value of the `PGHOST` environement + The *netloc* defaults to the value of the `PGHOST` environment variable, and if it is unset, to either the default `unix` socket path when running on a Unix system, and `localhost` otherwise. @@ -288,11 +289,11 @@ Where: digits and the punctuation signs comma (`,`), dash (`-`) and underscore (`_`). - When ommited, the *dbname* defaults to the value of the environment + When omitted, the *dbname* defaults to the value of the environment variable `PGDATABASE`, and if that is unset, to the *user* value as determined above. - - The only optionnal parameter should be a possibly qualified table name. + - The only optional parameter should be a possibly qualified table name. ### Regular Expressions @@ -369,7 +370,7 @@ The global batch behaviour options are: Supporting more than a single batch being sent at a time is on the TODO list of pgloader, but is not implemented yet. This option is about - controling the memory needs of pgloader as a trade-off to the + controlling the memory needs of pgloader as a trade-off to the performances characteristics, and not about parallel activity of pgloader. @@ -509,7 +510,7 @@ The `csv` format command accepts the following clauses and options: Takes a single character as argument, which must be found inside single quotes, and might be given as the printable character itself, the special value \t to denote a tabulation character, or `0x` then - an hexadecimal value read as the ascii code for the character. + an hexadecimal value read as the ASCII code for the character. This character is used as the quoting character in the `CSV` file, and defaults to double-quote. @@ -534,7 +535,7 @@ The `csv` format command accepts the following clauses and options: Takes a single character as argument, which must be found inside single quotes, and might be given as the printable character itself, the special value \t to denote a tabulation character, or `0x` then - an hexadecimal value read as the ascii code for the character. + an hexadecimal value read as the ASCII code for the character. This character is used as the *field separator* when reading the `CSV` data. @@ -544,7 +545,7 @@ The `csv` format command accepts the following clauses and options: Takes a single character as argument, which must be found inside single quotes, and might be given as the printable character itself, the special value \t to denote a tabulation character, or `0x` then - an hexadecimal value read as the ascii code for the character. + an hexadecimal value read as the ASCII code for the character. This character is used to recognize *end-of-line* condition when reading the `CSV` data. @@ -928,7 +929,7 @@ The `database` command accepts the following clauses and options: - *no truncate* - When this topion is listed, pgloader issues no `TRUNCATE` command. + When this option is listed, pgloader issues no `TRUNCATE` command. - *create tables* @@ -1058,8 +1059,8 @@ The `database` command accepts the following clauses and options: existing default expression in the MySQL database for columns of the source type from the `CREATE TABLE` statement it generates. - The spelling *keep default* explicitely prevents that behavior and - can be used to overlad the default casting rules. + The spelling *keep default* explicitly prevents that behaviour and + can be used to overload the default casting rules. - *drop not null*, *keep not null* @@ -1068,8 +1069,8 @@ The `database` command accepts the following clauses and options: MySQL datatype when it creates the tables in the PostgreSQL database. - The spelling *keep not null* explicitely prevents that behavior and - can be used to overlad the default casting rules. + The spelling *keep not null* explicitly prevents that behaviour and + can be used to overload the default casting rules. - *drop typemod*, *keep typemod* @@ -1078,13 +1079,13 @@ The `database` command accepts the following clauses and options: the datatype definition found in the MySQL columns of the source type when it created the tables in the PostgreSQL database. - The spelling *keep typemod* explicitely prevents that behavior and - can be used to overlad the default casting rules. + The spelling *keep typemod* explicitly prevents that behaviour and + can be used to overload the default casting rules. - *using* This option takes as its single argument the name of a function to - be found un the `pgloader.transforms` Common Lisp package. See above + be found in the `pgloader.transforms` Common Lisp package. See above for details. It's possible to augment a default cast rule (such as one that @@ -1153,11 +1154,11 @@ the following limitations: - Views are not migrated, - Supporting views might require implemeting a full SQL parser for the + Supporting views might require implementing a full SQL parser for the MySQL dialect with a porting engine to rewrite the SQL against PostgreSQL, including renaming functions and changing some constructs. - While it's not theorically impossible, don't hold your breath. + While it's not theoretically impossible, don't hold your breath. - Triggers are not migrated @@ -1167,7 +1168,7 @@ the following limitations: It's simple enough to implement, just not on the priority list yet. - - Of the geometric datatypes, onle the `POINT` database has been covered. + - Of the geometric datatypes, only the `POINT` database has been covered. The other ones should be easy enough to implement now, it's just not done yet. @@ -1195,7 +1196,7 @@ Numbers: - type double to double precision drop typemod - type numeric to numeric keep typemod - - type decimal to deciman keep typemod + - type decimal to decimal keep typemod Texts: @@ -1300,7 +1301,7 @@ The `sqlite` command accepts the following clauses and options: - *no truncate* - When this topion is listed, pgloader issues no `TRUNCATE` command. + When this option is listed, pgloader issues no `TRUNCATE` command. - *create tables* @@ -1361,7 +1362,7 @@ The `sqlite` command accepts the following clauses and options: - *EXCLUDING TABLE NAMES MATCHING* - Introduce a comma separated list of table names or *rugular expression* + Introduce a comma separated list of table names or *regular expression* used to exclude table names from the migration. This filter only applies to the result of the *INCLUDING* filter. @@ -1440,7 +1441,7 @@ The provided transformation functions are: - *right-trimg* - Remove whitespaces at end of string. + Remove whitespace at end of string. - *byte-vector-to-bytea* @@ -1451,7 +1452,7 @@ The provided transformation functions are: ## LOAD MESSAGES This command is still experimental and allows to receive messages in UDP -with a syslod like format, and depending on matching rules load named parts +with a syslog like format, and depending on matching rules load named parts them to a destination table. LOAD MESSAGES From ccb22d410bc127bc201c22b409571168304bd003 Mon Sep 17 00:00:00 2001 From: Christopher Browne Date: Thu, 24 Jul 2014 11:03:33 -0400 Subject: [PATCH 2/2] more wordsmithing --- pgloader.1.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pgloader.1.md b/pgloader.1.md index d7c9499..44db012 100644 --- a/pgloader.1.md +++ b/pgloader.1.md @@ -1451,9 +1451,9 @@ The provided transformation functions are: ## LOAD MESSAGES -This command is still experimental and allows to receive messages in UDP -with a syslog like format, and depending on matching rules load named parts -them to a destination table. +This command is still experimental and allows receiving messages via +UDP using a syslog like format, and, depending on rule matching, loads +named portions of the data stream into a destination table. LOAD MESSAGES FROM syslog://localhost:10514/