Rework CSV and Fixed files source fields options, see #116.

It's not possible to use a comma separator when using more than one
source field option at the same time, and for better readability the
options are to be found enclosed in squared brackets.

Also, it's now possible to spell out "from" and "for" keywords on the
source definitions, making it easier to read and maintain the load file,
as in this full example:

          (
           a from  0 for 10,
           b from 10 for  8,
           c from 18 for  8,
           d from 26 for 17 [null if blanks, trim right whitespace]
          )
This commit is contained in:
Dimitri Fontaine 2014-10-01 18:32:40 +02:00
parent ea97fc4659
commit ac55d71401
4 changed files with 63 additions and 15 deletions

View File

@ -1,7 +1,7 @@
.\" generated with Ronn/v0.7.3 .\" generated with Ronn/v0.7.3
.\" http://github.com/rtomayko/ronn/tree/0.7.3 .\" http://github.com/rtomayko/ronn/tree/0.7.3
. .
.TH "PGLOADER" "1" "September 2014" "ff" "" .TH "PGLOADER" "1" "October 2014" "ff" ""
. .
.SH "NAME" .SH "NAME"
\fBpgloader\fR \- PostgreSQL data loader \fBpgloader\fR \- PostgreSQL data loader
@ -516,7 +516,7 @@ The optional \fIIN DIRECTORY\fR clause allows specifying which directory to walk
The \fIFROM\fR option also supports an optional comma separated list of \fIfield\fR names describing what is expected in the \fBCSV\fR data file, optionally introduced by the clause \fBHAVING FIELDS\fR\. The \fIFROM\fR option also supports an optional comma separated list of \fIfield\fR names describing what is expected in the \fBCSV\fR data file, optionally introduced by the clause \fBHAVING FIELDS\fR\.
. .
.IP .IP
Each field name can be either only one name or a name following with specific reader options for that field\. Supported per\-field reader options are: Each field name can be either only one name or a name following with specific reader options for that field, enclosed in square brackets and comma\-separated\. Supported per\-field reader options are:
. .
.IP "\(bu" 4 .IP "\(bu" 4
\fIterminated by\fR \fIterminated by\fR
@ -639,7 +639,13 @@ This command instructs pgloader to load data from a text file containing columns
.nf .nf
LOAD FIXED LOAD FIXED
FROM inline (a 0 10, b 10 8, c 18 8, d 26 17) FROM inline
(
a from 0 for 10,
b from 10 for 8,
c from 18 for 8,
d from 26 for 17 [null if blanks, trim right whitespace]
)
INTO postgresql:///pgloader?fixed INTO postgresql:///pgloader?fixed
( (
a, b, a, b,
@ -666,6 +672,8 @@ BEFORE LOAD DO
01234567892008052011431250firstline 01234567892008052011431250firstline
01234562008052115182300left blank\-padded 01234562008052115182300left blank\-padded
12345678902008052208231560another line 12345678902008052208231560another line
2345609872014092914371500
2345678902014092914371520
. .
.fi .fi
. .
@ -715,6 +723,11 @@ Position in the line where to start reading that field\'s value\. Can be entered
.IP .IP
How many bytes to read from the \fIstart\fR position to read that field\'s value\. Same format as \fIstart\fR\. How many bytes to read from the \fIstart\fR position to read that field\'s value\. Same format as \fIstart\fR\.
. .
.IP "" 0
.
.IP
Those optional parameters can enclosed in square brackets and comma\-separated:
.
.IP "\(bu" 4 .IP "\(bu" 4
\fIterminated by\fR \fIterminated by\fR
. .

View File

@ -465,8 +465,8 @@ The `csv` format command accepts the following clauses and options:
optionally introduced by the clause `HAVING FIELDS`. optionally introduced by the clause `HAVING FIELDS`.
Each field name can be either only one name or a name following with Each field name can be either only one name or a name following with
specific reader options for that field. Supported per-field reader specific reader options for that field, enclosed in square brackets and
options are: comma-separated. Supported per-field reader options are:
- *terminated by* - *terminated by*
@ -576,7 +576,13 @@ This command instructs pgloader to load data from a text file containing
columns arranged in a *fixed size* manner. Here's an example: columns arranged in a *fixed size* manner. Here's an example:
LOAD FIXED LOAD FIXED
FROM inline (a 0 10, b 10 8, c 18 8, d 26 17) FROM inline
(
a from 0 for 10,
b from 10 for 8,
c from 18 for 8,
d from 26 for 17 [null if blanks, trim right whitespace]
)
INTO postgresql:///pgloader?fixed INTO postgresql:///pgloader?fixed
( (
a, b, a, b,
@ -603,6 +609,8 @@ columns arranged in a *fixed size* manner. Here's an example:
01234567892008052011431250firstline 01234567892008052011431250firstline
01234562008052115182300left blank-padded 01234562008052115182300left blank-padded
12345678902008052208231560another line 12345678902008052208231560another line
2345609872014092914371500
2345678902014092914371520
The `fixed` format command accepts the following clauses and options: The `fixed` format command accepts the following clauses and options:
@ -642,6 +650,9 @@ The `fixed` format command accepts the following clauses and options:
How many bytes to read from the *start* position to read that How many bytes to read from the *start* position to read that
field's value. Same format as *start*. field's value. Same format as *start*.
Those optional parameters can enclosed in square brackets and
comma-separated:
- *terminated by* - *terminated by*
See the description of *field terminated by* below. See the description of *field terminated by* below.

View File

@ -120,6 +120,8 @@
(def-keyword-rule "left") (def-keyword-rule "left")
(def-keyword-rule "right") (def-keyword-rule "right")
(def-keyword-rule "whitespace") (def-keyword-rule "whitespace")
(def-keyword-rule "from")
(def-keyword-rule "for")
(def-keyword-rule "skip") (def-keyword-rule "skip")
(def-keyword-rule "header") (def-keyword-rule "header")
(def-keyword-rule "null") (def-keyword-rule "null")
@ -1721,6 +1723,28 @@ load database
option-trim-left-whitespace option-trim-left-whitespace
option-trim-right-whitespace)) option-trim-right-whitespace))
(defrule another-csv-field-option (and comma csv-field-option)
(:lambda (field-option)
(destructuring-bind (comma option) field-option
(declare (ignore comma))
option)))
(defrule open-square-bracket (and ignore-whitespace #\[ ignore-whitespace)
(:constant :open-square-bracket))
(defrule close-square-bracket (and ignore-whitespace #\] ignore-whitespace)
(:constant :close-square-bracket))
(defrule csv-field-option-list (and open-square-bracket
csv-field-option
(* another-csv-field-option)
close-square-bracket)
(:lambda (option)
(destructuring-bind (open opt1 opts close) option
(declare (ignore open close))
(alexandria:alist-plist `(,opt1 ,@opts)))))
(defrule csv-field-options (? (or csv-field-option csv-field-option-list)))
(defrule csv-field-options (* csv-field-option) (defrule csv-field-options (* csv-field-option)
(:lambda (options) (:lambda (options)
(alexandria:alist-plist options))) (alexandria:alist-plist options)))
@ -2038,11 +2062,11 @@ load database
(defrule number (or hex-number dec-number)) (defrule number (or hex-number dec-number))
(defrule field-start-position (and ignore-whitespace number) (defrule field-start-position (and (? kw-from) ignore-whitespace number)
(:destructure (ws pos) (declare (ignore ws)) pos)) (:destructure (from ws pos) (declare (ignore from ws)) pos))
(defrule fixed-field-length (and ignore-whitespace number) (defrule fixed-field-length (and (? kw-for) ignore-whitespace number)
(:destructure (ws len) (declare (ignore ws)) len)) (:destructure (for ws len) (declare (ignore for ws)) len))
(defrule fixed-source-field (and csv-field-name (defrule fixed-source-field (and csv-field-name
field-start-position fixed-field-length field-start-position fixed-field-length

View File

@ -13,11 +13,11 @@
LOAD FIXED LOAD FIXED
FROM inline FROM inline
( -- col start length opts (
a 0 10, a from 0 for 10,
b 10 8, b from 10 for 8,
c 18 8, c from 18 for 8,
d 26 17 null if blanks trim right whitespace d from 26 for 17 [null if blanks, trim right whitespace]
) )
INTO postgresql:///pgloader?fixed INTO postgresql:///pgloader?fixed
( (