# ABNF DEFINITION OF ABNF This Common Lisp librairie implements a parser generator for the ABNF grammar format as described in [http://tools.ietf.org/html/rfc2234](RFC 2234). The generated parser is a regular expression scanner provided by the [http://weitz.de/cl-ppcre/](cl-ppcre) lib, which means that we can't parse recursive grammar definition. One such definition is the ABNF definition as given by the RFC. Fortunately, as you have this lib, you most probably don't need to generate another parser to handle that particular ABNF grammar. ## Installation The system has been made Quicklisp ready. $ cd ~/quicklisp/local-projects/ $ git clone git://git.tapoueh.org/pgloader.git * (ql:quickload "abnf") Currently the ABNF system is maintained as part of the `pgloader` tool as a central piece of its syslog message parser facility. ## Usage The `parse-abnf-grammar` function expects the grammar to be parsed as a string, and also needs the top level rule name of the grammar you're interested into, as a symbol or a string. You can also give a list of rule names that you want to capture, they will be capture in the order in which they are needed to expand the given top-level rule. The `parse-abnf-grammar` function returns a `cl-ppcre` scanner. ~~~ {#example.lisp .commonlisp .numberLines} (defvar *timestamp-abnf* " TIMESTAMP = NILVALUE / FULL-DATE \"T\" FULL-TIME FULL-DATE = DATE-FULLYEAR \"-\" DATE-MONTH \"-\" DATE-MDAY DATE-FULLYEAR = 4DIGIT DATE-MONTH = 2DIGIT ; 01-12 DATE-MDAY = 2DIGIT ; 01-28, 01-29, 01-30, 01-31 based on ; month/year FULL-TIME = PARTIAL-TIME TIME-OFFSET PARTIAL-TIME = TIME-HOUR \":\" TIME-MINUTE \":\" TIME-SECOND [TIME-SECFRAC] TIME-HOUR = 2DIGIT ; 00-23 TIME-MINUTE = 2DIGIT ; 00-59 TIME-SECOND = 2DIGIT ; 00-59 TIME-SECFRAC = \".\" 1*6DIGIT TIME-OFFSET = \"Z\" / TIME-NUMOFFSET TIME-NUMOFFSET = (\"+\" / \"-\") TIME-HOUR \":\" TIME-MINUTE NILVALUE = \"-\" " "A timestamp ABNF grammar.") (let ((scanner (abnf:parse-abnf-grammar *timestamp-abnf* :timestamp :registering-rules '(:full-date)))) (cl-ppcre:register-groups-bind (date) (scanner "2013-09-08T00:02:03.123456Z+02:00") date)) ~~~ In the previous usage example the `let` block returns `"2013-09-08"`. ## ABNF grammar This library supports the ABNF grammar as given in RFC 2234, with additional support for plain regular expressions. ### Parsed grammar Here's the RFC syntax: rulelist = 1*( rule / (*c-wsp c-nl) ) rule = rulename defined-as elements c-nl ; continues if next line starts ; with white space rulename = ALPHA *(ALPHA / DIGIT / "-") defined-as = *c-wsp ("=" / "=/") *c-wsp ; basic rules definition and ; incremental alternatives elements = alternation *c-wsp c-wsp = WSP / (c-nl WSP) c-nl = comment / CRLF ; comment or newline comment = ";" *(WSP / VCHAR) CRLF alternation = concatenation *(*c-wsp "/" *c-wsp concatenation) concatenation = repetition *(1*c-wsp repetition) repetition = [repeat] element repeat = 1*DIGIT / (*DIGIT "*" *DIGIT) element = rulename / group / option / char-val / num-val / prose-val / regex ; regex is an addition of this lib, see above group = "(" *c-wsp alternation *c-wsp ")" option = "[" *c-wsp alternation *c-wsp "]" char-val = DQUOTE *(%x20-21 / %x23-7E) DQUOTE ; quoted string of SP and VCHAR ; without DQUOTE num-val = "%" (bin-val / dec-val / hex-val) bin-val = "b" 1*BIT [ 1*("." 1*BIT) / ("-" 1*BIT) ] ; series of concatenated bit values ; or single ONEOF range dec-val = "d" 1*DIGIT [ 1*("." 1*DIGIT) / ("-" 1*DIGIT) ] hex-val = "x" 1*HEXDIG [ 1*("." 1*HEXDIG) / ("-" 1*HEXDIG) ] prose-val = "<" *(%x20-3D / %x3F-7E) ">" ; bracketed string of SP and VCHAR ; without angles ; prose description, to be used as ; last resort ### Core rules Those parts of the grammar are always provided, they are the *defaults* rules of the ABNF definition. ALPHA = %x41-5A / %x61-7A ; A-Z / a-z BIT = "0" / "1" CHAR = %x01-7F ; any 7-bit US-ASCII character, excluding NUL CR = %x0D ; carriage return CRLF = CR LF ; Internet standard newline CTL = %x00-1F / %x7F ; controls DIGIT = %x30-39 ; 0-9 DQUOTE = %x22 ; " (Double Quote) HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F" HTAB = %x09 ; horizontal tab LF = %x0A ; linefeed LWSP = *(WSP / CRLF WSP) ; linear white space (past newline) OCTET = %x00-FF ; 8 bits of data SP = %x20 ### Regex Support We add support for plain regexp in the `element` rule. A regexp is expected to follow the form: regex = "~" delimiter expression delimiter The *expression* shouldn't contain the *delimiter* of course, and the allowed delimiters are `~//`, `~[]`, `~{}`, `~()`, `~<>`, `~""`, `~''`, `~||` and `~##`. If you have to build a regexp with more than one of those delimiters in it, you can just concatenate multiple parts together like in this example: complex-regex = ~/foo{bar}/ ~{baz/quux} That will be used in exactly the same way as the following example: complex-regex = ~