13 KiB
title |
---|
(Un)quotation in layered pub/sub networks |
Tony Garnock-Jones tonyg@leastfixedpoint.com
June 2020.
Introduction
There's potential for confusion whenever patterns are drawn from the same set that the values to be matched come from.
In layered pub/sub networks like Syndicate, publisher-subscriber relationships are set up using messages carried over the publish-subscribe medium itself.
Subscribers build messages containing patterns describing other messages they are interested in.
But in general, these messages travel across multiple layered pub/sub dataspaces. To avoid confusion, we have to be very clear about when a piece of data denotes a pattern and when it is just a piece of data.
In this post, I'll describe the approach I want to take to solve this problem for Syndicate in the general case.
What's the problem?
Consider regular expressions. The regex ab*c
matches ac
,
abc
, abbc
, and so on; but what if we wanted to match the
literal string ab*c
? We have to escape the star in the pattern,
writing the pattern as ab\*c
.
Quotation
If we want to use a regex engine to search for a literal string supplied by someone else---perhaps a user, perhaps an API client---we have to take care to quote the literal string, in order to avoid confusion caused by metacharacters:1
Literal string | Quoted form |
---|---|
no metachars |
no metachars |
ab*c |
ab\*c |
... |
\.\.\. |
(hello \$\ [world]) |
\(hello \\\$\\ \[world\]\) |
(check-equal? (regexp-quote "ab*c") "ab\\*c")
(check-equal? (regexp-quote "...") "\\.\\.\\.")
(check-equal? (regexp-quote "(hello \\$\\ [world])") "\\(hello \\\\\\$\\\\ \\[world\\]\\)")
Invariant
The invariant we want to rely on is
every string, after quotation, becomes a regular expression that matches exactly and only the original string.2
(define (regexp-quotation-invariant original-string some-other-string)
(equal? (string=? original-string some-other-string)
(regexp-match? (regexp-quote original-string) some-other-string)))
We could use Racket's [random data
generation](https://docs.racket-lang.org/data/Enumerations.html?q=data%2Fen#%28mod-path._data%2Fenumerate%29)
tools to perform randomized testing of this invariant.
Of course, this same problem crops up in any pattern language where embedding of values in patterns makes sense, so let's generalize accordingly:
every value, after quotation, becomes a pattern that matches exactly and only the original value.
Quotation of pub/sub patterns.
In Syndicate, we want patterns over assertions (and thus over messages) to be able to match arbitrary Preserves atoms, sequences, and records.
However, to date, there have been special exceptions. The special
records capture/1
and discard/0
, embedded in patterns, have
been interpreted as defining pattern variables and wildcards,
respectively. There has been no way to construct a pattern that
matches capture/1
or discard/0
records themselves.
Previously, I tried adding a quote/1
record to the set of
special record types. Experimentation demonstrated
that patterns with quote/1
were able to escape the limitations
of patterns without quote/1
. Also, it's a theorem
that quoting is "sensible": that unquotation after quotation of a
pattern always equals the original pattern.
In this post, I'll explore an alternative I think might be better:
letting patterns match anything except a single special record,
unquote/1
. Patterns use unquote/1
to escape into a
metalanguage where special operators like capture, discard, and
so on can be discussed.
Definitions
We'll work with a simplified data language. Instead of full Preserves, which includes lists, maps, records, a variety of atoms and so forth, we'll stick to just records and atoms.
Value
s are the things our Pattern
s are to match.
Value
s that are Lists
denote records: the first Value
is the
label of the record, and the remaining Value
s are its fields.
(define (record? x)
(and (list? x)
(positive? (length x))
(andmap value? x)))
(define (record-label x) (car x))
(define (record-fields x) (cdr x))
(define (record-arity x) (length (record-fields x)))
(define (record-field x n) (list-ref x (+ n 1)))
Atoms, then, are the non-List
Value
s.
(define (atom? x)
(or (string? x)
(symbol? x)
(number? x)))
Value
s are the records and the atoms, and nothing else.
(define (value? x)
(or (atom? x)
(record? x)))
We will frequently need to decide whether a Value
is a record
with a particular label and arity.
(define (record/class? x label arity)
(and (record? x)
(equal? (record-label x) label)
(= (record-arity x) arity)))
A Pattern
is a Value
with certain additional constraints on
the Operation
s that appear within an unquote/1
record.
(define (pattern? x)
(and (value? x)
(or (atom? x)
(if (record/class? x 'unquote 1)
(operation? (record-field x 0))
(andmap pattern? x)))))
A valid Operation
is either a capture/1
, containing another
pattern; a discard/0
, denoting wildcard; or quote/1
, which
promotes an arbitrary Value
to a literal-matching Pattern
.
This last one is what allows us to write patterns that match
unquote/1
values without tripping over ourselves.
(define (operation? x)
(cond [(record/class? x 'capture 1) (pattern? (record-field x 0))]
[(record/class? x 'discard 0) #t]
[(record/class? x 'quote 1) (value? (record-field x 0))]
[else #f]))
Examples.
(check-true (pattern? 123))
(check-true (pattern? "Hello"))
(check-true (pattern? 'capture))
(check-true (pattern? 'unquote))
(check-true (pattern? '(Present "Tony")))
(check-true (pattern? '(Says "Tony" "Hello, world!")))
(check-true (pattern? '(Says "Tony" (capture (discard)))))
(check-true (pattern? '(Says "Tony" (unquote (capture (discard))))))
Since pattern?
subsumes value?
, we see that each of these is
both a Value
and a Pattern
. However, the following term is a
Value
but not a Pattern
, because it does not conform to the
definition of the unquote
d Operation
s.
(define non-pattern '(Says "Tony" (unquote (noise))))
(check-true (value? non-pattern))
(check-false (pattern? non-pattern))
We may use quote
in conjunction with unquote
to specify a
pattern that will match our non-pattern
example:
(check-true (pattern? '(Says "Tony" ((unquote (quote unquote)) (noise)))))
We may also abuse the Lisp notation for quotation and quasiquotation when writing such patterns, which helps make them a little shorter and perhaps easier to read:
(check-true (pattern? '(Says "Tony" (,'unquote (noise)))))
Finally, this term isn't even a Value
: it includes an empty list
within it.
(check-false (value? '(Says "Tony" ())))
Pattern matching
Let's call our matching operator Match
, with a capital M
to
distinguish it from Racket's own match
syntax.
Match
, given a pat
tern and a val
ue, should answer either a
List
of the captured values in depth-first, left-to-right order,
if the pattern matches the value, or #f
if the arguments do not
match.
;; ;; Precondition: (and (pattern? pat) (value? val))
;; (define (Match pat val)
;; ;;; First of all, let's take care of matching atoms.
;; (cond [(atom? pat) (if (equal? pat val) '() #f)]
;; ;;; Next, the special pattern record `unquote/1` and the operators it
;; ;;; contains.
;; [(record/class? pat 'unquote 1)
;; (define op (record-field pat 0))
;; ;;; A `capture/1` should prepend the current `val`ue to a succeeding
;; ;;; submatch, and should propagate a failing submatch.
;; (cond [(record/class? op 'capture 1)
;; (define r (Match (record-field op 0) val))
;; (and r (cons val r))]
;; ;;; A `discard/0` should accept any `val`ue, yielding a success result
;; ;;; containing no captures.
;; [(record/class? op 'discard 0)
;; '()]
;; ;;; Finally, a `quote/1` should match exactly its argument and nothing
;; ;;; else.
;; [(record/class? op 'quote 1)
;; (if (equal? (record-field op 0) val) '() #f)]
;; (struct DISCARD ())
;; (struct CAPTURE (p))
;; (define (q p)
;; (match p
;; [(DISCARD) `(discard)]
;; [(CAPTURE p) `(capture ,(q p))]
;; [`(discard) `('discard)]
;; [`(capture ,p) `('capture ,(q p))]
;; [`(quote ,p) `('quote ,(q p))]
;; [`(,pa . ,pd) `(,(q pa) . ,(q pd))]
;; [a a]))
;; (define (uq p)
;; (match p
;; [`(discard) (DISCARD)]
;; [`(capture ,p) (CAPTURE (uq p))]
;; [`(quote ,p) p]
;; [`(,pa . ,pd) `(,(uq pa) . ,(uq pd))]
;; [a a]))
;; ;; Goal: (uq (q p)) === p
;; ;; (define (m p v k)
;; ;; (match* (p v)
;; ;; [(`(discard) _) (k '())]
;; ;; [(`(capture ,p) v) (m p v (lambda (bs) (cons v bs)))]
;; ;; [(`(quote ,p) v) (and (equal? p v) (k '()))]
;; ;; [(`(,pa . ,pd) `(,va . ,vd)) (m pa va (lambda (bs) (m pd vd (lambda (cs) (k (append bs cs))))))]
;; ;; [(p v) (and (equal? p v) (k '()))]))
;; (define (m p v k)
;; (match* (p v)
;; [((DISCARD) _) (k '())]
;; [((CAPTURE p) v) (m p v (lambda (bs) (cons v bs)))]
;; [(`(,pa . ,pd) `(,va . ,vd)) (m pa va (lambda (bs) (m pd vd (lambda (cs) (k (append bs cs))))))]
;; [(p v) (and (equal? p v) (k '()))]))
;; (module+ test
;; (require rackunit)
;; ;; (define (M p v) (m p v values))
;; (define (M p v) (m (uq p) v values))
;; (check-equal? (M `(capture 1) 1) '(1))
;; (check-equal? (M `(discard) 1) '())
;; (check-equal? (M 1 1) '())
;; (check-equal? (M 2 1) #f)
;; (check-equal? (M `(capture 2) 1) #f)
;; (check-equal? (M `(record (capture (discard))) `(record value)) '(value))
;; (check-equal? (M `(record (capture (discard))) `(record 'value)) '('value))
;; (check-equal? (M `(record (capture (discard))) `(record (capture (discard))))
;; '((capture (discard))))
;; (check-equal? (M `(record ('capture (discard))) `(record (capture (discard))))
;; '())
;; (check-equal? (M `(record (capture ('capture (discard)))) `(record (capture (discard))))
;; '((capture (discard))))
;; (check-equal? (M `(record (capture ('capture (discard)))) `(record value))
;; #f)
;; (check-equal? (M `(record (capture ('capture (discard)))) `(record 'value))
;; #f)
;; (check-equal? (M `(record (capture ('capture (discard)))) `(record ('capture (discard))))
;; #f)
;; (check-equal? (M `(record '(capture (discard))) `(record (capture (discard))))
;; '())
;; (check-equal? (M `(record (capture '(capture (discard)))) `(record (capture (discard))))
;; '((capture (discard))))
;; (check-equal? (M `(record (capture '(capture (discard)))) `(record value))
;; #f)
;; (check-equal? (M `(record (capture '(capture (discard)))) `(record 'value))
;; #f)
;; (check-equal? (M `(record (capture '(capture (discard)))) `(record '(capture (discard))))
;; #f)
;; (check-equal? (M `(record ('quote value)) `(record (capture (discard)))) #f)
;; (check-equal? (M `(record (capture ('quote value))) `(record (capture (discard)))) #f)
;; (check-equal? (M `(record (capture ('quote value))) `(record value)) #f)
;; (check-equal? (M `(record (capture ('quote value))) `(record 'value)) '('value))
;; (check-equal? (M `(record (capture ('quote value))) `(record 'notvalue)) #f)
;; (check-equal? (M `(record ('quote value)) `(record 'value)) '())
;; (check-equal? (M `(record (capture ('quote value))) `(record ('quote value))) #f)
;; (check-equal? (M `(record ''value) `(record (capture (discard)))) #f)
;; (check-equal? (M `(record (capture ''value)) `(record (capture (discard)))) #f)
;; (check-equal? (M `(record (capture ''value)) `(record value)) #f)
;; (check-equal? (M `(record (capture ''value)) `(record 'value)) '('value))
;; (check-equal? (M `(record (capture ''value)) `(record 'notvalue)) #f)
;; (check-equal? (M `(record ''value) `(record 'value)) '())
;; (check-equal? (M `(record (capture ''value)) `(record ''value)) #f)
;; (check-equal? (q (CAPTURE 1)) `(capture 1))
;; (check-equal? (q (DISCARD)) `(discard))
;; (check-equal? (q `(capture 1)) `('capture 1))
;; (check-equal? (q `(discard)) `('discard))
;; )
-
Racket supplies
regexp-quote
(docs) for this purpose: ↩︎ -
Expressing this in code is straightforward: ↩︎