preserves/unquoting.md

13 KiB

title
(Un)quotation in layered pub/sub networks

Tony Garnock-Jones tonyg@leastfixedpoint.com
June 2020.

Introduction

There's potential for confusion whenever patterns are drawn from the same set that the values to be matched come from.

In layered pub/sub networks like Syndicate, publisher-subscriber relationships are set up using messages carried over the publish-subscribe medium itself.

Subscribers build messages containing patterns describing other messages they are interested in.

But in general, these messages travel across multiple layered pub/sub dataspaces. To avoid confusion, we have to be very clear about when a piece of data denotes a pattern and when it is just a piece of data.

In this post, I'll describe the approach I want to take to solve this problem for Syndicate in the general case.

What's the problem?

Consider regular expressions. The regex ab*c matches ac, abc, abbc, and so on; but what if we wanted to match the literal string ab*c? We have to escape the star in the pattern, writing the pattern as ab\*c.

Quotation

If we want to use a regex engine to search for a literal string supplied by someone else---perhaps a user, perhaps an API client---we have to take care to quote the literal string, in order to avoid confusion caused by metacharacters:1

Literal string Quoted form
no metachars no metachars
ab*c ab\*c
... \.\.\.
(hello \$\ [world]) \(hello \\\$\\ \[world\]\)
(check-equal? (regexp-quote "ab*c")                  "ab\\*c")
(check-equal? (regexp-quote "...")                   "\\.\\.\\.")
(check-equal? (regexp-quote "(hello \\$\\ [world])") "\\(hello \\\\\\$\\\\ \\[world\\]\\)")

Invariant

The invariant we want to rely on is

every string, after quotation, becomes a regular expression that matches exactly and only the original string.2

(define (regexp-quotation-invariant original-string some-other-string)
  (equal? (string=? original-string some-other-string)
          (regexp-match? (regexp-quote original-string) some-other-string)))
We could use Racket's [random data
generation](https://docs.racket-lang.org/data/Enumerations.html?q=data%2Fen#%28mod-path._data%2Fenumerate%29)
tools to perform randomized testing of this invariant.

Of course, this same problem crops up in any pattern language where embedding of values in patterns makes sense, so let's generalize accordingly:

every value, after quotation, becomes a pattern that matches exactly and only the original value.

Quotation of pub/sub patterns.

In Syndicate, we want patterns over assertions (and thus over messages) to be able to match arbitrary Preserves atoms, sequences, and records.

However, to date, there have been special exceptions. The special records capture/1 and discard/0, embedded in patterns, have been interpreted as defining pattern variables and wildcards, respectively. There has been no way to construct a pattern that matches capture/1 or discard/0 records themselves.

Previously, I tried adding a quote/1 record to the set of special record types. Experimentation demonstrated that patterns with quote/1 were able to escape the limitations of patterns without quote/1. Also, it's a theorem that quoting is "sensible": that unquotation after quotation of a pattern always equals the original pattern.

In this post, I'll explore an alternative I think might be better: letting patterns match anything except a single special record, unquote/1. Patterns use unquote/1 to escape into a metalanguage where special operators like capture, discard, and so on can be discussed.

Definitions

We'll work with a simplified data language. Instead of full Preserves, which includes lists, maps, records, a variety of atoms and so forth, we'll stick to just records and atoms.

Values are the things our Patterns are to match.

Values that are Lists denote records: the first Value is the label of the record, and the remaining Values are its fields.

(define (record? x)
  (and (list? x)
       (positive? (length x))
       (andmap value? x)))

(define (record-label x) (car x))
(define (record-fields x) (cdr x))
(define (record-arity x) (length (record-fields x)))
(define (record-field x n) (list-ref x (+ n 1)))

Atoms, then, are the non-List Values.

(define (atom? x)
  (or (string? x)
      (symbol? x)
      (number? x)))

Values are the records and the atoms, and nothing else.

(define (value? x)
  (or (atom? x)
      (record? x)))

We will frequently need to decide whether a Value is a record with a particular label and arity.

(define (record/class? x label arity)
  (and (record? x)
       (equal? (record-label x) label)
       (= (record-arity x) arity)))

A Pattern is a Value with certain additional constraints on the Operations that appear within an unquote/1 record.

(define (pattern? x)
  (and (value? x)
       (or (atom? x)
           (if (record/class? x 'unquote 1)
               (operation? (record-field x 0))
               (andmap pattern? x)))))

A valid Operation is either a capture/1, containing another pattern; a discard/0, denoting wildcard; or quote/1, which promotes an arbitrary Value to a literal-matching Pattern. This last one is what allows us to write patterns that match unquote/1 values without tripping over ourselves.

(define (operation? x)
  (cond [(record/class? x 'capture 1) (pattern? (record-field x 0))]
        [(record/class? x 'discard 0) #t]
        [(record/class? x 'quote 1) (value? (record-field x 0))]
        [else #f]))

Examples.

(check-true (pattern? 123))
(check-true (pattern? "Hello"))
(check-true (pattern? 'capture))
(check-true (pattern? 'unquote))
(check-true (pattern? '(Present "Tony")))
(check-true (pattern? '(Says "Tony" "Hello, world!")))
(check-true (pattern? '(Says "Tony" (capture (discard)))))
(check-true (pattern? '(Says "Tony" (unquote (capture (discard))))))

Since pattern? subsumes value?, we see that each of these is both a Value and a Pattern. However, the following term is a Value but not a Pattern, because it does not conform to the definition of the unquoted Operations.

(define non-pattern '(Says "Tony" (unquote (noise))))
(check-true (value? non-pattern))
(check-false (pattern? non-pattern))

We may use quote in conjunction with unquote to specify a pattern that will match our non-pattern example:

(check-true (pattern? '(Says "Tony" ((unquote (quote unquote)) (noise)))))

We may also abuse the Lisp notation for quotation and quasiquotation when writing such patterns, which helps make them a little shorter and perhaps easier to read:

(check-true (pattern? '(Says "Tony" (,'unquote (noise)))))

Finally, this term isn't even a Value: it includes an empty list within it.

(check-false (value? '(Says "Tony" ())))

Pattern matching

Let's call our matching operator Match, with a capital M to distinguish it from Racket's own match syntax.

Match, given a pattern and a value, should answer either a List of the captured values in depth-first, left-to-right order, if the pattern matches the value, or #f if the arguments do not match.

;; ;; Precondition: (and (pattern? pat) (value? val))
;; (define (Match pat val)

;; ;;; First of all, let's take care of matching atoms.

;;   (cond [(atom? pat) (if (equal? pat val) '() #f)]

;; ;;; Next, the special pattern record `unquote/1` and the operators it
;; ;;; contains.

;;         [(record/class? pat 'unquote 1)
;;          (define op (record-field pat 0))

;; ;;; A `capture/1` should prepend the current `val`ue to a succeeding
;; ;;; submatch, and should propagate a failing submatch.

;;          (cond [(record/class? op 'capture 1)
;;                 (define r (Match (record-field op 0) val))
;;                 (and r (cons val r))]

;; ;;; A `discard/0` should accept any `val`ue, yielding a success result
;; ;;; containing no captures.

;;                [(record/class? op 'discard 0)
;;                 '()]

;; ;;; Finally, a `quote/1` should match exactly its argument and nothing
;; ;;; else.

;;                [(record/class? op 'quote 1)
;;                 (if (equal? (record-field op 0) val) '() #f)]

;; (struct DISCARD ())
;; (struct CAPTURE (p))

;; (define (q p)
;;   (match p
;;     [(DISCARD) `(discard)]
;;     [(CAPTURE p) `(capture ,(q p))]
;;     [`(discard) `('discard)]
;;     [`(capture ,p) `('capture ,(q p))]
;;     [`(quote ,p) `('quote ,(q p))]
;;     [`(,pa . ,pd) `(,(q pa) . ,(q pd))]
;;     [a a]))

;; (define (uq p)
;;   (match p
;;     [`(discard) (DISCARD)]
;;     [`(capture ,p) (CAPTURE (uq p))]
;;     [`(quote ,p) p]
;;     [`(,pa . ,pd) `(,(uq pa) . ,(uq pd))]
;;     [a a]))

;; ;; Goal: (uq (q p)) === p

;; ;; (define (m p v k)
;; ;;   (match* (p v)
;; ;;     [(`(discard) _) (k '())]
;; ;;     [(`(capture ,p) v) (m p v (lambda (bs) (cons v bs)))]
;; ;;     [(`(quote ,p) v) (and (equal? p v) (k '()))]
;; ;;     [(`(,pa . ,pd) `(,va . ,vd)) (m pa va (lambda (bs) (m pd vd (lambda (cs) (k (append bs cs))))))]
;; ;;     [(p v) (and (equal? p v) (k '()))]))

;; (define (m p v k)
;;   (match* (p v)
;;     [((DISCARD) _) (k '())]
;;     [((CAPTURE p) v) (m p v (lambda (bs) (cons v bs)))]
;;     [(`(,pa . ,pd) `(,va . ,vd)) (m pa va (lambda (bs) (m pd vd (lambda (cs) (k (append bs cs))))))]
;;     [(p v) (and (equal? p v) (k '()))]))

;; (module+ test
;;   (require rackunit)
;;   ;; (define (M p v) (m p v values))
;;   (define (M p v) (m (uq p) v values))
;;   (check-equal? (M `(capture 1) 1) '(1))
;;   (check-equal? (M `(discard) 1) '())
;;   (check-equal? (M 1 1) '())
;;   (check-equal? (M 2 1) #f)
;;   (check-equal? (M `(capture 2) 1) #f)

;;   (check-equal? (M `(record (capture (discard))) `(record value)) '(value))
;;   (check-equal? (M `(record (capture (discard))) `(record 'value)) '('value))
;;   (check-equal? (M `(record (capture (discard))) `(record (capture (discard))))
;;                 '((capture (discard))))

;;   (check-equal? (M `(record ('capture (discard))) `(record (capture (discard))))
;;                 '())
;;   (check-equal? (M `(record (capture ('capture (discard)))) `(record (capture (discard))))
;;                 '((capture (discard))))
;;   (check-equal? (M `(record (capture ('capture (discard)))) `(record value))
;;                 #f)
;;   (check-equal? (M `(record (capture ('capture (discard)))) `(record 'value))
;;                 #f)
;;   (check-equal? (M `(record (capture ('capture (discard)))) `(record ('capture (discard))))
;;                 #f)

;;   (check-equal? (M `(record '(capture (discard))) `(record (capture (discard))))
;;                 '())
;;   (check-equal? (M `(record (capture '(capture (discard)))) `(record (capture (discard))))
;;                 '((capture (discard))))
;;   (check-equal? (M `(record (capture '(capture (discard)))) `(record value))
;;                 #f)
;;   (check-equal? (M `(record (capture '(capture (discard)))) `(record 'value))
;;                 #f)
;;   (check-equal? (M `(record (capture '(capture (discard)))) `(record '(capture (discard))))
;;                 #f)

;;   (check-equal? (M `(record ('quote value)) `(record (capture (discard)))) #f)
;;   (check-equal? (M `(record (capture ('quote value))) `(record (capture (discard)))) #f)
;;   (check-equal? (M `(record (capture ('quote value))) `(record value)) #f)
;;   (check-equal? (M `(record (capture ('quote value))) `(record 'value)) '('value))
;;   (check-equal? (M `(record (capture ('quote value))) `(record 'notvalue)) #f)
;;   (check-equal? (M `(record ('quote value)) `(record 'value)) '())
;;   (check-equal? (M `(record (capture ('quote value))) `(record ('quote value))) #f)

;;   (check-equal? (M `(record ''value) `(record (capture (discard)))) #f)
;;   (check-equal? (M `(record (capture ''value)) `(record (capture (discard)))) #f)
;;   (check-equal? (M `(record (capture ''value)) `(record value)) #f)
;;   (check-equal? (M `(record (capture ''value)) `(record 'value)) '('value))
;;   (check-equal? (M `(record (capture ''value)) `(record 'notvalue)) #f)
;;   (check-equal? (M `(record ''value) `(record 'value)) '())
;;   (check-equal? (M `(record (capture ''value)) `(record ''value)) #f)

;;   (check-equal? (q (CAPTURE 1)) `(capture 1))
;;   (check-equal? (q (DISCARD)) `(discard))
;;   (check-equal? (q `(capture 1)) `('capture 1))
;;   (check-equal? (q `(discard)) `('discard))

;;   )

  1. Racket supplies regexp-quote (docs) for this purpose: ↩︎

  2. Expressing this in code is straightforward: ↩︎