preserves/unquoting.rkt

359 lines
14 KiB
Racket

;;; ---
;;; title: (Un)quotation in layered pub/sub networks
;;; ---
;;; <!--
#lang racket
(require rackunit)
(require racket/contract)
;;; -->
;;; Tony Garnock-Jones <tonyg@leastfixedpoint.com>
;;; June 2020.
;;; ## Introduction
;;; There's potential for confusion whenever *patterns* are drawn from
;;; the same set that the *values to be matched* come from.
;;; In layered pub/sub networks like
;;; [Syndicate](https://syndicate-lang.org/), publisher-subscriber
;;; relationships are set up using messages carried over the
;;; publish-subscribe medium itself.
;;; Subscribers build messages containing patterns describing other
;;; messages they are interested in.
;;; But in general, these messages travel across multiple layered
;;; pub/sub dataspaces. To avoid confusion, we have to be very clear
;;; about when a piece of data denotes a pattern and when it is just a
;;; piece of data.
;;; In this post, I'll describe the approach I want to take to solve
;;; this problem for Syndicate in the general case.
;;; ### What's the problem?
;;; Consider regular expressions. The regex `ab*c` matches `ac`,
;;; `abc`, `abbc`, and so on; but what if we wanted to match the
;;; literal string `ab*c`? We have to escape the star in the pattern,
;;; writing the pattern as `ab\*c`.
;;; #### Quotation
;;; If we want to use a regex engine to search for a literal string
;;; supplied by someone else---perhaps a user, perhaps an API
;;; client---we have to take care to *quote* the literal string, in
;;; order to avoid confusion caused by metacharacters:[^racket-regexp-quote]
;;; | Literal string | Quoted form |
;;; | --- | --- |
;;; | `no metachars` | `no metachars` |
;;; | `ab*c` | `ab\*c` |
;;; | `...` | `\.\.\.` |
;;; | `(hello \$\ [world])` | `\(hello \\\$\\ \[world\]\)` |
;;; [^racket-regexp-quote]: Racket supplies
;;; [`regexp-quote`](https://github.com/racket/racket/blob/5bb837661c12a9752c6a99f952c0e1b267645b33/racket/collects/racket/private/string.rkt#L38-L51)
;;; ([docs](https://docs.racket-lang.org/reference/regexp.html?q=reg%20quo#(def._((lib._racket%2Fprivate%2Fbase..rkt)._regexp-quote))))
;;; for this purpose:
(check-equal? (regexp-quote "ab*c") "ab\\*c")
(check-equal? (regexp-quote "...") "\\.\\.\\.")
(check-equal? (regexp-quote "(hello \\$\\ [world])") "\\(hello \\\\\\$\\\\ \\[world\\]\\)")
;;; #### Invariant
;;; The invariant we want to rely on is
;;;
;;; > every string, after quotation, becomes a regular expression that matches
;;; > exactly and only the original string.[^racket-regexp-quotation-invariant]
;;; [^racket-regexp-quotation-invariant]:
;;; Expressing this in code is straightforward:
(define (regexp-quotation-invariant original-string some-other-string)
(equal? (string=? original-string some-other-string)
(regexp-match? (regexp-quote original-string) some-other-string)))
;;; We could use Racket's [random data
;;; generation](https://docs.racket-lang.org/data/Enumerations.html?q=data%2Fen#%28mod-path._data%2Fenumerate%29)
;;; tools to perform randomized testing of this invariant.
;;; Of course, this same problem crops up in any pattern language
;;; where embedding of values in patterns makes sense, so let's
;;; generalize accordingly:
;;;
;;; > every value, after quotation, becomes a pattern that matches
;;; > exactly and only the original value.
;;; ### Quotation of pub/sub patterns.
;;; In [Syndicate](https://syndicate-lang.org/), we want patterns over
;;; assertions (and thus over messages) to be able to match arbitrary
;;; [Preserves](.) atoms, sequences, and records.
;;; However, to date, there have been special exceptions. The special
;;; records `capture/1` and `discard/0`, embedded in patterns, have
;;; been interpreted as defining pattern variables and wildcards,
;;; respectively. There has been no way to construct a pattern that
;;; matches `capture/1` or `discard/0` records themselves.
;;; Previously, I tried adding a `quote/1` record to the set of
;;; special record types. Experimentation [demonstrated](quoting.rkt)
;;; that patterns with `quote/1` were able to escape the limitations
;;; of patterns without `quote/1`. Also, it's [a theorem](quoting.v)
;;; that quoting is "sensible": that unquotation after quotation of a
;;; pattern always equals the original pattern.
;;;
;;; In this post, I'll explore an alternative I think might be better:
;;; letting patterns match anything except a *single* special record,
;;; `unquote/1`. Patterns use `unquote/1` to escape into a
;;; *metalanguage* where special operators like capture, discard, and
;;; so on can be discussed.
;;; ## Definitions
;;;
;;; We'll work with a simplified data language. Instead of full
;;; Preserves, which includes lists, maps, records, a variety of atoms
;;; and so forth, we'll stick to just records and atoms.
;;;
;;; `Value`s are the things our `Pattern`s are to match.
;;;
;;; `Value`s that are `Lists` denote records: the first `Value` is the
;;; label of the record, and the remaining `Value`s are its fields.
(define (record? x)
(and (list? x)
(positive? (length x))
(andmap value? x)))
(define (record-label x) (car x))
(define (record-fields x) (cdr x))
(define (record-arity x) (length (record-fields x)))
(define (record-field x n) (list-ref x (+ n 1)))
;;; Atoms, then, are the non-`List` `Value`s.
(define (atom? x)
(or (string? x)
(symbol? x)
(number? x)))
;;; `Value`s are the records and the atoms, and nothing else.
(define (value? x)
(or (atom? x)
(record? x)))
;;; We will frequently need to decide whether a `Value` is a record
;;; with a particular label and arity.
(define (record/class? x label arity)
(and (record? x)
(equal? (record-label x) label)
(= (record-arity x) arity)))
;;; A `Pattern` is a `Value` with certain additional constraints on
;;; the `Operation`s that appear within an `unquote/1` record.
(define (pattern? x)
(and (value? x)
(or (atom? x)
(if (record/class? x 'unquote 1)
(operation? (record-field x 0))
(andmap pattern? x)))))
;;; A valid `Operation` is either a `capture/1`, containing another
;;; pattern; a `discard/0`, denoting wildcard; or `quote/1`, which
;;; promotes an *arbitrary* `Value` to a literal-matching `Pattern`.
;;; This last one is what allows us to write patterns that match
;;; `unquote/1` values without tripping over ourselves.
(define (operation? x)
(cond [(record/class? x 'capture 1) (pattern? (record-field x 0))]
[(record/class? x 'discard 0) #t]
[(record/class? x 'quote 1) (value? (record-field x 0))]
[else #f]))
;;; **Examples**.
(check-true (pattern? 123))
(check-true (pattern? "Hello"))
(check-true (pattern? 'capture))
(check-true (pattern? 'unquote))
(check-true (pattern? '(Present "Tony")))
(check-true (pattern? '(Says "Tony" "Hello, world!")))
(check-true (pattern? '(Says "Tony" (capture (discard)))))
(check-true (pattern? '(Says "Tony" (unquote (capture (discard))))))
;;; Since `pattern?` subsumes `value?`, we see that each of these is
;;; both a `Value` and a `Pattern`. However, the following term is a
;;; `Value` but not a `Pattern`, because it does not conform to the
;;; definition of the `unquote`d `Operation`s.
;;; <!--
(let ()
;;; -->
(define non-pattern '(Says "Tony" (unquote (noise))))
(check-true (value? non-pattern))
(check-false (pattern? non-pattern))
;;; <!--
)
;;; -->
;;; We may use `quote` in conjunction with `unquote` to specify a
;;; pattern that will match our `non-pattern` example:
(check-true (pattern? '(Says "Tony" ((unquote (quote unquote)) (noise)))))
;;; We may also abuse the Lisp notation for quotation and
;;; quasiquotation when writing such patterns, which helps make them a
;;; little shorter and perhaps easier to read:
(check-true (pattern? '(Says "Tony" (,'unquote (noise)))))
;;; Finally, this term isn't even a `Value`: it includes an empty list
;;; within it.
(check-false (value? '(Says "Tony" ())))
;;; ## Pattern matching
;;;
;;; Let's call our matching operator `Match`, with a capital `M` to
;;; distinguish it from Racket's own `match` syntax.
;;;
;;; `Match`, given a `pat`tern and a `val`ue, should answer either a
;;; `List` of the captured values in depth-first, left-to-right order,
;;; if the pattern matches the value, or `#f` if the arguments do not
;;; match.
;; ;; Precondition: (and (pattern? pat) (value? val))
;; (define (Match pat val)
;; ;;; First of all, let's take care of matching atoms.
;; (cond [(atom? pat) (if (equal? pat val) '() #f)]
;; ;;; Next, the special pattern record `unquote/1` and the operators it
;; ;;; contains.
;; [(record/class? pat 'unquote 1)
;; (define op (record-field pat 0))
;; ;;; A `capture/1` should prepend the current `val`ue to a succeeding
;; ;;; submatch, and should propagate a failing submatch.
;; (cond [(record/class? op 'capture 1)
;; (define r (Match (record-field op 0) val))
;; (and r (cons val r))]
;; ;;; A `discard/0` should accept any `val`ue, yielding a success result
;; ;;; containing no captures.
;; [(record/class? op 'discard 0)
;; '()]
;; ;;; Finally, a `quote/1` should match exactly its argument and nothing
;; ;;; else.
;; [(record/class? op 'quote 1)
;; (if (equal? (record-field op 0) val) '() #f)]
;;; ---------------------------------------------------------------------------
;; (struct DISCARD ())
;; (struct CAPTURE (p))
;; (define (q p)
;; (match p
;; [(DISCARD) `(discard)]
;; [(CAPTURE p) `(capture ,(q p))]
;; [`(discard) `('discard)]
;; [`(capture ,p) `('capture ,(q p))]
;; [`(quote ,p) `('quote ,(q p))]
;; [`(,pa . ,pd) `(,(q pa) . ,(q pd))]
;; [a a]))
;; (define (uq p)
;; (match p
;; [`(discard) (DISCARD)]
;; [`(capture ,p) (CAPTURE (uq p))]
;; [`(quote ,p) p]
;; [`(,pa . ,pd) `(,(uq pa) . ,(uq pd))]
;; [a a]))
;; ;; Goal: (uq (q p)) === p
;; ;; (define (m p v k)
;; ;; (match* (p v)
;; ;; [(`(discard) _) (k '())]
;; ;; [(`(capture ,p) v) (m p v (lambda (bs) (cons v bs)))]
;; ;; [(`(quote ,p) v) (and (equal? p v) (k '()))]
;; ;; [(`(,pa . ,pd) `(,va . ,vd)) (m pa va (lambda (bs) (m pd vd (lambda (cs) (k (append bs cs))))))]
;; ;; [(p v) (and (equal? p v) (k '()))]))
;; (define (m p v k)
;; (match* (p v)
;; [((DISCARD) _) (k '())]
;; [((CAPTURE p) v) (m p v (lambda (bs) (cons v bs)))]
;; [(`(,pa . ,pd) `(,va . ,vd)) (m pa va (lambda (bs) (m pd vd (lambda (cs) (k (append bs cs))))))]
;; [(p v) (and (equal? p v) (k '()))]))
;; (module+ test
;; (require rackunit)
;; ;; (define (M p v) (m p v values))
;; (define (M p v) (m (uq p) v values))
;; (check-equal? (M `(capture 1) 1) '(1))
;; (check-equal? (M `(discard) 1) '())
;; (check-equal? (M 1 1) '())
;; (check-equal? (M 2 1) #f)
;; (check-equal? (M `(capture 2) 1) #f)
;; (check-equal? (M `(record (capture (discard))) `(record value)) '(value))
;; (check-equal? (M `(record (capture (discard))) `(record 'value)) '('value))
;; (check-equal? (M `(record (capture (discard))) `(record (capture (discard))))
;; '((capture (discard))))
;; (check-equal? (M `(record ('capture (discard))) `(record (capture (discard))))
;; '())
;; (check-equal? (M `(record (capture ('capture (discard)))) `(record (capture (discard))))
;; '((capture (discard))))
;; (check-equal? (M `(record (capture ('capture (discard)))) `(record value))
;; #f)
;; (check-equal? (M `(record (capture ('capture (discard)))) `(record 'value))
;; #f)
;; (check-equal? (M `(record (capture ('capture (discard)))) `(record ('capture (discard))))
;; #f)
;; (check-equal? (M `(record '(capture (discard))) `(record (capture (discard))))
;; '())
;; (check-equal? (M `(record (capture '(capture (discard)))) `(record (capture (discard))))
;; '((capture (discard))))
;; (check-equal? (M `(record (capture '(capture (discard)))) `(record value))
;; #f)
;; (check-equal? (M `(record (capture '(capture (discard)))) `(record 'value))
;; #f)
;; (check-equal? (M `(record (capture '(capture (discard)))) `(record '(capture (discard))))
;; #f)
;; (check-equal? (M `(record ('quote value)) `(record (capture (discard)))) #f)
;; (check-equal? (M `(record (capture ('quote value))) `(record (capture (discard)))) #f)
;; (check-equal? (M `(record (capture ('quote value))) `(record value)) #f)
;; (check-equal? (M `(record (capture ('quote value))) `(record 'value)) '('value))
;; (check-equal? (M `(record (capture ('quote value))) `(record 'notvalue)) #f)
;; (check-equal? (M `(record ('quote value)) `(record 'value)) '())
;; (check-equal? (M `(record (capture ('quote value))) `(record ('quote value))) #f)
;; (check-equal? (M `(record ''value) `(record (capture (discard)))) #f)
;; (check-equal? (M `(record (capture ''value)) `(record (capture (discard)))) #f)
;; (check-equal? (M `(record (capture ''value)) `(record value)) #f)
;; (check-equal? (M `(record (capture ''value)) `(record 'value)) '('value))
;; (check-equal? (M `(record (capture ''value)) `(record 'notvalue)) #f)
;; (check-equal? (M `(record ''value) `(record 'value)) '())
;; (check-equal? (M `(record (capture ''value)) `(record ''value)) #f)
;; (check-equal? (q (CAPTURE 1)) `(capture 1))
;; (check-equal? (q (DISCARD)) `(discard))
;; (check-equal? (q `(capture 1)) `('capture 1))
;; (check-equal? (q `(discard)) `('discard))
;; )