;;; --- ;;; title: (Un)quotation in layered pub/sub networks ;;; --- ;;; ;;; Tony Garnock-Jones ;;; June 2020. ;;; ## Introduction ;;; There's potential for confusion whenever *patterns* are drawn from ;;; the same set that the *values to be matched* come from. ;;; In layered pub/sub networks like ;;; [Syndicate](https://syndicate-lang.org/), publisher-subscriber ;;; relationships are set up using messages carried over the ;;; publish-subscribe medium itself. ;;; Subscribers build messages containing patterns describing other ;;; messages they are interested in. ;;; But in general, these messages travel across multiple layered ;;; pub/sub dataspaces. To avoid confusion, we have to be very clear ;;; about when a piece of data denotes a pattern and when it is just a ;;; piece of data. ;;; In this post, I'll describe the approach I want to take to solve ;;; this problem for Syndicate in the general case. ;;; ### What's the problem? ;;; Consider regular expressions. The regex `ab*c` matches `ac`, ;;; `abc`, `abbc`, and so on; but what if we wanted to match the ;;; literal string `ab*c`? We have to escape the star in the pattern, ;;; writing the pattern as `ab\*c`. ;;; #### Quotation ;;; If we want to use a regex engine to search for a literal string ;;; supplied by someone else---perhaps a user, perhaps an API ;;; client---we have to take care to *quote* the literal string, in ;;; order to avoid confusion caused by metacharacters:[^racket-regexp-quote] ;;; | Literal string | Quoted form | ;;; | --- | --- | ;;; | `no metachars` | `no metachars` | ;;; | `ab*c` | `ab\*c` | ;;; | `...` | `\.\.\.` | ;;; | `(hello \$\ [world])` | `\(hello \\\$\\ \[world\]\)` | ;;; [^racket-regexp-quote]: Racket supplies ;;; [`regexp-quote`](https://github.com/racket/racket/blob/5bb837661c12a9752c6a99f952c0e1b267645b33/racket/collects/racket/private/string.rkt#L38-L51) ;;; ([docs](https://docs.racket-lang.org/reference/regexp.html?q=reg%20quo#(def._((lib._racket%2Fprivate%2Fbase..rkt)._regexp-quote)))) ;;; for this purpose: (check-equal? (regexp-quote "ab*c") "ab\\*c") (check-equal? (regexp-quote "...") "\\.\\.\\.") (check-equal? (regexp-quote "(hello \\$\\ [world])") "\\(hello \\\\\\$\\\\ \\[world\\]\\)") ;;; #### Invariant ;;; The invariant we want to rely on is ;;; ;;; > every string, after quotation, becomes a regular expression that matches ;;; > exactly and only the original string.[^racket-regexp-quotation-invariant] ;;; [^racket-regexp-quotation-invariant]: ;;; Expressing this in code is straightforward: (define (regexp-quotation-invariant original-string some-other-string) (equal? (string=? original-string some-other-string) (regexp-match? (regexp-quote original-string) some-other-string))) ;;; We could use Racket's [random data ;;; generation](https://docs.racket-lang.org/data/Enumerations.html?q=data%2Fen#%28mod-path._data%2Fenumerate%29) ;;; tools to perform randomized testing of this invariant. ;;; Of course, this same problem crops up in any pattern language ;;; where embedding of values in patterns makes sense, so let's ;;; generalize accordingly: ;;; ;;; > every value, after quotation, becomes a pattern that matches ;;; > exactly and only the original value. ;;; ### Quotation of pub/sub patterns. ;;; In [Syndicate](https://syndicate-lang.org/), we want patterns over ;;; assertions (and thus over messages) to be able to match arbitrary ;;; [Preserves](.) atoms, sequences, and records. ;;; However, to date, there have been special exceptions. The special ;;; records `capture/1` and `discard/0`, embedded in patterns, have ;;; been interpreted as defining pattern variables and wildcards, ;;; respectively. There has been no way to construct a pattern that ;;; matches `capture/1` or `discard/0` records themselves. ;;; Previously, I tried adding a `quote/1` record to the set of ;;; special record types. Experimentation [demonstrated](quoting.rkt) ;;; that patterns with `quote/1` were able to escape the limitations ;;; of patterns without `quote/1`. Also, it's [a theorem](quoting.v) ;;; that quoting is "sensible": that unquotation after quotation of a ;;; pattern always equals the original pattern. ;;; ;;; In this post, I'll explore an alternative I think might be better: ;;; letting patterns match anything except a *single* special record, ;;; `unquote/1`. Patterns use `unquote/1` to escape into a ;;; *metalanguage* where special operators like capture, discard, and ;;; so on can be discussed. ;;; ## Definitions ;;; ;;; We'll work with a simplified data language. Instead of full ;;; Preserves, which includes lists, maps, records, a variety of atoms ;;; and so forth, we'll stick to just records and atoms. ;;; ;;; `Value`s are the things our `Pattern`s are to match. ;;; ;;; `Value`s that are `Lists` denote records: the first `Value` is the ;;; label of the record, and the remaining `Value`s are its fields. (define (record? x) (and (list? x) (positive? (length x)) (andmap value? x))) (define (record-label x) (car x)) (define (record-fields x) (cdr x)) (define (record-arity x) (length (record-fields x))) (define (record-field x n) (list-ref x (+ n 1))) ;;; Atoms, then, are the non-`List` `Value`s. (define (atom? x) (or (string? x) (symbol? x) (number? x))) ;;; `Value`s are the records and the atoms, and nothing else. (define (value? x) (or (atom? x) (record? x))) ;;; We will frequently need to decide whether a `Value` is a record ;;; with a particular label and arity. (define (record/class? x label arity) (and (record? x) (equal? (record-label x) label) (= (record-arity x) arity))) ;;; A `Pattern` is a `Value` with certain additional constraints on ;;; the `Operation`s that appear within an `unquote/1` record. (define (pattern? x) (and (value? x) (or (atom? x) (if (record/class? x 'unquote 1) (operation? (record-field x 0)) (andmap pattern? x))))) ;;; A valid `Operation` is either a `capture/1`, containing another ;;; pattern; a `discard/0`, denoting wildcard; or `quote/1`, which ;;; promotes an *arbitrary* `Value` to a literal-matching `Pattern`. ;;; This last one is what allows us to write patterns that match ;;; `unquote/1` values without tripping over ourselves. (define (operation? x) (cond [(record/class? x 'capture 1) (pattern? (record-field x 0))] [(record/class? x 'discard 0) #t] [(record/class? x 'quote 1) (value? (record-field x 0))] [else #f])) ;;; **Examples**. (check-true (pattern? 123)) (check-true (pattern? "Hello")) (check-true (pattern? 'capture)) (check-true (pattern? 'unquote)) (check-true (pattern? '(Present "Tony"))) (check-true (pattern? '(Says "Tony" "Hello, world!"))) (check-true (pattern? '(Says "Tony" (capture (discard))))) (check-true (pattern? '(Says "Tony" (unquote (capture (discard)))))) ;;; Since `pattern?` subsumes `value?`, we see that each of these is ;;; both a `Value` and a `Pattern`. However, the following term is a ;;; `Value` but not a `Pattern`, because it does not conform to the ;;; definition of the `unquote`d `Operation`s. ;;; (define non-pattern '(Says "Tony" (unquote (noise)))) (check-true (value? non-pattern)) (check-false (pattern? non-pattern)) ;;; ;;; We may use `quote` in conjunction with `unquote` to specify a ;;; pattern that will match our `non-pattern` example: (check-true (pattern? '(Says "Tony" ((unquote (quote unquote)) (noise))))) ;;; We may also abuse the Lisp notation for quotation and ;;; quasiquotation when writing such patterns, which helps make them a ;;; little shorter and perhaps easier to read: (check-true (pattern? '(Says "Tony" (,'unquote (noise))))) ;;; Finally, this term isn't even a `Value`: it includes an empty list ;;; within it. (check-false (value? '(Says "Tony" ()))) ;;; ## Pattern matching ;;; ;;; Let's call our matching operator `Match`, with a capital `M` to ;;; distinguish it from Racket's own `match` syntax. ;;; ;;; `Match`, given a `pat`tern and a `val`ue, should answer either a ;;; `List` of the captured values in depth-first, left-to-right order, ;;; if the pattern matches the value, or `#f` if the arguments do not ;;; match. ;; ;; Precondition: (and (pattern? pat) (value? val)) ;; (define (Match pat val) ;; ;;; First of all, let's take care of matching atoms. ;; (cond [(atom? pat) (if (equal? pat val) '() #f)] ;; ;;; Next, the special pattern record `unquote/1` and the operators it ;; ;;; contains. ;; [(record/class? pat 'unquote 1) ;; (define op (record-field pat 0)) ;; ;;; A `capture/1` should prepend the current `val`ue to a succeeding ;; ;;; submatch, and should propagate a failing submatch. ;; (cond [(record/class? op 'capture 1) ;; (define r (Match (record-field op 0) val)) ;; (and r (cons val r))] ;; ;;; A `discard/0` should accept any `val`ue, yielding a success result ;; ;;; containing no captures. ;; [(record/class? op 'discard 0) ;; '()] ;; ;;; Finally, a `quote/1` should match exactly its argument and nothing ;; ;;; else. ;; [(record/class? op 'quote 1) ;; (if (equal? (record-field op 0) val) '() #f)] ;;; --------------------------------------------------------------------------- ;; (struct DISCARD ()) ;; (struct CAPTURE (p)) ;; (define (q p) ;; (match p ;; [(DISCARD) `(discard)] ;; [(CAPTURE p) `(capture ,(q p))] ;; [`(discard) `('discard)] ;; [`(capture ,p) `('capture ,(q p))] ;; [`(quote ,p) `('quote ,(q p))] ;; [`(,pa . ,pd) `(,(q pa) . ,(q pd))] ;; [a a])) ;; (define (uq p) ;; (match p ;; [`(discard) (DISCARD)] ;; [`(capture ,p) (CAPTURE (uq p))] ;; [`(quote ,p) p] ;; [`(,pa . ,pd) `(,(uq pa) . ,(uq pd))] ;; [a a])) ;; ;; Goal: (uq (q p)) === p ;; ;; (define (m p v k) ;; ;; (match* (p v) ;; ;; [(`(discard) _) (k '())] ;; ;; [(`(capture ,p) v) (m p v (lambda (bs) (cons v bs)))] ;; ;; [(`(quote ,p) v) (and (equal? p v) (k '()))] ;; ;; [(`(,pa . ,pd) `(,va . ,vd)) (m pa va (lambda (bs) (m pd vd (lambda (cs) (k (append bs cs))))))] ;; ;; [(p v) (and (equal? p v) (k '()))])) ;; (define (m p v k) ;; (match* (p v) ;; [((DISCARD) _) (k '())] ;; [((CAPTURE p) v) (m p v (lambda (bs) (cons v bs)))] ;; [(`(,pa . ,pd) `(,va . ,vd)) (m pa va (lambda (bs) (m pd vd (lambda (cs) (k (append bs cs))))))] ;; [(p v) (and (equal? p v) (k '()))])) ;; (module+ test ;; (require rackunit) ;; ;; (define (M p v) (m p v values)) ;; (define (M p v) (m (uq p) v values)) ;; (check-equal? (M `(capture 1) 1) '(1)) ;; (check-equal? (M `(discard) 1) '()) ;; (check-equal? (M 1 1) '()) ;; (check-equal? (M 2 1) #f) ;; (check-equal? (M `(capture 2) 1) #f) ;; (check-equal? (M `(record (capture (discard))) `(record value)) '(value)) ;; (check-equal? (M `(record (capture (discard))) `(record 'value)) '('value)) ;; (check-equal? (M `(record (capture (discard))) `(record (capture (discard)))) ;; '((capture (discard)))) ;; (check-equal? (M `(record ('capture (discard))) `(record (capture (discard)))) ;; '()) ;; (check-equal? (M `(record (capture ('capture (discard)))) `(record (capture (discard)))) ;; '((capture (discard)))) ;; (check-equal? (M `(record (capture ('capture (discard)))) `(record value)) ;; #f) ;; (check-equal? (M `(record (capture ('capture (discard)))) `(record 'value)) ;; #f) ;; (check-equal? (M `(record (capture ('capture (discard)))) `(record ('capture (discard)))) ;; #f) ;; (check-equal? (M `(record '(capture (discard))) `(record (capture (discard)))) ;; '()) ;; (check-equal? (M `(record (capture '(capture (discard)))) `(record (capture (discard)))) ;; '((capture (discard)))) ;; (check-equal? (M `(record (capture '(capture (discard)))) `(record value)) ;; #f) ;; (check-equal? (M `(record (capture '(capture (discard)))) `(record 'value)) ;; #f) ;; (check-equal? (M `(record (capture '(capture (discard)))) `(record '(capture (discard)))) ;; #f) ;; (check-equal? (M `(record ('quote value)) `(record (capture (discard)))) #f) ;; (check-equal? (M `(record (capture ('quote value))) `(record (capture (discard)))) #f) ;; (check-equal? (M `(record (capture ('quote value))) `(record value)) #f) ;; (check-equal? (M `(record (capture ('quote value))) `(record 'value)) '('value)) ;; (check-equal? (M `(record (capture ('quote value))) `(record 'notvalue)) #f) ;; (check-equal? (M `(record ('quote value)) `(record 'value)) '()) ;; (check-equal? (M `(record (capture ('quote value))) `(record ('quote value))) #f) ;; (check-equal? (M `(record ''value) `(record (capture (discard)))) #f) ;; (check-equal? (M `(record (capture ''value)) `(record (capture (discard)))) #f) ;; (check-equal? (M `(record (capture ''value)) `(record value)) #f) ;; (check-equal? (M `(record (capture ''value)) `(record 'value)) '('value)) ;; (check-equal? (M `(record (capture ''value)) `(record 'notvalue)) #f) ;; (check-equal? (M `(record ''value) `(record 'value)) '()) ;; (check-equal? (M `(record (capture ''value)) `(record ''value)) #f) ;; (check-equal? (q (CAPTURE 1)) `(capture 1)) ;; (check-equal? (q (DISCARD)) `(discard)) ;; (check-equal? (q `(capture 1)) `('capture 1)) ;; (check-equal? (q `(discard)) `('discard)) ;; )