synit-manual/src/protocols/syndicate/dataspacePatterns.md

131 lines
5.4 KiB
Markdown

# Patterns over assertions
- [`[syndicate-protocols]/schemas/dataspacePatterns.prs`](https://git.syndicate-lang.org/syndicate-lang/syndicate-protocols/src/branch/main/schemas/dataspacePatterns.prs)
Each [subscription record](./dataspace.md) asserted at a dataspace entity contains a pattern
over [Preserves values](../../guide/preserves.md#grammar-of-values).
The pattern language is carefully chosen to be reasonably expressive without closing the door
to efficient indexing of dataspace contents.[^efficient-indexing]
## Interpretation of patterns
A pattern is matched against a candidate input value. Matching can either fail or succeed; if
matching succeeds, a sequence of (numbered) bindings is produced. Each binding in the sequence
corresponds to a (possibly-nested) [binding pattern](#binding) in the overall pattern.
## Example
Consider the pattern:
```preserves
<arr {0:<lit 1> 1:<bind <arr {0:<bind <_>> 1:<_>}>> 2:<_>}>
```
Each of the following values yields different results when matched against it:
- `[1 2 3]` fails, because `2` is not an array.
- `[1 [2 3] 4]` succeeds, yielding a binding sequence `[[2 3] 2]`, because the outer `bind`
captures the whole `[2 3]` array, and the inner (nested) `bind` captures the `2`.
- `[1 [2] 5]` fails, because `[2]` lacks an element at index 1.
- However, `[1 [2 3 4] 5]` succeeds, yielding a binding sequence `[[2 3 4] 2]`, because
`[2 3 4]` has *at least* the expected two elements at indexes 0 and 1.
- `[1 [<x> <y>] []]` succeeds, yielding a binding sequence `[[<x> <y>] <x>]`. Each discard
pattern (`<_>`) ignores the specific input it is given.
## Abstract syntax of patterns
A *pattern* may be either a *discard*, a (nested) *binding*, a *literal*, or a *compound*.
```preserves-schema
Pattern = DDiscard / DBind / DLit / DCompound .
```
### Discard
A *discard pattern* matches any input value.
```preserves-schema
DDiscard = <_>.
```
### Binding
A *binding pattern* speculatively pushes the portion of the input under consideration onto the
end of the binding sequence being built, and then recursively evaluates its subpattern. If the
subpattern succeeds, so does the overall binding pattern (keeping the binding); otherwise, the
speculative addition to the binding sequence is undone, and the overall binding pattern fails.
```preserves-schema
DBind = <bind @pattern Pattern>.
```
### Literal
A *literal pattern* matches any **atomic** Preserves value. In order to match a literal
**compound** value, a combination of compound and literal patterns must be used.
```preserves-schema
DLit = <lit @value AnyAtom>.
AnyAtom =
/ @bool bool
/ @float float
/ @double double
/ @int int
/ @string string
/ @bytes bytes
/ @symbol symbol
/ @embedded #:any
.
```
### Compound
Each *compound pattern* first checks the type of its input: a `rec` pattern fails unless it is
given a Record, an `arr` demands a Sequence and a `dict` only matches a Dictionary.
```preserves-schema
DCompound = <rec @label any @fields { int: Pattern ...:... }>
/ <arr @items { int: Pattern ...:... }>
/ <dict @entries { any: Pattern ...:... }> .
```
If the type check fails, the pattern match fails. Otherwise, matching continues:
- `rec` patterns compare the label of the input Record against the `label` field in the
pattern; unless they match literally and exactly, the overall match fails. Otherwise,
subpatterns in `fields` are considered in increasing order of key. Each key in `fields` is
the index of an input record field to examine. Each subpattern is matched against the
corresponding field in the input, failing if no such field exists. The overall pattern thus
ignores any position in the input record for which no subpattern exists.
- `arr` patterns are similar, except without the need for a label check. Subpatterns in
`items` are processed in order of key. Each key is interpreted as an element index. Excess
or unexamined input elements are ignored.
- `dict` patterns consider each of the key/subpattern pairs in `entries` in turn, according to
the Preserves order of the keys.[^dict-pattern-ordering] If any given key from the pattern
is not present in the input value, matching fails. Otherwise, matching proceeds recursively.
The pattern ignores keys in the input value that are not mentioned in the pattern.
These rules treat Record, Sequence and Dictionary values similarly: matching will succeed if
such values have *more* than the structure and information required of them by a given pattern,
but not if they have less. This allows for protocol extension: for example, records with
"extra" fields will continue to match.
---
[^efficient-indexing]: Most implementations of Syndicated Actor Model dataspaces use an
efficient index datastructure [described here](https://git.syndicate-lang.org/syndicate-lang/syndicate-rkt/src/commit/90c4c60699069b496491b81ee63b5a45ffd638cb/syndicate/HOWITWORKS.md).
[^dict-pattern-ordering]: The ordering of visiting of keys in a `dict` pattern is important
because bindings are *numbered* in this pattern language, not named. Recall that `<dict {a:
<bind <_>>, b: <bind <_>>}>` is an identical Preserves value to `<dict {b: <<bind <_>>, a:
<bind <_>>}>`, so to guarantee consistent binding results, we must choose some
deterministic order for visiting the subpatterns of the `dict`. (In this example, `a` will
be visited before `b`, because `a` < `b`).