preserves/preserves-path.md

157 lines
5.6 KiB
Markdown

---
no_site_title: true
title: "Preserves Path"
---
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
August 2021. Version 0.1.0.
XML documents can move into attributes, into text, or into children.
Preserves documents don't have attributes, but they do have children
generally and keyed children in particular. You might want to move
into the child with a particular key (number, for sequences, or
general-value for dictionaries); into all keys; into all
mapped-to-values, i.e. children (n.b. not just for sequences and
dicts, but also for sets).
## Selector
A sequence of steps, applied one after the other, flatmap-style.
step ... ;; Applies steps one after the other, flatmap-style
Each step transforms an input document into zero or more related
documents. A step is an axis or a filter.
## Predicates
Predicates: interpret selectors as truth-functions over inputs
(nonempty output meaning truth), and compose them using and, not, or,
etc.
Precedence groupings from highest to lowest. Within a grouping, no
mixed precedence is permitted.
selector ;; Applies steps one after the other, flatmap-style
! pred ;; "not" of a predicate
pred + pred + ... ;; "or" of predicates
pred & pred & ... ;; "and" of predicates
## Axes
Axes: move around, applying filters after moving
/ ;; Moves into immediate children (values / fields)
// ;; Flattens children recursively
. key ;; Moves into named child
.^ ;; Moves into record label
.keys ;; Moves into *keys* rather than values
.length ;; Moves into the number of keys
.annotations ;; Moves into any annotations that might be present
.embedded ;; Moves into the representation of an embedded value
Sets have children, but no keys/length; Strings, ByteStrings and
Symbols have no children, but have keys/length.
## Filters
Filters: narrow down a selection without moving
* ;; Accepts all
[!] ;; Rejects all (just a use of `[pred]`)
eq literal ;; Matches values (equal to/less than/greater than/etc.) the literal
= literal
ne literal
!= literal
lt literal
gt literal
le literal
ge literal
re regex ;; Matches strings and symbols by POSIX extended regular expression
=r regex
[pred] ;; Applies predicate to each input; keeps inputs yielding truth
^ literal ;; Matches a record having a the literal as its label -- equivalent to [.^ = literal]
~real ;; Promotes int and float to double, passes on double unchanged, rejects others
;; Out-of-range ints (too big or too small) become various double infinities
;; Converting high-magnitude ints causes loss of precision
~int ;; Converts float and double to closest integer, where possible
;; NaN and infinities are rejected
bool ;; Type filters
float
double
int
string
bytes
symbol
rec
seq
set
dict
embedded
Design choice: Which regular expression dialect to choose? [CDDL (RFC
8610) goes for XML Schema regular
expressions](https://www.rfc-editor.org/rfc/rfc8610.html#section-3.8.3),
which seems like a very sensible choice. The discussion in section
3.8.3 of RFC 8610 makes some good points. A couple of things that
occurred to me: (1) the dialect should be backreference-free, allowing
matching by "[text-directed
engines](https://www.regular-expressions.info/engine.html)"; (2) it
should be very widely implemented; (3) it should cover regular
languages and no more; (4) it should be easy to implement.
## Transformers
e.g. stringify results; sequenceify results (see "+" operator); setify
results (see "/" and "&" operators); join stringified results with a
separator
## Tool design
When processing multiple input documents sequentially, will sometimes
want a list of results for each document, a set of results for each
document, or a list flattened into a sequence of outputs for all input
documents in the sequence. (A flattened set doesn't make sense for
streaming since the input documents come in a sequence; if the inputs
were treated as a set represented as a sequence, and outputs were
buffered in a single large set, that could work out...)
## Examples
Consider the following Preserves Path selectors, intended to run
against the [Preserves codec test suite document](tests/samples.pr):
- `.annotations ^ Documentation . 0 /`
This selects each of the elements (mostly text strings) in the list
of the `Documentation` record annotating the test suite document
itself.
First, `.annotations` focuses on the annotations of the document.
Then, `^ Documentation` selects only annotations that are records
with label `Documentation`. Then, `. 0` selects the first field in
each record. Finally, `/` replaces each selected value with a
sequence of its children.
- `// [.^ [= Test + = NondeterministicTest]] [. 1 rec]`
This selects every deterministic or nondeterministic test case
where the expected value is a record.
First, `//` recursively selects *every* descendant subvalue of the
root (inclusive). Then, two filters are applied, one after the
other. The first, `[.^ [= Test + = NondeterministicTest]]`, selects
record labels, and then filters out all but `Test` and
`NondeterministicTest`. Then, the second, `[. 1 rec]`, filters out
all but those where the second field is a record.