167 lines
6.1 KiB
Markdown
167 lines
6.1 KiB
Markdown
---
|
|
no_site_title: true
|
|
title: "Preserves Path"
|
|
---
|
|
|
|
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
|
August 2021. Version 0.1.0.
|
|
|
|
XML documents can move into attributes, into text, or into children.
|
|
|
|
Preserves documents don't have attributes, but they do have children
|
|
generally and keyed children in particular. You might want to move
|
|
into the child with a particular key (number, for sequences, or
|
|
general-value for dictionaries); into all keys; into all
|
|
mapped-to-values, i.e. children (n.b. not just for sequences and
|
|
dicts, but also for sets).
|
|
|
|
## Selector
|
|
|
|
A sequence of steps, applied one after the other, flatmap-style.
|
|
|
|
step ... ;; Applies steps one after the other, flatmap-style
|
|
|
|
Each step transforms an input document into zero or more related
|
|
documents. A step is an axis or a filter.
|
|
|
|
## Predicates
|
|
|
|
Predicates: interpret selectors as truth-functions over inputs
|
|
(nonempty output meaning truth), and compose them using and, not, or,
|
|
etc.
|
|
|
|
Precedence groupings from highest to lowest. Within a grouping, no
|
|
mixed precedence is permitted.
|
|
|
|
selector ;; Applies steps one after the other, flatmap-style
|
|
|
|
! pred ;; "not" of a predicate
|
|
|
|
pred + pred + ... ;; "or" of predicates
|
|
pred & pred & ... ;; "and" of predicates
|
|
|
|
## Axes
|
|
|
|
Axes: move around, applying filters after moving
|
|
|
|
/ ;; Moves into immediate children (values / fields)
|
|
// ;; Flattens children recursively
|
|
. key ;; Moves into named child
|
|
.^ ;; Moves into record label
|
|
.keys ;; Moves into *keys* rather than values
|
|
.length ;; Moves into the number of keys
|
|
.annotations ;; Moves into any annotations that might be present
|
|
.embedded ;; Moves into the representation of an embedded value
|
|
% name ;; Moves into successful Preserves Schema parse of definition `name`
|
|
%- name ;; Moves into successful Preserves Schema unparse of definition `name`
|
|
|
|
Sets have children, but no keys/length; Strings, ByteStrings and
|
|
Symbols have no children, but have keys/length.
|
|
|
|
## Filters
|
|
|
|
Filters: narrow down a selection without moving
|
|
|
|
* ;; Accepts all
|
|
[!] ;; Rejects all (just a use of `[pred]`)
|
|
|
|
eq literal ;; Matches values (equal to/less than/greater than/etc.) the literal
|
|
= literal
|
|
ne literal
|
|
!= literal
|
|
lt literal
|
|
gt literal
|
|
le literal
|
|
ge literal
|
|
|
|
re regex ;; Matches strings and symbols by POSIX extended regular expression
|
|
=r regex
|
|
|
|
[pred] ;; Applies predicate to each input; keeps inputs yielding truth
|
|
|
|
^ literal ;; Matches a record having a the literal as its label -- equivalent to [.^ = literal]
|
|
|
|
~real ;; Promotes int and float to double, passes on double unchanged, rejects others
|
|
;; Out-of-range ints (too big or too small) become various double infinities
|
|
;; Converting high-magnitude ints causes loss of precision
|
|
|
|
~int ;; Converts float and double to closest integer, where possible
|
|
;; NaN and infinities are rejected
|
|
|
|
bool ;; Type filters
|
|
float
|
|
double
|
|
int
|
|
string
|
|
bytes
|
|
symbol
|
|
rec
|
|
seq
|
|
set
|
|
dict
|
|
embedded
|
|
|
|
Design choice: Which regular expression dialect to choose? [CDDL (RFC
|
|
8610) goes for XML Schema regular
|
|
expressions](https://www.rfc-editor.org/rfc/rfc8610.html#section-3.8.3),
|
|
which seems like a very sensible choice. The discussion in section
|
|
3.8.3 of RFC 8610 makes some good points. A couple of things that
|
|
occurred to me: (1) the dialect should be backreference-free, allowing
|
|
matching by "[text-directed
|
|
engines](https://www.regular-expressions.info/engine.html)"; (2) it
|
|
should be very widely implemented; (3) it should cover regular
|
|
languages and no more; (4) it should be easy to implement.
|
|
|
|
Design choice: How should comparison work? Should `lt 1.0f` accept not only `0.9f` but also
|
|
`#t` and `#f` (since `Boolean` comes before `Float` in the Preserves total ordering)? Should
|
|
`lt 1.0f` accept `0.9` and `0` as well as `0.9f`?
|
|
|
|
## Functions
|
|
|
|
<count selector> ;; Counts number of results of selector
|
|
|
|
## Transformers
|
|
|
|
e.g. stringify results; sequenceify results (see "+" operator); setify
|
|
results (see "/" and "&" operators); join stringified results with a
|
|
separator
|
|
|
|
## Tool design
|
|
|
|
When processing multiple input documents sequentially, will sometimes
|
|
want a list of results for each document, a set of results for each
|
|
document, or a list flattened into a sequence of outputs for all input
|
|
documents in the sequence. (A flattened set doesn't make sense for
|
|
streaming since the input documents come in a sequence; if the inputs
|
|
were treated as a set represented as a sequence, and outputs were
|
|
buffered in a single large set, that could work out...)
|
|
|
|
## Examples
|
|
|
|
Consider the following Preserves Path selectors, intended to run
|
|
against the [Preserves codec test suite document](tests/samples.pr):
|
|
|
|
- `.annotations ^ Documentation . 0 /`
|
|
|
|
This selects each of the elements (mostly text strings) in the list
|
|
of the `Documentation` record annotating the test suite document
|
|
itself.
|
|
|
|
First, `.annotations` focuses on the annotations of the document.
|
|
Then, `^ Documentation` selects only annotations that are records
|
|
with label `Documentation`. Then, `. 0` selects the first field in
|
|
each record. Finally, `/` replaces each selected value with a
|
|
sequence of its children.
|
|
|
|
- `// [.^ [= Test + = NondeterministicTest]] [. 1 rec]`
|
|
|
|
This selects every deterministic or nondeterministic test case
|
|
where the expected value is a record.
|
|
|
|
First, `//` recursively selects *every* descendant subvalue of the
|
|
root (inclusive). Then, two filters are applied, one after the
|
|
other. The first, `[.^ [= Test + = NondeterministicTest]]`, selects
|
|
record labels, and then filters out all but `Test` and
|
|
`NondeterministicTest`. Then, the second, `[. 1 rec]`, filters out
|
|
all but those where the second field is a record.
|