14 KiB
TODO:
-
consider a lead byte value used to wrap an encoded
Value
in a size-counted wrapper? That way parsers can quickly skip nested structure they're not interested in... -
http://erlang.org/doc/reference_manual/expressions.html#term-comparisons; in particular, see the non-lexicographic ordering on tuples (vs lists).
-
should there be a built-in (i.e. recommended) reference type for external data??
- if there were, it'd give IPLD-like characteristics to the thing from the get-go
- IRIs and mime-typed things are already in there so why not content-based addressing
It is becoming VERY CLEAR that on-the-wire efficiency is... a secondary concern. Perhaps revise the binary syntax to be less terse and better for simple encoding and for term ordering, canonicalization, quick indexing, etc.
- the indexing thing clashes with the term ordering thing
- maybe put the indexes at the end?? they could be optional
It might be nice to define some kind of jsonpath/xpath-like means of
naming a subterm within a Preserve. Record labels would be a kind of
assertion on the current node. Indexes and keys would be steps. It'd
be a lot like xpath I think; see also my racket-xe
package.
<child>
- moves into direct children<descendant-or-self>
- moves into direct and indirect children, including this node<descendant>
- moves into direct and indirect children, excluding this node<where[P*]>
- "where" clause, applies nested path, keeping nodes with submatches<or[P*]>
- result of first non-emptyP
match<at K>
- moves into direct children whose keys areK
from dictionaries, sequences or records;K
should be a number for the latter two<label>
- moves into labels of records<equals V>
- filters to only nodes that equalV
<isa T>
- filters to only nodes that areT ∈ {boolean float double signed-integer string byte-string symbol record sequence set dictionary}
Abbreviations:
/ = <child>
// = <descendant-or-self>
[P*] = <where[P*]>
Symbol = [<label> <equals Symbol>]
NonSymbolAtom = <at NonSymbolAtom>
TODO
-
explain why total order / comparison of values is important and/or useful
- what does having a total order unlock?
-
explain why records are good (see below on yaml tags etc)
-
hashability: comes from equivalence
-
more examples
- over-8000er mountains
- yaml example from the top of https://camel.readthedocs.io/en/latest/yamlref.html
-
having records with ANONYMOUS but ordered fields is good for easy parsing in languages like C where you don't want to explicitly search dictionaries of key/value mappings
-
labels vs. yaml tags vs. annotations
- yaml tags are complex. they're relative uris, for the most part
anyway, except the local ones; they force interpretation rather
than being data, e.g.
!
forces a node to be interpreted as a string, sequence, or map and?
forces "tag resolution" aka dwimming of scalar syntax. Labels here don't change how their fields resolve at all.-
they're also used to specify particular host-language classes and other objects.
!!python/none !!python/bool !!python/bytes !!python/str !!python/unicode !!python/int !!python/long !!python/float !!python/complex !!python/list !!python/tuple !!python/dict !!python/name:module.name !!python/module:package.module !!python/object:module.Cls !!python/object/new:module.Cls !!python/object/apply:module.f !ruby/symbol !ruby/sym (alias of the previous!) !ruby/range !ruby/regexp !ruby/struct:StructTypeName !ruby/object:Module::ClassName !ruby/array:Module::ClassName (subtyping arrays! objects, not data) !ruby/hash:Module::ClassName (subtyping hashes! objects, not data) !perl/regexp
-
yaml tag meanings are per-document or global. Labels aren't really specified. Is this good or bad? Once there's a type system, labels will become meaningful in a per-type context.
-
yaml tags basically are meant to mean the type of the object following. Labels are not: they are for distinguishing among variants within a type. (In a unityped setting, this boils down to the same thing at a different level; object-level vs meta-level variants.)
-
in some cases (ruby) a tag indicates a subclass: a behavioural refinement of some object rather than a structural extension of some data.
-
yaml tags don't have intrinsic meaning: implementations are allowed to complain if they don't recognise a tag. They also affect how and whether an object can be used as a dict key; labels, otoh, have intrinsic (trivial) meaning, and any preserves value is allowed to be used as a dict key. YAML documents then have implementation-specific meaning, but Preserves have intrinsic meaning.
-
yaml has schemas, holy shit, and there the tags really do direct interpretation of values to a significant extent. Preserves forces the application to do such interpretations: the parser/reader won't do them for you.
- TODO: be clearer in the bit on "validity"
-
yaml tags are URIs, and cannot be structured data
-
- yaml tags are complex. they're relative uris, for the most part
anyway, except the local ones; they force interpretation rather
than being data, e.g.
-
annotations
- in brief: out-of-domain METADATA; implementation/metalevel, not domain/objectlevel
- comments are a good example: out-of-domain description about the value, not part of the value itself
- uses:
- roundtripping config cf the approach taken by http://augeas.net/
- embedding trace information in messages
- provenance information
- stack information / distributed trace/continuation record
-
remove comments once annotations are in!
-
binary syntax: length-prefixing is good for pattern-matching, because it allows you to reject terms based on arity without having to scan the contents.
-
hey so what about protobufs? the optional fields / forward-and-backwards-compatibility thing is interesting.
-
what about skipping e.g. lists? would need byte-length prefix
-
When thinking about extensibility and forward/backward compatibility, consider this: https://eighty-twenty.org/2016/09/18/gnome-flashback-patch
-
types, type-directed whitespace-sensitive parsing (oh hey it might also lead to optimized binary parsers based on type?)
-
Zephyr (here
*
is postfix Kleene star and?
marks zero-or-one):asdl_ty = Sum(identifier, field*, ctor, ctor*) ;; typename, common fields, at least one ctor, more ctors | Product(identifier, field, field*) ;; ?? i guess a degenerate kind of sum?? ctor = Con(identifier, field*) ;; most like Preserves' record field = Id(identifier, identifier?) ;; basic typename reference (?) | Option(identifier, identifier?) ;; postfix `?` | Sequence(identifier, identifier?) ;; postfix `*` value = SumVal(identifier, value*, value*) ;; there are common fields | ProductVal(value, value*) | SequenceVal(value*) | NoneVal | SomeVal(value) | PrimVal(prim) prim = IntVal(int) | IdentifierVal(identifier) | StringVal(string)
-
So then for us, where we have kind of union types more than labelled sums:
<equals Value>
,<lessthan Value>
,<greaterthan Value>
- must be equal to / less than / greater than this value
- maybe take
<range lo hi>
as primitive?- no, because of infinitesimals
<regexp string>
... etc? (Perhaps<pattern regexpstring>
is better) (Be sure to specify ECMA-262 dialect, with restrictions a la JSON-schema https://json-schema.org/latest/json-schema-validation.html#rfc.section.4.3)
- identifier naming a type definition
- some type definitions are builtin:
Boolean = <union <equals #true> <equals #false>>
- some have to be primitive rather than builtin, like
SignedInteger
orDouble
, because they have unboundedly (or awkwardly) many inhabitants and the class above or below them doesn't have a limit ordinal in the right place - parameters/
forall
?
- some type definitions are builtin:
<record Type Type ...>
- first one is the label type<list Type ...>
- heterogeneous list of specific types<listof Type>
- homogeneous list<setof Type>
{ keyType: valueType, ... }
- heterogeneous dict- wait,
{ keyLiteral: valueType, ... }
might be better - sugar for<dict [<equals keyLiteral> valueType] ...>
<dict+ ...>
for when extra members are allowed- what about optional members?
- wait,
<dictof keyType valueType>
- homogeneous dict<union Type ...>
- empty union is uninhabited type(!)
- a kind of or
<and Type ...>
- simultaneous constraints on type, for range, or for range-and-type
- a kind of intersection; parallel reduction
<interleave Type ...>
?? maybe, if sequences are a thing? Could be good for organizing key-value mappings in dictionary-brackets, because unordered... and sets...
Sketching it out:
preserves_ty =
-
Oh dear, actually this is very close to being just a pattern language without the captures.
a1.a & b1.b = a1.(a & b1.b) + b1.(a1.a & b)
-
Take two.
-
<== Value >
,<|<| value>
,<|>| value>
,|<=|
,|>=|
,*eq
*lt
*gt
*le
*ge
-
_
for discard,<*discard>
-
scalar values not symbols beginning with
*
match themselves as if they were==
-wrapped -
all the special things are records, possibly 0-ary, with labels symbols starting with
*
except for==
etc and_
and...
-
if you have to match a label like
*foo
it might clash, so match<== *foo>
instead:<*foo 1 2 3>
==><<== *foo> 1 2 3>
-
<*int>
forSignedInteger
,<*string>
,<*symbol>
,<*bytestring>
/<*binary>
,<*float>
,<*double>
,<*bool>
-
<*and Pattern ⋯>
-
<*or Pattern ⋯>
-
<*not Pattern>
? -
<Pattern Pattern ⋯>
- match record -
[Pattern ⋯]
- match sequence -
#set{Pattern}
- match set -
don't know how to match dictionaries yet
- view it as an interleave of its keyvalues
<*interleave Pattern ⋯>
?- somehow allow specification of a keyvalue that is repeating, that is optional, etc
{Keypat:Valpat ⋯ <... Keypat>:<... Valpat>}
??? eww?
-
<*group Pattern ⋯>
- sequence of values spliced into wider sequence? -
use literal
...
symbol (!) to mark repetition in a sequence:[<*string> ...]
-
could use literal
?
to mark optionality; or better perhaps<*optional Pattern>
, equivalent to<*biased-choice Pattern <*group>>
; hmm, biased choice! -
could use
<*repeat lo hi>
or similar for counted repetition -
don't know how to write refs to other types yet! def labels starting with
*
?<*def <*foo> <*or <*int> <*string>>> <*foo> <*def <*maybe a> <*or <nothing> <just a>>> <*maybe <*int>>
-
should those be relative URLs, or jsonpointer or something, so can drag in types from the web?
-
NOTE: No schema for indicating attachment of annotations?!?!?!
-
-
The YAML example:
database:
username: admin
password: foobar # TODO get prod passwords out of config
socket: /var/tmp/database.sock
options: {use_utf8: true}
memcached:
host: 10.0.0.99
workers:
- host: 10.0.0.101
port: 2301
- host: 10.0.0.102
port: 2302
Could be:
[ <Database [<Username "admin">
@<TODO "get prod passwords out of config"> <Password "foobar">
<Socket "/var/tmp/database.sock">
<Options [<UseUTF8>]>]>
<Memcached [<Host "10.0.0.99">]>
<Workers [<Worker "10.0.0.101" 2301>
<Worker "10.0.0.102" 2302>]> ]
Or
{
database: {
username: "admin",
@<TODO "get prod passwords out of config">
password: "foobar",
socket: "/var/tmp/database.sock",
options: #set{use_utf8}
},
memcached: {
host: "10.0.0.99"
},
workers: [ <Worker "10.0.0.101" 2301>
<Worker "10.0.0.102" 2302> ]
}
Its schema-sketch could be
[ <*interleave <Database [ <*interleave <Username <*string>>
<Password <*string>>
<*optional <Socket <*string>>>
<*optional <Options [<*option> ...]>>> ]>
<Memcached [ <Host <*ipv4>> ... ]>
<Workers [ <Worker <*ipv4> <*u16>> ... ]>> ]
(for the first variant) or
{
database: {
username: <*string>,
password: <*string>,
<*optional socket>: <*string>,
<*optional options>: #set{<*option>}
},
memcached: {
host: <*ipv4>
},
workers: [ <Worker <*ipv4> <*u16>> ... ]
}
Annotations will be allowed on any value; but also perhaps on a key-value mapping pair?
{
@"I label the key" key: value
key @"I label the mapping": value
key: @"I label the value" value
}
??
Perhaps not.
The schema for the second YAML config sketch would allow the instance to be written:
database:
username: admin
@<TODO "get prod passwords out of config">
password: foobar
socket: /var/tmp/database.sock
options: use_utf8
memcached:
host: 10.0.0.99
workers:
<Worker 10.0.0.101 2301>
<Worker 10.0.0.102 2302>