2018-12-04 11:01:20 +00:00
|
|
|
---
|
|
|
|
---
|
|
|
|
|
|
|
|
TODO:
|
|
|
|
|
2019-08-18 21:42:17 +00:00
|
|
|
- consider a lead byte value used to wrap an encoded `Value` in a
|
|
|
|
size-counted wrapper? That way parsers can quickly skip nested
|
|
|
|
structure they're not interested in...
|
|
|
|
|
2018-12-04 11:01:20 +00:00
|
|
|
- https://github.com/uwiger/sext
|
|
|
|
|
|
|
|
- http://erlang.org/doc/reference_manual/expressions.html#term-comparisons;
|
|
|
|
in particular, see the non-lexicographic ordering on tuples (vs
|
|
|
|
lists).
|
|
|
|
|
|
|
|
- should there be a built-in (i.e. recommended) reference type for external data??
|
|
|
|
- if there were, it'd give IPLD-like characteristics to the thing from the get-go
|
|
|
|
- IRIs and mime-typed things are already in there so why not content-based addressing
|
|
|
|
|
|
|
|
It is becoming VERY CLEAR that on-the-wire efficiency is... a
|
|
|
|
secondary concern. Perhaps revise the binary syntax to be less terse
|
|
|
|
and better for simple encoding and for term ordering,
|
|
|
|
canonicalization, quick indexing, etc.
|
|
|
|
|
|
|
|
- the indexing thing clashes with the term ordering thing
|
|
|
|
- maybe put the indexes at the end?? they could be optional
|
|
|
|
|
|
|
|
It might be nice to define some kind of jsonpath/xpath-like means of
|
|
|
|
naming a subterm within a Preserve. Record labels would be a kind of
|
|
|
|
assertion on the current node. Indexes and keys would be steps. It'd
|
|
|
|
be a lot like xpath I think; see also my `racket-xe` package.
|
|
|
|
|
2019-08-18 22:06:28 +00:00
|
|
|
- `<child>` - moves into direct children
|
|
|
|
- `<descendant-or-self>` - moves into direct and indirect children, including this node
|
|
|
|
- `<descendant>` - moves into direct and indirect children, excluding this node
|
|
|
|
- `<where[P*]>` - "where" clause, applies nested path, keeping nodes with submatches
|
|
|
|
- `<or[P*]>` - result of first non-empty `P` match
|
|
|
|
- `<at K>` - moves into direct children whose keys are `K` from
|
2018-12-04 11:01:20 +00:00
|
|
|
dictionaries, sequences or records; `K` should be a number for the
|
|
|
|
latter two
|
2019-08-18 22:06:28 +00:00
|
|
|
- `<label>` - moves into labels of records
|
|
|
|
- `<equals V>` - filters to only nodes that equal `V`
|
|
|
|
- `<isa T>` - filters to only nodes that are `T ∈
|
|
|
|
{boolean float double signed-integer string byte-string symbol record sequence set dictionary}`
|
2018-12-04 11:01:20 +00:00
|
|
|
|
|
|
|
Abbreviations:
|
|
|
|
|
2019-08-18 22:06:28 +00:00
|
|
|
/ = <child>
|
|
|
|
// = <descendant-or-self>
|
|
|
|
[P*] = <where[P*]>
|
|
|
|
Symbol = [<label> <equals Symbol>]
|
|
|
|
NonSymbolAtom = <at NonSymbolAtom>
|
2018-12-04 11:01:20 +00:00
|
|
|
|
|
|
|
# TODO
|
|
|
|
|
|
|
|
- explain why total order / comparison of values is important and/or useful
|
|
|
|
- what does having a total order unlock?
|
|
|
|
- explain why records are good (see below on yaml tags etc)
|
|
|
|
- hashability: comes from equivalence
|
|
|
|
- more examples
|
|
|
|
- over-8000er mountains
|
|
|
|
- yaml example from the top of https://camel.readthedocs.io/en/latest/yamlref.html
|
|
|
|
- having records with ANONYMOUS but ordered fields is good for easy
|
|
|
|
parsing in languages like C where you don't want to explicitly
|
|
|
|
search dictionaries of key/value mappings
|
|
|
|
- labels vs. yaml tags vs. annotations
|
|
|
|
- yaml tags are complex. they're relative uris, for the most part
|
|
|
|
anyway, except the local ones; they force interpretation rather
|
|
|
|
than being data, e.g. `!` forces a node to be interpreted as a
|
|
|
|
string, sequence, or map and `?` forces "tag resolution" aka
|
|
|
|
dwimming of scalar syntax. Labels here don't change how their
|
|
|
|
fields resolve at all.
|
|
|
|
- they're also used to specify particular host-language classes
|
|
|
|
and other objects.
|
|
|
|
|
|
|
|
!!python/none
|
|
|
|
!!python/bool
|
|
|
|
!!python/bytes
|
|
|
|
!!python/str
|
|
|
|
!!python/unicode
|
|
|
|
!!python/int
|
|
|
|
!!python/long
|
|
|
|
!!python/float
|
|
|
|
!!python/complex
|
|
|
|
!!python/list
|
|
|
|
!!python/tuple
|
|
|
|
!!python/dict
|
|
|
|
!!python/name:module.name
|
|
|
|
!!python/module:package.module
|
|
|
|
!!python/object:module.Cls
|
|
|
|
!!python/object/new:module.Cls
|
|
|
|
!!python/object/apply:module.f
|
|
|
|
|
|
|
|
!ruby/symbol
|
|
|
|
!ruby/sym (alias of the previous!)
|
|
|
|
!ruby/range
|
|
|
|
!ruby/regexp
|
|
|
|
!ruby/struct:StructTypeName
|
|
|
|
!ruby/object:Module::ClassName
|
|
|
|
!ruby/array:Module::ClassName (subtyping arrays! objects, not data)
|
|
|
|
!ruby/hash:Module::ClassName (subtyping hashes! objects, not data)
|
|
|
|
|
|
|
|
!perl/regexp
|
|
|
|
|
|
|
|
- yaml tag meanings are per-document or global. Labels aren't
|
|
|
|
really specified. Is this good or bad? Once there's a type
|
|
|
|
system, labels will become meaningful in a per-type context.
|
|
|
|
|
|
|
|
- yaml tags basically are meant to mean the type of the object
|
|
|
|
following. Labels are not: they are for distinguishing among
|
|
|
|
variants *within* a type. (In a unityped setting, this boils
|
|
|
|
down to the same thing at a different level; object-level vs
|
|
|
|
meta-level variants.)
|
|
|
|
|
|
|
|
- in some cases (ruby) a tag indicates a subclass: a
|
|
|
|
behavioural refinement of some *object* rather than a
|
|
|
|
structural extension of some *data*.
|
|
|
|
|
|
|
|
- yaml tags don't have intrinsic meaning: implementations are
|
|
|
|
allowed to complain if they don't recognise a tag. They also
|
|
|
|
affect how and whether an object can be used as a dict key;
|
|
|
|
labels, otoh, have intrinsic (trivial) meaning, and *any*
|
|
|
|
preserves value is allowed to be used as a dict key. YAML
|
|
|
|
documents then have implementation-specific meaning, but
|
|
|
|
Preserves have intrinsic meaning.
|
|
|
|
|
|
|
|
- yaml has schemas, holy shit, and there the tags really do
|
|
|
|
direct interpretation of values to a significant extent.
|
|
|
|
Preserves forces the application to do such interpretations:
|
|
|
|
the parser/reader won't do them for you.
|
|
|
|
- TODO: be clearer in the bit on "validity"
|
|
|
|
|
|
|
|
- yaml tags are URIs, and cannot be structured data
|
|
|
|
|
|
|
|
- annotations
|
|
|
|
- in brief: out-of-domain METADATA; implementation/metalevel, not domain/objectlevel
|
|
|
|
- comments are a good example: out-of-domain description about the
|
|
|
|
value, not part of the value itself
|
|
|
|
- uses:
|
|
|
|
- roundtripping config cf the approach taken by http://augeas.net/
|
|
|
|
- embedding trace information in messages
|
|
|
|
- provenance information
|
|
|
|
- stack information / distributed trace/continuation record
|
|
|
|
|
|
|
|
- remove comments once annotations are in!
|
|
|
|
|
|
|
|
- binary syntax: length-prefixing is good for pattern-matching,
|
|
|
|
because it allows you to reject terms based on arity without having
|
|
|
|
to scan the contents.
|
|
|
|
|
|
|
|
- hey so what about protobufs? the optional fields /
|
|
|
|
forward-and-backwards-compatibility thing is interesting.
|
|
|
|
|
|
|
|
- what about skipping e.g. lists? would need byte-length prefix
|
|
|
|
|
|
|
|
- When thinking about extensibility and forward/backward
|
|
|
|
compatibility, consider this:
|
|
|
|
<https://eighty-twenty.org/2016/09/18/gnome-flashback-patch>
|
|
|
|
|
|
|
|
- types, type-directed whitespace-sensitive parsing (oh hey it might
|
|
|
|
also lead to optimized binary parsers based on type?)
|
|
|
|
|
|
|
|
- Zephyr (here `*` is postfix Kleene star and `?` marks zero-or-one):
|
|
|
|
|
|
|
|
asdl_ty = Sum(identifier, field*, ctor, ctor*) ;; typename, common fields, at least one ctor, more ctors
|
|
|
|
| Product(identifier, field, field*) ;; ?? i guess a degenerate kind of sum??
|
|
|
|
ctor = Con(identifier, field*) ;; most like Preserves' record
|
|
|
|
field = Id(identifier, identifier?) ;; basic typename reference (?)
|
|
|
|
| Option(identifier, identifier?) ;; postfix `?`
|
|
|
|
| Sequence(identifier, identifier?) ;; postfix `*`
|
|
|
|
|
|
|
|
value = SumVal(identifier, value*, value*) ;; there are common fields
|
|
|
|
| ProductVal(value, value*)
|
|
|
|
| SequenceVal(value*)
|
|
|
|
| NoneVal
|
|
|
|
| SomeVal(value)
|
|
|
|
| PrimVal(prim)
|
|
|
|
prim = IntVal(int)
|
|
|
|
| IdentifierVal(identifier)
|
|
|
|
| StringVal(string)
|
|
|
|
|
|
|
|
- So then for us, where we have kind of union types more than
|
|
|
|
labelled sums:
|
2019-08-18 22:06:28 +00:00
|
|
|
- `<equals Value>`, `<lessthan Value>`, `<greaterthan Value>`
|
2018-12-04 11:01:20 +00:00
|
|
|
- must be equal to / less than / greater than this value
|
2019-08-18 22:06:28 +00:00
|
|
|
- maybe take `<range lo hi>` as primitive?
|
2018-12-04 11:01:20 +00:00
|
|
|
- no, because of infinitesimals
|
2019-08-18 22:06:28 +00:00
|
|
|
- `<regexp string>` ... etc? (Perhaps `<pattern regexpstring>`
|
2018-12-04 11:01:20 +00:00
|
|
|
is better) (Be sure to specify ECMA-262 dialect, with
|
|
|
|
restrictions a la JSON-schema
|
|
|
|
https://json-schema.org/latest/json-schema-validation.html#rfc.section.4.3)
|
|
|
|
- identifier naming a type definition
|
2019-08-18 22:06:28 +00:00
|
|
|
- some type definitions are builtin: `Boolean = <union <equals #true> <equals #false>>`
|
2018-12-04 11:01:20 +00:00
|
|
|
- some have to be primitive rather than builtin, like
|
|
|
|
`SignedInteger` or `Double`, because they have unboundedly
|
|
|
|
(or awkwardly) many inhabitants and the class above or
|
|
|
|
below them doesn't have a limit ordinal in the right place
|
|
|
|
- parameters/`forall`?
|
2019-08-18 22:06:28 +00:00
|
|
|
- `<record Type Type ...>` - first one is the label type
|
|
|
|
- `<list Type ...>` - heterogeneous list of specific types
|
|
|
|
- `<listof Type>` - homogeneous list
|
|
|
|
- `<setof Type>`
|
|
|
|
- `{ keyType: valueType, ... }` - heterogeneous dict
|
|
|
|
- wait, `{ keyLiteral: valueType, ... }` might be better - sugar for
|
|
|
|
`<dict [<equals keyLiteral> valueType] ...>`
|
|
|
|
- `<dict+ ...>` for when extra members are allowed
|
2018-12-04 11:01:20 +00:00
|
|
|
- what about optional members?
|
2019-08-18 22:06:28 +00:00
|
|
|
- `<dictof keyType valueType>` - homogeneous dict
|
|
|
|
- `<union Type ...>`
|
2018-12-04 11:01:20 +00:00
|
|
|
- empty union is uninhabited type(!)
|
|
|
|
- a kind of or
|
2019-08-18 22:06:28 +00:00
|
|
|
- `<and Type ...>`
|
2018-12-04 11:01:20 +00:00
|
|
|
- simultaneous constraints on type, for range, or for range-and-type
|
|
|
|
- a kind of intersection; parallel reduction
|
2019-08-18 22:06:28 +00:00
|
|
|
- `<interleave Type ...>` ?? maybe, if sequences are a thing?
|
2018-12-04 11:01:20 +00:00
|
|
|
Could be good for organizing key-value mappings in
|
|
|
|
dictionary-brackets, because unordered... and sets...
|
|
|
|
|
|
|
|
Sketching it out:
|
|
|
|
|
|
|
|
preserves_ty =
|
|
|
|
|
|
|
|
- Oh dear, actually this is very close to being just a pattern
|
|
|
|
language without the captures.
|
|
|
|
|
|
|
|
a1.a & b1.b = a1.(a & b1.b) + b1.(a1.a & b)
|
|
|
|
|
|
|
|
- Take two.
|
2019-08-18 22:06:28 +00:00
|
|
|
- `<== Value >`, `<|<| value>`, `<|>| value>`, `|<=|`, `|>=|`, `*eq` `*lt` `*gt` `*le` `*ge`
|
|
|
|
- `_` for discard, `<*discard>`
|
|
|
|
- scalar values not symbols beginning with `*` match themselves as if they were `==`-wrapped
|
2018-12-04 11:01:20 +00:00
|
|
|
- all the special things are records, possibly 0-ary, with labels symbols starting with `*`
|
2019-08-18 22:06:28 +00:00
|
|
|
except for `==` etc and `_` and `...`
|
|
|
|
- if you have to match a label like `*foo` it might clash, so match `<== *foo>` instead:
|
|
|
|
`<*foo 1 2 3>` ==> `<<== *foo> 1 2 3>`
|
|
|
|
- `<*int>` for `SignedInteger`, `<*string>`, `<*symbol>`,
|
|
|
|
`<*bytestring>`/`<*binary>`, `<*float>`, `<*double>`,
|
|
|
|
`<*bool>`
|
|
|
|
- `<*and Pattern ⋯>`
|
|
|
|
- `<*or Pattern ⋯>`
|
|
|
|
- `<*not Pattern>` ?
|
|
|
|
- `<Pattern Pattern ⋯>` - match record
|
|
|
|
- `[Pattern ⋯]` - match sequence
|
|
|
|
- `#set{Pattern}` - match set
|
2018-12-04 11:01:20 +00:00
|
|
|
- don't know how to match dictionaries yet
|
|
|
|
- view it as an interleave of its keyvalues
|
2019-08-18 22:06:28 +00:00
|
|
|
- `<*interleave Pattern ⋯>`?
|
2018-12-04 11:01:20 +00:00
|
|
|
- somehow allow specification of a keyvalue that is repeating, that is optional, etc
|
2019-08-18 22:06:28 +00:00
|
|
|
- `{Keypat:Valpat ⋯ <... Keypat>:<... Valpat>}` ??? eww?
|
|
|
|
- `<*group Pattern ⋯>` - sequence of values spliced into wider sequence?
|
2018-12-04 11:01:20 +00:00
|
|
|
- use literal `...` symbol (!) to mark repetition in a sequence:
|
2019-08-18 22:06:28 +00:00
|
|
|
`[<*string> ...]`
|
|
|
|
- could use literal `?` to mark optionality; or better perhaps `<*optional Pattern>`,
|
|
|
|
equivalent to `<*biased-choice Pattern <*group>>`; hmm, biased choice!
|
|
|
|
- could use `<*repeat lo hi>` or similar for counted repetition
|
2018-12-04 11:01:20 +00:00
|
|
|
- don't know how to write refs to other types yet! def labels starting with `*`?
|
|
|
|
|
2019-08-18 22:06:28 +00:00
|
|
|
<*def <*foo> <*or <*int> <*string>>>
|
|
|
|
<*foo>
|
2018-12-04 11:01:20 +00:00
|
|
|
|
2019-08-18 22:06:28 +00:00
|
|
|
<*def <*maybe a> <*or <nothing> <just a>>>
|
|
|
|
<*maybe <*int>>
|
2018-12-04 11:01:20 +00:00
|
|
|
|
|
|
|
- should those be relative URLs, or jsonpointer or something,
|
|
|
|
so can drag in types from the web?
|
|
|
|
- NOTE: No schema for indicating attachment of annotations?!?!?!
|
|
|
|
|
|
|
|
The YAML example:
|
|
|
|
|
|
|
|
database:
|
|
|
|
username: admin
|
|
|
|
password: foobar # TODO get prod passwords out of config
|
|
|
|
socket: /var/tmp/database.sock
|
|
|
|
options: {use_utf8: true}
|
|
|
|
memcached:
|
|
|
|
host: 10.0.0.99
|
|
|
|
workers:
|
|
|
|
- host: 10.0.0.101
|
|
|
|
port: 2301
|
|
|
|
- host: 10.0.0.102
|
|
|
|
port: 2302
|
|
|
|
|
|
|
|
Could be:
|
|
|
|
|
2019-08-18 22:06:28 +00:00
|
|
|
[ <Database [<Username "admin">
|
|
|
|
@<TODO "get prod passwords out of config"> <Password "foobar">
|
|
|
|
<Socket "/var/tmp/database.sock">
|
|
|
|
<Options [<UseUTF8>]>]>
|
|
|
|
<Memcached [<Host "10.0.0.99">]>
|
|
|
|
<Workers [<Worker "10.0.0.101" 2301>
|
|
|
|
<Worker "10.0.0.102" 2302>]> ]
|
2018-12-04 11:01:20 +00:00
|
|
|
|
|
|
|
Or
|
|
|
|
|
|
|
|
{
|
|
|
|
database: {
|
|
|
|
username: "admin",
|
2019-08-18 22:06:28 +00:00
|
|
|
@<TODO "get prod passwords out of config">
|
2018-12-04 11:01:20 +00:00
|
|
|
password: "foobar",
|
|
|
|
socket: "/var/tmp/database.sock",
|
|
|
|
options: #set{use_utf8}
|
|
|
|
},
|
|
|
|
memcached: {
|
|
|
|
host: "10.0.0.99"
|
|
|
|
},
|
2019-08-18 22:06:28 +00:00
|
|
|
workers: [ <Worker "10.0.0.101" 2301>
|
|
|
|
<Worker "10.0.0.102" 2302> ]
|
2018-12-04 11:01:20 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
Its schema-sketch could be
|
|
|
|
|
2019-08-18 22:06:28 +00:00
|
|
|
[ <*interleave <Database [ <*interleave <Username <*string>>
|
|
|
|
<Password <*string>>
|
|
|
|
<*optional <Socket <*string>>>
|
|
|
|
<*optional <Options [<*option> ...]>>> ]>
|
|
|
|
<Memcached [ <Host <*ipv4>> ... ]>
|
|
|
|
<Workers [ <Worker <*ipv4> <*u16>> ... ]>> ]
|
2018-12-04 11:01:20 +00:00
|
|
|
|
|
|
|
(for the first variant) or
|
|
|
|
|
|
|
|
{
|
|
|
|
database: {
|
2019-08-18 22:06:28 +00:00
|
|
|
username: <*string>,
|
|
|
|
password: <*string>,
|
|
|
|
<*optional socket>: <*string>,
|
|
|
|
<*optional options>: #set{<*option>}
|
2018-12-04 11:01:20 +00:00
|
|
|
},
|
|
|
|
memcached: {
|
2019-08-18 22:06:28 +00:00
|
|
|
host: <*ipv4>
|
2018-12-04 11:01:20 +00:00
|
|
|
},
|
2019-08-18 22:06:28 +00:00
|
|
|
workers: [ <Worker <*ipv4> <*u16>> ... ]
|
2018-12-04 11:01:20 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
Annotations will be allowed on any value; but also perhaps on a
|
|
|
|
key-value mapping pair?
|
|
|
|
|
|
|
|
{
|
|
|
|
@"I label the key" key: value
|
|
|
|
key @"I label the mapping": value
|
|
|
|
key: @"I label the value" value
|
|
|
|
}
|
|
|
|
|
|
|
|
??
|
|
|
|
|
|
|
|
Perhaps not.
|
|
|
|
|
|
|
|
The schema for the second YAML config sketch would allow the instance
|
|
|
|
to be written:
|
|
|
|
|
|
|
|
database:
|
|
|
|
username: admin
|
2019-08-18 22:06:28 +00:00
|
|
|
@<TODO "get prod passwords out of config">
|
2018-12-04 11:01:20 +00:00
|
|
|
password: foobar
|
|
|
|
socket: /var/tmp/database.sock
|
|
|
|
options: use_utf8
|
|
|
|
memcached:
|
|
|
|
host: 10.0.0.99
|
|
|
|
workers:
|
2019-08-18 22:06:28 +00:00
|
|
|
<Worker 10.0.0.101 2301>
|
|
|
|
<Worker 10.0.0.102 2302>
|