--- --- TODO: - https://github.com/uwiger/sext - http://erlang.org/doc/reference_manual/expressions.html#term-comparisons; in particular, see the non-lexicographic ordering on tuples (vs lists). - should there be a built-in (i.e. recommended) reference type for external data?? - if there were, it'd give IPLD-like characteristics to the thing from the get-go - IRIs and mime-typed things are already in there so why not content-based addressing It is becoming VERY CLEAR that on-the-wire efficiency is... a secondary concern. Perhaps revise the binary syntax to be less terse and better for simple encoding and for term ordering, canonicalization, quick indexing, etc. - the indexing thing clashes with the term ordering thing - maybe put the indexes at the end?? they could be optional It might be nice to define some kind of jsonpath/xpath-like means of naming a subterm within a Preserve. Record labels would be a kind of assertion on the current node. Indexes and keys would be steps. It'd be a lot like xpath I think; see also my `racket-xe` package. - `child()` - moves into direct children - `descendant-or-self()` - moves into direct and indirect children, including this node - `descendant()` - moves into direct and indirect children, excluding this node - `where[P*]` - "where" clause, applies nested path, keeping nodes with submatches - `or[P*]` - result of first non-empty `P` match - `at(K)` - moves into direct children whose keys are `K` from dictionaries, sequences or records; `K` should be a number for the latter two - `label()` - moves into labels of records - `equals(V)` - filters to only nodes that equal `V` - `isa(T)` - filters to only nodes that are `T ∈ [boolean float double signed-integer string byte-string symbol record sequence set dictionary]` Abbreviations: / = child() // = descendant-or-self() [P*] = where[P*] Symbol = [label() equals(Symbol)] NonSymbolAtom = at(NonSymbolAtom) # TODO - [DONE] allow `label[1,2,3]` and `label{a:b, c:d}`, meaning `label([1,2,3])` and `label({a:b, c:d})`. - explain why total order / comparison of values is important and/or useful - what does having a total order unlock? - explain why records are good (see below on yaml tags etc) - hashability: comes from equivalence - more examples - over-8000er mountains - yaml example from the top of https://camel.readthedocs.io/en/latest/yamlref.html - having records with ANONYMOUS but ordered fields is good for easy parsing in languages like C where you don't want to explicitly search dictionaries of key/value mappings - labels vs. yaml tags vs. annotations - yaml tags are complex. they're relative uris, for the most part anyway, except the local ones; they force interpretation rather than being data, e.g. `!` forces a node to be interpreted as a string, sequence, or map and `?` forces "tag resolution" aka dwimming of scalar syntax. Labels here don't change how their fields resolve at all. - they're also used to specify particular host-language classes and other objects. !!python/none !!python/bool !!python/bytes !!python/str !!python/unicode !!python/int !!python/long !!python/float !!python/complex !!python/list !!python/tuple !!python/dict !!python/name:module.name !!python/module:package.module !!python/object:module.Cls !!python/object/new:module.Cls !!python/object/apply:module.f !ruby/symbol !ruby/sym (alias of the previous!) !ruby/range !ruby/regexp !ruby/struct:StructTypeName !ruby/object:Module::ClassName !ruby/array:Module::ClassName (subtyping arrays! objects, not data) !ruby/hash:Module::ClassName (subtyping hashes! objects, not data) !perl/regexp - yaml tag meanings are per-document or global. Labels aren't really specified. Is this good or bad? Once there's a type system, labels will become meaningful in a per-type context. - yaml tags basically are meant to mean the type of the object following. Labels are not: they are for distinguishing among variants *within* a type. (In a unityped setting, this boils down to the same thing at a different level; object-level vs meta-level variants.) - in some cases (ruby) a tag indicates a subclass: a behavioural refinement of some *object* rather than a structural extension of some *data*. - yaml tags don't have intrinsic meaning: implementations are allowed to complain if they don't recognise a tag. They also affect how and whether an object can be used as a dict key; labels, otoh, have intrinsic (trivial) meaning, and *any* preserves value is allowed to be used as a dict key. YAML documents then have implementation-specific meaning, but Preserves have intrinsic meaning. - yaml has schemas, holy shit, and there the tags really do direct interpretation of values to a significant extent. Preserves forces the application to do such interpretations: the parser/reader won't do them for you. - TODO: be clearer in the bit on "validity" - yaml tags are URIs, and cannot be structured data - annotations - in brief: out-of-domain METADATA; implementation/metalevel, not domain/objectlevel - comments are a good example: out-of-domain description about the value, not part of the value itself - uses: - roundtripping config cf the approach taken by http://augeas.net/ - embedding trace information in messages - provenance information - stack information / distributed trace/continuation record - remove comments once annotations are in! - binary syntax: length-prefixing is good for pattern-matching, because it allows you to reject terms based on arity without having to scan the contents. - hey so what about protobufs? the optional fields / forward-and-backwards-compatibility thing is interesting. - what about skipping e.g. lists? would need byte-length prefix - When thinking about extensibility and forward/backward compatibility, consider this: - types, type-directed whitespace-sensitive parsing (oh hey it might also lead to optimized binary parsers based on type?) - Zephyr (here `*` is postfix Kleene star and `?` marks zero-or-one): asdl_ty = Sum(identifier, field*, ctor, ctor*) ;; typename, common fields, at least one ctor, more ctors | Product(identifier, field, field*) ;; ?? i guess a degenerate kind of sum?? ctor = Con(identifier, field*) ;; most like Preserves' record field = Id(identifier, identifier?) ;; basic typename reference (?) | Option(identifier, identifier?) ;; postfix `?` | Sequence(identifier, identifier?) ;; postfix `*` value = SumVal(identifier, value*, value*) ;; there are common fields | ProductVal(value, value*) | SequenceVal(value*) | NoneVal | SomeVal(value) | PrimVal(prim) prim = IntVal(int) | IdentifierVal(identifier) | StringVal(string) - So then for us, where we have kind of union types more than labelled sums: - `equals(value)`, `lessthan(value)`, `greaterthan(value)` - must be equal to / less than / greater than this value - maybe take `range(lo,hi)` as primitive? - no, because of infinitesimals - `regexp(string)` ... etc? (Perhaps `pattern(regexpstring)` is better) (Be sure to specify ECMA-262 dialect, with restrictions a la JSON-schema https://json-schema.org/latest/json-schema-validation.html#rfc.section.4.3) - identifier naming a type definition - some type definitions are builtin: `Boolean = union(equals(true), equals(false))` - some have to be primitive rather than builtin, like `SignedInteger` or `Double`, because they have unboundedly (or awkwardly) many inhabitants and the class above or below them doesn't have a limit ordinal in the right place - parameters/`forall`? - `record(type, type, ...)` - first one is the label type - `list(type, ...)` - heterogeneous list of specific types - `listof(type)` - homogeneous list - `setof(type)` - `{ keytype: valuetype, ... }` - heterogeneous dict - wait, `{ keyliteral: valuetype, ... }` might be better - sugar for `dict([equals(keyliteral), valuetype], ...)` - `dict*(...)` for when extra members are allowed - what about optional members? - `dictof(keytype, valuetype)` - homogeneous dict - `union(type, ...)` - empty union is uninhabited type(!) - a kind of or - `and(type, ...)` - simultaneous constraints on type, for range, or for range-and-type - a kind of intersection; parallel reduction - `interleave(type, ...)` ?? maybe, if sequences are a thing? Could be good for organizing key-value mappings in dictionary-brackets, because unordered... and sets... Sketching it out: preserves_ty = - Oh dear, actually this is very close to being just a pattern language without the captures. a1.a & b1.b = a1.(a & b1.b) + b1.(a1.a & b) - Take two. - `=(value)`, `<(value)`, `>(value)`, `<=`, `>=`, *eq *lt *gt *le *ge - `_` for discard, `*discard()` - scalar values not symbols beginning with `*` match themselves as if they were `=`-wrapped - all the special things are records, possibly 0-ary, with labels symbols starting with `*` except for `=` etc and `_` and `...` - if you have to match a label like `*foo` it might clash, so match `=(*foo)` instead: `*foo(1 2 3)` ==> `=(*foo)(1 2 3)` - `*int()` for `SignedInteger`, `*string()`, `*symbol()`, `*bytestring()`/`*binary()`, `*float()`, `*double()`, `*bool()` - `*and[pattern ⋯]` - `*or[pattern ⋯]` - `*not(pattern)` ? - `pattern(pattern ⋯)` - match record - `[pattern ⋯]` - match sequence - `#set{pattern}` - match set - don't know how to match dictionaries yet - view it as an interleave of its keyvalues - `*interleave[pattern ⋯]`? - somehow allow specification of a keyvalue that is repeating, that is optional, etc - `{keypat:valpat ⋯ ...(keypat):...(valpat)}` ??? eww? - `*group[pattern ⋯]` - sequence of values spliced into wider sequence? - use literal `...` symbol (!) to mark repetition in a sequence: `[*string() ...]` - could use literal `?` to mark optionality; or better perhaps `*optional(pattern)`, equivalent to `*biased-choice[pattern *group[]]`; hmm, biased choice! - could use `*repeat(lo,hi)` or similar for counted repetition - don't know how to write refs to other types yet! def labels starting with `*`? *def(*foo() *or[*int() *string()]) ? *foo() *def(*maybe(a) *or[nothing() just(a)]) *maybe(*int()) - should those be relative URLs, or jsonpointer or something, so can drag in types from the web? - NOTE: No schema for indicating attachment of annotations?!?!?! The YAML example: database: username: admin password: foobar # TODO get prod passwords out of config socket: /var/tmp/database.sock options: {use_utf8: true} memcached: host: 10.0.0.99 workers: - host: 10.0.0.101 port: 2301 - host: 10.0.0.102 port: 2302 Could be: [ Database[Username("admin"), @TODO("get prod passwords out of config") Password("foobar"), Socket("/var/tmp/database.sock"), Options[UseUTF8()]], Memcached[Host("10.0.0.99")], Workers[Worker("10.0.0.101", 2301), Worker("10.0.0.102", 2302)] ] Or { database: { username: "admin", @TODO("get prod passwords out of config") password: "foobar", socket: "/var/tmp/database.sock", options: #set{use_utf8} }, memcached: { host: "10.0.0.99" }, workers: [ Worker("10.0.0.101", 2301), Worker("10.0.0.102", 2302) ] } Its schema-sketch could be [ *interleave[ Database[ *interleave[ Username(*string()) Password(*string()) *optional(Socket(*string())) *optional(Options[*option() ...]) ] ] Memcached[ Host(*ipv4()) ... ] Workers[ Worker(*ipv4() *u16()) ... ] ] ] (for the first variant) or { database: { username: *string(), password: *string(), *optional(socket): *string(), *optional(options): #set{*option()} }, memcached: { host: *ipv4() }, workers: [ Worker(*ipv4() *u16()) ... ] } Annotations will be allowed on any value; but also perhaps on a key-value mapping pair? { @"I label the key" key: value key @"I label the mapping": value key: @"I label the value" value } ?? Perhaps not. The schema for the second YAML config sketch would allow the instance to be written: database: username: admin @TODO("get prod passwords out of config") password: foobar socket: /var/tmp/database.sock options: use_utf8 memcached: host: 10.0.0.99 workers: Worker(10.0.0.101, 2301) Worker(10.0.0.102, 2302)