preserves/TODO.md

14 KiB

TODO:

  • consider a lead byte value used to wrap an encoded Value in a size-counted wrapper? That way parsers can quickly skip nested structure they're not interested in...

  • https://github.com/uwiger/sext

  • http://erlang.org/doc/reference_manual/expressions.html#term-comparisons; in particular, see the non-lexicographic ordering on tuples (vs lists).

  • should there be a built-in (i.e. recommended) reference type for external data??

    • if there were, it'd give IPLD-like characteristics to the thing from the get-go
    • IRIs and mime-typed things are already in there so why not content-based addressing
  • Check out https://hitchdev.com/strictyaml/, in particular the "Why StrictYAML?" and "Design justifications" sections; perhaps borrow elements of that structure for writing a comparison of Preserves with other things

It is becoming VERY CLEAR that on-the-wire efficiency is... a secondary concern. Perhaps revise the binary syntax to be less terse and better for simple encoding and for term ordering, canonicalization, quick indexing, etc.

  • the indexing thing clashes with the term ordering thing
  • maybe put the indexes at the end?? they could be optional

It might be nice to define some kind of jsonpath/xpath-like means of naming a subterm within a Preserve. Record labels would be a kind of assertion on the current node. Indexes and keys would be steps. It'd be a lot like xpath I think; see also my racket-xe package.

  • <child> - moves into direct children
  • <descendant-or-self> - moves into direct and indirect children, including this node
  • <descendant> - moves into direct and indirect children, excluding this node
  • <where[P*]> - "where" clause, applies nested path, keeping nodes with submatches
  • <or[P*]> - result of first non-empty P match
  • <at K> - moves into direct children whose keys are K from dictionaries, sequences or records; K should be a number for the latter two
  • <label> - moves into labels of records
  • <equals V> - filters to only nodes that equal V
  • <isa T> - filters to only nodes that are T ∈ {boolean float double signed-integer string byte-string symbol record sequence set dictionary}

Abbreviations:

/ = <child>
// = <descendant-or-self>
[P*] = <where[P*]>
Symbol = [<label> <equals Symbol>]
NonSymbolAtom = <at NonSymbolAtom>

TODO

  • explain why total order / comparison of values is important and/or useful

    • what does having a total order unlock?
  • explain why records are good (see below on yaml tags etc)

  • hashability: comes from equivalence

  • more examples

  • having records with ANONYMOUS but ordered fields is good for easy parsing in languages like C where you don't want to explicitly search dictionaries of key/value mappings

  • labels vs. yaml tags vs. annotations

    • yaml tags are complex. they're relative uris, for the most part anyway, except the local ones; they force interpretation rather than being data, e.g. ! forces a node to be interpreted as a string, sequence, or map and ? forces "tag resolution" aka dwimming of scalar syntax. Labels here don't change how their fields resolve at all.
      • they're also used to specify particular host-language classes and other objects.

        !!python/none
        !!python/bool
        !!python/bytes
        !!python/str
        !!python/unicode
        !!python/int
        !!python/long
        !!python/float
        !!python/complex
        !!python/list
        !!python/tuple
        !!python/dict
        !!python/name:module.name
        !!python/module:package.module
        !!python/object:module.Cls
        !!python/object/new:module.Cls
        !!python/object/apply:module.f
        
        !ruby/symbol
        !ruby/sym    (alias of the previous!)
        !ruby/range
        !ruby/regexp
        !ruby/struct:StructTypeName
        !ruby/object:Module::ClassName
        !ruby/array:Module::ClassName   (subtyping arrays! objects, not data)
        !ruby/hash:Module::ClassName    (subtyping hashes! objects, not data)
        
        !perl/regexp
        
      • yaml tag meanings are per-document or global. Labels aren't really specified. Is this good or bad? Once there's a type system, labels will become meaningful in a per-type context.

      • yaml tags basically are meant to mean the type of the object following. Labels are not: they are for distinguishing among variants within a type. (In a unityped setting, this boils down to the same thing at a different level; object-level vs meta-level variants.)

      • in some cases (ruby) a tag indicates a subclass: a behavioural refinement of some object rather than a structural extension of some data.

      • yaml tags don't have intrinsic meaning: implementations are allowed to complain if they don't recognise a tag. They also affect how and whether an object can be used as a dict key; labels, otoh, have intrinsic (trivial) meaning, and any preserves value is allowed to be used as a dict key. YAML documents then have implementation-specific meaning, but Preserves have intrinsic meaning.

      • yaml has schemas, holy shit, and there the tags really do direct interpretation of values to a significant extent. Preserves forces the application to do such interpretations: the parser/reader won't do them for you.

        • TODO: be clearer in the bit on "validity"
      • yaml tags are URIs, and cannot be structured data

  • annotations

    • in brief: out-of-domain METADATA; implementation/metalevel, not domain/objectlevel
    • comments are a good example: out-of-domain description about the value, not part of the value itself
    • uses:
      • roundtripping config cf the approach taken by http://augeas.net/
      • embedding trace information in messages
        • provenance information
        • stack information / distributed trace/continuation record
  • remove comments once annotations are in!

  • binary syntax: length-prefixing is good for pattern-matching, because it allows you to reject terms based on arity without having to scan the contents.

  • hey so what about protobufs? the optional fields / forward-and-backwards-compatibility thing is interesting.

  • what about skipping e.g. lists? would need byte-length prefix

  • When thinking about extensibility and forward/backward compatibility, consider this: https://eighty-twenty.org/2016/09/18/gnome-flashback-patch

  • types, type-directed whitespace-sensitive parsing (oh hey it might also lead to optimized binary parsers based on type?)

    • Zephyr (here * is postfix Kleene star and ? marks zero-or-one):

      asdl_ty = Sum(identifier, field*, ctor, ctor*) ;; typename, common fields, at least one ctor, more ctors
              | Product(identifier, field, field*)   ;; ?? i guess a degenerate kind of sum??
         ctor = Con(identifier, field*)              ;; most like Preserves' record
        field = Id(identifier, identifier?)          ;; basic typename reference (?)
              | Option(identifier, identifier?)      ;; postfix `?`
              | Sequence(identifier, identifier?)    ;; postfix `*`
      
        value = SumVal(identifier, value*, value*)   ;; there are common fields
              | ProductVal(value, value*)
              | SequenceVal(value*)
              | NoneVal
              | SomeVal(value)
              | PrimVal(prim)
         prim = IntVal(int)
              | IdentifierVal(identifier)
              | StringVal(string)
      
    • So then for us, where we have kind of union types more than labelled sums:

      • <equals Value>, <lessthan Value>, <greaterthan Value>
      • identifier naming a type definition
        • some type definitions are builtin: Boolean = <union <equals #true> <equals #false>>
        • some have to be primitive rather than builtin, like SignedInteger or Double, because they have unboundedly (or awkwardly) many inhabitants and the class above or below them doesn't have a limit ordinal in the right place
        • parameters/forall?
      • <record Type Type ...> - first one is the label type
      • <list Type ...> - heterogeneous list of specific types
      • <listof Type> - homogeneous list
      • <setof Type>
      • { keyType: valueType, ... } - heterogeneous dict
        • wait, { keyLiteral: valueType, ... } might be better - sugar for <dict [<equals keyLiteral> valueType] ...>
        • <dict+ ...> for when extra members are allowed
        • what about optional members?
      • <dictof keyType valueType> - homogeneous dict
      • <union Type ...>
        • empty union is uninhabited type(!)
        • a kind of or
      • <and Type ...>
        • simultaneous constraints on type, for range, or for range-and-type
        • a kind of intersection; parallel reduction
      • <interleave Type ...> ?? maybe, if sequences are a thing? Could be good for organizing key-value mappings in dictionary-brackets, because unordered... and sets...

      Sketching it out:

      preserves_ty = 
      
    • Oh dear, actually this is very close to being just a pattern language without the captures.

      a1.a & b1.b  =  a1.(a & b1.b) + b1.(a1.a & b)
      
    • Take two.

      • <== Value >, <|<| value>, <|>| value>, |<=|, |>=|, *eq *lt *gt *le *ge

      • _ for discard, <*discard>

      • scalar values not symbols beginning with * match themselves as if they were ==-wrapped

      • all the special things are records, possibly 0-ary, with labels symbols starting with * except for == etc and _ and ...

      • if you have to match a label like *foo it might clash, so match <== *foo> instead: <*foo 1 2 3> ==> <<== *foo> 1 2 3>

      • <*int> for SignedInteger, <*string>, <*symbol>, <*bytestring>/<*binary>, <*float>, <*double>, <*bool>

      • <*and Pattern ⋯>

      • <*or Pattern ⋯>

      • <*not Pattern> ?

      • <Pattern Pattern ⋯> - match record

      • [Pattern ⋯] - match sequence

      • #set{Pattern} - match set

      • don't know how to match dictionaries yet

        • view it as an interleave of its keyvalues
        • <*interleave Pattern ⋯>?
        • somehow allow specification of a keyvalue that is repeating, that is optional, etc
        • {Keypat:Valpat ⋯ <... Keypat>:<... Valpat>} ??? eww?
      • <*group Pattern ⋯> - sequence of values spliced into wider sequence?

      • use literal ... symbol (!) to mark repetition in a sequence: [<*string> ...]

      • could use literal ? to mark optionality; or better perhaps <*optional Pattern>, equivalent to <*biased-choice Pattern <*group>>; hmm, biased choice!

      • could use <*repeat lo hi> or similar for counted repetition

      • don't know how to write refs to other types yet! def labels starting with *?

        <*def <*foo> <*or <*int> <*string>>>
        <*foo>
        
        <*def <*maybe a> <*or <nothing> <just a>>>
        <*maybe <*int>>
        
      • should those be relative URLs, or jsonpointer or something, so can drag in types from the web?

      • NOTE: No schema for indicating attachment of annotations?!?!?!

The YAML example:

database:
    username: admin
    password: foobar  # TODO get prod passwords out of config
    socket: /var/tmp/database.sock
    options: {use_utf8: true}
memcached:
    host: 10.0.0.99
workers:
  - host: 10.0.0.101
    port: 2301
  - host: 10.0.0.102
    port: 2302

Could be:

[ <Database [<Username "admin">
             @<TODO "get prod passwords out of config"> <Password "foobar">
             <Socket "/var/tmp/database.sock">
             <Options [<UseUTF8>]>]>
  <Memcached [<Host "10.0.0.99">]>
  <Workers [<Worker "10.0.0.101" 2301>
            <Worker "10.0.0.102" 2302>]> ]

Or

{
  database: {
    username: "admin",
    @<TODO "get prod passwords out of config">
    password: "foobar",
    socket: "/var/tmp/database.sock",
    options: #set{use_utf8}
  },
  memcached: {
    host: "10.0.0.99"
  },
  workers: [ <Worker "10.0.0.101" 2301>
             <Worker "10.0.0.102" 2302> ]
}

Its schema-sketch could be

[ <*interleave <Database [ <*interleave <Username <*string>>
                                        <Password <*string>>
                                        <*optional <Socket <*string>>>
                                        <*optional <Options [<*option> ...]>>> ]>
               <Memcached [ <Host <*ipv4>> ... ]>
               <Workers [ <Worker <*ipv4> <*u16>> ... ]>> ]

(for the first variant) or

{
  database: {
    username: <*string>,
    password: <*string>,
    <*optional socket>: <*string>,
    <*optional options>: #set{<*option>}
  },
  memcached: {
    host: <*ipv4>
  },
  workers: [ <Worker <*ipv4> <*u16>> ... ]
}

Annotations will be allowed on any value; but also perhaps on a key-value mapping pair?

{
  @"I label the key" key: value
  key @"I label the mapping": value
  key: @"I label the value" value
}

??

Perhaps not.

The schema for the second YAML config sketch would allow the instance to be written:

database:
  username: admin
  @<TODO "get prod passwords out of config">
  password: foobar
  socket: /var/tmp/database.sock
  options: use_utf8
memcached:
  host: 10.0.0.99
workers:
  <Worker 10.0.0.101 2301>
  <Worker 10.0.0.102 2302>