preserves/TODO.md

---
---

TODO:

 - consider a lead byte value used to wrap an encoded `Value` in a
   size-counted wrapper? That way parsers can quickly skip nested
   structure they're not interested in...

 - https://github.com/uwiger/sext

 - http://erlang.org/doc/reference_manual/expressions.html#term-comparisons;
   in particular, see the non-lexicographic ordering on tuples (vs
   lists).

 - should there be a built-in (i.e. recommended) reference type for external data??
    - if there were, it'd give IPLD-like characteristics to the thing from the get-go
    - IRIs and mime-typed things are already in there so why not content-based addressing

 - Check out https://hitchdev.com/strictyaml/, in particular the "Why
   StrictYAML?" and "Design justifications" sections; perhaps borrow
   elements of that structure for writing a comparison of Preserves
   with other things

It is becoming VERY CLEAR that on-the-wire efficiency is... a
secondary concern. Perhaps revise the binary syntax to be less terse
and better for simple encoding and for term ordering,
canonicalization, quick indexing, etc.

 - the indexing thing clashes with the term ordering thing
 - maybe put the indexes at the end?? they could be optional

It might be nice to define some kind of jsonpath/xpath-like means of
naming a subterm within a Preserve. Record labels would be a kind of
assertion on the current node. Indexes and keys would be steps. It'd
be a lot like xpath I think; see also my `racket-xe` package.

 - `<child>` - moves into direct children
 - `<descendant-or-self>` - moves into direct and indirect children, including this node
 - `<descendant>` - moves into direct and indirect children, excluding this node
 - `<where[P*]>` - "where" clause, applies nested path, keeping nodes with submatches
 - `<or[P*]>` - result of first non-empty `P` match
 - `<at K>` - moves into direct children whose keys are `K` from
   dictionaries, sequences or records; `K` should be a number for the
   latter two
 - `<label>` - moves into labels of records
 - `<equals V>` - filters to only nodes that equal `V`
 - `<isa T>` - filters to only nodes that are `T ∈
   {boolean float double signed-integer string byte-string symbol record sequence set dictionary}`

Abbreviations:

    / = <child>
    // = <descendant-or-self>
    [P*] = <where[P*]>
    Symbol = [<label> <equals Symbol>]
    NonSymbolAtom = <at NonSymbolAtom>

# TODO

 - explain why total order / comparison of values is important and/or useful
    - what does having a total order unlock?
 - explain why records are good (see below on yaml tags etc)
 - hashability: comes from equivalence
 - more examples
    - over-8000er mountains
    - yaml example from the top of https://camel.readthedocs.io/en/latest/yamlref.html
 - having records with ANONYMOUS but ordered fields is good for easy
   parsing in languages like C where you don't want to explicitly
   search dictionaries of key/value mappings
 - labels vs. yaml tags vs. annotations
    - yaml tags are complex. they're relative uris, for the most part
      anyway, except the local ones; they force interpretation rather
      than being data, e.g. `!` forces a node to be interpreted as a
      string, sequence, or map and `?` forces "tag resolution" aka
      dwimming of scalar syntax. Labels here don't change how their
      fields resolve at all.
       - they're also used to specify particular host-language classes
         and other objects.

             !!python/none
             !!python/bool
             !!python/bytes
             !!python/str
             !!python/unicode
             !!python/int
             !!python/long
             !!python/float
             !!python/complex
             !!python/list
             !!python/tuple
             !!python/dict
             !!python/name:module.name
             !!python/module:package.module
             !!python/object:module.Cls
             !!python/object/new:module.Cls
             !!python/object/apply:module.f

             !ruby/symbol
             !ruby/sym    (alias of the previous!)
             !ruby/range
             !ruby/regexp
             !ruby/struct:StructTypeName
             !ruby/object:Module::ClassName
             !ruby/array:Module::ClassName   (subtyping arrays! objects, not data)
             !ruby/hash:Module::ClassName    (subtyping hashes! objects, not data)

             !perl/regexp

       - yaml tag meanings are per-document or global. Labels aren't
         really specified. Is this good or bad? Once there's a type
         system, labels will become meaningful in a per-type context.

       - yaml tags basically are meant to mean the type of the object
         following. Labels are not: they are for distinguishing among
         variants *within* a type. (In a unityped setting, this boils
         down to the same thing at a different level; object-level vs
         meta-level variants.)

       - in some cases (ruby) a tag indicates a subclass: a
         behavioural refinement of some *object* rather than a
         structural extension of some *data*.

       - yaml tags don't have intrinsic meaning: implementations are
         allowed to complain if they don't recognise a tag. They also
         affect how and whether an object can be used as a dict key;
         labels, otoh, have intrinsic (trivial) meaning, and *any*
         preserves value is allowed to be used as a dict key. YAML
         documents then have implementation-specific meaning, but
         Preserves have intrinsic meaning.

       - yaml has schemas, holy shit, and there the tags really do
         direct interpretation of values to a significant extent.
         Preserves forces the application to do such interpretations:
         the parser/reader won't do them for you.
          - TODO: be clearer in the bit on "validity"

       - yaml tags are URIs, and cannot be structured data

 - annotations
    - in brief: out-of-domain METADATA; implementation/metalevel, not domain/objectlevel
    - comments are a good example: out-of-domain description about the
      value, not part of the value itself
    - uses:
       - roundtripping config cf the approach taken by http://augeas.net/
       - embedding trace information in messages
          - provenance information
          - stack information / distributed trace/continuation record

 - remove comments once annotations are in!

 - binary syntax: length-prefixing is good for pattern-matching,
   because it allows you to reject terms based on arity without having
   to scan the contents.

 - hey so what about protobufs? the optional fields /
   forward-and-backwards-compatibility thing is interesting.

 - what about skipping e.g. lists? would need byte-length prefix

 - When thinking about extensibility and forward/backward
   compatibility, consider this:
   <https://eighty-twenty.org/2016/09/18/gnome-flashback-patch>

 - types, type-directed whitespace-sensitive parsing (oh hey it might
   also lead to optimized binary parsers based on type?)

    - Zephyr (here `*` is postfix Kleene star and `?` marks zero-or-one):

          asdl_ty = Sum(identifier, field*, ctor, ctor*) ;; typename, common fields, at least one ctor, more ctors
                  | Product(identifier, field, field*)   ;; ?? i guess a degenerate kind of sum??
             ctor = Con(identifier, field*)              ;; most like Preserves' record
            field = Id(identifier, identifier?)          ;; basic typename reference (?)
                  | Option(identifier, identifier?)      ;; postfix `?`
                  | Sequence(identifier, identifier?)    ;; postfix `*`

            value = SumVal(identifier, value*, value*)   ;; there are common fields
                  | ProductVal(value, value*)
                  | SequenceVal(value*)
                  | NoneVal
                  | SomeVal(value)
                  | PrimVal(prim)
             prim = IntVal(int)
                  | IdentifierVal(identifier)
                  | StringVal(string)

    - So then for us, where we have kind of union types more than
      labelled sums:
       - `<equals Value>`, `<lessthan Value>`, `<greaterthan Value>`
          - must be equal to / less than / greater than this value
          - maybe take `<range lo hi>` as primitive?
             - no, because of infinitesimals
          - `<regexp string>` ... etc? (Perhaps `<pattern regexpstring>`
            is better) (Be sure to specify ECMA-262 dialect, with
            restrictions a la JSON-schema
            https://json-schema.org/latest/json-schema-validation.html#rfc.section.4.3)
       - identifier naming a type definition
          - some type definitions are builtin: `Boolean = <union <equals #true> <equals #false>>`
          - some have to be primitive rather than builtin, like
            `SignedInteger` or `Double`, because they have unboundedly
            (or awkwardly) many inhabitants and the class above or
            below them doesn't have a limit ordinal in the right place
          - parameters/`forall`?
       - `<record Type Type ...>` - first one is the label type
       - `<list Type ...>` - heterogeneous list of specific types
       - `<listof Type>` - homogeneous list
       - `<setof Type>`
       - `{ keyType: valueType, ... }` - heterogeneous dict
           - wait, `{ keyLiteral: valueType, ... }` might be better - sugar for
             `<dict [<equals keyLiteral> valueType] ...>`
           - `<dict+ ...>` for when extra members are allowed
           - what about optional members?
       - `<dictof keyType valueType>` - homogeneous dict
       - `<union Type ...>`
          - empty union is uninhabited type(!)
          - a kind of or
       - `<and Type ...>`
          - simultaneous constraints on type, for range, or for range-and-type
          - a kind of intersection; parallel reduction
       - `<interleave Type ...>` ?? maybe, if sequences are a thing?
         Could be good for organizing key-value mappings in
         dictionary-brackets, because unordered... and sets...

      Sketching it out:

          preserves_ty =

    - Oh dear, actually this is very close to being just a pattern
      language without the captures.

          a1.a & b1.b  =  a1.(a & b1.b) + b1.(a1.a & b)

    - Take two.
       - `<== Value >`, `<|<| value>`, `<|>| value>`, `|<=|`, `|>=|`, `*eq` `*lt` `*gt` `*le` `*ge`
       - `_` for discard, `<*discard>`
       - scalar values not symbols beginning with `*` match themselves as if they were `==`-wrapped
       - all the special things are records, possibly 0-ary, with labels symbols starting with `*`
         except for `==` etc and `_` and `...`
       - if you have to match a label like `*foo` it might clash, so match `<== *foo>` instead:
         `<*foo 1 2 3>` ==> `<<== *foo> 1 2 3>`
       - `<*int>` for `SignedInteger`, `<*string>`, `<*symbol>`,
         `<*bytestring>`/`<*binary>`, `<*float>`, `<*double>`,
         `<*bool>`
       - `<*and Pattern ⋯>`
       - `<*or Pattern ⋯>`
       - `<*not Pattern>` ?
       - `<Pattern Pattern ⋯>` - match record
       - `[Pattern ⋯]` - match sequence
       - `#set{Pattern}` - match set
       - don't know how to match dictionaries yet
          - view it as an interleave of its keyvalues
          - `<*interleave Pattern ⋯>`?
          - somehow allow specification of a keyvalue that is repeating, that is optional, etc
          - `{Keypat:Valpat ⋯ <... Keypat>:<... Valpat>}` ??? eww?
       - `<*group Pattern ⋯>` - sequence of values spliced into wider sequence?
       - use literal `...` symbol (!) to mark repetition in a sequence:
         `[<*string> ...]`
       - could use literal `?` to mark optionality; or better perhaps `<*optional Pattern>`,
         equivalent to `<*biased-choice Pattern <*group>>`; hmm, biased choice!
       - could use `<*repeat lo hi>` or similar for counted repetition
       - don't know how to write refs to other types yet! def labels starting with `*`?

             <*def <*foo> <*or <*int> <*string>>>
             <*foo>

             <*def <*maybe a> <*or <nothing> <just a>>>
             <*maybe <*int>>

       - should those be relative URLs, or jsonpointer or something,
         so can drag in types from the web?
       - NOTE: No schema for indicating attachment of annotations?!?!?!

The YAML example:

    database:
        username: admin
        password: foobar  # TODO get prod passwords out of config
        socket: /var/tmp/database.sock
        options: {use_utf8: true}
    memcached:
        host: 10.0.0.99
    workers:
      - host: 10.0.0.101
        port: 2301
      - host: 10.0.0.102
        port: 2302

Could be:

    [ <Database [<Username "admin">
                 @<TODO "get prod passwords out of config"> <Password "foobar">
                 <Socket "/var/tmp/database.sock">
                 <Options [<UseUTF8>]>]>
      <Memcached [<Host "10.0.0.99">]>
      <Workers [<Worker "10.0.0.101" 2301>
                <Worker "10.0.0.102" 2302>]> ]

Or

    {
      database: {
        username: "admin",
        @<TODO "get prod passwords out of config">
        password: "foobar",
        socket: "/var/tmp/database.sock",
        options: #set{use_utf8}
      },
      memcached: {
        host: "10.0.0.99"
      },
      workers: [ <Worker "10.0.0.101" 2301>
                 <Worker "10.0.0.102" 2302> ]
    }

Its schema-sketch could be

    [ <*interleave <Database [ <*interleave <Username <*string>>
                                            <Password <*string>>
                                            <*optional <Socket <*string>>>
                                            <*optional <Options [<*option> ...]>>> ]>
                   <Memcached [ <Host <*ipv4>> ... ]>
                   <Workers [ <Worker <*ipv4> <*u16>> ... ]>> ]

(for the first variant) or

    {
      database: {
        username: <*string>,
        password: <*string>,
        <*optional socket>: <*string>,
        <*optional options>: #set{<*option>}
      },
      memcached: {
        host: <*ipv4>
      },
      workers: [ <Worker <*ipv4> <*u16>> ... ]
    }

Annotations will be allowed on any value; but also perhaps on a
key-value mapping pair?

    {
      @"I label the key" key: value
      key @"I label the mapping": value
      key: @"I label the value" value
    }

??

Perhaps not.

The schema for the second YAML config sketch would allow the instance
to be written:

    database:
      username: admin
      @<TODO "get prod passwords out of config">
      password: foobar
      socket: /var/tmp/database.sock
      options: use_utf8
    memcached:
      host: 10.0.0.99
    workers:
      <Worker 10.0.0.101 2301>
      <Worker 10.0.0.102 2302>