preserves/TODO.md

---
---
<link rel="stylesheet" href="preserves.css">

TODO:

 - https://github.com/uwiger/sext

 - http://erlang.org/doc/reference_manual/expressions.html#term-comparisons;
   in particular, see the non-lexicographic ordering on tuples (vs
   lists).

 - should there be a built-in (i.e. recommended) reference type for external data??
    - if there were, it'd give IPLD-like characteristics to the thing from the get-go
    - IRIs and mime-typed things are already in there so why not content-based addressing

It is becoming VERY CLEAR that on-the-wire efficiency is... a
secondary concern. Perhaps revise the binary syntax to be less terse
and better for simple encoding and for term ordering,
canonicalization, quick indexing, etc.

 - the indexing thing clashes with the term ordering thing
 - maybe put the indexes at the end?? they could be optional

It might be nice to define some kind of jsonpath/xpath-like means of
naming a subterm within a Preserve. Record labels would be a kind of
assertion on the current node. Indexes and keys would be steps. It'd
be a lot like xpath I think; see also my `racket-xe` package.

 - `child()` - moves into direct children
 - `descendant-or-self()` - moves into direct and indirect children, including this node
 - `descendant()` - moves into direct and indirect children, excluding this node
 - `where[P*]` - "where" clause, applies nested path, keeping nodes with submatches
 - `or[P*]` - result of first non-empty `P` match
 - `at(K)` - moves into direct children whose keys are `K` from
   dictionaries, sequences or records; `K` should be a number for the
   latter two
 - `label()` - moves into labels of records
 - `equals(V)` - filters to only nodes that equal `V`
 - `isa(T)` - filters to only nodes that are `T ∈
   [boolean float double signed-integer string byte-string symbol record sequence set dictionary]`

Abbreviations:

    / = child()
    // = descendant-or-self()
    [P*] = where[P*]
    Symbol = [label() equals(Symbol)]
    NonSymbolAtom = at(NonSymbolAtom)

# TODO

 - [DONE] allow `label[1,2,3]` and `label{a:b, c:d}`, meaning
   `label([1,2,3])` and `label({a:b, c:d})`.

 - explain why total order / comparison of values is important and/or useful
    - what does having a total order unlock?
 - explain why records are good (see below on yaml tags etc)
 - hashability: comes from equivalence
 - more examples
    - over-8000er mountains
    - yaml example from the top of https://camel.readthedocs.io/en/latest/yamlref.html
 - having records with ANONYMOUS but ordered fields is good for easy
   parsing in languages like C where you don't want to explicitly
   search dictionaries of key/value mappings
 - labels vs. yaml tags vs. annotations
    - yaml tags are complex. they're relative uris, for the most part
      anyway, except the local ones; they force interpretation rather
      than being data, e.g. `!` forces a node to be interpreted as a
      string, sequence, or map and `?` forces "tag resolution" aka
      dwimming of scalar syntax. Labels here don't change how their
      fields resolve at all.
       - they're also used to specify particular host-language classes
         and other objects.

             !!python/none
             !!python/bool
             !!python/bytes
             !!python/str
             !!python/unicode
             !!python/int
             !!python/long
             !!python/float
             !!python/complex
             !!python/list
             !!python/tuple
             !!python/dict
             !!python/name:module.name
             !!python/module:package.module
             !!python/object:module.Cls
             !!python/object/new:module.Cls
             !!python/object/apply:module.f

             !ruby/symbol
             !ruby/sym    (alias of the previous!)
             !ruby/range
             !ruby/regexp
             !ruby/struct:StructTypeName
             !ruby/object:Module::ClassName
             !ruby/array:Module::ClassName   (subtyping arrays! objects, not data)
             !ruby/hash:Module::ClassName    (subtyping hashes! objects, not data)

             !perl/regexp

       - yaml tag meanings are per-document or global. Labels aren't
         really specified. Is this good or bad? Once there's a type
         system, labels will become meaningful in a per-type context.

       - yaml tags basically are meant to mean the type of the object
         following. Labels are not: they are for distinguishing among
         variants *within* a type. (In a unityped setting, this boils
         down to the same thing at a different level; object-level vs
         meta-level variants.)

       - in some cases (ruby) a tag indicates a subclass: a
         behavioural refinement of some *object* rather than a
         structural extension of some *data*.

       - yaml tags don't have intrinsic meaning: implementations are
         allowed to complain if they don't recognise a tag. They also
         affect how and whether an object can be used as a dict key;
         labels, otoh, have intrinsic (trivial) meaning, and *any*
         preserves value is allowed to be used as a dict key. YAML
         documents then have implementation-specific meaning, but
         Preserves have intrinsic meaning.

       - yaml has schemas, holy shit, and there the tags really do
         direct interpretation of values to a significant extent.
         Preserves forces the application to do such interpretations:
         the parser/reader won't do them for you.
          - TODO: be clearer in the bit on "validity"

       - yaml tags are URIs, and cannot be structured data

 - annotations
    - in brief: out-of-domain METADATA; implementation/metalevel, not domain/objectlevel
    - comments are a good example: out-of-domain description about the
      value, not part of the value itself
    - uses:
       - roundtripping config cf the approach taken by http://augeas.net/
       - embedding trace information in messages
          - provenance information
          - stack information / distributed trace/continuation record

 - remove comments once annotations are in!

 - binary syntax: length-prefixing is good for pattern-matching,
   because it allows you to reject terms based on arity without having
   to scan the contents.

 - hey so what about protobufs? the optional fields /
   forward-and-backwards-compatibility thing is interesting.

 - what about skipping e.g. lists? would need byte-length prefix

 - When thinking about extensibility and forward/backward
   compatibility, consider this:
   <https://eighty-twenty.org/2016/09/18/gnome-flashback-patch>

 - types, type-directed whitespace-sensitive parsing (oh hey it might
   also lead to optimized binary parsers based on type?)

    - Zephyr (here `*` is postfix Kleene star and `?` marks zero-or-one):

          asdl_ty = Sum(identifier, field*, ctor, ctor*) ;; typename, common fields, at least one ctor, more ctors
                  | Product(identifier, field, field*)   ;; ?? i guess a degenerate kind of sum??
             ctor = Con(identifier, field*)              ;; most like Preserves' record
            field = Id(identifier, identifier?)          ;; basic typename reference (?)
                  | Option(identifier, identifier?)      ;; postfix `?`
                  | Sequence(identifier, identifier?)    ;; postfix `*`

            value = SumVal(identifier, value*, value*)   ;; there are common fields
                  | ProductVal(value, value*)
                  | SequenceVal(value*)
                  | NoneVal
                  | SomeVal(value)
                  | PrimVal(prim)
             prim = IntVal(int)
                  | IdentifierVal(identifier)
                  | StringVal(string)

    - So then for us, where we have kind of union types more than
      labelled sums:
       - `equals(value)`, `lessthan(value)`, `greaterthan(value)`
          - must be equal to / less than / greater than this value
          - maybe take `range(lo,hi)` as primitive?
             - no, because of infinitesimals
          - `regexp(string)` ... etc? (Perhaps `pattern(regexpstring)`
            is better) (Be sure to specify ECMA-262 dialect, with
            restrictions a la JSON-schema
            https://json-schema.org/latest/json-schema-validation.html#rfc.section.4.3)
       - identifier naming a type definition
          - some type definitions are builtin: `Boolean = union(equals(true), equals(false))`
          - some have to be primitive rather than builtin, like
            `SignedInteger` or `Double`, because they have unboundedly
            (or awkwardly) many inhabitants and the class above or
            below them doesn't have a limit ordinal in the right place
          - parameters/`forall`?
       - `record(type, type, ...)` - first one is the label type
       - `list(type, ...)` - heterogeneous list of specific types
       - `listof(type)` - homogeneous list
       - `setof(type)`
       - `{ keytype: valuetype, ... }` - heterogeneous dict
           - wait, `{ keyliteral: valuetype, ... }` might be better - sugar for
             `dict([equals(keyliteral), valuetype], ...)`
           - `dict*(...)` for when extra members are allowed
           - what about optional members?
       - `dictof(keytype, valuetype)` - homogeneous dict
       - `union(type, ...)`
          - empty union is uninhabited type(!)
          - a kind of or
       - `and(type, ...)`
          - simultaneous constraints on type, for range, or for range-and-type
          - a kind of intersection; parallel reduction
       - `interleave(type, ...)` ?? maybe, if sequences are a thing?
         Could be good for organizing key-value mappings in
         dictionary-brackets, because unordered... and sets...

      Sketching it out:

          preserves_ty =

    - Oh dear, actually this is very close to being just a pattern
      language without the captures.

          a1.a & b1.b  =  a1.(a & b1.b) + b1.(a1.a & b)

    - Take two.
       - `=(value)`, `<(value)`, `>(value)`, `<=`, `>=`, *eq *lt *gt *le *ge
       - `_` for discard, `*discard()`
       - scalar values not symbols beginning with `*` match themselves as if they were `=`-wrapped
       - all the special things are records, possibly 0-ary, with labels symbols starting with `*`
         except for `=` etc and `_` and `...`
       - if you have to match a label like `*foo` it might clash, so match `=(*foo)` instead:
         `*foo(1 2 3)` ==> `=(*foo)(1 2 3)`
       - `*int()` for `SignedInteger`, `*string()`, `*symbol()`,
         `*bytestring()`/`*binary()`, `*float()`, `*double()`,
         `*bool()`
       - `*and[pattern ⋯]`
       - `*or[pattern ⋯]`
       - `*not(pattern)` ?
       - `pattern(pattern ⋯)` - match record
       - `[pattern ⋯]` - match sequence
       - `#set{pattern}` - match set
       - don't know how to match dictionaries yet
          - view it as an interleave of its keyvalues
          - `*interleave[pattern ⋯]`?
          - somehow allow specification of a keyvalue that is repeating, that is optional, etc
          - `{keypat:valpat ⋯ ...(keypat):...(valpat)}` ??? eww?
       - `*group[pattern ⋯]` - sequence of values spliced into wider sequence?
       - use literal `...` symbol (!) to mark repetition in a sequence:
         `[*string() ...]`
       - could use literal `?` to mark optionality; or better perhaps `*optional(pattern)`,
         equivalent to `*biased-choice[pattern *group[]]`; hmm, biased choice!
       - could use `*repeat(lo,hi)` or similar for counted repetition
       - don't know how to write refs to other types yet! def labels starting with `*`?

             *def(*foo() *or[*int() *string()]) ?
             *foo()

             *def(*maybe(a) *or[nothing() just(a)])
             *maybe(*int())

       - should those be relative URLs, or jsonpointer or something,
         so can drag in types from the web?
       - NOTE: No schema for indicating attachment of annotations?!?!?!

The YAML example:

    database:
        username: admin
        password: foobar  # TODO get prod passwords out of config
        socket: /var/tmp/database.sock
        options: {use_utf8: true}
    memcached:
        host: 10.0.0.99
    workers:
      - host: 10.0.0.101
        port: 2301
      - host: 10.0.0.102
        port: 2302

Could be:

    [ Database[Username("admin"),
               @TODO("get prod passwords out of config") Password("foobar"),
               Socket("/var/tmp/database.sock"),
               Options[UseUTF8()]],
      Memcached[Host("10.0.0.99")],
      Workers[Worker("10.0.0.101", 2301),
              Worker("10.0.0.102", 2302)] ]

Or

    {
      database: {
        username: "admin",
        @TODO("get prod passwords out of config")
        password: "foobar",
        socket: "/var/tmp/database.sock",
        options: #set{use_utf8}
      },
      memcached: {
        host: "10.0.0.99"
      },
      workers: [ Worker("10.0.0.101", 2301),
                 Worker("10.0.0.102", 2302) ]
    }

Its schema-sketch could be

    [ *interleave[ Database[ *interleave[ Username(*string())
                                          Password(*string())
                                          *optional(Socket(*string()))
                                          *optional(Options[*option() ...]) ] ]
                   Memcached[ Host(*ipv4()) ... ]
                   Workers[ Worker(*ipv4() *u16()) ... ] ] ]

(for the first variant) or

    {
      database: {
        username: *string(),
        password: *string(),
        *optional(socket): *string(),
        *optional(options): #set{*option()}
      },
      memcached: {
        host: *ipv4()
      },
      workers: [ Worker(*ipv4() *u16()) ... ]
    }

Annotations will be allowed on any value; but also perhaps on a
key-value mapping pair?

    {
      @"I label the key" key: value
      key @"I label the mapping": value
      key: @"I label the value" value
    }

??

Perhaps not.

The schema for the second YAML config sketch would allow the instance
to be written:

    database:
      username: admin
      @TODO("get prod passwords out of config")
      password: foobar
      socket: /var/tmp/database.sock
      options: use_utf8
    memcached:
      host: 10.0.0.99
    workers:
      Worker(10.0.0.101, 2301)
      Worker(10.0.0.102, 2302)