From e6e3057de38cb387f65ddc16664a86684318bbfb Mon Sep 17 00:00:00 2001
From: Tony Garnock-Jones <tonyg@leastfixedpoint.com>
Date: Tue, 4 Dec 2018 11:01:20 +0000
Subject: [PATCH] Notes and TODOs

---
 TODO.md | 360 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 360 insertions(+)
 create mode 100644 TODO.md
diff --git a/TODO.md b/TODO.md
new file mode 100644
index 0000000..c818f68
--- /dev/null
+++ b/TODO.md
@@ -0,0 +1,360 @@
+---
+---
+<link rel="stylesheet" href="preserves.css">
+
+TODO:
+
+ - https://github.com/uwiger/sext
+
+ - http://erlang.org/doc/reference_manual/expressions.html#term-comparisons;
+   in particular, see the non-lexicographic ordering on tuples (vs
+   lists).
+
+ - should there be a built-in (i.e. recommended) reference type for external data??
+    - if there were, it'd give IPLD-like characteristics to the thing from the get-go
+    - IRIs and mime-typed things are already in there so why not content-based addressing
+
+It is becoming VERY CLEAR that on-the-wire efficiency is... a
+secondary concern. Perhaps revise the binary syntax to be less terse
+and better for simple encoding and for term ordering,
+canonicalization, quick indexing, etc.
+
+ - the indexing thing clashes with the term ordering thing
+ - maybe put the indexes at the end?? they could be optional
+
+It might be nice to define some kind of jsonpath/xpath-like means of
+naming a subterm within a Preserve. Record labels would be a kind of
+assertion on the current node. Indexes and keys would be steps. It'd
+be a lot like xpath I think; see also my `racket-xe` package.
+
+ - `child()` - moves into direct children
+ - `descendant-or-self()` - moves into direct and indirect children, including this node
+ - `descendant()` - moves into direct and indirect children, excluding this node
+ - `where[P*]` - "where" clause, applies nested path, keeping nodes with submatches
+ - `or[P*]` - result of first non-empty `P` match
+ - `at(K)` - moves into direct children whose keys are `K` from
+   dictionaries, sequences or records; `K` should be a number for the
+   latter two
+ - `label()` - moves into labels of records
+ - `equals(V)` - filters to only nodes that equal `V`
+ - `isa(T)` - filters to only nodes that are `T ∈
+   [boolean float double signed-integer string byte-string symbol record sequence set dictionary]`
+
+Abbreviations:
+
+    / = child()
+    // = descendant-or-self()
+    [P*] = where[P*]
+    Symbol = [label() equals(Symbol)]
+    NonSymbolAtom = at(NonSymbolAtom)
+
+# TODO
+
+ - [DONE] allow `label[1,2,3]` and `label{a:b, c:d}`, meaning
+   `label([1,2,3])` and `label({a:b, c:d})`.
+
+ - explain why total order / comparison of values is important and/or useful
+    - what does having a total order unlock?
+ - explain why records are good (see below on yaml tags etc)
+ - hashability: comes from equivalence
+ - more examples
+    - over-8000er mountains
+    - yaml example from the top of https://camel.readthedocs.io/en/latest/yamlref.html
+ - having records with ANONYMOUS but ordered fields is good for easy
+   parsing in languages like C where you don't want to explicitly
+   search dictionaries of key/value mappings
+ - labels vs. yaml tags vs. annotations
+    - yaml tags are complex. they're relative uris, for the most part
+      anyway, except the local ones; they force interpretation rather
+      than being data, e.g. `!` forces a node to be interpreted as a
+      string, sequence, or map and `?` forces "tag resolution" aka
+      dwimming of scalar syntax. Labels here don't change how their
+      fields resolve at all.
+       - they're also used to specify particular host-language classes
+         and other objects.
+
+             !!python/none
+             !!python/bool
+             !!python/bytes
+             !!python/str
+             !!python/unicode
+             !!python/int
+             !!python/long
+             !!python/float
+             !!python/complex
+             !!python/list
+             !!python/tuple
+             !!python/dict
+             !!python/name:module.name
+             !!python/module:package.module
+             !!python/object:module.Cls
+             !!python/object/new:module.Cls
+             !!python/object/apply:module.f
+
+             !ruby/symbol
+             !ruby/sym    (alias of the previous!)
+             !ruby/range
+             !ruby/regexp
+             !ruby/struct:StructTypeName
+             !ruby/object:Module::ClassName
+             !ruby/array:Module::ClassName   (subtyping arrays! objects, not data)
+             !ruby/hash:Module::ClassName    (subtyping hashes! objects, not data)
+
+             !perl/regexp
+
+       - yaml tag meanings are per-document or global. Labels aren't
+         really specified. Is this good or bad? Once there's a type
+         system, labels will become meaningful in a per-type context.
+
+       - yaml tags basically are meant to mean the type of the object
+         following. Labels are not: they are for distinguishing among
+         variants *within* a type. (In a unityped setting, this boils
+         down to the same thing at a different level; object-level vs
+         meta-level variants.)
+
+       - in some cases (ruby) a tag indicates a subclass: a
+         behavioural refinement of some *object* rather than a
+         structural extension of some *data*.
+
+       - yaml tags don't have intrinsic meaning: implementations are
+         allowed to complain if they don't recognise a tag. They also
+         affect how and whether an object can be used as a dict key;
+         labels, otoh, have intrinsic (trivial) meaning, and *any*
+         preserves value is allowed to be used as a dict key. YAML
+         documents then have implementation-specific meaning, but
+         Preserves have intrinsic meaning.
+
+       - yaml has schemas, holy shit, and there the tags really do
+         direct interpretation of values to a significant extent.
+         Preserves forces the application to do such interpretations:
+         the parser/reader won't do them for you.
+          - TODO: be clearer in the bit on "validity"
+
+       - yaml tags are URIs, and cannot be structured data
+
+ - annotations
+    - in brief: out-of-domain METADATA; implementation/metalevel, not domain/objectlevel
+    - comments are a good example: out-of-domain description about the
+      value, not part of the value itself
+    - uses:
+       - roundtripping config cf the approach taken by http://augeas.net/
+       - embedding trace information in messages
+          - provenance information
+          - stack information / distributed trace/continuation record
+
+ - remove comments once annotations are in!
+
+ - binary syntax: length-prefixing is good for pattern-matching,
+   because it allows you to reject terms based on arity without having
+   to scan the contents.
+
+ - hey so what about protobufs? the optional fields /
+   forward-and-backwards-compatibility thing is interesting.
+
+ - what about skipping e.g. lists? would need byte-length prefix
+
+ - When thinking about extensibility and forward/backward
+   compatibility, consider this:
+   <https://eighty-twenty.org/2016/09/18/gnome-flashback-patch>
+
+ - types, type-directed whitespace-sensitive parsing (oh hey it might
+   also lead to optimized binary parsers based on type?)
+
+    - Zephyr (here `*` is postfix Kleene star and `?` marks zero-or-one):
+
+          asdl_ty = Sum(identifier, field*, ctor, ctor*) ;; typename, common fields, at least one ctor, more ctors
+                  | Product(identifier, field, field*)   ;; ?? i guess a degenerate kind of sum??
+             ctor = Con(identifier, field*)              ;; most like Preserves' record
+            field = Id(identifier, identifier?)          ;; basic typename reference (?)
+                  | Option(identifier, identifier?)      ;; postfix `?`
+                  | Sequence(identifier, identifier?)    ;; postfix `*`
+
+            value = SumVal(identifier, value*, value*)   ;; there are common fields
+                  | ProductVal(value, value*)
+                  | SequenceVal(value*)
+                  | NoneVal
+                  | SomeVal(value)
+                  | PrimVal(prim)
+             prim = IntVal(int)
+                  | IdentifierVal(identifier)
+                  | StringVal(string)
+
+    - So then for us, where we have kind of union types more than
+      labelled sums:
+       - `equals(value)`, `lessthan(value)`, `greaterthan(value)`
+          - must be equal to / less than / greater than this value
+          - maybe take `range(lo,hi)` as primitive?
+             - no, because of infinitesimals
+          - `regexp(string)` ... etc? (Perhaps `pattern(regexpstring)`
+            is better) (Be sure to specify ECMA-262 dialect, with
+            restrictions a la JSON-schema
+            https://json-schema.org/latest/json-schema-validation.html#rfc.section.4.3)
+       - identifier naming a type definition
+          - some type definitions are builtin: `Boolean = union(equals(true), equals(false))`
+          - some have to be primitive rather than builtin, like
+            `SignedInteger` or `Double`, because they have unboundedly
+            (or awkwardly) many inhabitants and the class above or
+            below them doesn't have a limit ordinal in the right place
+          - parameters/`forall`?
+       - `record(type, type, ...)` - first one is the label type
+       - `list(type, ...)` - heterogeneous list of specific types
+       - `listof(type)` - homogeneous list
+       - `setof(type)`
+       - `{ keytype: valuetype, ... }` - heterogeneous dict
+           - wait, `{ keyliteral: valuetype, ... }` might be better - sugar for
+             `dict([equals(keyliteral), valuetype], ...)`
+           - `dict*(...)` for when extra members are allowed
+           - what about optional members?
+       - `dictof(keytype, valuetype)` - homogeneous dict
+       - `union(type, ...)`
+          - empty union is uninhabited type(!)
+          - a kind of or
+       - `and(type, ...)`
+          - simultaneous constraints on type, for range, or for range-and-type
+          - a kind of intersection; parallel reduction
+       - `interleave(type, ...)` ?? maybe, if sequences are a thing?
+         Could be good for organizing key-value mappings in
+         dictionary-brackets, because unordered... and sets...
+
+      Sketching it out:
+
+          preserves_ty = 
+
+    - Oh dear, actually this is very close to being just a pattern
+      language without the captures.
+
+          a1.a & b1.b  =  a1.(a & b1.b) + b1.(a1.a & b)
+
+    - Take two.
+       - `=(value)`, `<(value)`, `>(value)`, `<=`, `>=`, *eq *lt *gt *le *ge
+       - `_` for discard, `*discard()`
+       - scalar values not symbols beginning with `*` match themselves as if they were `=`-wrapped
+       - all the special things are records, possibly 0-ary, with labels symbols starting with `*`
+         except for `=` etc and `_` and `...`
+       - if you have to match a label like `*foo` it might clash, so match `=(*foo)` instead:
+         `*foo(1 2 3)` ==> `=(*foo)(1 2 3)`
+       - `*int()` for `SignedInteger`, `*string()`, `*symbol()`,
+         `*bytestring()`/`*binary()`, `*float()`, `*double()`,
+         `*bool()`
+       - `*and[pattern ⋯]`
+       - `*or[pattern ⋯]`
+       - `*not(pattern)` ?
+       - `pattern(pattern ⋯)` - match record
+       - `[pattern ⋯]` - match sequence
+       - `#set{pattern}` - match set
+       - don't know how to match dictionaries yet
+          - view it as an interleave of its keyvalues
+          - `*interleave[pattern ⋯]`?
+          - somehow allow specification of a keyvalue that is repeating, that is optional, etc
+          - `{keypat:valpat ⋯ ...(keypat):...(valpat)}` ??? eww?
+       - `*group[pattern ⋯]` - sequence of values spliced into wider sequence?
+       - use literal `...` symbol (!) to mark repetition in a sequence:
+         `[*string() ...]`
+       - could use literal `?` to mark optionality; or better perhaps `*optional(pattern)`,
+         equivalent to `*biased-choice[pattern *group[]]`; hmm, biased choice!
+       - could use `*repeat(lo,hi)` or similar for counted repetition
+       - don't know how to write refs to other types yet! def labels starting with `*`?
+
+             *def(*foo() *or[*int() *string()]) ?
+             *foo()
+
+             *def(*maybe(a) *or[nothing() just(a)])
+             *maybe(*int())
+
+       - should those be relative URLs, or jsonpointer or something,
+         so can drag in types from the web?
+       - NOTE: No schema for indicating attachment of annotations?!?!?!
+
+The YAML example:
+
+    database:
+        username: admin
+        password: foobar  # TODO get prod passwords out of config
+        socket: /var/tmp/database.sock
+        options: {use_utf8: true}
+    memcached:
+        host: 10.0.0.99
+    workers:
+      - host: 10.0.0.101
+        port: 2301
+      - host: 10.0.0.102
+        port: 2302
+
+Could be:
+
+    [ Database[Username("admin"),
+               @TODO("get prod passwords out of config") Password("foobar"),
+               Socket("/var/tmp/database.sock"),
+               Options[UseUTF8()]],
+      Memcached[Host("10.0.0.99")],
+      Workers[Worker("10.0.0.101", 2301),
+              Worker("10.0.0.102", 2302)] ]
+
+Or
+
+    {
+      database: {
+        username: "admin",
+        @TODO("get prod passwords out of config")
+        password: "foobar",
+        socket: "/var/tmp/database.sock",
+        options: #set{use_utf8}
+      },
+      memcached: {
+        host: "10.0.0.99"
+      },
+      workers: [ Worker("10.0.0.101", 2301),
+                 Worker("10.0.0.102", 2302) ]
+    }
+
+Its schema-sketch could be
+
+    [ *interleave[ Database[ *interleave[ Username(*string())
+                                          Password(*string())
+                                          *optional(Socket(*string()))
+                                          *optional(Options[*option() ...]) ] ]
+                   Memcached[ Host(*ipv4()) ... ]
+                   Workers[ Worker(*ipv4() *u16()) ... ] ] ]
+
+(for the first variant) or
+
+    {
+      database: {
+        username: *string(),
+        password: *string(),
+        *optional(socket): *string(),
+        *optional(options): #set{*option()}
+      },
+      memcached: {
+        host: *ipv4()
+      },
+      workers: [ Worker(*ipv4() *u16()) ... ]
+    }
+
+Annotations will be allowed on any value; but also perhaps on a
+key-value mapping pair?
+
+    {
+      @"I label the key" key: value
+      key @"I label the mapping": value
+      key: @"I label the value" value
+    }
+
+??
+
+Perhaps not.
+
+The schema for the second YAML config sketch would allow the instance
+to be written:
+
+    database:
+      username: admin
+      @TODO("get prod passwords out of config")
+      password: foobar
+      socket: /var/tmp/database.sock
+      options: use_utf8
+    memcached:
+      host: 10.0.0.99
+    workers:
+      Worker(10.0.0.101, 2301)
+      Worker(10.0.0.102, 2302)