From e6e3057de38cb387f65ddc16664a86684318bbfb Mon Sep 17 00:00:00 2001 From: Tony Garnock-Jones Date: Tue, 4 Dec 2018 11:01:20 +0000 Subject: [PATCH] Notes and TODOs --- TODO.md | 360 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 360 insertions(+) create mode 100644 TODO.md diff --git a/TODO.md b/TODO.md new file mode 100644 index 0000000..c818f68 --- /dev/null +++ b/TODO.md @@ -0,0 +1,360 @@ +--- +--- + + +TODO: + + - https://github.com/uwiger/sext + + - http://erlang.org/doc/reference_manual/expressions.html#term-comparisons; + in particular, see the non-lexicographic ordering on tuples (vs + lists). + + - should there be a built-in (i.e. recommended) reference type for external data?? + - if there were, it'd give IPLD-like characteristics to the thing from the get-go + - IRIs and mime-typed things are already in there so why not content-based addressing + +It is becoming VERY CLEAR that on-the-wire efficiency is... a +secondary concern. Perhaps revise the binary syntax to be less terse +and better for simple encoding and for term ordering, +canonicalization, quick indexing, etc. + + - the indexing thing clashes with the term ordering thing + - maybe put the indexes at the end?? they could be optional + +It might be nice to define some kind of jsonpath/xpath-like means of +naming a subterm within a Preserve. Record labels would be a kind of +assertion on the current node. Indexes and keys would be steps. It'd +be a lot like xpath I think; see also my `racket-xe` package. + + - `child()` - moves into direct children + - `descendant-or-self()` - moves into direct and indirect children, including this node + - `descendant()` - moves into direct and indirect children, excluding this node + - `where[P*]` - "where" clause, applies nested path, keeping nodes with submatches + - `or[P*]` - result of first non-empty `P` match + - `at(K)` - moves into direct children whose keys are `K` from + dictionaries, sequences or records; `K` should be a number for the + latter two + - `label()` - moves into labels of records + - `equals(V)` - filters to only nodes that equal `V` + - `isa(T)` - filters to only nodes that are `T ∈ + [boolean float double signed-integer string byte-string symbol record sequence set dictionary]` + +Abbreviations: + + / = child() + // = descendant-or-self() + [P*] = where[P*] + Symbol = [label() equals(Symbol)] + NonSymbolAtom = at(NonSymbolAtom) + +# TODO + + - [DONE] allow `label[1,2,3]` and `label{a:b, c:d}`, meaning + `label([1,2,3])` and `label({a:b, c:d})`. + + - explain why total order / comparison of values is important and/or useful + - what does having a total order unlock? + - explain why records are good (see below on yaml tags etc) + - hashability: comes from equivalence + - more examples + - over-8000er mountains + - yaml example from the top of https://camel.readthedocs.io/en/latest/yamlref.html + - having records with ANONYMOUS but ordered fields is good for easy + parsing in languages like C where you don't want to explicitly + search dictionaries of key/value mappings + - labels vs. yaml tags vs. annotations + - yaml tags are complex. they're relative uris, for the most part + anyway, except the local ones; they force interpretation rather + than being data, e.g. `!` forces a node to be interpreted as a + string, sequence, or map and `?` forces "tag resolution" aka + dwimming of scalar syntax. Labels here don't change how their + fields resolve at all. + - they're also used to specify particular host-language classes + and other objects. + + !!python/none + !!python/bool + !!python/bytes + !!python/str + !!python/unicode + !!python/int + !!python/long + !!python/float + !!python/complex + !!python/list + !!python/tuple + !!python/dict + !!python/name:module.name + !!python/module:package.module + !!python/object:module.Cls + !!python/object/new:module.Cls + !!python/object/apply:module.f + + !ruby/symbol + !ruby/sym (alias of the previous!) + !ruby/range + !ruby/regexp + !ruby/struct:StructTypeName + !ruby/object:Module::ClassName + !ruby/array:Module::ClassName (subtyping arrays! objects, not data) + !ruby/hash:Module::ClassName (subtyping hashes! objects, not data) + + !perl/regexp + + - yaml tag meanings are per-document or global. Labels aren't + really specified. Is this good or bad? Once there's a type + system, labels will become meaningful in a per-type context. + + - yaml tags basically are meant to mean the type of the object + following. Labels are not: they are for distinguishing among + variants *within* a type. (In a unityped setting, this boils + down to the same thing at a different level; object-level vs + meta-level variants.) + + - in some cases (ruby) a tag indicates a subclass: a + behavioural refinement of some *object* rather than a + structural extension of some *data*. + + - yaml tags don't have intrinsic meaning: implementations are + allowed to complain if they don't recognise a tag. They also + affect how and whether an object can be used as a dict key; + labels, otoh, have intrinsic (trivial) meaning, and *any* + preserves value is allowed to be used as a dict key. YAML + documents then have implementation-specific meaning, but + Preserves have intrinsic meaning. + + - yaml has schemas, holy shit, and there the tags really do + direct interpretation of values to a significant extent. + Preserves forces the application to do such interpretations: + the parser/reader won't do them for you. + - TODO: be clearer in the bit on "validity" + + - yaml tags are URIs, and cannot be structured data + + - annotations + - in brief: out-of-domain METADATA; implementation/metalevel, not domain/objectlevel + - comments are a good example: out-of-domain description about the + value, not part of the value itself + - uses: + - roundtripping config cf the approach taken by http://augeas.net/ + - embedding trace information in messages + - provenance information + - stack information / distributed trace/continuation record + + - remove comments once annotations are in! + + - binary syntax: length-prefixing is good for pattern-matching, + because it allows you to reject terms based on arity without having + to scan the contents. + + - hey so what about protobufs? the optional fields / + forward-and-backwards-compatibility thing is interesting. + + - what about skipping e.g. lists? would need byte-length prefix + + - When thinking about extensibility and forward/backward + compatibility, consider this: + + + - types, type-directed whitespace-sensitive parsing (oh hey it might + also lead to optimized binary parsers based on type?) + + - Zephyr (here `*` is postfix Kleene star and `?` marks zero-or-one): + + asdl_ty = Sum(identifier, field*, ctor, ctor*) ;; typename, common fields, at least one ctor, more ctors + | Product(identifier, field, field*) ;; ?? i guess a degenerate kind of sum?? + ctor = Con(identifier, field*) ;; most like Preserves' record + field = Id(identifier, identifier?) ;; basic typename reference (?) + | Option(identifier, identifier?) ;; postfix `?` + | Sequence(identifier, identifier?) ;; postfix `*` + + value = SumVal(identifier, value*, value*) ;; there are common fields + | ProductVal(value, value*) + | SequenceVal(value*) + | NoneVal + | SomeVal(value) + | PrimVal(prim) + prim = IntVal(int) + | IdentifierVal(identifier) + | StringVal(string) + + - So then for us, where we have kind of union types more than + labelled sums: + - `equals(value)`, `lessthan(value)`, `greaterthan(value)` + - must be equal to / less than / greater than this value + - maybe take `range(lo,hi)` as primitive? + - no, because of infinitesimals + - `regexp(string)` ... etc? (Perhaps `pattern(regexpstring)` + is better) (Be sure to specify ECMA-262 dialect, with + restrictions a la JSON-schema + https://json-schema.org/latest/json-schema-validation.html#rfc.section.4.3) + - identifier naming a type definition + - some type definitions are builtin: `Boolean = union(equals(true), equals(false))` + - some have to be primitive rather than builtin, like + `SignedInteger` or `Double`, because they have unboundedly + (or awkwardly) many inhabitants and the class above or + below them doesn't have a limit ordinal in the right place + - parameters/`forall`? + - `record(type, type, ...)` - first one is the label type + - `list(type, ...)` - heterogeneous list of specific types + - `listof(type)` - homogeneous list + - `setof(type)` + - `{ keytype: valuetype, ... }` - heterogeneous dict + - wait, `{ keyliteral: valuetype, ... }` might be better - sugar for + `dict([equals(keyliteral), valuetype], ...)` + - `dict*(...)` for when extra members are allowed + - what about optional members? + - `dictof(keytype, valuetype)` - homogeneous dict + - `union(type, ...)` + - empty union is uninhabited type(!) + - a kind of or + - `and(type, ...)` + - simultaneous constraints on type, for range, or for range-and-type + - a kind of intersection; parallel reduction + - `interleave(type, ...)` ?? maybe, if sequences are a thing? + Could be good for organizing key-value mappings in + dictionary-brackets, because unordered... and sets... + + Sketching it out: + + preserves_ty = + + - Oh dear, actually this is very close to being just a pattern + language without the captures. + + a1.a & b1.b = a1.(a & b1.b) + b1.(a1.a & b) + + - Take two. + - `=(value)`, `<(value)`, `>(value)`, `<=`, `>=`, *eq *lt *gt *le *ge + - `_` for discard, `*discard()` + - scalar values not symbols beginning with `*` match themselves as if they were `=`-wrapped + - all the special things are records, possibly 0-ary, with labels symbols starting with `*` + except for `=` etc and `_` and `...` + - if you have to match a label like `*foo` it might clash, so match `=(*foo)` instead: + `*foo(1 2 3)` ==> `=(*foo)(1 2 3)` + - `*int()` for `SignedInteger`, `*string()`, `*symbol()`, + `*bytestring()`/`*binary()`, `*float()`, `*double()`, + `*bool()` + - `*and[pattern ⋯]` + - `*or[pattern ⋯]` + - `*not(pattern)` ? + - `pattern(pattern ⋯)` - match record + - `[pattern ⋯]` - match sequence + - `#set{pattern}` - match set + - don't know how to match dictionaries yet + - view it as an interleave of its keyvalues + - `*interleave[pattern ⋯]`? + - somehow allow specification of a keyvalue that is repeating, that is optional, etc + - `{keypat:valpat ⋯ ...(keypat):...(valpat)}` ??? eww? + - `*group[pattern ⋯]` - sequence of values spliced into wider sequence? + - use literal `...` symbol (!) to mark repetition in a sequence: + `[*string() ...]` + - could use literal `?` to mark optionality; or better perhaps `*optional(pattern)`, + equivalent to `*biased-choice[pattern *group[]]`; hmm, biased choice! + - could use `*repeat(lo,hi)` or similar for counted repetition + - don't know how to write refs to other types yet! def labels starting with `*`? + + *def(*foo() *or[*int() *string()]) ? + *foo() + + *def(*maybe(a) *or[nothing() just(a)]) + *maybe(*int()) + + - should those be relative URLs, or jsonpointer or something, + so can drag in types from the web? + - NOTE: No schema for indicating attachment of annotations?!?!?! + +The YAML example: + + database: + username: admin + password: foobar # TODO get prod passwords out of config + socket: /var/tmp/database.sock + options: {use_utf8: true} + memcached: + host: 10.0.0.99 + workers: + - host: 10.0.0.101 + port: 2301 + - host: 10.0.0.102 + port: 2302 + +Could be: + + [ Database[Username("admin"), + @TODO("get prod passwords out of config") Password("foobar"), + Socket("/var/tmp/database.sock"), + Options[UseUTF8()]], + Memcached[Host("10.0.0.99")], + Workers[Worker("10.0.0.101", 2301), + Worker("10.0.0.102", 2302)] ] + +Or + + { + database: { + username: "admin", + @TODO("get prod passwords out of config") + password: "foobar", + socket: "/var/tmp/database.sock", + options: #set{use_utf8} + }, + memcached: { + host: "10.0.0.99" + }, + workers: [ Worker("10.0.0.101", 2301), + Worker("10.0.0.102", 2302) ] + } + +Its schema-sketch could be + + [ *interleave[ Database[ *interleave[ Username(*string()) + Password(*string()) + *optional(Socket(*string())) + *optional(Options[*option() ...]) ] ] + Memcached[ Host(*ipv4()) ... ] + Workers[ Worker(*ipv4() *u16()) ... ] ] ] + +(for the first variant) or + + { + database: { + username: *string(), + password: *string(), + *optional(socket): *string(), + *optional(options): #set{*option()} + }, + memcached: { + host: *ipv4() + }, + workers: [ Worker(*ipv4() *u16()) ... ] + } + +Annotations will be allowed on any value; but also perhaps on a +key-value mapping pair? + + { + @"I label the key" key: value + key @"I label the mapping": value + key: @"I label the value" value + } + +?? + +Perhaps not. + +The schema for the second YAML config sketch would allow the instance +to be written: + + database: + username: admin + @TODO("get prod passwords out of config") + password: foobar + socket: /var/tmp/database.sock + options: use_utf8 + memcached: + host: 10.0.0.99 + workers: + Worker(10.0.0.101, 2301) + Worker(10.0.0.102, 2302)