preserves/TODO.md

360 lines
14 KiB
Markdown
Raw Normal View History

2018-12-04 11:01:20 +00:00
---
---
TODO:
- https://github.com/uwiger/sext
- http://erlang.org/doc/reference_manual/expressions.html#term-comparisons;
in particular, see the non-lexicographic ordering on tuples (vs
lists).
- should there be a built-in (i.e. recommended) reference type for external data??
- if there were, it'd give IPLD-like characteristics to the thing from the get-go
- IRIs and mime-typed things are already in there so why not content-based addressing
It is becoming VERY CLEAR that on-the-wire efficiency is... a
secondary concern. Perhaps revise the binary syntax to be less terse
and better for simple encoding and for term ordering,
canonicalization, quick indexing, etc.
- the indexing thing clashes with the term ordering thing
- maybe put the indexes at the end?? they could be optional
It might be nice to define some kind of jsonpath/xpath-like means of
naming a subterm within a Preserve. Record labels would be a kind of
assertion on the current node. Indexes and keys would be steps. It'd
be a lot like xpath I think; see also my `racket-xe` package.
- `child()` - moves into direct children
- `descendant-or-self()` - moves into direct and indirect children, including this node
- `descendant()` - moves into direct and indirect children, excluding this node
- `where[P*]` - "where" clause, applies nested path, keeping nodes with submatches
- `or[P*]` - result of first non-empty `P` match
- `at(K)` - moves into direct children whose keys are `K` from
dictionaries, sequences or records; `K` should be a number for the
latter two
- `label()` - moves into labels of records
- `equals(V)` - filters to only nodes that equal `V`
- `isa(T)` - filters to only nodes that are `T ∈
[boolean float double signed-integer string byte-string symbol record sequence set dictionary]`
Abbreviations:
/ = child()
// = descendant-or-self()
[P*] = where[P*]
Symbol = [label() equals(Symbol)]
NonSymbolAtom = at(NonSymbolAtom)
# TODO
- [DONE] allow `label[1,2,3]` and `label{a:b, c:d}`, meaning
`label([1,2,3])` and `label({a:b, c:d})`.
- explain why total order / comparison of values is important and/or useful
- what does having a total order unlock?
- explain why records are good (see below on yaml tags etc)
- hashability: comes from equivalence
- more examples
- over-8000er mountains
- yaml example from the top of https://camel.readthedocs.io/en/latest/yamlref.html
- having records with ANONYMOUS but ordered fields is good for easy
parsing in languages like C where you don't want to explicitly
search dictionaries of key/value mappings
- labels vs. yaml tags vs. annotations
- yaml tags are complex. they're relative uris, for the most part
anyway, except the local ones; they force interpretation rather
than being data, e.g. `!` forces a node to be interpreted as a
string, sequence, or map and `?` forces "tag resolution" aka
dwimming of scalar syntax. Labels here don't change how their
fields resolve at all.
- they're also used to specify particular host-language classes
and other objects.
!!python/none
!!python/bool
!!python/bytes
!!python/str
!!python/unicode
!!python/int
!!python/long
!!python/float
!!python/complex
!!python/list
!!python/tuple
!!python/dict
!!python/name:module.name
!!python/module:package.module
!!python/object:module.Cls
!!python/object/new:module.Cls
!!python/object/apply:module.f
!ruby/symbol
!ruby/sym (alias of the previous!)
!ruby/range
!ruby/regexp
!ruby/struct:StructTypeName
!ruby/object:Module::ClassName
!ruby/array:Module::ClassName (subtyping arrays! objects, not data)
!ruby/hash:Module::ClassName (subtyping hashes! objects, not data)
!perl/regexp
- yaml tag meanings are per-document or global. Labels aren't
really specified. Is this good or bad? Once there's a type
system, labels will become meaningful in a per-type context.
- yaml tags basically are meant to mean the type of the object
following. Labels are not: they are for distinguishing among
variants *within* a type. (In a unityped setting, this boils
down to the same thing at a different level; object-level vs
meta-level variants.)
- in some cases (ruby) a tag indicates a subclass: a
behavioural refinement of some *object* rather than a
structural extension of some *data*.
- yaml tags don't have intrinsic meaning: implementations are
allowed to complain if they don't recognise a tag. They also
affect how and whether an object can be used as a dict key;
labels, otoh, have intrinsic (trivial) meaning, and *any*
preserves value is allowed to be used as a dict key. YAML
documents then have implementation-specific meaning, but
Preserves have intrinsic meaning.
- yaml has schemas, holy shit, and there the tags really do
direct interpretation of values to a significant extent.
Preserves forces the application to do such interpretations:
the parser/reader won't do them for you.
- TODO: be clearer in the bit on "validity"
- yaml tags are URIs, and cannot be structured data
- annotations
- in brief: out-of-domain METADATA; implementation/metalevel, not domain/objectlevel
- comments are a good example: out-of-domain description about the
value, not part of the value itself
- uses:
- roundtripping config cf the approach taken by http://augeas.net/
- embedding trace information in messages
- provenance information
- stack information / distributed trace/continuation record
- remove comments once annotations are in!
- binary syntax: length-prefixing is good for pattern-matching,
because it allows you to reject terms based on arity without having
to scan the contents.
- hey so what about protobufs? the optional fields /
forward-and-backwards-compatibility thing is interesting.
- what about skipping e.g. lists? would need byte-length prefix
- When thinking about extensibility and forward/backward
compatibility, consider this:
<https://eighty-twenty.org/2016/09/18/gnome-flashback-patch>
- types, type-directed whitespace-sensitive parsing (oh hey it might
also lead to optimized binary parsers based on type?)
- Zephyr (here `*` is postfix Kleene star and `?` marks zero-or-one):
asdl_ty = Sum(identifier, field*, ctor, ctor*) ;; typename, common fields, at least one ctor, more ctors
| Product(identifier, field, field*) ;; ?? i guess a degenerate kind of sum??
ctor = Con(identifier, field*) ;; most like Preserves' record
field = Id(identifier, identifier?) ;; basic typename reference (?)
| Option(identifier, identifier?) ;; postfix `?`
| Sequence(identifier, identifier?) ;; postfix `*`
value = SumVal(identifier, value*, value*) ;; there are common fields
| ProductVal(value, value*)
| SequenceVal(value*)
| NoneVal
| SomeVal(value)
| PrimVal(prim)
prim = IntVal(int)
| IdentifierVal(identifier)
| StringVal(string)
- So then for us, where we have kind of union types more than
labelled sums:
- `equals(value)`, `lessthan(value)`, `greaterthan(value)`
- must be equal to / less than / greater than this value
- maybe take `range(lo,hi)` as primitive?
- no, because of infinitesimals
- `regexp(string)` ... etc? (Perhaps `pattern(regexpstring)`
is better) (Be sure to specify ECMA-262 dialect, with
restrictions a la JSON-schema
https://json-schema.org/latest/json-schema-validation.html#rfc.section.4.3)
- identifier naming a type definition
- some type definitions are builtin: `Boolean = union(equals(true), equals(false))`
- some have to be primitive rather than builtin, like
`SignedInteger` or `Double`, because they have unboundedly
(or awkwardly) many inhabitants and the class above or
below them doesn't have a limit ordinal in the right place
- parameters/`forall`?
- `record(type, type, ...)` - first one is the label type
- `list(type, ...)` - heterogeneous list of specific types
- `listof(type)` - homogeneous list
- `setof(type)`
- `{ keytype: valuetype, ... }` - heterogeneous dict
- wait, `{ keyliteral: valuetype, ... }` might be better - sugar for
`dict([equals(keyliteral), valuetype], ...)`
- `dict*(...)` for when extra members are allowed
- what about optional members?
- `dictof(keytype, valuetype)` - homogeneous dict
- `union(type, ...)`
- empty union is uninhabited type(!)
- a kind of or
- `and(type, ...)`
- simultaneous constraints on type, for range, or for range-and-type
- a kind of intersection; parallel reduction
- `interleave(type, ...)` ?? maybe, if sequences are a thing?
Could be good for organizing key-value mappings in
dictionary-brackets, because unordered... and sets...
Sketching it out:
preserves_ty =
- Oh dear, actually this is very close to being just a pattern
language without the captures.
a1.a & b1.b = a1.(a & b1.b) + b1.(a1.a & b)
- Take two.
- `=(value)`, `<(value)`, `>(value)`, `<=`, `>=`, *eq *lt *gt *le *ge
- `_` for discard, `*discard()`
- scalar values not symbols beginning with `*` match themselves as if they were `=`-wrapped
- all the special things are records, possibly 0-ary, with labels symbols starting with `*`
except for `=` etc and `_` and `...`
- if you have to match a label like `*foo` it might clash, so match `=(*foo)` instead:
`*foo(1 2 3)` ==> `=(*foo)(1 2 3)`
- `*int()` for `SignedInteger`, `*string()`, `*symbol()`,
`*bytestring()`/`*binary()`, `*float()`, `*double()`,
`*bool()`
- `*and[pattern ⋯]`
- `*or[pattern ⋯]`
- `*not(pattern)` ?
- `pattern(pattern ⋯)` - match record
- `[pattern ⋯]` - match sequence
- `#set{pattern}` - match set
- don't know how to match dictionaries yet
- view it as an interleave of its keyvalues
- `*interleave[pattern ⋯]`?
- somehow allow specification of a keyvalue that is repeating, that is optional, etc
- `{keypat:valpat ⋯ ...(keypat):...(valpat)}` ??? eww?
- `*group[pattern ⋯]` - sequence of values spliced into wider sequence?
- use literal `...` symbol (!) to mark repetition in a sequence:
`[*string() ...]`
- could use literal `?` to mark optionality; or better perhaps `*optional(pattern)`,
equivalent to `*biased-choice[pattern *group[]]`; hmm, biased choice!
- could use `*repeat(lo,hi)` or similar for counted repetition
- don't know how to write refs to other types yet! def labels starting with `*`?
*def(*foo() *or[*int() *string()]) ?
*foo()
*def(*maybe(a) *or[nothing() just(a)])
*maybe(*int())
- should those be relative URLs, or jsonpointer or something,
so can drag in types from the web?
- NOTE: No schema for indicating attachment of annotations?!?!?!
The YAML example:
database:
username: admin
password: foobar # TODO get prod passwords out of config
socket: /var/tmp/database.sock
options: {use_utf8: true}
memcached:
host: 10.0.0.99
workers:
- host: 10.0.0.101
port: 2301
- host: 10.0.0.102
port: 2302
Could be:
[ Database[Username("admin"),
@TODO("get prod passwords out of config") Password("foobar"),
Socket("/var/tmp/database.sock"),
Options[UseUTF8()]],
Memcached[Host("10.0.0.99")],
Workers[Worker("10.0.0.101", 2301),
Worker("10.0.0.102", 2302)] ]
Or
{
database: {
username: "admin",
@TODO("get prod passwords out of config")
password: "foobar",
socket: "/var/tmp/database.sock",
options: #set{use_utf8}
},
memcached: {
host: "10.0.0.99"
},
workers: [ Worker("10.0.0.101", 2301),
Worker("10.0.0.102", 2302) ]
}
Its schema-sketch could be
[ *interleave[ Database[ *interleave[ Username(*string())
Password(*string())
*optional(Socket(*string()))
*optional(Options[*option() ...]) ] ]
Memcached[ Host(*ipv4()) ... ]
Workers[ Worker(*ipv4() *u16()) ... ] ] ]
(for the first variant) or
{
database: {
username: *string(),
password: *string(),
*optional(socket): *string(),
*optional(options): #set{*option()}
},
memcached: {
host: *ipv4()
},
workers: [ Worker(*ipv4() *u16()) ... ]
}
Annotations will be allowed on any value; but also perhaps on a
key-value mapping pair?
{
@"I label the key" key: value
key @"I label the mapping": value
key: @"I label the value" value
}
??
Perhaps not.
The schema for the second YAML config sketch would allow the instance
to be written:
database:
username: admin
@TODO("get prod passwords out of config")
password: foobar
socket: /var/tmp/database.sock
options: use_utf8
memcached:
host: 10.0.0.99
workers:
Worker(10.0.0.101, 2301)
Worker(10.0.0.102, 2302)