14 KiB

Raw Blame History

TODO:

consider a lead byte value used to wrap an encoded Value in a size-counted wrapper? That way parsers can quickly skip nested structure they're not interested in...
https://github.com/uwiger/sext
http://erlang.org/doc/reference_manual/expressions.html#term-comparisons; in particular, see the non-lexicographic ordering on tuples (vs lists).
should there be a built-in (i.e. recommended) reference type for external data??
- if there were, it'd give IPLD-like characteristics to the thing from the get-go
- IRIs and mime-typed things are already in there so why not content-based addressing

It is becoming VERY CLEAR that on-the-wire efficiency is... a secondary concern. Perhaps revise the binary syntax to be less terse and better for simple encoding and for term ordering, canonicalization, quick indexing, etc.

the indexing thing clashes with the term ordering thing
maybe put the indexes at the end?? they could be optional

It might be nice to define some kind of jsonpath/xpath-like means of naming a subterm within a Preserve. Record labels would be a kind of assertion on the current node. Indexes and keys would be steps. It'd be a lot like xpath I think; see also my racket-xe package.

<child> - moves into direct children
<descendant-or-self> - moves into direct and indirect children, including this node
<descendant> - moves into direct and indirect children, excluding this node
<where[P*]> - "where" clause, applies nested path, keeping nodes with submatches
<or[P*]> - result of first non-empty P match
<at K> - moves into direct children whose keys are K from dictionaries, sequences or records; K should be a number for the latter two
<label> - moves into labels of records
<equals V> - filters to only nodes that equal V
<isa T> - filters to only nodes that are T ∈ {boolean float double signed-integer string byte-string symbol record sequence set dictionary}

Abbreviations:

/ = <child>
// = <descendant-or-self>
[P*] = <where[P*]>
Symbol = [<label> <equals Symbol>]
NonSymbolAtom = <at NonSymbolAtom>

TODO

explain why total order / comparison of values is important and/or useful
- what does having a total order unlock?
explain why records are good (see below on yaml tags etc)
hashability: comes from equivalence
more examples
- over-8000er mountains
- yaml example from the top of https://camel.readthedocs.io/en/latest/yamlref.html
having records with ANONYMOUS but ordered fields is good for easy parsing in languages like C where you don't want to explicitly search dictionaries of key/value mappings
labels vs. yaml tags vs. annotations
- yaml tags are complex. they're relative uris, for the most part anyway, except the local ones; they force interpretation rather than being data, e.g. ! forces a node to be interpreted as a string, sequence, or map and ? forces "tag resolution" aka dwimming of scalar syntax. Labels here don't change how their fields resolve at all.
  - they're also used to specify particular host-language classes and other objects.
```
!!python/none
!!python/bool
!!python/bytes
!!python/str
!!python/unicode
!!python/int
!!python/long
!!python/float
!!python/complex
!!python/list
!!python/tuple
!!python/dict
!!python/name:module.name
!!python/module:package.module
!!python/object:module.Cls
!!python/object/new:module.Cls
!!python/object/apply:module.f

!ruby/symbol
!ruby/sym    (alias of the previous!)
!ruby/range
!ruby/regexp
!ruby/struct:StructTypeName
!ruby/object:Module::ClassName
!ruby/array:Module::ClassName   (subtyping arrays! objects, not data)
!ruby/hash:Module::ClassName    (subtyping hashes! objects, not data)

!perl/regexp
```
  - yaml tag meanings are per-document or global. Labels aren't really specified. Is this good or bad? Once there's a type system, labels will become meaningful in a per-type context.
  - yaml tags basically are meant to mean the type of the object following. Labels are not: they are for distinguishing among variants within a type. (In a unityped setting, this boils down to the same thing at a different level; object-level vs meta-level variants.)
  - in some cases (ruby) a tag indicates a subclass: a behavioural refinement of some object rather than a structural extension of some data.
  - yaml tags don't have intrinsic meaning: implementations are allowed to complain if they don't recognise a tag. They also affect how and whether an object can be used as a dict key; labels, otoh, have intrinsic (trivial) meaning, and any preserves value is allowed to be used as a dict key. YAML documents then have implementation-specific meaning, but Preserves have intrinsic meaning.
  - yaml has schemas, holy shit, and there the tags really do direct interpretation of values to a significant extent. Preserves forces the application to do such interpretations: the parser/reader won't do them for you.
    - TODO: be clearer in the bit on "validity"
  - yaml tags are URIs, and cannot be structured data
annotations
- in brief: out-of-domain METADATA; implementation/metalevel, not domain/objectlevel
- comments are a good example: out-of-domain description about the value, not part of the value itself
- uses:
  - roundtripping config cf the approach taken by http://augeas.net/
  - embedding trace information in messages
    - provenance information
    - stack information / distributed trace/continuation record
remove comments once annotations are in!
binary syntax: length-prefixing is good for pattern-matching, because it allows you to reject terms based on arity without having to scan the contents.
hey so what about protobufs? the optional fields / forward-and-backwards-compatibility thing is interesting.
what about skipping e.g. lists? would need byte-length prefix
When thinking about extensibility and forward/backward compatibility, consider this: https://eighty-twenty.org/2016/09/18/gnome-flashback-patch
types, type-directed whitespace-sensitive parsing (oh hey it might also lead to optimized binary parsers based on type?)
- Zephyr (here * is postfix Kleene star and ? marks zero-or-one):
```
asdl_ty = Sum(identifier, field*, ctor, ctor*) ;; typename, common fields, at least one ctor, more ctors
        | Product(identifier, field, field*)   ;; ?? i guess a degenerate kind of sum??
   ctor = Con(identifier, field*)              ;; most like Preserves' record
  field = Id(identifier, identifier?)          ;; basic typename reference (?)
        | Option(identifier, identifier?)      ;; postfix `?`
        | Sequence(identifier, identifier?)    ;; postfix `*`

  value = SumVal(identifier, value*, value*)   ;; there are common fields
        | ProductVal(value, value*)
        | SequenceVal(value*)
        | NoneVal
        | SomeVal(value)
        | PrimVal(prim)
   prim = IntVal(int)
        | IdentifierVal(identifier)
        | StringVal(string)
```
- So then for us, where we have kind of union types more than labelled sums:
  - <equals Value>, <lessthan Value>, <greaterthan Value>
    - must be equal to / less than / greater than this value
    - maybe take <range lo hi> as primitive?
      - no, because of infinitesimals
    - <regexp string> ... etc? (Perhaps <pattern regexpstring> is better) (Be sure to specify ECMA-262 dialect, with restrictions a la JSON-schema https://json-schema.org/latest/json-schema-validation.html#rfc.section.4.3)
  - identifier naming a type definition
    - some type definitions are builtin: Boolean = <union <equals #true> <equals #false>>
    - some have to be primitive rather than builtin, like SignedInteger or Double, because they have unboundedly (or awkwardly) many inhabitants and the class above or below them doesn't have a limit ordinal in the right place
    - parameters/forall?
  - <record Type Type ...> - first one is the label type
  - <list Type ...> - heterogeneous list of specific types
  - <listof Type> - homogeneous list
  - <setof Type>
  - { keyType: valueType, ... } - heterogeneous dict
    - wait, { keyLiteral: valueType, ... } might be better - sugar for <dict [<equals keyLiteral> valueType] ...>
    - <dict+ ...> for when extra members are allowed
    - what about optional members?
  - <dictof keyType valueType> - homogeneous dict
  - <union Type ...>
    - empty union is uninhabited type(!)
    - a kind of or
  - <and Type ...>
    - simultaneous constraints on type, for range, or for range-and-type
    - a kind of intersection; parallel reduction
  - <interleave Type ...> ?? maybe, if sequences are a thing? Could be good for organizing key-value mappings in dictionary-brackets, because unordered... and sets...
  Sketching it out:
```
preserves_ty = 
```
- Oh dear, actually this is very close to being just a pattern language without the captures.
```
a1.a & b1.b  =  a1.(a & b1.b) + b1.(a1.a & b)
```
- Take two.
  - <== Value >, <|<| value>, <|>| value>, |<=|, |>=|, *eq *lt *gt *le *ge
  - _ for discard, <*discard>
  - scalar values not symbols beginning with * match themselves as if they were ==-wrapped
  - all the special things are records, possibly 0-ary, with labels symbols starting with * except for == etc and _ and ...
  - if you have to match a label like *foo it might clash, so match <== *foo> instead: <*foo 1 2 3> ==> <<== *foo> 1 2 3>
  - <*int> for SignedInteger, <*string>, <*symbol>, <*bytestring>/<*binary>, <*float>, <*double>, <*bool>
  - <*and Pattern ⋯>
  - <*or Pattern ⋯>
  - <*not Pattern> ?
  - <Pattern Pattern ⋯> - match record
  - [Pattern ⋯] - match sequence
  - #set{Pattern} - match set
  - don't know how to match dictionaries yet
    - view it as an interleave of its keyvalues
    - <*interleave Pattern ⋯>?
    - somehow allow specification of a keyvalue that is repeating, that is optional, etc
    - {Keypat:Valpat ⋯ <... Keypat>:<... Valpat>} ??? eww?
  - <*group Pattern ⋯> - sequence of values spliced into wider sequence?
  - use literal ... symbol (!) to mark repetition in a sequence: [<*string> ...]
  - could use literal ? to mark optionality; or better perhaps <*optional Pattern>, equivalent to <*biased-choice Pattern <*group>>; hmm, biased choice!
  - could use <*repeat lo hi> or similar for counted repetition
  - don't know how to write refs to other types yet! def labels starting with *?
```
<*def <*foo> <*or <*int> <*string>>>
<*foo>

<*def <*maybe a> <*or <nothing> <just a>>>
<*maybe <*int>>
```
  - should those be relative URLs, or jsonpointer or something, so can drag in types from the web?
  - NOTE: No schema for indicating attachment of annotations?!?!?!

The YAML example:

database:
    username: admin
    password: foobar  # TODO get prod passwords out of config
    socket: /var/tmp/database.sock
    options: {use_utf8: true}
memcached:
    host: 10.0.0.99
workers:
  - host: 10.0.0.101
    port: 2301
  - host: 10.0.0.102
    port: 2302

Could be:

[ <Database [<Username "admin">
             @<TODO "get prod passwords out of config"> <Password "foobar">
             <Socket "/var/tmp/database.sock">
             <Options [<UseUTF8>]>]>
  <Memcached [<Host "10.0.0.99">]>
  <Workers [<Worker "10.0.0.101" 2301>
            <Worker "10.0.0.102" 2302>]> ]

{
  database: {
    username: "admin",
    @<TODO "get prod passwords out of config">
    password: "foobar",
    socket: "/var/tmp/database.sock",
    options: #set{use_utf8}
  },
  memcached: {
    host: "10.0.0.99"
  },
  workers: [ <Worker "10.0.0.101" 2301>
             <Worker "10.0.0.102" 2302> ]
}

Its schema-sketch could be

[ <*interleave <Database [ <*interleave <Username <*string>>
                                        <Password <*string>>
                                        <*optional <Socket <*string>>>
                                        <*optional <Options [<*option> ...]>>> ]>
               <Memcached [ <Host <*ipv4>> ... ]>
               <Workers [ <Worker <*ipv4> <*u16>> ... ]>> ]

(for the first variant) or

{
  database: {
    username: <*string>,
    password: <*string>,
    <*optional socket>: <*string>,
    <*optional options>: #set{<*option>}
  },
  memcached: {
    host: <*ipv4>
  },
  workers: [ <Worker <*ipv4> <*u16>> ... ]
}

Annotations will be allowed on any value; but also perhaps on a key-value mapping pair?

{
  @"I label the key" key: value
  key @"I label the mapping": value
  key: @"I label the value" value
}

Perhaps not.

The schema for the second YAML config sketch would allow the instance to be written:

database:
  username: admin
  @<TODO "get prod passwords out of config">
  password: foobar
  socket: /var/tmp/database.sock
  options: use_utf8
memcached:
  host: 10.0.0.99
workers:
  <Worker 10.0.0.101 2301>
  <Worker 10.0.0.102 2302>

14 KiB Raw Blame History

TODO

14 KiB

Raw Blame History