2019-08-18 16:51:26 +00:00
|
|
|
---
|
2019-08-18 21:08:55 +00:00
|
|
|
title: "Open questions"
|
2019-08-18 16:51:26 +00:00
|
|
|
---
|
|
|
|
|
|
|
|
Q. Should "symbols" instead be URIs? Relative, usually; relative to
|
|
|
|
what? Some domain-specific base URI?
|
|
|
|
|
|
|
|
Q. Literal small integers: are they pulling their weight? They're not
|
|
|
|
absolutely necessary.
|
|
|
|
|
|
|
|
Q. Should we go for trying to make the data ordering line up with the
|
|
|
|
encoding ordering? We'd have to only use streaming forms, and avoid
|
|
|
|
the small integer encoding, and not store record arities, and sort
|
|
|
|
sets and dictionaries, and mask floats and doubles (perhaps
|
|
|
|
[like this](https://stackoverflow.com/questions/43299299/sorting-floating-point-values-using-their-byte-representation)),
|
|
|
|
and perhaps pick a specific `NaN`, and I don't know what to do about
|
|
|
|
SignedIntegers. Perhaps make them more like float formats, with the
|
|
|
|
byte count acting as a kind of exponent underneath the sign bit.
|
|
|
|
|
|
|
|
- Perhaps define separate additional canonicalization restrictions?
|
|
|
|
Doesn't help the ordering, but does help the equivalence.
|
|
|
|
|
|
|
|
- Canonicalization and early-bailout-equivalence-checking are in
|
|
|
|
tension with support for streaming values.
|
|
|
|
|
|
|
|
Q. To remain compatible with JSON, portions of the text syntax have to
|
|
|
|
remain case-insensitive (`%i"..."`). However, non-JSON extensions do
|
|
|
|
not. There's only one (?) at the moment, the `%i"f"` in `Float`;
|
|
|
|
should it be changed to case-sensitive?
|
|
|
|
|
|
|
|
Q. Should `IOList`s be wrapped in an identifying unary record constructor?
|
|
|
|
|
|
|
|
TODO: Examples of the ordering. `"bzz" < "c" < "caa"`; `#true < 3 < "3" < |3|`
|
|
|
|
|
|
|
|
TODO: Probably should add a canonicalized subset. Consider adding
|
|
|
|
explicit "I promise this is canonical" marker, like a BOM, which
|
|
|
|
identifies a binary value as (first) binary and (second, optionally)
|
|
|
|
as canonical. UTF-8 disallows byte `0xFF` from appearing anywhere in a
|
|
|
|
text; this might be a good candidate for a marker sequence.
|
|
|
|
((Actually, perhaps `0x10` would be good! It corresponds to DLE, "data
|
|
|
|
link escape"; it is not a printable ASCII character, and is disallowed
|
|
|
|
in the textual Preserves grammar; and it is also mnemonic for "version
|
|
|
|
0", since it is the Preserves binary encoding of the small integer
|
|
|
|
zero.))
|