forked from syndicate-lang/preserves
Simplify, repair, and regularise embedded binary values in textual syntax
This commit is contained in:
parent
6feb320aad
commit
db5c890e1c
|
@ -466,8 +466,10 @@
|
||||||
[#\# (match i
|
[#\# (match i
|
||||||
[(px #px#"^#set\\{" (list _))
|
[(px #px#"^#set\\{" (list _))
|
||||||
(sequence-fold (set) (lambda (acc) (set-add acc (read-value))) values #\})]
|
(sequence-fold (set) (lambda (acc) (set-add acc (read-value))) values #\})]
|
||||||
[(px #px#"^#hexvalue\\{" (list _))
|
[(px #px#"^#value" (list _))
|
||||||
(decode (read-hex-binary '()) (lambda () (parse-error "Invalid #hexvalue encoding")))]
|
(define bs (read-value))
|
||||||
|
(when (not (bytes? bs)) (parse-error "ByteString must follow #value"))
|
||||||
|
(decode bs)]
|
||||||
[(px #px#"^#true" (list _))
|
[(px #px#"^#true" (list _))
|
||||||
#t]
|
#t]
|
||||||
[(px #px#"^#false" (list _))
|
[(px #px#"^#false" (list _))
|
||||||
|
@ -631,6 +633,13 @@
|
||||||
(cross-check "#base64{SGk}" #"Hi" (#x62 "Hi"))
|
(cross-check "#base64{SGk}" #"Hi" (#x62 "Hi"))
|
||||||
(cross-check "#base64{ S G k }" #"Hi" (#x62 "Hi"))
|
(cross-check "#base64{ S G k }" #"Hi" (#x62 "Hi"))
|
||||||
|
|
||||||
|
(cross-check "#value#\"fcorymb\"" #"corymb" (#x66 "corymb"))
|
||||||
|
(cross-check "#value#\"\x01\"" #t (#x01))
|
||||||
|
(cross-check "#value#base64{AQ}" #t (#x01))
|
||||||
|
(cross-check "#value#base64{AQ==}" #t (#x01))
|
||||||
|
(cross-check "#value #base64{AQ==}" #t (#x01))
|
||||||
|
(cross-check "#value ;;comment\n #base64{AQ==}" #t (#x01))
|
||||||
|
|
||||||
(check-equal? (string->preserve "[]") '())
|
(check-equal? (string->preserve "[]") '())
|
||||||
(check-equal? (string->preserve "{}") (hash))
|
(check-equal? (string->preserve "{}") (hash))
|
||||||
(check-equal? (string->preserve "\"\"") "")
|
(check-equal? (string->preserve "\"\"") "")
|
||||||
|
|
51
preserves.md
51
preserves.md
|
@ -317,21 +317,6 @@ tokens `#set{` and `}`.[^printing-collections]
|
||||||
commas separating, and commas terminating elements or key/value
|
commas separating, and commas terminating elements or key/value
|
||||||
pairs within a collection.
|
pairs within a collection.
|
||||||
|
|
||||||
Any `Value` may be represented using the
|
|
||||||
[compact binary syntax](#compact-binary-syntax) by directly prefixing
|
|
||||||
the binary form of the `Value` with ASCII `SOH` (`%x01`), or by
|
|
||||||
enclosing a hexadecimal representation of the binary form of the
|
|
||||||
`Value` in the tokens `#hexvalue{` and `}`.[^rationale-switch-to-binary]
|
|
||||||
|
|
||||||
Compact = %x01 <binary data> / %s"#hexvalue{" *(ws / HEXDIG) ws "}"
|
|
||||||
|
|
||||||
[^rationale-switch-to-binary]: **Rationale.** The textual syntax
|
|
||||||
cannot express every `Value`: specifically, it cannot express the
|
|
||||||
several million floating-point NaNs, or the two floating-point
|
|
||||||
Infinities. Since the compact binary format for `Value`s expresses
|
|
||||||
each `Value` with precision, embedding binary `Value`s solves the
|
|
||||||
problem.
|
|
||||||
|
|
||||||
`Boolean`s are the simple literal strings `#true` and `#false`.
|
`Boolean`s are the simple literal strings `#true` and `#false`.
|
||||||
|
|
||||||
Boolean = %s"#true" / %s"#false"
|
Boolean = %s"#true" / %s"#false"
|
||||||
|
@ -456,6 +441,29 @@ double quote mark.
|
||||||
definition of "token representation", and with the
|
definition of "token representation", and with the
|
||||||
[R6RS definition of identifiers](http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_sec_4.2.4).
|
[R6RS definition of identifiers](http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_sec_4.2.4).
|
||||||
|
|
||||||
|
Finally, any `Value` may be represented by escaping from the textual
|
||||||
|
syntax to the [compact binary syntax](#compact-binary-syntax) by
|
||||||
|
prefixing a `ByteString` containing the binary representation of the
|
||||||
|
`Value` with `#value`.[^rationale-switch-to-binary] [^no-literal-binary-in-text]
|
||||||
|
|
||||||
|
Compact = %s"#value" ws ByteString
|
||||||
|
|
||||||
|
[^rationale-switch-to-binary]: **Rationale.** The textual syntax
|
||||||
|
cannot express every `Value`: specifically, it cannot express the
|
||||||
|
several million floating-point NaNs, or the two floating-point
|
||||||
|
Infinities. Since the compact binary format for `Value`s expresses
|
||||||
|
each `Value` with precision, embedding binary `Value`s solves the
|
||||||
|
problem.
|
||||||
|
|
||||||
|
[^no-literal-binary-in-text]: Every text is ultimately physically
|
||||||
|
stored as bytes; therefore, it might seem possible to escape to
|
||||||
|
the raw binary form of compact binary encoding from within a
|
||||||
|
pieces of textual syntax. However, while bytes must be involved in
|
||||||
|
any *representation* of text, the text *itself* is logically a
|
||||||
|
sequence of *code points* and is not *intrinsically* a binary
|
||||||
|
structure at all. It would be incoherent to expect to be able to
|
||||||
|
access the representation of the text from within the text itself.
|
||||||
|
|
||||||
## Compact Binary Syntax
|
## Compact Binary Syntax
|
||||||
|
|
||||||
A `Repr` is an encoding, or representation, of a specific `Value`.
|
A `Repr` is an encoding, or representation, of a specific `Value`.
|
||||||
|
@ -1395,17 +1403,4 @@ tell whether it is an open-parenthesis or not! For this reason, I've
|
||||||
disallowed whitespace between a label `Value` and the open-parenthesis
|
disallowed whitespace between a label `Value` and the open-parenthesis
|
||||||
of the fields. Is this reasonable??
|
of the fields. Is this reasonable??
|
||||||
|
|
||||||
Q. Should SOH-prefixed binary values embedded in a textual representation
|
|
||||||
be length-prefixed, too - byte strings, essentially? Also, why not
|
|
||||||
base64 embedded binary values? The length-prefixing might help with
|
|
||||||
being able to avoid having to care whether the embedded value is well-
|
|
||||||
formed or not; on the other hand, it means streaming-format embeddings
|
|
||||||
aren't possible.
|
|
||||||
|
|
||||||
TODO. The SOH-prefixed embedded binary idea is probably incoherent.
|
|
||||||
Textual form is *text*, not binary, and since it's code-points, we
|
|
||||||
cannot rely on having access to a hypothetical underlying bytestream.
|
|
||||||
Remove it, and consider generalizing `#hexvalue{}` to include
|
|
||||||
`#base64value{}` or similar.
|
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
Loading…
Reference in New Issue