Simplify, repair, and regularise embedded binary values in textual syntax
This commit is contained in:
parent
6feb320aad
commit
db5c890e1c
|
@ -466,8 +466,10 @@
|
|||
[#\# (match i
|
||||
[(px #px#"^#set\\{" (list _))
|
||||
(sequence-fold (set) (lambda (acc) (set-add acc (read-value))) values #\})]
|
||||
[(px #px#"^#hexvalue\\{" (list _))
|
||||
(decode (read-hex-binary '()) (lambda () (parse-error "Invalid #hexvalue encoding")))]
|
||||
[(px #px#"^#value" (list _))
|
||||
(define bs (read-value))
|
||||
(when (not (bytes? bs)) (parse-error "ByteString must follow #value"))
|
||||
(decode bs)]
|
||||
[(px #px#"^#true" (list _))
|
||||
#t]
|
||||
[(px #px#"^#false" (list _))
|
||||
|
@ -631,6 +633,13 @@
|
|||
(cross-check "#base64{SGk}" #"Hi" (#x62 "Hi"))
|
||||
(cross-check "#base64{ S G k }" #"Hi" (#x62 "Hi"))
|
||||
|
||||
(cross-check "#value#\"fcorymb\"" #"corymb" (#x66 "corymb"))
|
||||
(cross-check "#value#\"\x01\"" #t (#x01))
|
||||
(cross-check "#value#base64{AQ}" #t (#x01))
|
||||
(cross-check "#value#base64{AQ==}" #t (#x01))
|
||||
(cross-check "#value #base64{AQ==}" #t (#x01))
|
||||
(cross-check "#value ;;comment\n #base64{AQ==}" #t (#x01))
|
||||
|
||||
(check-equal? (string->preserve "[]") '())
|
||||
(check-equal? (string->preserve "{}") (hash))
|
||||
(check-equal? (string->preserve "\"\"") "")
|
||||
|
|
51
preserves.md
51
preserves.md
|
@ -317,21 +317,6 @@ tokens `#set{` and `}`.[^printing-collections]
|
|||
commas separating, and commas terminating elements or key/value
|
||||
pairs within a collection.
|
||||
|
||||
Any `Value` may be represented using the
|
||||
[compact binary syntax](#compact-binary-syntax) by directly prefixing
|
||||
the binary form of the `Value` with ASCII `SOH` (`%x01`), or by
|
||||
enclosing a hexadecimal representation of the binary form of the
|
||||
`Value` in the tokens `#hexvalue{` and `}`.[^rationale-switch-to-binary]
|
||||
|
||||
Compact = %x01 <binary data> / %s"#hexvalue{" *(ws / HEXDIG) ws "}"
|
||||
|
||||
[^rationale-switch-to-binary]: **Rationale.** The textual syntax
|
||||
cannot express every `Value`: specifically, it cannot express the
|
||||
several million floating-point NaNs, or the two floating-point
|
||||
Infinities. Since the compact binary format for `Value`s expresses
|
||||
each `Value` with precision, embedding binary `Value`s solves the
|
||||
problem.
|
||||
|
||||
`Boolean`s are the simple literal strings `#true` and `#false`.
|
||||
|
||||
Boolean = %s"#true" / %s"#false"
|
||||
|
@ -456,6 +441,29 @@ double quote mark.
|
|||
definition of "token representation", and with the
|
||||
[R6RS definition of identifiers](http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_sec_4.2.4).
|
||||
|
||||
Finally, any `Value` may be represented by escaping from the textual
|
||||
syntax to the [compact binary syntax](#compact-binary-syntax) by
|
||||
prefixing a `ByteString` containing the binary representation of the
|
||||
`Value` with `#value`.[^rationale-switch-to-binary] [^no-literal-binary-in-text]
|
||||
|
||||
Compact = %s"#value" ws ByteString
|
||||
|
||||
[^rationale-switch-to-binary]: **Rationale.** The textual syntax
|
||||
cannot express every `Value`: specifically, it cannot express the
|
||||
several million floating-point NaNs, or the two floating-point
|
||||
Infinities. Since the compact binary format for `Value`s expresses
|
||||
each `Value` with precision, embedding binary `Value`s solves the
|
||||
problem.
|
||||
|
||||
[^no-literal-binary-in-text]: Every text is ultimately physically
|
||||
stored as bytes; therefore, it might seem possible to escape to
|
||||
the raw binary form of compact binary encoding from within a
|
||||
pieces of textual syntax. However, while bytes must be involved in
|
||||
any *representation* of text, the text *itself* is logically a
|
||||
sequence of *code points* and is not *intrinsically* a binary
|
||||
structure at all. It would be incoherent to expect to be able to
|
||||
access the representation of the text from within the text itself.
|
||||
|
||||
## Compact Binary Syntax
|
||||
|
||||
A `Repr` is an encoding, or representation, of a specific `Value`.
|
||||
|
@ -1395,17 +1403,4 @@ tell whether it is an open-parenthesis or not! For this reason, I've
|
|||
disallowed whitespace between a label `Value` and the open-parenthesis
|
||||
of the fields. Is this reasonable??
|
||||
|
||||
Q. Should SOH-prefixed binary values embedded in a textual representation
|
||||
be length-prefixed, too - byte strings, essentially? Also, why not
|
||||
base64 embedded binary values? The length-prefixing might help with
|
||||
being able to avoid having to care whether the embedded value is well-
|
||||
formed or not; on the other hand, it means streaming-format embeddings
|
||||
aren't possible.
|
||||
|
||||
TODO. The SOH-prefixed embedded binary idea is probably incoherent.
|
||||
Textual form is *text*, not binary, and since it's code-points, we
|
||||
cannot rely on having access to a hypothetical underlying bytestream.
|
||||
Remove it, and consider generalizing `#hexvalue{}` to include
|
||||
`#base64value{}` or similar.
|
||||
|
||||
## Notes
|
||||
|
|
Loading…
Reference in New Issue