NUL terminator on String Reprs
This commit is contained in:
parent
3baade843d
commit
ade682a5f6
|
@ -12,12 +12,12 @@ For a value `v`, we write `«v»` for the binary encoding of `v`.
|
|||
|
||||
«#f» = [0xA0]
|
||||
«#t» = [0xA1]
|
||||
«F» = [0xA2] ++ binary32(F) if F ∈ Float
|
||||
«D» = [0xA2] ++ binary64(D) if D ∈ Double
|
||||
«x» = [0xA3] ++ intbytes(x) if x ∈ SignedInteger
|
||||
«S» = [0xA4] ++ utf8(S) if S ∈ String
|
||||
[0xA5] ++ S if S ∈ ByteString
|
||||
[0xA6] ++ utf8(S) if S ∈ Symbol
|
||||
«F» = [0xA2] ++ binary32(F) if F ∈ Float
|
||||
«D» = [0xA2] ++ binary64(D) if D ∈ Double
|
||||
«x» = [0xA3] ++ intbytes(x) if x ∈ SignedInteger
|
||||
«S» = [0xA4] ++ utf8(S) ++ [0] if S ∈ String
|
||||
[0xA5] ++ S if S ∈ ByteString
|
||||
[0xA6] ++ utf8(S) if S ∈ Symbol
|
||||
|
||||
«<L F_1...F_m>» = [0xA7] ++ seq(«L», «F_1», ..., «F_m»)
|
||||
«[X_1...X_m]» = [0xA8] ++ seq(«X_1», ..., «X_m»)
|
||||
|
|
|
@ -153,14 +153,22 @@ For example,
|
|||
|
||||
### Strings, ByteStrings and Symbols.
|
||||
|
||||
Syntax for these three types varies only in the tag used. For `String`
|
||||
and `Symbol`, the data following the tag is a UTF-8 encoding of the
|
||||
`Value`'s code points, while for `ByteString` it is the raw data
|
||||
contained within the `Value` unmodified.
|
||||
«S» = [0xA4] ++ utf8(S) ++ [0] if S ∈ String
|
||||
[0xA5] ++ S if S ∈ ByteString
|
||||
[0xA6] ++ utf8(S) if S ∈ Symbol
|
||||
|
||||
«S» = [0xA4] ++ utf8(S) if S ∈ String
|
||||
[0xA5] ++ S if S ∈ ByteString
|
||||
[0xA6] ++ utf8(S) if S ∈ Symbol
|
||||
For `String` and `Symbol`, the data following the tag is a UTF-8
|
||||
encoding of the `Value`'s code points, while for `ByteString` it is the
|
||||
raw data contained within the `Value` unmodified.
|
||||
|
||||
Each `String` has a trailing zero byte appended. This extra byte *MUST
|
||||
NOT* be treated as part of the `Value`: it exists to permit zero-copy C
|
||||
interoperability.[^zero-copy-c-string-interop]
|
||||
|
||||
[^zero-copy-c-string-interop]: Some care must still be taken when
|
||||
passing `String` `Repr`s directly to a C-style ABI, since `String`s
|
||||
may contain the zero Unicode code point, which C library routines
|
||||
will usually misinterpret as an end-of-string marker.
|
||||
|
||||
### Booleans.
|
||||
|
||||
|
@ -221,7 +229,8 @@ the same `Value` to yield different binary `Repr`s.
|
|||
## Acknowledgements
|
||||
|
||||
The exclusion of lengths from `Repr`s, placing lengths instead ahead of
|
||||
contained values in sequences, is inspired by [argdata][].
|
||||
contained values in sequences, is inspired by [argdata][], as is the
|
||||
inclusion of a `NUL` byte in `String` `Repr`s for C interoperability.
|
||||
|
||||
## Appendix. Autodetection of textual or binary syntax
|
||||
|
||||
|
|
Loading…
Reference in New Issue