From 3eeaab375a6a271b51149562592d2ab1f9aa0d54 Mon Sep 17 00:00:00 2001 From: Tony Garnock-Jones Date: Mon, 18 May 2020 09:55:57 +0200 Subject: [PATCH] More on autodetection --- preserves.md | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/preserves.md b/preserves.md index 0f107a5..25c5089 100644 --- a/preserves.md +++ b/preserves.md @@ -945,6 +945,26 @@ syntax being used. In a network protocol supporting this kind of autodetection, clients may transmit LF or `0xFF` to select text or binary syntax, respectively. +Furthermore, if an application consistently uses `Record`s for its +top-level messages,[^records-and-nonatoms] eschewing `Atom`s in +particular, then autodetection of the encoding used for a given input +can be done as follows: + +| First byte of encoded input | Encoding | Other conclusions | +| --- | --- | --- | +| `0x80`--`0x8F` | binary | `Record` (format B) | +| `0x28` | binary | `Record` (format C) | +| `0x05` | binary | annotated value (presumably a `Record`) | +| `0xFF` | binary | no-op; value will follow | +| --- | --- | --- | +| `0x7B` ("<") | text | `Record` | +| `0x40` ("@") | text | annotated value (presumably a `Record`) | +| `0x09`, `0x0A`, `0x0D`, `0x20` or `0x2C` | text | whitespace; value will follow | + + [^records-and-nonatoms]: Similar reasoning can be used to permit + unambiguous detection of encoding when `Collection`s are allowed + as top-level messages as well as `Record`s. + ## Appendix. Table of lead byte values 00 - False