From 3eeaab375a6a271b51149562592d2ab1f9aa0d54 Mon Sep 17 00:00:00 2001
From: Tony Garnock-Jones <tonyg@leastfixedpoint.com>
Date: Mon, 18 May 2020 09:55:57 +0200
Subject: [PATCH] More on autodetection

---
 preserves.md | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/preserves.md b/preserves.md
index 0f107a5..25c5089 100644
--- a/preserves.md
+++ b/preserves.md
@@ -945,6 +945,26 @@ syntax being used. In a network protocol supporting this kind of
 autodetection, clients may transmit LF or `0xFF` to select text or
 binary syntax, respectively.
 
+Furthermore, if an application consistently uses `Record`s for its
+top-level messages,[^records-and-nonatoms] eschewing `Atom`s in
+particular, then autodetection of the encoding used for a given input
+can be done as follows:
+
+| First byte of encoded input              | Encoding | Other conclusions                       |
+| ---                                      | ---      | ---                                     |
+| `0x80`--`0x8F`                           | binary   | `Record` (format B)                     |
+| `0x28`                                   | binary   | `Record` (format C)                     |
+| `0x05`                                   | binary   | annotated value (presumably a `Record`) |
+| `0xFF`                                   | binary   | no-op; value will follow                |
+| ---                                      | ---      | ---                                     |
+| `0x7B` ("<")                             | text     | `Record`                                |
+| `0x40` ("@")                             | text     | annotated value (presumably a `Record`) |
+| `0x09`, `0x0A`, `0x0D`, `0x20` or `0x2C` | text     | whitespace; value will follow           |
+
+  [^records-and-nonatoms]: Similar reasoning can be used to permit
+    unambiguous detection of encoding when `Collection`s are allowed
+    as top-level messages as well as `Record`s.
+
 ## Appendix. Table of lead byte values
 
      00 - False