--- title: "Why not Just Use JSON?" --- Tony Garnock-Jones September 2018. JSON offers *syntax* for numbers, strings, booleans, null, arrays and string-keyed maps. However, it suffers from two major problems. First, it offers no *semantics* for the syntax: it is left to each implementation to determine how to treat each JSON term. This causes [interoperability](http://seriot.ch/parsing_json.php) [{% include ia-logo.md %}](http://web.archive.org/web/20231016132727/https://seriot.ch/projects/parsing_json.html) and even [security](https://docs.couchdb.org/en/stable/cve/2017-12635.html) [{% include ia-logo.md %}](http://web.archive.org/web/20180906202559/http://docs.couchdb.org/en/stable/cve/2017-12635.html) issues. Second, JSON's lack of support for type tags leads to awkward and incompatible *encodings* of type information in terms of the fixed suite of constructors on offer. There are other minor problems with JSON having to do with its syntax. Examples include its relative verbosity and its lack of support for binary data. ## JSON syntax doesn't *mean* anything When are two JSON values the same? When are they different? The specifications are largely silent on these questions. Different JSON implementations give different answers. Specifically, JSON does not: - assign any meaning to numbers,[^meaning-ieee-double] - determine how strings are to be compared,[^string-key-comparison] - determine whether object key ordering is significant,[^json-member-ordering] or - determine whether duplicate object keys are permitted, what it would mean if they were, or how to determine a duplicate in the first place.[^json-key-uniqueness] In short, JSON syntax doesn't *denote* anything.[^xml-infoset] [^other-formats] [^meaning-ieee-double]: [Section 6 of RFC 8259](https://tools.ietf.org/html/rfc8259#section-6) does go so far as to indicate “good interoperability can be achieved” by imagining that parsers are able reliably to understand the syntax of numbers as denoting an IEEE 754 double-precision floating-point value. [^string-key-comparison]: [Section 8.3 of RFC 8259](https://tools.ietf.org/html/rfc8259#section-8.3) suggests that *if* an implementation compares strings used as object keys “code unit by code unit”, then it will interoperate with *other such implementations*, but neither requires this behaviour nor discusses comparisons of strings used in other contexts. [^json-member-ordering]: [Section 4 of RFC 8259](https://tools.ietf.org/html/rfc8259#section-4) remarks that “[implementations] differ as to whether or not they make the ordering of object members visible to calling software.” [^json-key-uniqueness]: [Section 4 of RFC 8259](https://tools.ietf.org/html/rfc8259#section-4) is the only place in the specification that mentions the issue. It explicitly sanctions implementations supporting duplicate keys, noting only that “when the names within an object are not unique, the behavior of software that receives such an object is unpredictable.” Implementations are free to choose any behaviour at all in this situation, including signalling an error, or discarding all but one of a set of duplicates. [^xml-infoset]: The XML world has the concept of [XML infoset](https://www.w3.org/TR/xml-infoset/). Loosely speaking, XML infoset is the *denotation* of an XML document; the *meaning* of the document. [^other-formats]: Most other recent data languages are like JSON in specifying only a syntax with no associated semantics. While some do make a sketch of a semantics, the result is often underspecified (e.g. in terms of how strings are to be compared), overly machine-oriented (e.g. treating 32-bit integers as fundamentally distinct from 64-bit integers and from floating-point numbers), overly fine (e.g. giving visibility to the order in which map entries are written), or all three. Some examples: - are the JSON values `1`, `1.0`, and `1e0` the same or different? - are the JSON values `1.0` and `1.0000000000000001` the same or different? - are the JSON strings `"päron"` (UTF-8 `70c3a4726f6e`) and `"päron"` (UTF-8 `7061cc88726f6e`) the same or different? - are the JSON objects `{"a":1, "b":2}` and `{"b":2, "a":1}` the same or different? - which, if any, of `{"a":1, "a":2}`, `{"a":1}` and `{"a":2}` are the same? Are all three legal? - are `{"päron":1}` and `{"päron":1}` the same or different? - is `"\uD834"` a legal string? Is `"\uDD1E"`? If so, is either one the same as `""`?[^unpaired-surrogates] [^unpaired-surrogates]: [Section 8.2 of RFC 8259](https://tools.ietf.org/html/rfc8259#section-8.2) discusses *unpaired UTF-16 surrogate* code points such as these, and remarks that implementations differ in their treatment of them. Some reject unpaired surrogates, some discard them, and some retain them. ## JSON can multiply nicely, but it can't add very well JSON includes a fixed set of types: numbers, strings, booleans, null, arrays and string-keyed maps. Domain-specific data must be *encoded* into these types. For example, dates and email addresses are often represented as strings with an implicit internal structure. There is no convention for *labelling* a value as belonging to a particular category. Instead, JSON-encoded data are often labelled in an ad-hoc way. Multiple incompatible approaches exist. For example, a “money” structure containing a `currency` field and an `amount` may be represented in any number of ways: { "_type": "money", "currency": "EUR", "amount": 10 } { "type": "money", "value": { "currency": "EUR", "amount": 10 } } [ "money", { "currency": "EUR", "amount": 10 } ] { "@money": { "currency": "EUR", "amount": 10 } } This causes particular problems when JSON is used to represent *sum* or *union* types, such as “either a value or an error, but not both”. Again, multiple incompatible approaches exist. For example, imagine an API for depositing money in an account. The response might be either a “success” response indicating the new balance, or one of a set of possible errors. Sometimes, a *pair* of values is used, with `null` marking the option not taken.[^interesting-failure-mode] { "ok": { "balance": 210 }, "error": null } { "ok": null, "error": "Unauthorized" } [^interesting-failure-mode]: What is the meaning of a document where both `ok` and `error` are non-null? What might happen when a program is presented with such a document? The branch not chosen is sometimes present, sometimes omitted as if it were an optional field: { "ok": { "balance": 210 } } { "error": "Unauthorized" } Sometimes, an array of a label and a value is used: [ "ok", { "balance": 210 } ] [ "error", "Unauthorized" ] Sometimes, the shape of the data is sufficient to distinguish among the alternatives, and the label is left implicit: { "balance": 210 } "Unauthorized" JSON itself does not offer any guidance for which of these options to choose. In many real cases on the web, poor choices have led to encodings that are irrecoverably ambiguous. **Update 20230123**. [This article](https://gist.github.com/FeepingCreature/d2fd982f485973a154abcaf0ccb4003c) [{% include ia-logo.md %}](http://web.archive.org/web20231016125428/https://gist.github.com/FeepingCreature/d2fd982f485973a154abcaf0ccb4003c) discusses another subtle aspect of the problems caused by the lack of tagging in JSON. **Update 20231016**. Lack of tagging sometimes [causes implementors to rely on specific key-value orderings in JSON objects](https://learn.microsoft.com/en-us/dotnet/framework/wcf/feature-details/stand-alone-json-serialization#type-hint-position-in-json-objects) [{% include ia-logo.md %}](http://web.archive.org/web20231016124858/https://learn.microsoft.com/en-us/dotnet/framework/wcf/feature-details/stand-alone-json-serialization#type-hint-position-in-json-objects) to make sure their `"type"` tag appears first in the text, to allow use of streaming parsers in deserialization. ## Notes