forked from syndicate-lang/preserves
158 lines
6.7 KiB
Markdown
158 lines
6.7 KiB
Markdown
---
|
|
title: "Why not Just Use JSON?"
|
|
---
|
|
|
|
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
|
September 2018.
|
|
|
|
<!-- JSON lacks semantics: JSON syntax doesn't denote anything -->
|
|
|
|
JSON offers *syntax* for numbers, strings, booleans, null, arrays and
|
|
string-keyed maps. However, it suffers from two major problems. First,
|
|
it offers no *semantics* for the syntax: it is left to each
|
|
implementation to determine how to treat each JSON term. This causes
|
|
[interoperability](http://seriot.ch/parsing_json.php) and even
|
|
[security](http://web.archive.org/web/20180906202559/http://docs.couchdb.org/en/stable/cve/2017-12635.html)
|
|
issues. Second, JSON's lack of support for type tags leads to awkward
|
|
and incompatible *encodings* of type information in terms of the fixed
|
|
suite of constructors on offer.
|
|
|
|
There are other minor problems with JSON having to do with its syntax.
|
|
Examples include its relative verbosity and its lack of support for
|
|
binary data.
|
|
|
|
## JSON syntax doesn't *mean* anything
|
|
|
|
When are two JSON values the same? When are they different?
|
|
<!-- When is one JSON value “less than” another? -->
|
|
|
|
The specifications are largely silent on these questions. Different
|
|
JSON implementations give different answers.
|
|
|
|
Specifically, JSON does not:
|
|
|
|
- assign any meaning to numbers,[^meaning-ieee-double]
|
|
- determine how strings are to be compared,[^string-key-comparison]
|
|
- determine whether object key ordering is significant,[^json-member-ordering] or
|
|
- determine whether duplicate object keys are permitted, what it
|
|
would mean if they were, or how to determine a duplicate in the
|
|
first place.[^json-key-uniqueness]
|
|
|
|
In short, JSON syntax doesn't *denote* anything.[^xml-infoset] [^other-formats]
|
|
|
|
[^meaning-ieee-double]:
|
|
[Section 6 of RFC 8259](https://tools.ietf.org/html/rfc8259#section-6)
|
|
does go so far as to indicate “good interoperability can be
|
|
achieved” by imagining that parsers are able reliably to
|
|
understand the syntax of numbers as denoting an IEEE 754
|
|
double-precision floating-point value.
|
|
|
|
[^string-key-comparison]:
|
|
[Section 8.3 of RFC 8259](https://tools.ietf.org/html/rfc8259#section-8.3)
|
|
suggests that *if* an implementation compares strings used as
|
|
object keys “code unit by code unit”, then it will interoperate
|
|
with *other such implementations*, but neither requires this
|
|
behaviour nor discusses comparisons of strings used in other
|
|
contexts.
|
|
|
|
[^json-member-ordering]:
|
|
[Section 4 of RFC 8259](https://tools.ietf.org/html/rfc8259#section-4)
|
|
remarks that “[implementations] differ as to whether or not they
|
|
make the ordering of object members visible to calling software.”
|
|
|
|
[^json-key-uniqueness]:
|
|
[Section 4 of RFC 8259](https://tools.ietf.org/html/rfc8259#section-4)
|
|
is the only place in the specification that mentions the issue. It
|
|
explicitly sanctions implementations supporting duplicate keys,
|
|
noting only that “when the names within an object are not unique,
|
|
the behavior of software that receives such an object is
|
|
unpredictable.” Implementations are free to choose any behaviour
|
|
at all in this situation, including signalling an error, or
|
|
discarding all but one of a set of duplicates.
|
|
|
|
[^xml-infoset]: The XML world has the concept of
|
|
[XML infoset](https://www.w3.org/TR/xml-infoset/). Loosely
|
|
speaking, XML infoset is the *denotation* of an XML document; the
|
|
*meaning* of the document.
|
|
|
|
[^other-formats]: Most other recent data languages are like JSON in
|
|
specifying only a syntax with no associated semantics. While some
|
|
do make a sketch of a semantics, the result is often
|
|
underspecified (e.g. in terms of how strings are to be compared),
|
|
overly machine-oriented (e.g. treating 32-bit integers as
|
|
fundamentally distinct from 64-bit integers and from
|
|
floating-point numbers), overly fine (e.g. giving visibility to
|
|
the order in which map entries are written), or all three.
|
|
|
|
Some examples:
|
|
|
|
- are the JSON values `1`, `1.0`, and `1e0` the same or different?
|
|
- are the JSON values `1.0` and `1.0000000000000001` the same or different?
|
|
- are the JSON strings `"päron"` (UTF-8 `70c3a4726f6e`) and `"päron"`
|
|
(UTF-8 `7061cc88726f6e`) the same or different?
|
|
- are the JSON objects `{"a":1, "b":2}` and `{"b":2, "a":1}` the same
|
|
or different?
|
|
- which, if any, of `{"a":1, "a":2}`, `{"a":1}` and `{"a":2}` are the
|
|
same? Are all three legal?
|
|
- are `{"päron":1}` and `{"päron":1}` the same or different?
|
|
|
|
## JSON can multiply nicely, but it can't add very well
|
|
|
|
JSON includes a fixed set of types: numbers, strings, booleans, null,
|
|
arrays and string-keyed maps. Domain-specific data must be *encoded*
|
|
into these types. For example, dates and email addresses are often
|
|
represented as strings with an implicit internal structure.
|
|
|
|
There is no convention for *labelling* a value as belonging to a
|
|
particular category. Instead, JSON-encoded data are often labelled in
|
|
an ad-hoc way. Multiple incompatible approaches exist. For example, a
|
|
“money” structure containing a `currency` field and an `amount` may be
|
|
represented in any number of ways:
|
|
|
|
{ "_type": "money", "currency": "EUR", "amount": 10 }
|
|
{ "type": "money", "value": { "currency": "EUR", "amount": 10 } }
|
|
[ "money", { "currency": "EUR", "amount": 10 } ]
|
|
{ "@money": { "currency": "EUR", "amount": 10 } }
|
|
|
|
This causes particular problems when JSON is used to represent *sum*
|
|
or *union* types, such as “either a value or an error, but not both”.
|
|
Again, multiple incompatible approaches exist.
|
|
|
|
For example, imagine an API for depositing money in an account. The
|
|
response might be either a “success” response indicating the new
|
|
balance, or one of a set of possible errors.
|
|
|
|
Sometimes, a *pair* of values is used, with `null` marking the option
|
|
not taken.[^interesting-failure-mode]
|
|
|
|
{ "ok": { "balance": 210 }, "error": null }
|
|
{ "ok": null, "error": "Unauthorized" }
|
|
|
|
[^interesting-failure-mode]: What is the meaning of a document where
|
|
both `ok` and `error` are non-null? What might happen when a
|
|
program is presented with such a document?
|
|
|
|
The branch not chosen is sometimes present, sometimes omitted as if it
|
|
were an optional field:
|
|
|
|
{ "ok": { "balance": 210 } }
|
|
{ "error": "Unauthorized" }
|
|
|
|
Sometimes, an array of a label and a value is used:
|
|
|
|
[ "ok", { "balance": 210 } ]
|
|
[ "error", "Unauthorized" ]
|
|
|
|
Sometimes, the shape of the data is sufficient to distinguish among
|
|
the alternatives, and the label is left implicit:
|
|
|
|
{ "balance": 210 }
|
|
"Unauthorized"
|
|
|
|
JSON itself does not offer any guidance for which of these options to
|
|
choose. In many real cases on the web, poor choices have led to
|
|
encodings that are irrecoverably ambiguous.
|
|
|
|
<!-- Heading to visually offset the footnotes from the main document: -->
|
|
## Notes
|