Merge branch 'main' into comment-syntax-hash-space

This commit is contained in:
Tony Garnock-Jones 2023-10-29 16:44:32 +01:00
commit 2445ab4a5a
73 changed files with 1512 additions and 139 deletions

View File

@ -36,23 +36,30 @@ automatic, perfect-fidelity conversion between syntaxes.
## Implementations
Implementations of the data model, plus the textual and/or binary transfer syntaxes:
#### Implementations of the data model, plus Preserves textual and binary transfer syntax
- [Preserves for Nim](https://git.syndicate-lang.org/ehmry/preserves-nim)
- [Preserves for Python]({{page.projecttree}}/implementations/python/) ([`pip install preserves`](https://pypi.org/project/preserves/); [documentation available online](python/latest/))
- [Preserves for Racket]({{page.projecttree}}/implementations/racket/preserves/) ([`raco pkg install preserves`](https://pkgs.racket-lang.org/package/preserves))
- [Preserves for Rust]({{page.projecttree}}/implementations/rust/) ([crates.io package](https://crates.io/crates/preserves))
- [Preserves for Squeak Smalltalk](https://squeaksource.com/Preserves.html) (`Installer ss project: 'Preserves'; install: 'Preserves'`)
- [Preserves for TypeScript and JavaScript]({{page.projecttree}}/implementations/javascript/) ([`yarn add @preserves/core`](https://www.npmjs.com/package/@preserves/core))
- (Pre-alpha) Preserves for [C]({{page.projecttree}}/implementations/c/) and [C++]({{page.projecttree}}/implementations/cpp/)
| Language[^pre-alpha-implementations] | Code | Package | Docs |
|-----------------------|------------------------------------------------------------------------------|--------------------------------------------------------------------------------|-------------------------------------------|
| Nim | [git.syndicate-lang.org](https://git.syndicate-lang.org/ehmry/preserves-nim) | | |
| Python | [preserves.dev]({{page.projecttree}}/implementations/python/) | [`pip install preserves`](https://pypi.org/project/preserves/) | [docs](python/latest/) |
| Racket | [preserves.dev]({{page.projecttree}}/implementations/racket/preserves/) | [`raco pkg install preserves`](https://pkgs.racket-lang.org/package/preserves) | |
| Rust | [preserves.dev]({{page.projecttree}}/implementations/rust/) | [`cargo add preserves`](https://crates.io/crates/preserves) | [docs](https://docs.rs/preserves/latest/) |
| Squeak Smalltalk | [SqueakSource](https://squeaksource.com/Preserves.html) | `Installer ss project: 'Preserves';`<br>`  install: 'Preserves'` | |
| TypeScript/JavaScript | [preserves.dev]({{page.projecttree}}/implementations/javascript/) | [`yarn add @preserves/core`](https://www.npmjs.com/package/@preserves/core) | |
Implementations of the data model, plus Syrup transfer syntax:
[^pre-alpha-implementations]: Pre-alpha implementations also exist for
[C]({{page.projecttree}}/implementations/c/) and
[C++]({{page.projecttree}}/implementations/cpp/).
- [Syrup for Racket](https://github.com/ocapn/syrup/blob/master/impls/racket/syrup/syrup.rkt)
- [Syrup for Guile](https://github.com/ocapn/syrup/blob/master/impls/guile/syrup.scm)
- [Syrup for Python](https://github.com/ocapn/syrup/blob/master/impls/python/syrup.py)
- [Syrup for JavaScript](https://github.com/zarutian/agoric-sdk/blob/zarutian/captp_variant/packages/captp/lib/syrup.js)
- [Syrup for Haskell](https://github.com/zenhack/haskell-preserves)
#### Implementations of the data model, plus Syrup transfer syntax
| Language | Code |
|------------|----------------------------------------------------------------------------------------------------------------------------------|
| Guile | [github.com/ocapn/syrup](https://github.com/ocapn/syrup/blob/master/impls/guile/syrup.scm) |
| Haskell | [github.com/zenhack/haskell-preserves](https://github.com/zenhack/haskell-preserves) |
| JavaScript | [github.com/zarutian/agoric-sdk](https://github.com/zarutian/agoric-sdk/blob/zarutian/captp_variant/packages/captp/lib/syrup.js) |
| Python | [github.com/ocapn/syrup](https://github.com/ocapn/syrup/blob/master/impls/python/syrup.py) |
| Racket | [github.com/ocapn/syrup](https://github.com/ocapn/syrup/blob/master/impls/racket/syrup/syrup.rkt) |
## Tools
@ -81,3 +88,5 @@ The contents of this repository are made available to you under the
[Apache License, version 2.0](LICENSE)
(<http://www.apache.org/licenses/LICENSE-2.0>), and are Copyright
2018-2022 Tony Garnock-Jones.
## Notes

View File

@ -14,4 +14,4 @@ defaults:
title: "Preserves"
version_date: "October 2023"
version: "0.990.0"
version: "0.990.1"

View File

@ -0,0 +1,33 @@
For a value `V`, we write `«V»` for the binary encoding of `V`.
```text
«#f» = [0x80]
«#t» = [0x81]
«@W V» = [0x85] ++ «W» ++ «V»
«#!V» = [0x86] ++ «V»
«V» if V ∈ Float = [0x87, 0x04] ++ binary32(V)
«V» if V ∈ Double = [0x87, 0x08] ++ binary64(V)
«V» if V ∈ SignedInteger = [0xB0] ++ varint(|intbytes(V)|) ++ intbytes(V)
«V» if V ∈ String = [0xB1] ++ varint(|utf8(V)|) ++ utf8(V)
«V» if V ∈ ByteString = [0xB2] ++ varint(|V|) ++ V
«V» if V ∈ Symbol = [0xB3] ++ varint(|utf8(V)|) ++ utf8(V)
«<L F_1...F_m>» = [0xB4] ++ «L» ++ «F_1» ++...++ «F_m» ++ [0x84]
«[X_1...X_m]» = [0xB5] ++ «X_1» ++...++ «X_m» ++ [0x84]
«#{E_1...E_m}» = [0xB6] ++ «E_1» ++...++ «E_m» ++ [0x84]
«{K_1:V_1...K_m:V_m}» = [0xB7] ++ «K_1» ++ «V_1» ++...++ «K_m» ++ «V_m» ++ [0x84]
varint(n) = [n] if n < 128
[(n & 127) | 128] ++ varint(n >> 7) if n ≥ 128
intbytes(n) = the empty sequence if n = 0, otherwise signedBigEndian(n)
signedBigEndian(n) = [n & 255] if -128 ≤ n ≤ 127
signedBigEndian(n >> 8) ++ [n & 255] otherwise
```
The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
8-byte IEEE 754 binary representations of `F` and `D`, respectively.

View File

@ -51,36 +51,3 @@ division](https://en.wikipedia.org/wiki/Euclidean_division); that is, if
<span class="postcard-grammar binarysyntax">*n* = *dq* + *r*</span> and
<span class="postcard-grammar binarysyntax">0 ≤ *r* &lt; |d|</span>.
-->
<!--
For a value `V`, we write `«V»` for the binary encoding of `V`.
«#f» = [0x80]
«#t» = [0x81]
«@W V» = [0x85] ++ «W» ++ «V»
«#!V» = [0x86] ++ «V»
«V» if V ∈ Float = [0x87, 0x04] ++ binary32(V)
«V» if V ∈ Double = [0x87, 0x08] ++ binary64(V)
«V» if V ∈ SignedInteger = [0xB0] ++ varint(|intbytes(V)|) ++ intbytes(V)
«V» if V ∈ String = [0xB1] ++ varint(|utf8(V)|) ++ utf8(V)
«V» if V ∈ ByteString = [0xB2] ++ varint(|V|) ++ V
«V» if V ∈ Symbol = [0xB3] ++ varint(|utf8(V)|) ++ utf8(V)
«<L F_1...F_m>» = [0xB4] ++ «L» ++ «F_1» ++...++ «F_m» ++ [0x84]
«[X_1...X_m]» = [0xB5] ++ «X_1» ++...++ «X_m» ++ [0x84]
«#{E_1...E_m}» = [0xB6] ++ «E_1» ++...++ «E_m» ++ [0x84]
«{K_1:V_1...K_m:V_m}» = [0xB7] ++ «K_1» ++ «V_1» ++...++ «K_m» ++ «V_m» ++ [0x84]
varint(v) = [v] if v < 128
[(v & 0x7F) + 128] ++ varint(v >> 7) if v ≥ 128
The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
8-byte IEEE 754 binary representations of `F` and `D`, respectively.
The function `intbytes(x)` is a big-endian two's-complement signed binary representation of
`x`, taking exactly as many whole bytes as needed to unambiguously identify the value and its
sign. In particular, `intbytes(0)` is the empty byte sequence.
-->

View File

@ -0,0 +1,44 @@
```text
Document := Value ws
Value := ws (Record | Collection | Atom | Embedded | Annotated)
Collection := Sequence | Dictionary | Set
Atom := Boolean | ByteString | String | QuotedSymbol | Symbol | Number
ws := (space | tab | cr | lf | `,`)*
Record := `<` Value+ ws `>`
Sequence := `[` Value* ws `]`
Dictionary := `{` (Value ws `:` Value)* ws `}`
Set := `#{` Value* ws `}`
Boolean := `#t` | `#f`
ByteString := `#"` binchar* `"`
| `#x"` (ws hex hex)* ws `"`
| `#[` (ws base64char)* ws `]`
String := `"` («any unicode scalar except `\` or `"`» | escaped | `\"`)* `"`
QuotedSymbol := `|` («any unicode scalar except `\` or `|`» | escaped | `\|`)* `|`
Symbol := (`A`..`Z` | `a`..`z` | `0`..`9` | sympunct | symuchar)+
Number := Float | Double | SignedInteger
Float := flt (`f`|`F`) | `#xf"` (ws hex hex)4 ws `"`
Double := flt | `#xd"` (ws hex hex)8 ws `"`
SignedInteger := int
Embedded := `#!` Value
Annotated := Annotation Value
Annotation := `@` Value | `;` «any unicode scalar except cr or lf»* (cr | lf)
escaped := `\\` | `\/` | `\b` | `\f` | `\n` | `\r` | `\t` | `\u` hex hex hex hex
binescaped := `\\` | `\/` | `\b` | `\f` | `\n` | `\r` | `\t` | `\x` hex hex
binchar := «any scalar ≥32 and ≤126, except `\` or `"`» | binescaped | `\"`
base64char := `A`..`Z` | `a`..`z` | `0`..`9` | `+` | `/` | `-` | `_` | `=`
sympunct := `~` | `!` | `$` | `%` | `^` | `&` | `*` | `?`
| `_` | `=` | `+` | `-` | `/` | `.`
symuchar := «any scalar value ≥128 whose Unicode category is
Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd, Nl, No, Pc,
Pd, Po, Sc, Sm, Sk, So, or Co»
flt := int ( frac exp | frac | exp )
int := (`-`|`+`) (`0`..`9`)+
frac := `.` (`0`..`9`)+
exp := (`e`|`E`) (`-`|`+`) (`0`..`9`)+
hex := `A`..`F` | `a`..`f` | `0`..`9`
```

View File

@ -1,17 +1,18 @@
Value = Atom
| Compound
| Embedded
```text
Value = Atom
| Compound
| Embedded
Atom = Boolean
| Float
| Double
| SignedInteger
| String
| ByteString
| Symbol
Compound = Record
| Sequence
| Set
| Dictionary
Atom = Boolean
| Float
| Double
| SignedInteger
| String
| ByteString
| Symbol
Compound = Record
| Sequence
| Set
| Dictionary
```

11
cheatsheet-plaintext.md Normal file
View File

@ -0,0 +1,11 @@
---
no_site_title: true
title: "Preserves Quick Reference (Plaintext)"
---
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
{{ site.version_date }}. Version {{ site.version }}.
{% include cheatsheet-binary-plaintext.md %}
{% include cheatsheet-text-plaintext.md %}

View File

@ -14,7 +14,7 @@ inputs.
You will usually not need to use the `preserves-schema-rs`
command-line program. Instead, access the preserves-schema compiler
API from your `build.rs`. The following example is taken from
[`build.rs` for the `preserves-path` crate](https://gitlab.com/preserves/preserves/-/blob/18ac9168996026073ee16164fce108054b2a0ed7/implementations/rust/preserves-path/build.rs):
[`build.rs` for the `preserves-path` crate](https://gitlab.com/preserves/preserves/-/blob/af5de5b836ffc51999db93797d1995ff677cf6f8/implementations/rust/preserves-path/build.rs):
use preserves_schema::compiler::*;
@ -30,14 +30,14 @@ API from your `build.rs`. The following example is taken from
let mut c = CompilerConfig::new(gen_dir, "crate::schemas".to_owned());
let inputs = expand_inputs(&vec!["path.bin".to_owned()])?;
c.load_schemas_and_bundles(&inputs)?;
c.load_schemas_and_bundles(&inputs, &vec![])?;
compile(&c)
}
This approach also requires an `include!` from your main, hand-written
source tree. The following is a snippet from
[`preserves-path/src/lib.rs`](https://gitlab.com/preserves/preserves/-/blob/18ac9168996026073ee16164fce108054b2a0ed7/implementations/rust/preserves-path/src/lib.rs):
[`preserves-path/src/lib.rs`](https://gitlab.com/preserves/preserves/-/blob/af5de5b836ffc51999db93797d1995ff677cf6f8/implementations/rust/preserves-path/src/lib.rs):
pub mod schemas {
include!(concat!(env!("OUT_DIR"), "/src/schemas/mod.rs"));
@ -52,20 +52,23 @@ Then, `cargo install preserves-schema`.
## Usage
preserves-schema 1.0.0
preserves-schema 3.990.2
USAGE:
preserves-schema-rs [OPTIONS] --output-dir <output-dir> --prefix <prefix> [--] [input-glob]...
preserves-schema-rs [FLAGS] [OPTIONS] --output-dir <output-dir> --prefix <prefix>
[--] [input-glob]...
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
-h, --help Prints help information
--rustfmt-skip
-V, --version Prints version information
OPTIONS:
--module <module>...
-o, --output-dir <output-dir>
-p, --prefix <prefix>
--support-crate <support-crate>
--xref <xref>...
ARGS:
<input-glob>...

View File

@ -3,6 +3,12 @@
set -e
exec 1>&2
COMMAND=cmp
if [ "$1" = "--fix" ];
then
COMMAND=cp
fi
# https://gitlab.com/preserves/preserves/-/issues/30
#
# So it turns out that Racket's git-checkout mechanism pays attention
@ -16,10 +22,19 @@ exec 1>&2
# Ensure that various copies of schema.prs, schema.bin, path.bin,
# samples.pr and samples.bin are in fact identical.
cmp path/path.bin implementations/python/preserves/path.prb
cmp path/path.bin implementations/rust/preserves-path/path.bin
cmp schema/schema.bin implementations/python/preserves/schema.prb
cmp schema/schema.prs implementations/racket/preserves/preserves-schema/schema.prs
cmp tests/samples.bin implementations/python/tests/samples.bin
cmp tests/samples.pr implementations/python/tests/samples.pr
cmp tests/samples.pr implementations/racket/preserves/preserves/tests/samples.pr
${COMMAND} path/path.bin implementations/python/preserves/path.prb
${COMMAND} path/path.bin implementations/rust/preserves-path/path.bin
${COMMAND} schema/schema.bin implementations/python/preserves/schema.prb
${COMMAND} schema/schema.prs implementations/racket/preserves/preserves-schema/schema.prs
${COMMAND} tests/samples.bin implementations/python/tests/samples.bin
${COMMAND} tests/samples.pr implementations/python/tests/samples.pr
${COMMAND} tests/samples.pr implementations/racket/preserves/preserves/tests/samples.pr
${COMMAND} _includes/what-is-preserves.md implementations/rust/preserves/doc/what-is-preserves.md
${COMMAND} _includes/cheatsheet-binary-plaintext.md implementations/rust/preserves/doc/cheatsheet-binary-plaintext.md
${COMMAND} _includes/cheatsheet-text-plaintext.md implementations/rust/preserves/doc/cheatsheet-text-plaintext.md
${COMMAND} _includes/value-grammar.md implementations/rust/preserves/doc/value-grammar.md
${COMMAND} _includes/what-is-preserves-schema.md implementations/rust/preserves-schema/doc/what-is-preserves-schema.md

View File

@ -0,0 +1,2 @@
dist/
lib/

View File

@ -0,0 +1 @@
version-tag-prefix javascript-@preserves/schema-cli@

View File

@ -0,0 +1 @@
# Preserves Schema for TypeScript/JavaScript: Command-line tools

View File

@ -0,0 +1,39 @@
{
"name": "@preserves/schema-cli",
"version": "0.990.1",
"description": "Command-line tools for Preserves Schema",
"homepage": "https://gitlab.com/preserves/preserves",
"license": "Apache-2.0",
"publishConfig": {
"access": "public"
},
"repository": "gitlab:preserves/preserves",
"author": "Tony Garnock-Jones <tonyg@leastfixedpoint.com>",
"scripts": {
"clean": "rm -rf lib dist",
"prepare": "yarn compile && yarn rollup",
"compile": "tsc",
"compile:watch": "yarn compile -w",
"rollup": "rollup -c",
"rollup:watch": "yarn rollup -w",
"test": "true",
"veryclean": "yarn run clean && rm -rf node_modules"
},
"bin": {
"preserves-schema-ts": "./bin/preserves-schema-ts.js",
"preserves-schemac": "./bin/preserves-schemac.js"
},
"devDependencies": {
"@types/glob": "^7.1",
"@types/minimatch": "^3.0"
},
"dependencies": {
"@preserves/core": "^0.990.0",
"@preserves/schema": "^0.990.1",
"chalk": "^4.1",
"chokidar": "^3.5",
"commander": "^7.2",
"glob": "^7.1",
"minimatch": "^3.0"
}
}

View File

@ -0,0 +1,17 @@
import terser from '@rollup/plugin-terser';
function cli(name) {
return {
input: `lib/bin/${name}.js`,
output: [{file: `dist/bin/${name}.js`, format: 'commonjs'}],
external: [
'@preserves/core',
'@preserves/schema',
],
};
}
export default [
cli('preserves-schema-ts'),
cli('preserves-schemac'),
];

View File

@ -1,10 +1,9 @@
import fs from 'fs';
import path from 'path';
import { glob } from 'glob';
import { IdentitySet, formatPosition, Position } from '@preserves/core';
import { readSchema } from '../reader';
import chalk from 'chalk';
import * as M from '../meta';
import { IdentitySet, formatPosition, Position } from '@preserves/core';
import { readSchema, Meta as M } from '@preserves/schema';
export interface Diagnostic {
type: 'warn' | 'error';

View File

@ -1,9 +1,8 @@
import { compile } from '../index';
import fs from 'fs';
import path from 'path';
import minimatch from 'minimatch';
import { Command } from 'commander';
import * as M from '../meta';
import { compile, Meta as M } from '@preserves/schema';
import chalk from 'chalk';
import { is, Position } from '@preserves/core';
import chokidar from 'chokidar';

View File

@ -2,7 +2,7 @@ import { Command } from 'commander';
import { canonicalEncode, KeyedDictionary, underlying } from '@preserves/core';
import fs from 'fs';
import path from 'path';
import * as M from '../meta';
import { Meta as M } from '@preserves/schema';
import { expandInputGlob, formatFailures } from './cli-utils';
export type CommandLineArguments = {

View File

@ -0,0 +1,16 @@
{
"compilerOptions": {
"target": "ES2017",
"lib": ["es2019", "DOM"],
"declaration": true,
"baseUrl": "./src",
"rootDir": "./src",
"outDir": "./lib",
"declarationDir": "./lib",
"esModuleInterop": true,
"moduleResolution": "node",
"sourceMap": true,
"strict": true
},
"include": ["src/**/*"]
}

View File

@ -2,3 +2,7 @@
This is an implementation of [Preserves Schema](https://preserves.dev/preserves-schema.html)
for TypeScript and JavaScript.
This package implements a Schema runtime and a Schema-to-TypeScript compiler, but offers no
command line interfaces. See `@preserves/schema-cli` for command-line tools for working with
Schema and compiling from Schema to TypeScript.

View File

@ -1,6 +1,6 @@
{
"name": "@preserves/schema",
"version": "0.990.0",
"version": "0.990.1",
"description": "Schema support for Preserves data serialization format",
"homepage": "https://gitlab.com/preserves/preserves",
"license": "Apache-2.0",
@ -13,7 +13,7 @@
"types": "lib/index.d.ts",
"author": "Tony Garnock-Jones <tonyg@leastfixedpoint.com>",
"scripts": {
"regenerate": "rm -rf ./src/gen && yarn copy-schema && ./bin/preserves-schema-ts.js --output ./src/gen ./dist:schema.prs",
"regenerate": "rm -rf ./src/gen && yarn copy-schema && ../schema-cli/bin/preserves-schema-ts.js --output ./src/gen ./dist:schema.prs",
"clean": "rm -rf lib dist",
"prepare": "yarn compile && yarn rollup && yarn copy-schema",
"compile": "tsc",
@ -25,18 +25,7 @@
"test:watch": "jest --watch",
"veryclean": "yarn run clean && rm -rf node_modules"
},
"bin": {
"preserves-schema-ts": "./bin/preserves-schema-ts.js",
"preserves-schemac": "./bin/preserves-schemac.js"
},
"dependencies": {
"@preserves/core": "^0.990.0",
"@types/glob": "^7.1",
"@types/minimatch": "^3.0",
"chalk": "^4.1",
"chokidar": "^3.5",
"commander": "^7.2",
"glob": "^7.1",
"minimatch": "^3.0"
"@preserves/core": "^0.990.0"
}
}

View File

@ -31,13 +31,6 @@ function cli(name) {
output: [{file: `dist/bin/${name}.js`, format: 'commonjs'}],
external: [
'@preserves/core',
'chalk',
'chokidar',
'fs',
'glob',
'minimatch',
'path',
'commander',
],
};
}
@ -53,6 +46,4 @@ export default [
],
external: ['@preserves/core'],
},
cli('preserves-schema-ts'),
cli('preserves-schemac'),
];

View File

@ -20,5 +20,7 @@ open "cd packages/core; yarn run test:watch"
open "cd packages/schema; yarn run compile:watch"
open "cd packages/schema; yarn run rollup:watch"
open "cd packages/schema; yarn run test:watch"
open "cd packages/schema-cli; yarn run compile:watch"
open "cd packages/schema-cli; yarn run rollup:watch"
tmux select-layout even-vertical

View File

@ -5,6 +5,9 @@
all:
cargo build --all-targets
doc:
cargo doc --workspace
x86_64-binary: x86_64-binary-release
x86_64-binary-release:

View File

@ -1,6 +1,6 @@
[package]
name = "preserves-schema"
version = "3.990.0"
version = "3.990.3"
authors = ["Tony Garnock-Jones <tonyg@leastfixedpoint.com>"]
edition = "2018"
description = "Implementation of Preserves Schema code generation and support for Rust."

View File

@ -1,4 +1,6 @@
# Preserves Schema for Rust
```shell
cargo add preserves preserves-schema
```
This is an implementation of [Preserves Schema](https://preserves.dev/preserves-schema.html)
for Rust.
This crate ([`preserves-schema` on crates.io](https://crates.io/crates/preserves-schema)) is an
implementation of [Preserves Schema](https://preserves.dev/preserves-schema.html) for Rust.

View File

@ -0,0 +1,112 @@
# Example
[preserves-schemac]: https://preserves.dev/doc/preserves-schemac.html
[preserves-schema-rs]: https://preserves.dev/doc/preserves-schema-rs.html
Preserves schemas are written in a syntax that (ab)uses [Preserves text
syntax][preserves::value::text] as a kind of S-expression. Schema source code looks like this:
```preserves-schema
version 1 .
Present = <Present @username string> .
Says = <Says @who string @what string> .
UserStatus = <Status @username string @status Status> .
Status = =here / <away @since TimeStamp> .
TimeStamp = string .
```
Conventionally, schema source code is stored in `*.prs` files. In this example, the source code
above is placed in `simpleChatProtocol.prs`.
The Rust code generator for schemas requires not source code, but instances of the [Preserves
metaschema](https://preserves.dev/preserves-schema.html#appendix-metaschema). To compile schema
source code to metaschema instances, use [preserves-schemac][]:
```shell
yarn global add @preserves/schema
preserves-schemac .:simpleChatProtocol.prs > simpleChatProtocol.prb
```
Binary-syntax metaschema instances are conventionally stored in `*.prb` files. If you have a
whole directory tree of `*.prs` files, you can supply just "`.`" without the "`:`"-prefixed
fileglob part.[^converting-metaschema-to-text] See the [preserves-schemac documentation][preserves-schemac].
[^converting-metaschema-to-text]:
Converting the `simpleChatProtocol.prb` file to Preserves text syntax lets us read the
metaschema instance corresponding to the source code:
```shell
cat simpleChatProtocol.prb | preserves-tool convert
```
The result:
```preserves
<bundle {
[
simpleChatProtocol
]: <schema {
definitions: {
Present: <rec <lit Present> <tuple [
<named username <atom String>>
]>>
Says: <rec <lit Says> <tuple [
<named who <atom String>>
<named what <atom String>>
]>>
Status: <or [
[
"here"
<lit here>
]
[
"away"
<rec <lit away> <tuple [
<named since <ref [] TimeStamp>>
]>>
]
]>
TimeStamp: <atom String>
UserStatus: <rec <lit Status> <tuple [
<named username <atom String>>
<named status <ref [] Status>>
]>>
}
embeddedType: #f
version: 1
}>
}>
```
#### Generating Rust code from a schema
Generate Rust definitions corresponding to a metaschema instance with [preserves-schema-rs][].
The best way to use it is to integrate it into your `build.rs` (see [the
docs][preserves-schema-rs]), but you can also use it as a standalone command-line tool.
The following command generates a directory `./rs/chat` containing rust sources for a module
that expects to be called `chat` in Rust code:
```shell
preserves-schema-rs --output-dir rs/chat --prefix chat simpleChatProtocol.prb
```
Representative excerpts from one of the generated files, `./rs/chat/simple_chat_protocol.rs`:
```rust,noplayground
pub struct Present {
pub username: std::string::String
}
pub struct Says {
pub who: std::string::String,
pub what: std::string::String
}
pub struct UserStatus {
pub username: std::string::String,
pub status: Status
}
pub enum Status {
Here,
Away {
since: std::boxed::Box<TimeStamp>
}
}
pub struct TimeStamp(pub std::string::String);
```

View File

@ -0,0 +1,16 @@
A Preserves schema connects Preserves `Value`s to host-language data
structures. Each definition within a schema can be processed by a
compiler to produce
- a simple host-language *type definition*;
- a partial *parsing* function from `Value`s to instances of the
produced type; and
- a total *serialization* function from instances of the type to
`Value`s.
Every parsed `Value` retains enough information to always be able to
be serialized again, and every instance of a host-language data
structure contains, by construction, enough information to be
successfully serialized.

View File

@ -1,3 +1,6 @@
//! Command-line Rust code generator for Preserves Schema. See the documentation at
//! <https://preserves.dev/doc/preserves-schema-rs.html>.
use std::io::Error;
use std::io::ErrorKind;
use std::path::PathBuf;

View File

@ -1,3 +1,39 @@
//! Implementation of the Schema-to-Rust compiler; this is the core of the
//! [preserves-schema-rs][] program.
//!
//! See the [documentation for preserves-schema-rs][preserves-schema-rs] for examples of how to
//! use the compiler programmatically from a `build.rs` script, but very briefly, use
//! [preserves-schemac](https://preserves.dev/doc/preserves-schemac.html) to generate a
//! metaschema instance `*.prb` file, and then put something like this in `build.rs`:
//!
//! ```rust,ignore
//! use preserves_schema::compiler::*;
//!
//! const PATH_TO_PRB_FILE: &'static str = "your-metaschema-instance-file.prb";
//!
//! fn main() -> Result<(), std::io::Error> {
//! let buildroot = std::path::PathBuf::from(std::env::var_os("OUT_DIR").unwrap());
//!
//! let mut gen_dir = buildroot.clone();
//! gen_dir.push("src/schemas");
//! let mut c = CompilerConfig::new(gen_dir, "crate::schemas".to_owned());
//!
//! let inputs = expand_inputs(&vec![PATH_TO_PRB_FILE.to_owned()])?;
//! c.load_schemas_and_bundles(&inputs, &vec![])?;
//! compile(&c)
//! }
//! ```
//!
//! plus something like this in your `lib.rs` or main program:
//!
//! ```rust,ignore
//! pub mod schemas {
//! include!(concat!(env!("OUT_DIR"), "/src/schemas/mod.rs"));
//! }
//! ```
//!
//! [preserves-schema-rs]: https://preserves.dev/doc/preserves-schema-rs.html
pub mod context;
pub mod cycles;
pub mod names;
@ -29,11 +65,18 @@ use std::io::Read;
use std::io::Write;
use std::path::PathBuf;
/// Names a Schema module within a (collection of) Schema bundle(s).
pub type ModulePath = Vec<String>;
/// Implement this trait to extend the compiler with custom code generation support. The main
/// code generators are also implemented as plugins.
///
/// For an example of its use outside the core compiler, see [`build.rs` for the `syndicate-rs` project](https://git.syndicate-lang.org/syndicate-lang/syndicate-rs/src/commit/60e6c6badfcbcbccc902994f4f32db6048f60d1f/syndicate/build.rs).
pub trait Plugin: std::fmt::Debug {
/// Use `_module_ctxt` to emit code at a per-module level.
fn generate_module(&self, _module_ctxt: &mut ModuleContext) {}
/// Use `module_ctxt` to emit code at a per-Schema-[Definition] level.
fn generate_definition(
&self,
module_ctxt: &mut ModuleContext,
@ -110,17 +153,30 @@ impl ExternalModule {
}
}
/// Main entry point to the compiler.
#[derive(Debug)]
pub struct CompilerConfig {
/// All known Schema modules, indexed by [ModulePath] and annotated with a [Purpose].
pub bundle: Map<ModulePath, (Schema, Purpose)>,
/// Where output Rust code files will be placed.
pub output_dir: PathBuf,
/// Fully-qualified Rust module prefix to use for each generated module.
pub fully_qualified_module_prefix: String,
/// Rust module path to the [preserves_schema::support][crate::support] module.
pub support_crate: String,
/// External modules for cross-referencing.
pub external_modules: Map<ModulePath, ExternalModule>,
/// Plugins active in this compiler instance.
pub plugins: Vec<Box<dyn Plugin>>,
/// If true, a directive is emitted in each module instructing
/// [rustfmt](https://github.com/rust-lang/rustfmt) to ignore it.
pub rustfmt_skip: bool,
}
/// Loads a [Schema] or [Bundle] from path `i` into `bundle` for the given `purpose`.
///
/// If `i` holds a [Schema], then the file stem of `i` is used as the module name when placing
/// the schema in `bundle`.
pub fn load_schema_or_bundle_with_purpose(
bundle: &mut Map<ModulePath, (Schema, Purpose)>,
i: &PathBuf,
@ -134,6 +190,11 @@ pub fn load_schema_or_bundle_with_purpose(
Ok(())
}
/// Loads a [Schema] or [Bundle] from raw binary encoded value `input` into `bundle` for the
/// given `purpose`.
///
/// If `input` corresponds to a [Schema], then `prefix` is used as its module name; otherwise,
/// it's a [Bundle], and `prefix` is ignored.
pub fn load_schema_or_bundle_bin_with_purpose(
bundle: &mut Map<ModulePath, (Schema, Purpose)>,
prefix: &str,
@ -165,6 +226,10 @@ fn bundle_prefix(i: &PathBuf) -> io::Result<&str> {
})
}
/// Loads a [Schema] or [Bundle] from path `i` into `bundle`.
///
/// If `i` holds a [Schema], then the file stem of `i` is used as the module name when placing
/// the schema in `bundle`.
pub fn load_schema_or_bundle(bundle: &mut Map<ModulePath, Schema>, i: &PathBuf) -> io::Result<()> {
let mut f = File::open(&i)?;
let mut bs = vec![];
@ -172,6 +237,10 @@ pub fn load_schema_or_bundle(bundle: &mut Map<ModulePath, Schema>, i: &PathBuf)
load_schema_or_bundle_bin(bundle, bundle_prefix(i)?, &bs[..])
}
/// Loads a [Schema] or [Bundle] from raw binary encoded value `input` into `bundle`.
///
/// If `input` corresponds to a [Schema], then `prefix` is used as its module name; otherwise,
/// it's a [Bundle], and `prefix` is ignored.
pub fn load_schema_or_bundle_bin(
bundle: &mut Map<ModulePath, Schema>,
prefix: &str,
@ -199,6 +268,8 @@ pub fn load_schema_or_bundle_bin(
}
impl CompilerConfig {
/// Construct a [CompilerConfig] configured to send output files to `output_dir`, and to
/// use `fully_qualified_module_prefix` as the Rust module prefix for generated code.
pub fn new(output_dir: PathBuf, fully_qualified_module_prefix: String) -> Self {
CompilerConfig {
bundle: Map::new(),
@ -277,6 +348,7 @@ impl CompilerConfig {
}
}
/// Expands a vector of [mod@glob]s to a vector of actual paths.
pub fn expand_inputs(globs: &Vec<String>) -> io::Result<Vec<PathBuf>> {
let mut result = Vec::new();
for g in globs.iter() {
@ -322,6 +394,7 @@ impl Schema {
}
}
/// Main entry point: runs the compilation process.
pub fn compile(config: &CompilerConfig) -> io::Result<()> {
let mut b = BundleContext::new(config);

View File

@ -1,4 +1,12 @@
#![doc = concat!(
include_str!("../README.md"),
"# What is Preserves Schema?\n\n",
include_str!("../doc/what-is-preserves-schema.md"),
include_str!("../doc/example.md"),
)]
pub mod compiler;
/// Auto-generated Preserves Schema Metaschema types, parsers, and unparsers.
pub mod gen;
pub mod support;
pub mod syntax;

View File

@ -1,3 +1,6 @@
//! Interpreter for instances of Preserves Schema Metaschema, for schema-directed dynamic
//! parsing and unparsing of terms.
use crate::gen::schema::*;
use preserves::value::merge::merge2;
@ -5,8 +8,10 @@ use preserves::value::Map;
use preserves::value::NestedValue;
use preserves::value::Value;
/// Represents an environment mapping schema module names to [Schema] instances.
pub type Env<V> = Map<Vec<String>, Schema<V>>;
/// Context for a given interpretation of a [Schema].
#[derive(Debug)]
pub struct Context<'a, V: NestedValue> {
pub env: &'a Env<V>,
@ -20,6 +25,7 @@ enum DynField<V: NestedValue> {
}
impl<'a, V: NestedValue> Context<'a, V> {
/// Construct a new [Context] with the given [Env].
pub fn new(env: &'a Env<V>) -> Self {
Context {
env,
@ -27,6 +33,8 @@ impl<'a, V: NestedValue> Context<'a, V> {
}
}
/// Parse `v` using the rule named `name` from the module at path `module` in `self.env`.
/// Yields `Some(...)` if the parse succeeds, and `None` otherwise.
pub fn dynamic_parse(&mut self, module: &Vec<String>, name: &str, v: &V) -> Option<V> {
let old_module =
(module.len() > 0).then(|| std::mem::replace(&mut self.module, module.clone()));
@ -39,6 +47,7 @@ impl<'a, V: NestedValue> Context<'a, V> {
result
}
#[doc(hidden)]
pub fn dynamic_unparse(&mut self, _module: &Vec<String>, _name: &str, _w: &V) -> Option<V> {
panic!("Not yet implemented");
}

View File

@ -1,3 +1,7 @@
//! The runtime support library for compiled Schemas.
#[doc(hidden)]
/// Reexport lazy_static for generated code to use.
pub use lazy_static::lazy_static;
pub use preserves;
@ -21,10 +25,16 @@ use std::sync::Arc;
use thiserror::Error;
/// Every [language][crate::define_language] implements [NestedValueCodec] as a marker trait.
pub trait NestedValueCodec {} // marker trait
impl NestedValueCodec for () {}
/// Implementors of [Parse] can produce instances of themselves from a [Value], given a
/// supporting [language][crate::define_language]. All Schema-compiler-produced types implement
/// [Parse].
pub trait Parse<L, Value: NestedValue>: Sized {
/// Decode the given `value` (using auxiliary structure from the `language` instance) to
/// produce an instance of [Self].
fn parse(language: L, value: &Value) -> Result<Self, ParseError>;
}
@ -34,7 +44,10 @@ impl<'a, T: NestedValueCodec, Value: NestedValue> Parse<&'a T, Value> for Value
}
}
/// Implementors of [Unparse] can convert themselves into a [Value], given a supporting
/// [language][crate::define_language]. All Schema-compiler-produced types implement [Unparse].
pub trait Unparse<L, Value: NestedValue> {
/// Encode `self` into a [Value] (using auxiliary structure from the `language` instance).
fn unparse(&self, language: L) -> Value;
}
@ -44,8 +57,13 @@ impl<'a, T: NestedValueCodec, Value: NestedValue> Unparse<&'a T, Value> for Valu
}
}
/// Every [language][crate::define_language] implements [Codec], which supplies convenient
/// shorthand for invoking [Parse::parse] and [Unparse::unparse].
pub trait Codec<N: NestedValue> {
/// Delegates to [`T::parse`][Parse::parse], using `self` as language and the given `value`
/// as input.
fn parse<'a, T: Parse<&'a Self, N>>(&'a self, value: &N) -> Result<T, ParseError>;
/// Delegates to [`value.unparse`][Unparse::unparse], using `self` as language.
fn unparse<'a, T: Unparse<&'a Self, N>>(&'a self, value: &T) -> N;
}
@ -59,6 +77,11 @@ impl<L, N: NestedValue> Codec<N> for L {
}
}
/// Implementors of [Deserialize] can produce instances of themselves from a [Value]. All
/// Schema-compiler-produced types implement [Deserialize].
///
/// The difference between [Deserialize] and [Parse] is that implementors of [Deserialize] know
/// which [language][crate::define_language] to use.
pub trait Deserialize<N: NestedValue>
where
Self: Sized,
@ -66,10 +89,14 @@ where
fn deserialize<'de, R: Reader<'de, N>>(r: &mut R) -> Result<Self, ParseError>;
}
/// Extracts a simple literal term from a byte array using
/// [PackedReader][preserves::value::packed::PackedReader]. No embedded values are permitted.
pub fn decode_lit<N: NestedValue>(bs: &[u8]) -> io::Result<N> {
preserves::value::packed::from_bytes(bs, NoEmbeddedDomainCodec)
}
/// When `D` can parse itself from an [IOValue], this function parses all embedded [IOValue]s
/// into `D`s.
pub fn decode_embedded<D: Domain>(v: &IOValue) -> Result<ArcValue<Arc<D>>, ParseError>
where
for<'a> D: TryFrom<&'a IOValue, Error = ParseError>,
@ -77,6 +104,8 @@ where
v.copy_via(&mut |d| Ok(Value::Embedded(Arc::new(D::try_from(d)?))))
}
/// When `D` can unparse itself into an [IOValue], this function converts all embedded `D`s
/// into [IOValue]s.
pub fn encode_embedded<D: Domain>(v: &ArcValue<Arc<D>>) -> IOValue
where
for<'a> IOValue: From<&'a D>,
@ -85,10 +114,13 @@ where
.unwrap()
}
/// Error value yielded when parsing of an [IOValue] into a Schema-compiler-produced type.
#[derive(Error, Debug)]
pub enum ParseError {
/// Signalled when the input does not match the Preserves Schema associated with the type.
#[error("Input not conformant with Schema: {0}")]
ConformanceError(&'static str),
/// Signalled when the underlying Preserves library signals an error.
#[error(transparent)]
Preserves(preserves::error::Error),
}
@ -120,10 +152,12 @@ impl From<ParseError> for io::Error {
}
impl ParseError {
/// Constructs a [ParseError::ConformanceError].
pub fn conformance_error(context: &'static str) -> Self {
ParseError::ConformanceError(context)
}
/// True iff `self` is a [ParseError::ConformanceError].
pub fn is_conformance_error(&self) -> bool {
return if let ParseError::ConformanceError(_) = self {
true

View File

@ -1,12 +1,21 @@
//! A library for emitting pretty-formatted structured source code.
//!
//! The main entry points are [Formatter::to_string] and [Formatter::write], plus the utilities
//! in the [macros] submodule.
use std::fmt::Write;
use std::str;
/// Default width for pretty-formatting, in columns.
pub const DEFAULT_WIDTH: usize = 80;
/// All pretty-formattable items must implement this trait.
pub trait Emittable: std::fmt::Debug {
/// Serializes `self`, as pretty-printed code, on `f`.
fn write_on(&self, f: &mut Formatter);
}
/// Tailoring of behaviour for [Vertical] groupings.
#[derive(Clone, PartialEq, Eq)]
pub enum VerticalMode {
Variable,
@ -14,13 +23,16 @@ pub enum VerticalMode {
ExtraNewline,
}
/// Vertical formatting for [Emittable]s.
pub trait Vertical {
fn set_vertical_mode(&mut self, mode: VerticalMode);
fn write_vertically_on(&self, f: &mut Formatter);
}
/// Polymorphic [Emittable], used consistently in the API.
pub type Item = std::rc::Rc<dyn Emittable>;
/// A possibly-vertical sequence of items with item-separating and -terminating text.
#[derive(Clone)]
pub struct Sequence {
pub items: Vec<Item>,
@ -29,6 +41,8 @@ pub struct Sequence {
pub terminator: &'static str,
}
/// A sequence of items, indented when formatted vertically, surrounded by opening and closing
/// text.
#[derive(Clone)]
pub struct Grouping {
pub sequence: Sequence,
@ -36,14 +50,18 @@ pub struct Grouping {
pub close: &'static str,
}
/// State needed for pretty-formatting of [Emittable]s.
pub struct Formatter {
/// Number of available columns. Used to decide between horizontal and vertical layouts.
pub width: usize,
indent_delta: String,
current_indent: String,
/// Mutable output buffer. Accumulates emitted text during writing.
pub buffer: String,
}
impl Formatter {
/// Construct a Formatter using [DEFAULT_WIDTH] and a four-space indent.
pub fn new() -> Self {
Formatter {
width: DEFAULT_WIDTH,
@ -53,6 +71,7 @@ impl Formatter {
}
}
/// Construct a Formatter just like `self` but with an empty `buffer`.
pub fn copy_empty(&self) -> Formatter {
Formatter {
width: self.width,
@ -62,28 +81,37 @@ impl Formatter {
}
}
/// Yields the indent size.
pub fn indent_size(self) -> usize {
self.indent_delta.len()
}
/// Updates the indent size.
pub fn set_indent_size(&mut self, n: usize) {
self.indent_delta = str::repeat(" ", n)
}
/// Accumulates a text serialization of `e` in `buffer`.
pub fn write<E: Emittable>(&mut self, e: E) {
e.write_on(self)
}
/// Emits a newline followed by indentation into `buffer`.
pub fn newline(&mut self) {
self.buffer.push_str(&self.current_indent)
}
/// Creates a default Formatter, uses it to [write][Formatter::write] `e`, and yields the
/// contents of its `buffer`.
pub fn to_string<E: Emittable>(e: E) -> String {
let mut f = Formatter::new();
f.write(e);
f.buffer
}
/// Calls `f` in a context where the indentation has been increased by
/// [Formatter::indent_size] spaces. Restores the indentation level after `f` returns.
/// Yields the result of the call to `f`.
pub fn with_indent<R, F: FnOnce(&mut Self) -> R>(&mut self, f: F) -> R {
let old_indent = self.current_indent.clone();
self.current_indent += &self.indent_delta;
@ -93,6 +121,12 @@ impl Formatter {
}
}
impl Default for Formatter {
fn default() -> Self {
Self::new()
}
}
impl Default for VerticalMode {
fn default() -> Self {
Self::Variable
@ -238,6 +272,12 @@ impl std::fmt::Debug for Grouping {
//---------------------------------------------------------------------------
/// Escapes `s` by substituting `\\` for `\`, `\"` for `"`, and `\u{...}` for characters
/// outside the range 32..126, inclusive.
///
/// This process is intended to generate literals compatible with `rustc`; see [the language
/// reference on "Character and string
/// literals"](https://doc.rust-lang.org/reference/tokens.html#character-and-string-literals).
pub fn escape_string(s: &str) -> String {
let mut buf = String::new();
buf.push('"');
@ -253,6 +293,13 @@ pub fn escape_string(s: &str) -> String {
buf
}
/// Escapes `bs` into a Rust byte string literal, treating each byte as its ASCII equivalent
/// except producing `\\` for 0x5c, `\"` for 0x22, and `\x..` for bytes outside the range
/// 0x20..0x7e, inclusive.
///
/// This process is intended to generate literals compatible with `rustc`; see [the language
/// reference on "Byte string
/// literals"](https://doc.rust-lang.org/reference/tokens.html#byte-string-literals).
pub fn escape_bytes(bs: &[u8]) -> String {
let mut buf = String::new();
buf.push_str("b\"");
@ -262,7 +309,7 @@ pub fn escape_bytes(bs: &[u8]) -> String {
'\\' => buf.push_str("\\\\"),
'"' => buf.push_str("\\\""),
_ if c >= ' ' && c <= '~' => buf.push(c),
_ => write!(&mut buf, "\\x{{{:02x}}}", b).expect("no IO errors building a string"),
_ => write!(&mut buf, "\\x{:02x}", b).expect("no IO errors building a string"),
}
}
buf.push('"');
@ -271,6 +318,7 @@ pub fn escape_bytes(bs: &[u8]) -> String {
//---------------------------------------------------------------------------
/// Utilities for constructing many useful kinds of [Sequence] and [Grouping].
pub mod constructors {
use super::Emittable;
use super::Grouping;
@ -279,10 +327,12 @@ pub mod constructors {
use super::Vertical;
use super::VerticalMode;
/// Produces a polymorphic, reference-counted [Item] from some generic [Emittable].
pub fn item<E: 'static + Emittable>(i: E) -> Item {
std::rc::Rc::new(i)
}
/// *a*`::`*b*`::`*...*`::`*z*
pub fn name(pieces: Vec<Item>) -> Sequence {
Sequence {
items: pieces,
@ -292,6 +342,7 @@ pub mod constructors {
}
}
/// *ab...z* (directly adjacent, no separators or terminators)
pub fn seq(items: Vec<Item>) -> Sequence {
Sequence {
items: items,
@ -301,6 +352,7 @@ pub mod constructors {
}
}
/// *a*`, `*b*`, `*...*`, `*z*
pub fn commas(items: Vec<Item>) -> Sequence {
Sequence {
items: items,
@ -310,6 +362,7 @@ pub mod constructors {
}
}
/// `(`*a*`, `*b*`, `*...*`, `*z*`)`
pub fn parens(items: Vec<Item>) -> Grouping {
Grouping {
sequence: commas(items),
@ -318,6 +371,7 @@ pub mod constructors {
}
}
/// `[`*a*`, `*b*`, `*...*`, `*z*`]`
pub fn brackets(items: Vec<Item>) -> Grouping {
Grouping {
sequence: commas(items),
@ -326,6 +380,7 @@ pub mod constructors {
}
}
/// `<`*a*`, `*b*`, `*...*`, `*z*`>`
pub fn anglebrackets(items: Vec<Item>) -> Grouping {
Grouping {
sequence: commas(items),
@ -334,6 +389,7 @@ pub mod constructors {
}
}
/// `{`*a*`, `*b*`, `*...*`, `*z*`}`
pub fn braces(items: Vec<Item>) -> Grouping {
Grouping {
sequence: commas(items),
@ -342,6 +398,7 @@ pub mod constructors {
}
}
/// `{`*a*` `*b*` `*...*` `*z*`}`
pub fn block(items: Vec<Item>) -> Grouping {
Grouping {
sequence: Sequence {
@ -355,10 +412,12 @@ pub mod constructors {
}
}
/// As [block], but always vertical
pub fn codeblock(items: Vec<Item>) -> Grouping {
vertical(false, block(items))
}
/// `{`*a*`; `*b*`; `*...*`; `*z*`}`
pub fn semiblock(items: Vec<Item>) -> Grouping {
Grouping {
sequence: Sequence {
@ -372,6 +431,9 @@ pub mod constructors {
}
}
/// Overrides `v` to be always vertical.
///
/// If `spaced` is true, inserts an extra newline between items.
pub fn vertical<V: Vertical>(spaced: bool, mut v: V) -> V {
v.set_vertical_mode(if spaced {
VerticalMode::ExtraNewline
@ -381,6 +443,7 @@ pub mod constructors {
v
}
/// Adds a layer of indentation to the given [Sequence].
pub fn indented(sequence: Sequence) -> Grouping {
Grouping {
sequence,
@ -390,52 +453,84 @@ pub mod constructors {
}
}
/// Ergonomic syntax for using the constructors in submodule [constructors]; see the
/// documentation for the macros, which appears on the [page for the crate
/// itself][crate#macros].
pub mod macros {
/// `name!(`*a*`, `*b*`, `*...*`, `*z*`)` ⟶ *a*`::`*b*`::`*...*`::`*z*
///
/// See [super::constructors::name].
#[macro_export]
macro_rules! name {
($($item:expr),*) => {$crate::syntax::block::constructors::name(vec![$(std::rc::Rc::new($item)),*])}
}
/// `seq!(`*a*`, `*b*`, `*...*`, `*z*`)` ⟶ *ab...z*
///
/// See [super::constructors::seq].
#[macro_export]
macro_rules! seq {
($($item:expr),*) => {$crate::syntax::block::constructors::seq(vec![$(std::rc::Rc::new($item)),*])}
}
/// `commas!(`*a*`, `*b*`, `*...*`, `*z*`)` ⟶ *a*`, `*b*`, `*...*`, `*z*
///
/// See [super::constructors::commas].
#[macro_export]
macro_rules! commas {
($($item:expr),*) => {$crate::syntax::block::constructors::commas(vec![$(std::rc::Rc::new($item)),*])}
}
/// `parens!(`*a*`, `*b*`, `*...*`, `*z*`)` ⟶ `(`*a*`, `*b*`, `*...*`, `*z*`)`
///
/// See [super::constructors::parens].
#[macro_export]
macro_rules! parens {
($($item:expr),*) => {$crate::syntax::block::constructors::parens(vec![$(std::rc::Rc::new($item)),*])}
}
/// `brackets!(`*a*`, `*b*`, `*...*`, `*z*`)` ⟶ `[`*a*`, `*b*`, `*...*`, `*z*`]`
///
/// See [super::constructors::brackets].
#[macro_export]
macro_rules! brackets {
($($item:expr),*) => {$crate::syntax::block::constructors::brackets(vec![$(std::rc::Rc::new($item)),*])}
}
/// `anglebrackets!(`*a*`, `*b*`, `*...*`, `*z*`)` ⟶ `<`*a*`, `*b*`, `*...*`, `*z*`>`
///
/// See [super::constructors::anglebrackets].
#[macro_export]
macro_rules! anglebrackets {
($($item:expr),*) => {$crate::syntax::block::constructors::anglebrackets(vec![$(std::rc::Rc::new($item)),*])}
}
/// `braces!(`*a*`, `*b*`, `*...*`, `*z*`)` ⟶ `{`*a*`, `*b*`, `*...*`, `*z*`}`
///
/// See [super::constructors::braces].
#[macro_export]
macro_rules! braces {
($($item:expr),*) => {$crate::syntax::block::constructors::braces(vec![$(std::rc::Rc::new($item)),*])}
}
/// `block!(`*a*`, `*b*`, `*...*`, `*z*`)` ⟶ `{`*a*` `*b*` `*...*` `*z*`}`
///
/// See [super::constructors::block].
#[macro_export]
macro_rules! block {
($($item:expr),*) => {$crate::syntax::block::constructors::block(vec![$(std::rc::Rc::new($item)),*])}
}
/// As [`block`]`!`, but always vertical. See
/// [constructors::codeblock][super::constructors::codeblock].
#[macro_export]
macro_rules! codeblock {
($($item:expr),*) => {$crate::syntax::block::constructors::codeblock(vec![$(std::rc::Rc::new($item)),*])}
}
/// `semiblock!(`*a*`, `*b*`, `*...*`, `*z*`)` ⟶ `{`*a*`; `*b*`; `*...*`; `*z*`}`
///
/// See [super::constructors::semiblock].
#[macro_export]
macro_rules! semiblock {
($($item:expr),*) => {$crate::syntax::block::constructors::semiblock(vec![$(std::rc::Rc::new($item)),*])}

View File

@ -1 +1,3 @@
//! A library for emitting pretty-formatted structured source code.
pub mod block;

View File

@ -1,6 +1,6 @@
[package]
name = "preserves"
version = "3.990.0"
version = "3.990.2"
authors = ["Tony Garnock-Jones <tonyg@leastfixedpoint.com>"]
edition = "2018"
description = "Implementation of the Preserves serialization format via serde."

View File

@ -0,0 +1,23 @@
```shell
cargo add preserves
```
This crate ([`preserves` on crates.io](https://crates.io/crates/preserves)) implements
[Preserves](https://preserves.dev/) for Rust. It provides the core
[semantics](https://preserves.dev/preserves.html#semantics) as well as both the [human-readable
text syntax][crate::value::text] (a superset of JSON) and [machine-oriented binary
format][crate::value::packed] (including
[canonicalization](https://preserves.dev/canonical-binary.html)) for Preserves.
This crate is the foundation for others such as
- [`preserves-schema`](https://docs.rs/preserves-schema/), which implements [Preserves
Schema](https://preserves.dev/preserves-schema.html);
- [`preserves-path`](https://docs.rs/preserves-path/), which implements [Preserves
Path](https://preserves.dev/preserves-path.html); and
- [`preserves-tools`](https://crates.io/crates/preserves-tools), which provides command-line
utilities for working with Preserves, in particular
[`preserves-tool`](https://preserves.dev/doc/preserves-tool.html), a kind of Preserves
Swiss-army knife.
It also includes [Serde](https://serde.rs/) support (modules [de], [ser], [symbol], [set]).

View File

@ -0,0 +1,33 @@
For a value `V`, we write `«V»` for the binary encoding of `V`.
```text
«#f» = [0x80]
«#t» = [0x81]
«@W V» = [0x85] ++ «W» ++ «V»
«#!V» = [0x86] ++ «V»
«V» if V ∈ Float = [0x87, 0x04] ++ binary32(V)
«V» if V ∈ Double = [0x87, 0x08] ++ binary64(V)
«V» if V ∈ SignedInteger = [0xB0] ++ varint(|intbytes(V)|) ++ intbytes(V)
«V» if V ∈ String = [0xB1] ++ varint(|utf8(V)|) ++ utf8(V)
«V» if V ∈ ByteString = [0xB2] ++ varint(|V|) ++ V
«V» if V ∈ Symbol = [0xB3] ++ varint(|utf8(V)|) ++ utf8(V)
«<L F_1...F_m>» = [0xB4] ++ «L» ++ «F_1» ++...++ «F_m» ++ [0x84]
«[X_1...X_m]» = [0xB5] ++ «X_1» ++...++ «X_m» ++ [0x84]
«#{E_1...E_m}» = [0xB6] ++ «E_1» ++...++ «E_m» ++ [0x84]
«{K_1:V_1...K_m:V_m}» = [0xB7] ++ «K_1» ++ «V_1» ++...++ «K_m» ++ «V_m» ++ [0x84]
varint(n) = [n] if n < 128
[(n & 127) | 128] ++ varint(n >> 7) if n ≥ 128
intbytes(n) = the empty sequence if n = 0, otherwise signedBigEndian(n)
signedBigEndian(n) = [n & 255] if -128 ≤ n ≤ 127
signedBigEndian(n >> 8) ++ [n & 255] otherwise
```
The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
8-byte IEEE 754 binary representations of `F` and `D`, respectively.

View File

@ -0,0 +1,44 @@
```text
Document := Value ws
Value := ws (Record | Collection | Atom | Embedded | Annotated)
Collection := Sequence | Dictionary | Set
Atom := Boolean | ByteString | String | QuotedSymbol | Symbol | Number
ws := (space | tab | cr | lf | `,`)*
Record := `<` Value+ ws `>`
Sequence := `[` Value* ws `]`
Dictionary := `{` (Value ws `:` Value)* ws `}`
Set := `#{` Value* ws `}`
Boolean := `#t` | `#f`
ByteString := `#"` binchar* `"`
| `#x"` (ws hex hex)* ws `"`
| `#[` (ws base64char)* ws `]`
String := `"` («any unicode scalar except `\` or `"`» | escaped | `\"`)* `"`
QuotedSymbol := `|` («any unicode scalar except `\` or `|`» | escaped | `\|`)* `|`
Symbol := (`A`..`Z` | `a`..`z` | `0`..`9` | sympunct | symuchar)+
Number := Float | Double | SignedInteger
Float := flt (`f`|`F`) | `#xf"` (ws hex hex)4 ws `"`
Double := flt | `#xd"` (ws hex hex)8 ws `"`
SignedInteger := int
Embedded := `#!` Value
Annotated := Annotation Value
Annotation := `@` Value | `;` «any unicode scalar except cr or lf»* (cr | lf)
escaped := `\\` | `\/` | `\b` | `\f` | `\n` | `\r` | `\t` | `\u` hex hex hex hex
binescaped := `\\` | `\/` | `\b` | `\f` | `\n` | `\r` | `\t` | `\x` hex hex
binchar := «any scalar ≥32 and ≤126, except `\` or `"`» | binescaped | `\"`
base64char := `A`..`Z` | `a`..`z` | `0`..`9` | `+` | `/` | `-` | `_` | `=`
sympunct := `~` | `!` | `$` | `%` | `^` | `&` | `*` | `?`
| `_` | `=` | `+` | `-` | `/` | `.`
symuchar := «any scalar value ≥128 whose Unicode category is
Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd, Nl, No, Pc,
Pd, Po, Sc, Sm, Sk, So, or Co»
flt := int ( frac exp | frac | exp )
int := (`-`|`+`) (`0`..`9`)+
frac := `.` (`0`..`9`)+
exp := (`e`|`E`) (`-`|`+`) (`0`..`9`)+
hex := `A`..`F` | `a`..`f` | `0`..`9`
```

View File

@ -0,0 +1,18 @@
```text
Value = Atom
| Compound
| Embedded
Atom = Boolean
| Float
| Double
| SignedInteger
| String
| ByteString
| Symbol
Compound = Record
| Sequence
| Set
| Dictionary
```

View File

@ -0,0 +1,12 @@
*Preserves* is a data model, with associated serialization formats.
It supports *records* with user-defined *labels*, embedded
*references*, and the usual suite of atomic and compound data types,
including *binary* data as a distinct type from text strings. Its
*annotations* allow separation of data from metadata such as comments,
trace information, and provenance information.
Preserves departs from many other data languages in defining how to
*compare* two values. Comparison is based on the data model, not on
syntax or on data structures of any particular implementation
language.

View File

@ -1,3 +1,5 @@
//! Support for Serde deserialization of Preserves terms described by Rust data types.
use serde::de::{DeserializeSeed, EnumAccess, MapAccess, SeqAccess, VariantAccess, Visitor};
use serde::Deserialize;
@ -11,13 +13,21 @@ use super::value::{IOValue, IOValueDomainCodec, PackedReader, TextReader, ViaCod
pub use super::error::Error;
/// A [std::result::Result] type including [Error], the Preserves Serde deserialization error
/// type, as its error.
pub type Result<T> = std::result::Result<T, Error>;
/// Serde deserializer for Preserves-encoded Rust data. Use [Deserializer::from_reader] to
/// construct instances, or [from_bytes]/[from_text]/[from_read]/[from_reader] etc to
/// deserialize single terms directly.
pub struct Deserializer<'de, 'r, R: Reader<'de, IOValue>> {
/// The underlying Preserves [reader][crate::value::reader::Reader].
pub read: &'r mut R,
phantom: PhantomData<&'de ()>,
}
/// Deserialize a `T` from `bytes`, which must contain a Preserves [machine-oriented binary
/// syntax][crate::value::packed] term corresponding to the Serde serialization of a `T`.
pub fn from_bytes<'de, T>(bytes: &'de [u8]) -> Result<T>
where
T: Deserialize<'de>,
@ -28,6 +38,8 @@ where
))
}
/// Deserialize a `T` from `text`, which must contain a Preserves [text
/// syntax][crate::value::text] term corresponding to the Serde serialization of a `T`.
pub fn from_text<'de, T>(text: &'de str) -> Result<T>
where
T: Deserialize<'de>,
@ -38,6 +50,8 @@ where
))
}
/// Deserialize a `T` from `read`, which must yield a Preserves [machine-oriented binary
/// syntax][crate::value::packed] term corresponding to the Serde serialization of a `T`.
pub fn from_read<'de, 'r, IOR: io::Read + io::Seek, T>(read: &'r mut IOR) -> Result<T>
where
T: Deserialize<'de>,
@ -48,6 +62,8 @@ where
))
}
/// Deserialize a `T` from `read`, which must yield a Preserves term corresponding to the Serde
/// serialization of a `T`.
pub fn from_reader<'r, 'de, R: Reader<'de, IOValue>, T>(read: &'r mut R) -> Result<T>
where
T: Deserialize<'de>,
@ -58,6 +74,7 @@ where
}
impl<'r, 'de, R: Reader<'de, IOValue>> Deserializer<'de, 'r, R> {
/// Construct a Deserializer from `read`, a Preserves [reader][crate::value::Reader].
pub fn from_reader(read: &'r mut R) -> Self {
Deserializer {
read,
@ -344,6 +361,7 @@ impl<'r, 'de, 'a, R: Reader<'de, IOValue>> serde::de::Deserializer<'de>
}
}
#[doc(hidden)]
pub struct Seq<'de, 'r, 'a, R: Reader<'de, IOValue>> {
b: B::Type,
i: B::Item,

View File

@ -1,27 +1,47 @@
//! Serde and plain-Preserves codec errors.
use num::bigint::BigInt;
use std::convert::From;
use std::io;
/// Representation of parse, deserialization, and other conversion errors.
#[derive(Debug)]
pub enum Error {
/// Generic IO error.
Io(io::Error),
/// Generic message for the user.
Message(String),
/// Invalid unicode scalar `n` found during interpretation of a `<UnicodeScalar n>` record
/// as a Rust `char`.
InvalidUnicodeScalar(u32),
/// Preserves supports arbitrary integers; when these are converted to specific Rust
/// machine word types, sometimes they exceed the available range.
NumberOutOfRange(BigInt),
/// Serde has limited support for deserializing free-form data; this error is signalled
/// when one of the limits is hit.
CannotDeserializeAny,
/// Syntax error: missing closing delimiter (`)`, `]`, `}`, `>` in text syntax; `0x84` in binary syntax; etc.)
MissingCloseDelimiter,
/// Signalled when an expected term is not present.
MissingItem,
/// Signalled when what was received did not match expectations.
Expected(ExpectedKind, Received),
#[doc(hidden)] // TODO remove this enum variant? It isn't used
StreamingSerializationUnsupported,
}
/// Used in [Error::Expected] to indicate what was received.
#[derive(Debug)]
pub enum Received {
#[doc(hidden)] // TODO remove this enum variant? It isn't used
ReceivedSomethingElse,
/// Received a record with the given label symbol text.
ReceivedRecordWithLabel(String),
/// Received some other value, described in the `String`
ReceivedOtherValue(String),
}
/// Used in [Error::Expected] to indicate what was expected.
#[derive(Debug, PartialEq)]
pub enum ExpectedKind {
Boolean,
@ -35,7 +55,9 @@ pub enum ExpectedKind {
ByteString,
Symbol,
/// Expected a record, either of a specific arity (length) or of no specific arity
Record(Option<usize>),
/// Expected a record with a symbol label with text `String`, perhaps of some specific arity
SimpleRecord(String, Option<usize>),
Sequence,
Set,
@ -87,14 +109,17 @@ impl std::fmt::Display for Error {
//---------------------------------------------------------------------------
/// True iff `e` is `Error::Io`
pub fn is_io_error(e: &Error) -> bool {
matches!(e, Error::Io(_))
}
/// Produce the generic "end of file" error, `Error::Io(`[io_eof]`())`
pub fn eof() -> Error {
Error::Io(io_eof())
}
/// True iff `e` is an "end of file" error; see [is_eof_io_error]
pub fn is_eof_error(e: &Error) -> bool {
if let Error::Io(ioe) = e {
is_eof_io_error(ioe)
@ -103,10 +128,12 @@ pub fn is_eof_error(e: &Error) -> bool {
}
}
/// Produce a syntax error bearing the message `s`
pub fn syntax_error(s: &str) -> Error {
Error::Io(io_syntax_error(s))
}
/// True iff `e` is a syntax error; see [is_syntax_io_error]
pub fn is_syntax_error(e: &Error) -> bool {
if let Error::Io(ioe) = e {
is_syntax_io_error(ioe)
@ -117,18 +144,22 @@ pub fn is_syntax_error(e: &Error) -> bool {
//---------------------------------------------------------------------------
/// Produce an [io::Error] of [io::ErrorKind::UnexpectedEof].
pub fn io_eof() -> io::Error {
io::Error::new(io::ErrorKind::UnexpectedEof, "EOF")
}
/// True iff `e` is [io::ErrorKind::UnexpectedEof]
pub fn is_eof_io_error(e: &io::Error) -> bool {
matches!(e.kind(), io::ErrorKind::UnexpectedEof)
}
/// Produce a syntax error ([io::ErrorKind::InvalidData]) bearing the message `s`
pub fn io_syntax_error(s: &str) -> io::Error {
io::Error::new(io::ErrorKind::InvalidData, s)
}
/// True iff `e` is an [io::ErrorKind::InvalidData] (a syntax error)
pub fn is_syntax_io_error(e: &io::Error) -> bool {
matches!(e.kind(), io::ErrorKind::InvalidData)
}

View File

@ -1,19 +1,38 @@
//! Utilities for producing and flexibly parsing strings containing hexadecimal binary data.
/// Utility for parsing hex binary data from strings.
pub enum HexParser {
/// "Liberal" parsing simply ignores characters that are not (case-insensitive) hex digits.
Liberal,
/// "Whitespace allowed" parsing ignores whitespace, but fails a parse on anything other
/// than hex or whitespace.
WhitespaceAllowed,
/// "Strict" parsing accepts only (case-insensitive) hex digits; no whitespace, no other
/// characters.
Strict,
}
/// Utility for formatting binary data as hex.
pub enum HexFormatter {
/// Produces LF-separated lines with a maximum of `usize` hex digits in each line.
Lines(usize),
/// Simply packs hex digits in as tightly as possible.
Packed,
}
/// Convert a number 0..15 to a hex digit [char].
///
/// # Panics
///
/// Panics if given `v` outside the range 0..15 inclusive.
///
pub fn hexdigit(v: u8) -> char {
char::from_digit(v as u32, 16).expect("hexadecimal digit value")
}
impl HexParser {
/// Decode `s` according to the given rules for `self`; see [HexParser].
/// If the parse fails, yield `None`.
pub fn decode(&self, s: &str) -> Option<Vec<u8>> {
let mut result = Vec::new();
let mut buf: u8 = 0;
@ -49,6 +68,7 @@ impl HexParser {
}
impl HexFormatter {
/// Encode `bs` according to the given rules for `self; see [HexFormatter].
pub fn encode(&self, bs: &[u8]) -> String {
match self {
HexFormatter::Lines(max_line_length) => {

View File

@ -1,3 +1,9 @@
#![doc = concat!(
include_str!("../README.md"),
"# What is Preserves?\n\n",
include_str!("../doc/what-is-preserves.md"),
)]
pub mod de;
pub mod error;
pub mod hex;

View File

@ -1,3 +1,5 @@
//! Support for Serde serialization of Rust data types into Preserves terms.
use super::value::boundary as B;
use super::value::writer::{CompoundWriter, Writer};
use super::value::IOValueDomainCodec;
@ -7,11 +9,16 @@ pub use super::error::Error;
type Result<T> = std::result::Result<T, Error>;
#[derive(Debug)]
/// Serde serializer for Preserves-encoding Rust data. Construct via [Serializer::new], and use
/// with [serde::Serialize::serialize] methods.
pub struct Serializer<'w, W: Writer> {
/// The underlying Preserves [writer][crate::value::writer::Writer].
pub write: &'w mut W,
}
impl<'w, W: Writer> Serializer<'w, W> {
/// Construct a new [Serializer] targetting the given
/// [writer][crate::value::writer::Writer].
pub fn new(write: &'w mut W) -> Self {
Serializer { write }
}
@ -22,6 +29,7 @@ enum SequenceVariant<W: Writer> {
Record(W::RecWriter),
}
#[doc(hidden)]
pub struct SerializeCompound<'a, 'w, W: Writer> {
b: B::Type,
i: B::Item,
@ -29,6 +37,7 @@ pub struct SerializeCompound<'a, 'w, W: Writer> {
c: SequenceVariant<W>,
}
#[doc(hidden)]
pub struct SerializeDictionary<'a, 'w, W: Writer> {
b: B::Type,
ser: &'a mut Serializer<'w, W>,
@ -442,6 +451,8 @@ impl<'a, 'w, W: Writer> serde::ser::SerializeSeq for SerializeCompound<'a, 'w, W
}
}
/// Convenience function for directly serializing a Serde-serializable `T` to the given
/// `write`, a Preserves [writer][crate::value::writer::Writer].
pub fn to_writer<W: Writer, T: Serialize + ?Sized>(write: &mut W, value: &T) -> Result<()> {
Ok(value.serialize(&mut Serializer::new(write))?)
}

View File

@ -1,7 +1,26 @@
//! Serde support for serializing Rust collections as Preserves sets.
//!
//! Serde doesn't include sets in its data model, so we do some somewhat awful tricks to force
//! things to come out the way we want them.
//!
//! # Example
//!
//! Annotate collection-valued fields that you want to (en|de)code as Preserves `Set`s with
//! `#[serde(with = "preserves::set")]`:
//!
//! ```rust
//! #[derive(serde::Serialize, serde::Deserialize)]
//! struct Example {
//! #[serde(with = "preserves::set")]
//! items: preserves::value::Set<String>,
//! }
//! ```
use crate::value::{self, to_value, IOValue, UnwrappedIOValue};
use serde::{Deserialize, Deserializer, Serialize, Serializer};
use std::iter::IntoIterator;
#[doc(hidden)]
pub fn serialize<S, T, Item>(s: T, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
@ -12,6 +31,7 @@ where
UnwrappedIOValue::from(s).wrap().serialize(serializer)
}
#[doc(hidden)]
pub fn deserialize<'de, D, T>(deserializer: D) -> Result<T, D::Error>
where
D: Deserializer<'de>,

View File

@ -1,5 +1,25 @@
//! Serde support for serializing Rust data as Preserves symbols.
//!
//! Serde doesn't include symbols in its data model, so we do some somewhat awful tricks to
//! force things to come out the way we want them.
//!
//! # Example
//!
//! Either use [Symbol] directly in your data types, or annotate [String]-valued fields that
//! you want to (en|de)code as Preserves `Symbol`s with `#[serde(with = "preserves::symbol")]`:
//!
//! ```rust
//! #[derive(serde::Serialize, serde::Deserialize)]
//! struct Example {
//! sym1: preserves::symbol::Symbol,
//! #[serde(with = "preserves::symbol")]
//! sym2: String,
//! }
//! ```
use crate::value::{IOValue, NestedValue};
/// Wrapper for a string to coerce its Preserves-serialization to `Symbol`.
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Clone)]
pub struct Symbol(pub String);
@ -26,6 +46,7 @@ impl<'de> serde::Deserialize<'de> for Symbol {
}
}
#[doc(hidden)]
pub fn serialize<S>(s: &str, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
@ -34,6 +55,7 @@ where
Symbol(s.to_string()).serialize(serializer)
}
#[doc(hidden)]
pub fn deserialize<'de, D>(deserializer: D) -> Result<String, D::Error>
where
D: serde::Deserializer<'de>,

View File

@ -1,3 +1,5 @@
#![doc(hidden)]
#[derive(Default, Clone, Debug)]
pub struct Type {
pub closing: Option<Item>,

View File

@ -1,3 +1,5 @@
//! Support Serde deserialization of Rust data types from Preserves *values* (not syntax).
use crate::error::{Error, ExpectedKind, Received};
use crate::value::repr::{Double, Float};
use crate::value::{IOValue, Map, NestedValue, UnwrappedIOValue, Value};
@ -7,10 +9,14 @@ use std::iter::Iterator;
pub type Result<T> = std::result::Result<T, Error>;
/// Serde deserializer for constructing Rust data from an in-memory Preserves value. Use
/// [Deserializer::from_value] to construct instances, or [from_value] to deserialize single
/// values directly.
pub struct Deserializer<'de> {
input: &'de IOValue,
}
/// Deserialize a `T` from `v`, a Preserves [IOValue].
pub fn from_value<'a, T>(v: &'a IOValue) -> Result<T>
where
T: Deserialize<'a>,
@ -21,6 +27,7 @@ where
}
impl<'de> Deserializer<'de> {
/// Construct a Deserializer from `v`, an [IOValue].
pub fn from_value(v: &'de IOValue) -> Self {
Deserializer { input: v }
}
@ -331,6 +338,7 @@ impl<'de, 'a> serde::de::Deserializer<'de> for &'a mut Deserializer<'de> {
}
}
#[doc(hidden)]
pub struct VecSeq<'a, 'de: 'a, I: Iterator<Item = &'de IOValue>> {
iter: I,
de: &'a mut Deserializer<'de>,
@ -359,6 +367,7 @@ impl<'de, 'a, I: Iterator<Item = &'de IOValue>> SeqAccess<'de> for VecSeq<'a, 'd
}
}
#[doc(hidden)]
pub struct DictMap<'a, 'de: 'a> {
pending: Option<&'de IOValue>,
iter: Box<dyn Iterator<Item = (&'de IOValue, &'de IOValue)> + 'a>,

View File

@ -1,3 +1,6 @@
//! Traits for working with Preserves [embedded
//! values](https://preserves.dev/preserves.html#embeddeds).
use std::io;
use super::packed;
@ -9,10 +12,12 @@ use super::NestedValue;
use super::Reader;
use super::Writer;
/// Implementations parse [IOValue]s to their own particular [Embeddable] values of type `D`.
pub trait DomainParse<D: Embeddable> {
fn parse_embedded(&mut self, v: &IOValue) -> io::Result<D>;
}
/// Implementations read and parse from `src` to produce [Embeddable] values of type `D`.
pub trait DomainDecode<D: Embeddable> {
fn decode_embedded<'de, 'src, S: BinarySource<'de>>(
&mut self,
@ -21,6 +26,7 @@ pub trait DomainDecode<D: Embeddable> {
) -> io::Result<D>;
}
/// Implementations unparse and write `D`s to `w`, a [writer][crate::value::writer::Writer].
pub trait DomainEncode<D: Embeddable> {
fn encode_embedded<W: Writer>(&mut self, w: &mut W, d: &D) -> io::Result<()>;
}
@ -41,6 +47,9 @@ impl<'a, D: Embeddable, T: DomainDecode<D>> DomainDecode<D> for &'a mut T {
}
}
/// Convenience codec: use this as embedded codec for encoding (only) when embedded values
/// should be serialized as Preserves `String`s holding their Rust [std::fmt::Debug]
/// representation.
pub struct DebugDomainEncode;
impl<D: Embeddable> DomainEncode<D> for DebugDomainEncode {
@ -49,6 +58,8 @@ impl<D: Embeddable> DomainEncode<D> for DebugDomainEncode {
}
}
/// Convenience codec: use this as embedded codec for decoding (only) when embedded values are
/// expected to conform to the syntax implicit in their [std::str::FromStr] implementation.
pub struct FromStrDomainParse;
impl<Err: Into<io::Error>, D: Embeddable + std::str::FromStr<Err = Err>> DomainParse<D>
@ -59,6 +70,8 @@ impl<Err: Into<io::Error>, D: Embeddable + std::str::FromStr<Err = Err>> DomainP
}
}
/// Use this as embedded codec when embedded data are already [IOValue]s that can be directly
/// serialized and deserialized without further transformation.
pub struct IOValueDomainCodec;
impl DomainDecode<IOValue> for IOValueDomainCodec {
@ -77,6 +90,7 @@ impl DomainEncode<IOValue> for IOValueDomainCodec {
}
}
/// Use this as embedded codec to forbid use of embedded values; an [io::Error] is signalled.
pub struct NoEmbeddedDomainCodec;
impl<D: Embeddable> DomainDecode<D> for NoEmbeddedDomainCodec {
@ -101,9 +115,12 @@ impl<D: Embeddable> DomainEncode<D> for NoEmbeddedDomainCodec {
}
}
/// If some `C` implements [DomainDecode] but not [DomainParse], or vice versa, use `ViaCodec`
/// to promote the one to the other. Construct instances with [ViaCodec::new].
pub struct ViaCodec<C>(C);
impl<C> ViaCodec<C> {
/// Constructs a `ViaCodec` wrapper around an underlying codec of type `C`.
pub fn new(c: C) -> Self {
ViaCodec(c)
}

View File

@ -1,3 +1,12 @@
#![doc(hidden)]
//! A horrifying hack to Serde-serialize [IOValue] instances to Preserves *as themselves*.
//!
//! Frankly I think this portion of the codebase might not survive for long. I can't think of a
//! better way of achieving this, but the drawbacks of having this functionality are *severe*.
//!
//! See <https://gitlab.com/preserves/preserves/-/issues/42>.
use super::repr::IOValue;
pub static MAGIC: &str = "$____Preserves_Serde_Magic";

View File

@ -1,8 +1,13 @@
//! Implements the Preserves
//! [merge](https://preserves.dev/preserves.html#appendix-merging-values) of values.
use super::Map;
use super::NestedValue;
use super::Record;
use super::Value;
/// Merge two sequences of values according to [the
/// specification](https://preserves.dev/preserves.html#appendix-merging-values).
pub fn merge_seqs<N: NestedValue>(mut a: Vec<N>, mut b: Vec<N>) -> Option<Vec<N>> {
if a.len() > b.len() {
std::mem::swap(&mut a, &mut b);
@ -16,6 +21,8 @@ pub fn merge_seqs<N: NestedValue>(mut a: Vec<N>, mut b: Vec<N>) -> Option<Vec<N>
Some(r)
}
/// Merge two values according to [the
/// specification](https://preserves.dev/preserves.html#appendix-merging-values).
pub fn merge2<N: NestedValue>(v: N, w: N) -> Option<N> {
let (mut v_anns, v_val) = v.pieces();
let (w_anns, w_val) = w.pieces();
@ -52,6 +59,8 @@ pub fn merge2<N: NestedValue>(v: N, w: N) -> Option<N> {
}
}
/// Merge several values into a single value according to [the
/// specification](https://preserves.dev/preserves.html#appendix-merging-values).
pub fn merge<N: NestedValue, I: IntoIterator<Item = N>>(vs: I) -> Option<N> {
let mut vs = vs.into_iter();
let mut v = vs.next().expect("at least one value in merge()");

View File

@ -1,3 +1,53 @@
//! # Representing, reading, and writing Preserves `Value`s as Rust data
//!
//! ```
//! use preserves::value::{IOValue, text, packed};
//! let v: IOValue = text::iovalue_from_str("<hi>")?;
//! let w: IOValue = packed::iovalue_from_bytes(b"\xb4\xb3\x02hi\x84")?;
//! assert_eq!(v, w);
//! assert_eq!(text::TextWriter::encode_iovalue(&v)?, "<hi>");
//! assert_eq!(packed::PackedWriter::encode_iovalue(&v)?, b"\xb4\xb3\x02hi\x84");
//! # Ok::<(), std::io::Error>(())
//! ```
//!
//! Preserves `Value`s are categorized in the following way. The core representation type,
//! [crate::value::repr::Value], reflects this structure. However, most of the time you will
//! work with [IOValue] or some other implementation of trait [NestedValue], which augments an
//! underlying [Value] with [*annotations*][crate::value::repr::Annotations] (e.g. comments) and fixes a strategy
//! for memory management.
//!
#![doc = include_str!("../../doc/value-grammar.md")]
//!
//! ## Memory management
//!
//! Each implementation of [NestedValue] chooses a different point in the space of possible
//! approaches to memory management for `Value`s.
//!
//! ##### `IOValue`
//!
//! The most commonly-used and versatile implementation, [IOValue], uses [std::sync::Arc] for
//! internal links in compound `Value`s. Unlike many of the other implementations of
//! [NestedValue], [IOValue] doesn't offer flexibility in the Rust data type to be used for
//! Preserves [embedded values](https://preserves.dev/preserves.html#embeddeds): instead,
//! embedded values in an [IOValue] are themselves [IOValue]s.
//!
//! ##### `ArcValue<D>`, `RcValue<D>`, and `PlainValue<D>`
//!
//! For control over the Rust type to use for embedded values, choose [ArcValue], [RcValue], or
//! [PlainValue]. Use [ArcValue] when you wish to transfer values among threads. [RcValue] is
//! more niche; it may be useful for complex terms that do not need to cross thread boundaries.
//! [PlainValue] is even more niche: it does not use a reference-counted pointer type, meaning
//! it does not offer any kind of aliasing or sharing among subterms at all.
//!
//! # Parsing, pretty-printing, encoding and decoding `Value`s
//!
//! Modules [reader] and [writer] supply generic [Reader] and [Writer] traits for parsing and
//! unparsing Preserves data. Implementations of [Reader] and [Writer] connect Preserves data
//! to specific transfer syntaxes:
//!
//! - module [packed] supplies tools for working with the machine-oriented binary syntax
//! - module [text] supplies tools for working with human-readable text syntax
pub mod boundary;
pub mod de;
pub mod domain;
@ -56,6 +106,7 @@ pub use text::TextReader;
pub use text::TextWriter;
pub use writer::Writer;
#[doc(hidden)]
pub fn invert_map<A, B>(m: &Map<A, B>) -> Map<B, A>
where
A: Clone,

View File

@ -1,6 +1,9 @@
//! Definitions of the tags used in the binary encoding.
use std::convert::{From, TryFrom};
use std::io;
/// Rust representation of tags used in the binary encoding.
#[derive(Debug, PartialEq, Eq)]
pub enum Tag {
False,
@ -19,8 +22,9 @@ pub enum Tag {
Dictionary,
}
/// Error value representing failure to decode a byte into a [Tag].
#[derive(Debug, PartialEq, Eq)]
pub struct InvalidTag(u8);
pub struct InvalidTag(pub u8);
impl From<InvalidTag> for io::Error {
fn from(v: InvalidTag) -> Self {

View File

@ -1,3 +1,15 @@
//! Implements the Preserves [machine-oriented binary
//! syntax](https://preserves.dev/preserves-binary.html).
//!
//! The main entry points for reading are functions [iovalue_from_bytes],
//! [annotated_iovalue_from_bytes], [from_bytes], and [annotated_from_bytes].
//!
//! The main entry points for writing are [PackedWriter::encode_iovalue] and
//! [PackedWriter::encode].
//!
//! # Summary of Binary Syntax
#![doc = include_str!("../../../doc/cheatsheet-binary-plaintext.md")]
pub mod constants;
pub mod reader;
pub mod writer;
@ -9,6 +21,8 @@ use std::io;
use super::{BinarySource, DomainDecode, IOValue, IOValueDomainCodec, NestedValue, Reader};
/// Reads a value from the given byte vector `bs` using the binary encoding, discarding
/// annotations.
pub fn from_bytes<N: NestedValue, Dec: DomainDecode<N::Embedded>>(
bs: &[u8],
decode_embedded: Dec,
@ -18,10 +32,13 @@ pub fn from_bytes<N: NestedValue, Dec: DomainDecode<N::Embedded>>(
.demand_next(false)
}
/// Reads an [IOValue] from the given byte vector `bs` using the binary encoding, discarding
/// annotations.
pub fn iovalue_from_bytes(bs: &[u8]) -> io::Result<IOValue> {
from_bytes(bs, IOValueDomainCodec)
}
/// As [from_bytes], but includes annotations.
pub fn annotated_from_bytes<N: NestedValue, Dec: DomainDecode<N::Embedded>>(
bs: &[u8],
decode_embedded: Dec,
@ -31,6 +48,7 @@ pub fn annotated_from_bytes<N: NestedValue, Dec: DomainDecode<N::Embedded>>(
.demand_next(true)
}
/// As [iovalue_from_bytes], but includes annotations.
pub fn annotated_iovalue_from_bytes(bs: &[u8]) -> io::Result<IOValue> {
annotated_from_bytes(bs, IOValueDomainCodec)
}

View File

@ -1,3 +1,5 @@
//! Implementation of [Reader] for the binary encoding.
use crate::error::{self, io_syntax_error, is_eof_io_error, ExpectedKind, Received};
use num::bigint::BigInt;
@ -18,6 +20,7 @@ use super::super::{
};
use super::constants::Tag;
/// The binary encoding Preserves reader.
pub struct PackedReader<
'de,
'src,
@ -25,7 +28,9 @@ pub struct PackedReader<
Dec: DomainDecode<N::Embedded>,
S: BinarySource<'de>,
> {
/// Underlying source of bytes.
pub source: &'src mut S,
/// Decoder for producing Rust values embedded in the binary data.
pub decode_embedded: Dec,
phantom: PhantomData<&'de N>,
}
@ -67,6 +72,7 @@ fn out_of_range<I: Into<BigInt>>(i: I) -> error::Error {
impl<'de, 'src, N: NestedValue, Dec: DomainDecode<N::Embedded>, S: BinarySource<'de>>
PackedReader<'de, 'src, N, Dec, S>
{
/// Construct a new reader from a byte source and embedded-value decoder.
#[inline(always)]
pub fn new(source: &'src mut S, decode_embedded: Dec) -> Self {
PackedReader {

View File

@ -1,3 +1,5 @@
//! Implementation of [Writer] for the binary encoding.
use super::super::boundary as B;
use super::super::suspendable::Suspendable;
use super::super::DomainEncode;
@ -13,9 +15,11 @@ use std::ops::DerefMut;
use super::super::writer::{varint, CompoundWriter, Writer};
/// The binary encoding Preserves writer.
pub struct PackedWriter<W: io::Write>(Suspendable<W>);
impl PackedWriter<&mut Vec<u8>> {
/// Encodes `v` to a byte vector.
#[inline(always)]
pub fn encode<N: NestedValue, Enc: DomainEncode<N::Embedded>>(
enc: &mut Enc,
@ -26,6 +30,7 @@ impl PackedWriter<&mut Vec<u8>> {
Ok(buf)
}
/// Encodes `v` to a byte vector.
#[inline(always)]
pub fn encode_iovalue(v: &IOValue) -> io::Result<Vec<u8>> {
Self::encode(&mut IOValueDomainCodec, v)
@ -33,26 +38,31 @@ impl PackedWriter<&mut Vec<u8>> {
}
impl<W: io::Write> PackedWriter<W> {
/// Construct a writer from the given byte sink `write`.
#[inline(always)]
pub fn new(write: W) -> Self {
PackedWriter(Suspendable::new(write))
}
/// Retrieve a mutable reference to the underlying byte sink.
#[inline(always)]
pub fn w(&mut self) -> &mut W {
self.0.deref_mut()
}
#[doc(hidden)]
#[inline(always)]
pub fn write_byte(&mut self, b: u8) -> io::Result<()> {
self.w().write_all(&[b])
}
#[doc(hidden)]
#[inline(always)]
pub fn write_integer(&mut self, bs: &[u8]) -> io::Result<()> {
self.write_atom(Tag::SignedInteger, bs)
}
#[doc(hidden)]
#[inline(always)]
pub fn write_atom(&mut self, tag: Tag, bs: &[u8]) -> io::Result<()> {
self.write_byte(tag.into())?;
@ -60,17 +70,20 @@ impl<W: io::Write> PackedWriter<W> {
self.w().write_all(bs)
}
#[doc(hidden)]
#[inline(always)]
pub fn suspend(&mut self) -> Self {
PackedWriter(self.0.suspend())
}
#[doc(hidden)]
#[inline(always)]
pub fn resume(&mut self, other: Self) {
self.0.resume(other.0)
}
}
#[doc(hidden)]
pub struct BinaryOrderWriter(Vec<Vec<u8>>);
impl BinaryOrderWriter {
@ -119,6 +132,7 @@ impl BinaryOrderWriter {
}
}
#[doc(hidden)]
pub trait WriteWriter: Writer {
fn write_raw_bytes(&mut self, v: &[u8]) -> io::Result<()>;

View File

@ -1,3 +1,6 @@
//! Generic [Reader] trait for parsing Preserves [Value][crate::value::repr::Value]s,
//! implemented by code that provides each specific transfer syntax.
use crate::error::{self, io_eof, ExpectedKind, Received};
use std::borrow::Cow;
@ -18,59 +21,104 @@ use super::ViaCodec;
pub type ReaderResult<T> = std::result::Result<T, error::Error>;
/// Tokens produced when performing
/// [SAX](https://en.wikipedia.org/wiki/Simple_API_for_XML)-style reading of terms.
pub enum Token<N: NestedValue> {
/// An embedded value was seen and completely decoded.
Embedded(N::Embedded),
/// An atomic value was seen and completely decoded.
Atom(N),
/// A compound value has been opened; its contents follow, and it will be terminated by
/// [Token::End].
Compound(CompoundClass),
/// Closes a previously-opened compound value.
End,
}
/// Generic parser for Preserves.
pub trait Reader<'de, N: NestedValue> {
/// Retrieve the next parseable value or an indication of end-of-input.
///
/// Yields `Ok(Some(...))` if a complete value is available, `Ok(None)` if the end of
/// stream has been reached, or `Err(...)` for parse or IO errors, including
/// incomplete/partial input. See also [Reader::demand_next].
fn next(&mut self, read_annotations: bool) -> io::Result<Option<N>>;
// Hiding these from the documentation for the moment because I don't want to have to
// document the whole Boundary thing.
#[doc(hidden)]
fn open_record(&mut self, arity: Option<usize>) -> ReaderResult<B::Type>;
#[doc(hidden)]
fn open_sequence_or_set(&mut self) -> ReaderResult<B::Item>;
#[doc(hidden)]
fn open_sequence(&mut self) -> ReaderResult<()>;
#[doc(hidden)]
fn open_set(&mut self) -> ReaderResult<()>;
#[doc(hidden)]
fn open_dictionary(&mut self) -> ReaderResult<()>;
#[doc(hidden)]
fn boundary(&mut self, b: &B::Type) -> ReaderResult<()>;
#[doc(hidden)]
// close_compound implies a b.shift(...) and a self.boundary(b).
fn close_compound(&mut self, b: &mut B::Type, i: &B::Item) -> ReaderResult<bool>;
#[doc(hidden)]
fn open_embedded(&mut self) -> ReaderResult<()>;
#[doc(hidden)]
fn close_embedded(&mut self) -> ReaderResult<()>;
/// Allows structured backtracking to an earlier stage in a parse. Useful for layering
/// parser combinators atop a Reader.
type Mark;
/// Retrieve a marker for the current position in the input.
fn mark(&mut self) -> io::Result<Self::Mark>;
/// Seek the input to a previously-saved position.
fn restore(&mut self, mark: &Self::Mark) -> io::Result<()>;
/// Get the next [SAX](https://en.wikipedia.org/wiki/Simple_API_for_XML)-style event,
/// discarding annotations.
///
/// The `read_embedded_annotations` controls whether annotations are also skipped on
/// *embedded* values or not.
fn next_token(&mut self, read_embedded_annotations: bool) -> io::Result<Token<N>>;
/// Get the next [SAX](https://en.wikipedia.org/wiki/Simple_API_for_XML)-style event, plus
/// a vector containing any annotations that preceded it.
fn next_annotations_and_token(&mut self) -> io::Result<(Vec<N>, Token<N>)>;
//---------------------------------------------------------------------------
/// Skips the next available complete value. Yields an error if no such value exists.
fn skip_value(&mut self) -> io::Result<()> {
// TODO efficient skipping in specific impls of this trait
let _ = self.demand_next(false)?;
Ok(())
}
/// Retrieve the next parseable value, treating end-of-input as an error.
///
/// Yields `Ok(...)` if a complete value is available or `Err(...)` for parse or IO errors,
/// including incomplete/partial input or end of stream. See also [Reader::next].
fn demand_next(&mut self, read_annotations: bool) -> io::Result<N> {
self.next(read_annotations)?.ok_or_else(io_eof)
}
/// Yields the next value, if it is a `Boolean`, or an error otherwise.
fn next_boolean(&mut self) -> ReaderResult<bool> {
self.demand_next(false)?.value().to_boolean()
}
/// Yields the next value, if it is a `Float`, or an error otherwise.
fn next_float(&mut self) -> ReaderResult<Float> {
Ok(self.demand_next(false)?.value().to_float()?.to_owned())
}
/// Yields the next value, if it is a `Double`, or an error otherwise.
fn next_double(&mut self) -> ReaderResult<Double> {
Ok(self.demand_next(false)?.value().to_double()?.to_owned())
}
/// Yields the next value, if it is a `SignedInteger`, or an error otherwise.
fn next_signedinteger(&mut self) -> ReaderResult<SignedInteger> {
Ok(self
.demand_next(false)?
@ -79,64 +127,92 @@ pub trait Reader<'de, N: NestedValue> {
.to_owned())
}
/// Yields the next value, if it is a `SignedInteger` that fits in [i8], or an error
/// otherwise.
fn next_i8(&mut self) -> ReaderResult<i8> {
self.demand_next(false)?.value().to_i8()
}
/// Yields the next value, if it is a `SignedInteger` that fits in [u8], or an error
/// otherwise.
fn next_u8(&mut self) -> ReaderResult<u8> {
self.demand_next(false)?.value().to_u8()
}
/// Yields the next value, if it is a `SignedInteger` that fits in [i16], or an error
/// otherwise.
fn next_i16(&mut self) -> ReaderResult<i16> {
self.demand_next(false)?.value().to_i16()
}
/// Yields the next value, if it is a `SignedInteger` that fits in [u16], or an error
/// otherwise.
fn next_u16(&mut self) -> ReaderResult<u16> {
self.demand_next(false)?.value().to_u16()
}
/// Yields the next value, if it is a `SignedInteger` that fits in [i32], or an error
/// otherwise.
fn next_i32(&mut self) -> ReaderResult<i32> {
self.demand_next(false)?.value().to_i32()
}
/// Yields the next value, if it is a `SignedInteger` that fits in [u32], or an error
/// otherwise.
fn next_u32(&mut self) -> ReaderResult<u32> {
self.demand_next(false)?.value().to_u32()
}
/// Yields the next value, if it is a `SignedInteger` that fits in [i64], or an error
/// otherwise.
fn next_i64(&mut self) -> ReaderResult<i64> {
self.demand_next(false)?.value().to_i64()
}
/// Yields the next value, if it is a `SignedInteger` that fits in [u64], or an error
/// otherwise.
fn next_u64(&mut self) -> ReaderResult<u64> {
self.demand_next(false)?.value().to_u64()
}
/// Yields the next value, if it is a `SignedInteger` that fits in [i128], or an error
/// otherwise.
fn next_i128(&mut self) -> ReaderResult<i128> {
self.demand_next(false)?.value().to_i128()
}
/// Yields the next value, if it is a `SignedInteger` that fits in [u128], or an error
/// otherwise.
fn next_u128(&mut self) -> ReaderResult<u128> {
self.demand_next(false)?.value().to_u128()
}
/// Yields the next value as an [f32], if it is a `Float`, or an error otherwise.
fn next_f32(&mut self) -> ReaderResult<f32> {
self.demand_next(false)?.value().to_f32()
}
/// Yields the next value as an [f64], if it is a `Double`, or an error otherwise.
fn next_f64(&mut self) -> ReaderResult<f64> {
self.demand_next(false)?.value().to_f64()
}
/// Yields the next value as a [char], if it is parseable by
/// [Value::to_char][crate::value::Value::to_char], or an error otherwise.
fn next_char(&mut self) -> ReaderResult<char> {
self.demand_next(false)?.value().to_char()
}
/// Yields the next value, if it is a `String`, or an error otherwise.
fn next_str(&mut self) -> ReaderResult<Cow<'de, str>> {
Ok(Cow::Owned(
self.demand_next(false)?.value().to_string()?.to_owned(),
))
}
/// Yields the next value, if it is a `ByteString`, or an error otherwise.
fn next_bytestring(&mut self) -> ReaderResult<Cow<'de, [u8]>> {
Ok(Cow::Owned(
self.demand_next(false)?.value().to_bytestring()?.to_owned(),
))
}
/// Yields the next value, if it is a `Symbol`, or an error otherwise.
fn next_symbol(&mut self) -> ReaderResult<Cow<'de, str>> {
Ok(Cow::Owned(
self.demand_next(false)?.value().to_symbol()?.to_owned(),
))
}
#[doc(hidden)]
fn open_option(&mut self) -> ReaderResult<Option<B::Type>> {
let b = self.open_record(None)?;
let label: &str = &self.next_symbol()?;
@ -153,6 +229,7 @@ pub trait Reader<'de, N: NestedValue> {
}
}
#[doc(hidden)]
fn open_simple_record(&mut self, name: &str, arity: Option<usize>) -> ReaderResult<B::Type> {
let b = self.open_record(arity)?;
let label: &str = &self.next_symbol()?;
@ -166,6 +243,7 @@ pub trait Reader<'de, N: NestedValue> {
}
}
/// Constructs a [ConfiguredReader] set with the given value for `read_annotations`.
fn configured(self, read_annotations: bool) -> ConfiguredReader<'de, N, Self>
where
Self: std::marker::Sized,
@ -177,6 +255,7 @@ pub trait Reader<'de, N: NestedValue> {
}
}
#[doc(hidden)]
fn ensure_more_expected(&mut self, b: &mut B::Type, i: &B::Item) -> ReaderResult<()> {
if !self.close_compound(b, i)? {
Ok(())
@ -185,6 +264,7 @@ pub trait Reader<'de, N: NestedValue> {
}
}
#[doc(hidden)]
fn ensure_complete(&mut self, mut b: B::Type, i: &B::Item) -> ReaderResult<()> {
if !self.close_compound(&mut b, i)? {
Err(error::Error::MissingCloseDelimiter)
@ -254,16 +334,27 @@ impl<'r, 'de, N: NestedValue, R: Reader<'de, N>> Reader<'de, N> for &'r mut R {
}
}
/// Generic seekable stream of input bytes.
pub trait BinarySource<'de>: Sized {
/// Allows structured backtracking to an earlier position in an input.
type Mark;
/// Retrieve a marker for the current position in the input.
fn mark(&mut self) -> io::Result<Self::Mark>;
/// Seek the input to a previously-saved position.
fn restore(&mut self, mark: &Self::Mark) -> io::Result<()>;
/// Skip the next byte.
fn skip(&mut self) -> io::Result<()>;
/// Returns the next byte without advancing over it.
fn peek(&mut self) -> io::Result<u8>;
/// Returns and consumes the next `count` bytes, which must all be available. Always yields
/// exactly `count` bytes or an error.
fn readbytes(&mut self, count: usize) -> io::Result<Cow<'de, [u8]>>;
/// As [BinarySource::readbytes], but uses `bs` as destination for the read bytes as well
/// as taking the size of `bs` as the count of bytes to read.
fn readbytes_into(&mut self, bs: &mut [u8]) -> io::Result<()>;
/// Constructs a [PackedReader][super::PackedReader] that will read from `self`.
fn packed<N: NestedValue, Dec: DomainDecode<N::Embedded>>(
&mut self,
decode_embedded: Dec,
@ -271,12 +362,14 @@ pub trait BinarySource<'de>: Sized {
super::PackedReader::new(self, decode_embedded)
}
/// Constructs a [PackedReader][super::PackedReader] that will read [IOValue]s from `self`.
fn packed_iovalues(
&mut self,
) -> super::PackedReader<'de, '_, IOValue, IOValueDomainCodec, Self> {
self.packed(IOValueDomainCodec)
}
/// Constructs a [TextReader][super::TextReader] that will read from `self`.
fn text<N: NestedValue, Dec: DomainParse<N::Embedded>>(
&mut self,
decode_embedded: Dec,
@ -284,6 +377,7 @@ pub trait BinarySource<'de>: Sized {
super::TextReader::new(self, decode_embedded)
}
/// Constructs a [TextReader][super::TextReader] that will read [IOValue]s from `self`.
fn text_iovalues(
&mut self,
) -> super::TextReader<'de, '_, IOValue, ViaCodec<IOValueDomainCodec>, Self> {
@ -291,12 +385,18 @@ pub trait BinarySource<'de>: Sized {
}
}
/// Implementation of [BinarySource] backed by an [`io::Read`]` + `[`io::Seek`] implementation.
pub struct IOBinarySource<R: io::Read + io::Seek> {
/// The underlying byte source.
pub read: R,
#[doc(hidden)]
/// One-place buffer for peeked bytes.
pub buf: Option<u8>,
}
impl<R: io::Read + io::Seek> IOBinarySource<R> {
/// Constructs an [IOBinarySource] from the given [`io::Read`]` + `[`io::Seek`]
/// implementation.
#[inline(always)]
pub fn new(read: R) -> Self {
IOBinarySource { read, buf: None }
@ -364,12 +464,17 @@ impl<'de, R: io::Read + io::Seek> BinarySource<'de> for IOBinarySource<R> {
}
}
/// Implementation of [BinarySource] backed by a slice of [u8].
pub struct BytesBinarySource<'de> {
/// The underlying byte source.
pub bytes: &'de [u8],
#[doc(hidden)]
/// Current position within `bytes`.
pub index: usize,
}
impl<'de> BytesBinarySource<'de> {
/// Constructs a [BytesBinarySource] from the given `u8` slice.
#[inline(always)]
pub fn new(bytes: &'de [u8]) -> Self {
BytesBinarySource { bytes, index: 0 }
@ -432,21 +537,29 @@ impl<'de> BinarySource<'de> for BytesBinarySource<'de> {
}
}
/// A combination of a [Reader] with presets governing its operation.
pub struct ConfiguredReader<'de, N: NestedValue, R: Reader<'de, N>> {
/// The underlying [Reader].
pub reader: R,
/// Configuration as to whether to include or discard annotations while reading.
pub read_annotations: bool,
phantom: PhantomData<&'de N>,
}
impl<'de, N: NestedValue, R: Reader<'de, N>> ConfiguredReader<'de, N, R> {
/// Constructs a [ConfiguredReader] based on the given `reader`.
pub fn new(reader: R) -> Self {
reader.configured(true)
}
/// Updates the `read_annotations` field of `self`.
pub fn set_read_annotations(&mut self, read_annotations: bool) {
self.read_annotations = read_annotations
}
/// Retrieve the next parseable value, treating end-of-input as an error.
///
/// Delegates directly to [Reader::demand_next].
pub fn demand_next(&mut self) -> io::Result<N> {
self.reader.demand_next(self.read_annotations)
}

File diff suppressed because it is too large Load Diff

View File

@ -1,6 +1,10 @@
//! Support for Serde serialization of Rust data types into Preserves *values* (not syntax).
use crate::value::{repr::Record, IOValue, Map, Value};
use serde::Serialize;
/// Empty/placeholder type for representing serialization errors: serialization to values
/// cannot fail.
#[derive(Debug)]
pub enum Error {}
impl serde::ser::Error for Error {
@ -20,17 +24,22 @@ impl std::fmt::Display for Error {
type Result<T> = std::result::Result<T, Error>;
/// Serde serializer for converting Rust data to in-memory Preserves values, which can then be
/// serialized using text or binary syntax, analyzed further, etc.
pub struct Serializer;
#[doc(hidden)]
pub struct SerializeDictionary {
next_key: Option<IOValue>,
items: Map<IOValue, IOValue>,
}
#[doc(hidden)]
pub struct SerializeRecord {
r: Record<IOValue>,
}
#[doc(hidden)]
pub struct SerializeSequence {
vec: Vec<IOValue>,
}
@ -359,6 +368,7 @@ impl serde::ser::SerializeSeq for SerializeSequence {
}
}
/// Convenience function for directly converting a Serde-serializable `T` to an [IOValue].
pub fn to_value<T>(value: T) -> IOValue
where
T: Serialize,

View File

@ -1,3 +1,6 @@
//! Representation of Preserves `SignedInteger`s as [i128]/[u128] (if they fit) or [BigInt] (if
//! they don't).
use num::bigint::BigInt;
use num::traits::cast::ToPrimitive;
use num::traits::sign::Signed;
@ -7,8 +10,10 @@ use std::convert::TryFrom;
use std::convert::TryInto;
use std::fmt;
// Invariant: if I128 can be used, it will be; otherwise, if U128 can
// be used, it will be; otherwise, Big will be used.
/// Internal representation of Preserves `SignedInteger`s.
///
/// Invariant: if I128 can be used, it will be; otherwise, if U128 can be used, it will be;
/// otherwise, Big will be used.
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
pub enum SignedIntegerRepr {
I128(i128),
@ -16,6 +21,7 @@ pub enum SignedIntegerRepr {
Big(Box<BigInt>),
}
/// Main representation of Preserves `SignedInteger`s.
#[derive(Clone, PartialEq, Eq, Hash)]
pub struct SignedInteger(SignedIntegerRepr);
@ -87,18 +93,25 @@ impl PartialOrd for SignedInteger {
}
impl SignedInteger {
/// Extract the internal representation.
pub fn repr(&self) -> &SignedIntegerRepr {
&self.0
}
/// Does this `SignedInteger` fit in an [i128]? (See also [the TryFrom instance for
/// i128](#impl-TryFrom<%26SignedInteger>-for-i128).)
pub fn is_i(&self) -> bool {
matches!(self.0, SignedIntegerRepr::I128(_))
}
/// Does this `SignedInteger` fit in a [u128], but not an [i128]? (See also [the TryFrom
/// instance for u128](#impl-TryFrom<%26SignedInteger>-for-u128).)
pub fn is_u(&self) -> bool {
matches!(self.0, SignedIntegerRepr::U128(_))
}
/// Does this `SignedInteger` fit neither in a [u128] nor an [i128]? (See also [the TryFrom
/// instance for BigInt](#impl-From<%26'a+SignedInteger>-for-BigInt).)
pub fn is_big(&self) -> bool {
matches!(self.0, SignedIntegerRepr::Big(_))
}

View File

@ -1,3 +1,5 @@
#![doc(hidden)]
use std::ops::{Deref, DerefMut};
pub enum Suspendable<T> {

View File

@ -1,3 +1,15 @@
//! Implements the Preserves [human-oriented text
//! syntax](https://preserves.dev/preserves-text.html).
//!
//! The main entry points for reading are functions [iovalue_from_str],
//! [annotated_iovalue_from_str], [from_str], and [annotated_from_str].
//!
//! The main entry points for writing are [TextWriter::encode_iovalue] and
//! [TextWriter::encode].
//!
//! # Summary of Text Syntax
#![doc = include_str!("../../../doc/cheatsheet-text-plaintext.md")]
pub mod reader;
pub mod writer;
@ -10,6 +22,7 @@ use std::io;
use super::{DomainParse, IOValue, IOValueDomainCodec, NestedValue, Reader, ViaCodec};
/// Reads a value from the given string using the text syntax, discarding annotations.
pub fn from_str<N: NestedValue, Dec: DomainParse<N::Embedded>>(
s: &str,
decode_embedded: Dec,
@ -17,10 +30,12 @@ pub fn from_str<N: NestedValue, Dec: DomainParse<N::Embedded>>(
TextReader::new(&mut BytesBinarySource::new(s.as_bytes()), decode_embedded).demand_next(false)
}
/// Reads an [IOValue] from the given string using the text syntax, discarding annotations.
pub fn iovalue_from_str(s: &str) -> io::Result<IOValue> {
from_str(s, ViaCodec::new(IOValueDomainCodec))
}
/// As [from_str], but includes annotations.
pub fn annotated_from_str<N: NestedValue, Dec: DomainParse<N::Embedded>>(
s: &str,
decode_embedded: Dec,
@ -28,6 +43,7 @@ pub fn annotated_from_str<N: NestedValue, Dec: DomainParse<N::Embedded>>(
TextReader::new(&mut BytesBinarySource::new(s.as_bytes()), decode_embedded).demand_next(true)
}
/// As [iovalue_from_str], but includes annotations.
pub fn annotated_iovalue_from_str(s: &str) -> io::Result<IOValue> {
annotated_from_str(s, ViaCodec::new(IOValueDomainCodec))
}

View File

@ -1,3 +1,5 @@
//! Implementation of [Reader] for the text syntax.
use crate::error::io_syntax_error;
use crate::error::is_eof_io_error;
use crate::error::syntax_error;
@ -35,8 +37,11 @@ use std::io;
use std::iter::FromIterator;
use std::marker::PhantomData;
/// The text syntax Preserves reader.
pub struct TextReader<'de, 'src, D: Embeddable, Dec: DomainParse<D>, S: BinarySource<'de>> {
/// Underlying source of (utf8) bytes.
pub source: &'src mut S,
/// Decoder for producing Rust values embedded in the text.
pub dec: Dec,
phantom: PhantomData<&'de D>,
}
@ -56,6 +61,7 @@ fn append_codepoint(bs: &mut Vec<u8>, n: u32) -> io::Result<()> {
impl<'de, 'src, D: Embeddable, Dec: DomainParse<D>, S: BinarySource<'de>>
TextReader<'de, 'src, D, Dec, S>
{
/// Construct a new reader from a byte (utf8) source and embedded-value decoder.
pub fn new(source: &'src mut S, dec: Dec) -> Self {
TextReader {
source,
@ -155,6 +161,7 @@ impl<'de, 'src, D: Embeddable, Dec: DomainParse<D>, S: BinarySource<'de>>
}
}
/// Retrieve the next [IOValue] in the input stream.
pub fn next_iovalue(&mut self, read_annotations: bool) -> io::Result<IOValue> {
let mut r = TextReader::new(self.source, ViaCodec::new(IOValueDomainCodec));
let v = r.demand_next(read_annotations)?;

View File

@ -1,3 +1,5 @@
//! Implementation of [Writer] for the text syntax.
use crate::hex::HexFormatter;
use crate::value::suspendable::Suspendable;
use crate::value::writer::CompoundWriter;
@ -15,17 +17,26 @@ use std::io;
use super::super::boundary as B;
/// Specifies a comma style for printing using [TextWriter].
#[derive(Clone, Copy, Debug)]
pub enum CommaStyle {
/// No commas will be printed. (Preserves text syntax treats commas as whitespace (!).)
None,
/// Commas will be used to separate subterms.
Separating,
/// Commas will be used to terminate subterms.
Terminating,
}
/// The (optionally pretty-printing) text syntax Preserves writer.
pub struct TextWriter<W: io::Write> {
w: Suspendable<W>,
/// Selects a comma style to use when printing.
pub comma_style: CommaStyle,
/// Specifies indentation to use when pretty-printing; 0 disables pretty-printing.
pub indentation: usize,
/// An aid to use of printed terms in shell scripts: set `true` to escape spaces embedded
/// in strings and symbols.
pub escape_spaces: bool,
indent: String,
}
@ -37,6 +48,8 @@ impl std::default::Default for CommaStyle {
}
impl TextWriter<&mut Vec<u8>> {
/// Writes `v` to `f` using text syntax. Selects indentation mode based on
/// [`f.alternate()`][std::fmt::Formatter::alternate].
pub fn fmt_value<N: NestedValue, Enc: DomainEncode<N::Embedded>>(
f: &mut std::fmt::Formatter<'_>,
enc: &mut Enc,
@ -52,6 +65,7 @@ impl TextWriter<&mut Vec<u8>> {
.map_err(|_| io::Error::new(io::ErrorKind::Other, "could not append to Formatter"))
}
/// Encode `v` to a [String].
pub fn encode<N: NestedValue, Enc: DomainEncode<N::Embedded>>(
enc: &mut Enc,
v: &N,
@ -61,12 +75,14 @@ impl TextWriter<&mut Vec<u8>> {
Ok(String::from_utf8(buf).expect("valid UTF-8 from TextWriter"))
}
/// Encode `v` to a [String].
pub fn encode_iovalue(v: &IOValue) -> io::Result<String> {
Self::encode(&mut IOValueDomainCodec, v)
}
}
impl<W: io::Write> TextWriter<W> {
/// Construct a writer from the given byte sink `w`.
pub fn new(w: W) -> Self {
TextWriter {
w: Suspendable::new(w),
@ -77,16 +93,19 @@ impl<W: io::Write> TextWriter<W> {
}
}
/// Update selected comma-printing style.
pub fn set_comma_style(mut self, v: CommaStyle) -> Self {
self.comma_style = v;
self
}
/// Update selected space-escaping style.
pub fn set_escape_spaces(mut self, v: bool) -> Self {
self.escape_spaces = v;
self
}
#[doc(hidden)]
pub fn suspend(&mut self) -> Self {
TextWriter {
w: self.w.suspend(),
@ -95,10 +114,12 @@ impl<W: io::Write> TextWriter<W> {
}
}
#[doc(hidden)]
pub fn resume(&mut self, other: Self) {
self.w.resume(other.w)
}
#[doc(hidden)]
pub fn write_stringlike_char_fallback<F>(&mut self, c: char, f: F) -> io::Result<()>
where
F: FnOnce(&mut W, char) -> io::Result<()>,
@ -114,22 +135,26 @@ impl<W: io::Write> TextWriter<W> {
}
}
#[doc(hidden)]
pub fn write_stringlike_char(&mut self, c: char) -> io::Result<()> {
self.write_stringlike_char_fallback(c, |w, c| write!(w, "{}", c))
}
#[doc(hidden)]
pub fn add_indent(&mut self) {
for _ in 0..self.indentation {
self.indent.push(' ')
}
}
#[doc(hidden)]
pub fn del_indent(&mut self) {
if self.indentation > 0 {
self.indent.truncate(self.indent.len() - self.indentation)
}
}
#[doc(hidden)]
pub fn indent(&mut self) -> io::Result<()> {
if self.indentation > 0 {
write!(self.w, "{}", &self.indent)
@ -138,6 +163,7 @@ impl<W: io::Write> TextWriter<W> {
}
}
#[doc(hidden)]
pub fn indent_sp(&mut self) -> io::Result<()> {
if self.indentation > 0 {
write!(self.w, "{}", &self.indent)
@ -146,6 +172,7 @@ impl<W: io::Write> TextWriter<W> {
}
}
/// Borrow the underlying byte sink.
pub fn borrow_write(&mut self) -> &mut W {
&mut self.w
}

View File

@ -1,3 +1,6 @@
//! Generic [Writer] trait for unparsing Preserves [Value]s, implemented by code that provides
//! each specific transfer syntax.
use super::boundary as B;
use super::repr::{Double, Float, NestedValue, Value};
use super::signed_integer::SignedIntegerRepr;
@ -5,61 +8,103 @@ use super::DomainEncode;
use num::bigint::BigInt;
use std::io;
#[doc(hidden)]
/// Utility trait for tracking unparser state during production of compound `Value`s.
pub trait CompoundWriter: Writer {
fn boundary(&mut self, b: &B::Type) -> io::Result<()>;
}
/// Generic unparser for Preserves.
pub trait Writer: Sized {
// Hiding these from the documentation for the moment because I don't want to have to
// document the whole Boundary thing.
#[doc(hidden)]
type AnnWriter: CompoundWriter;
#[doc(hidden)]
type RecWriter: CompoundWriter;
#[doc(hidden)]
type SeqWriter: CompoundWriter;
#[doc(hidden)]
type SetWriter: CompoundWriter;
#[doc(hidden)]
type DictWriter: CompoundWriter;
#[doc(hidden)]
type EmbeddedWriter: Writer;
#[doc(hidden)]
fn start_annotations(&mut self) -> io::Result<Self::AnnWriter>;
#[doc(hidden)]
fn end_annotations(&mut self, ann: Self::AnnWriter) -> io::Result<()>;
#[doc(hidden)]
fn write_bool(&mut self, v: bool) -> io::Result<()>;
#[doc(hidden)]
fn write_f32(&mut self, v: f32) -> io::Result<()>;
#[doc(hidden)]
fn write_f64(&mut self, v: f64) -> io::Result<()>;
#[doc(hidden)]
fn write_i8(&mut self, v: i8) -> io::Result<()>;
#[doc(hidden)]
fn write_u8(&mut self, v: u8) -> io::Result<()>;
#[doc(hidden)]
fn write_i16(&mut self, v: i16) -> io::Result<()>;
#[doc(hidden)]
fn write_u16(&mut self, v: u16) -> io::Result<()>;
#[doc(hidden)]
fn write_i32(&mut self, v: i32) -> io::Result<()>;
#[doc(hidden)]
fn write_u32(&mut self, v: u32) -> io::Result<()>;
#[doc(hidden)]
fn write_i64(&mut self, v: i64) -> io::Result<()>;
#[doc(hidden)]
fn write_u64(&mut self, v: u64) -> io::Result<()>;
#[doc(hidden)]
fn write_i128(&mut self, v: i128) -> io::Result<()>;
#[doc(hidden)]
fn write_u128(&mut self, v: u128) -> io::Result<()>;
#[doc(hidden)]
fn write_int(&mut self, v: &BigInt) -> io::Result<()>;
#[doc(hidden)]
fn write_string(&mut self, v: &str) -> io::Result<()>;
#[doc(hidden)]
fn write_bytes(&mut self, v: &[u8]) -> io::Result<()>;
#[doc(hidden)]
fn write_symbol(&mut self, v: &str) -> io::Result<()>;
#[doc(hidden)]
fn start_record(&mut self, field_count: Option<usize>) -> io::Result<Self::RecWriter>;
#[doc(hidden)]
fn end_record(&mut self, rec: Self::RecWriter) -> io::Result<()>;
#[doc(hidden)]
fn start_sequence(&mut self, item_count: Option<usize>) -> io::Result<Self::SeqWriter>;
#[doc(hidden)]
fn end_sequence(&mut self, seq: Self::SeqWriter) -> io::Result<()>;
#[doc(hidden)]
fn start_set(&mut self, item_count: Option<usize>) -> io::Result<Self::SetWriter>;
#[doc(hidden)]
fn end_set(&mut self, set: Self::SetWriter) -> io::Result<()>;
#[doc(hidden)]
fn start_dictionary(&mut self, entry_count: Option<usize>) -> io::Result<Self::DictWriter>;
#[doc(hidden)]
fn end_dictionary(&mut self, dict: Self::DictWriter) -> io::Result<()>;
#[doc(hidden)]
fn start_embedded(&mut self) -> io::Result<Self::EmbeddedWriter>;
#[doc(hidden)]
fn end_embedded(&mut self, ptr: Self::EmbeddedWriter) -> io::Result<()>;
/// Flushes any buffered output.
fn flush(&mut self) -> io::Result<()>;
//---------------------------------------------------------------------------
/// Writes [NestedValue] `v` to the output of this [Writer].
fn write<N: NestedValue, Enc: DomainEncode<N::Embedded>>(
&mut self,
enc: &mut Enc,
@ -88,6 +133,7 @@ pub trait Writer: Sized {
Ok(())
}
/// Writes [Value] `v` to the output of this [Writer].
fn write_value<N: NestedValue, Enc: DomainEncode<N::Embedded>>(
&mut self,
enc: &mut Enc,
@ -167,6 +213,13 @@ pub trait Writer: Sized {
}
}
/// Writes a [varint](https://protobuf.dev/programming-guides/encoding/#varints) to `w`.
/// Returns the number of bytes written.
///
/// ```text
/// varint(n) = [n] if n < 128
/// [(n & 127) | 128] ++ varint(n >> 7) if n ≥ 128
/// ```
pub fn varint<W: io::Write>(w: &mut W, mut v: u64) -> io::Result<usize> {
let mut byte_count = 0;
loop {

View File

@ -4,7 +4,7 @@ title: "Preserves Schema"
---
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
February 2023. Version 0.3.1.
October 2023. Version 0.3.3.
[abnf]: https://tools.ietf.org/html/rfc7405
@ -189,12 +189,14 @@ with algebraic data types would produce a labelled-sum-of-products type.
### Alternation definitions.
OrPattern = AltPattern "/" AltPattern *("/" AltPattern)
OrPattern = [orsep] AltPattern 1*(orsep AltPattern) [orsep]
orsep = 1*"/"
The right-hand-side of a definition may supply two or more
*alternatives*. When parsing, the alternatives are tried in order; the
result of the first successful alternative is the result of the entire
parse.
The right-hand-side of a definition may supply two or more *alternatives*.
Alternatives are separated by any number of slashes `/`, and leading or
trailing slashes are ignored. When parsing, the alternatives are tried in
order; the result of the first successful alternative is the result of the
entire parse.
**Host-language types.** The type corresponding to an `OrPattern` is an
algebraic sum type, a union type, a variant type, or a concrete subclass
@ -205,31 +207,39 @@ definition-unique *name*. The name is used to uniquely label the
alternative's host-language representation (for example, a subclass, or
a member of a tagged union type).
A variant name can either be given explicitly as `@name` (see discussion
of `NamedPattern` below) or inferred. It can only be inferred from the
label of a record pattern, from the name of a reference to another
definition, or from the text of a "sufficiently identifierlike" literal
pattern - one that matches a string, symbol, number or boolean:
A variant name can either be given explicitly as `@name` or
inferred.[^variant-names-unlike-binding-names] It can only be inferred
from the label of a record pattern, from the name of a reference to
another definition, or from the text of a "sufficiently identifierlike"
literal pattern - one that matches a string, symbol, number or boolean:
AltPattern = "@" id SimplePattern
AltPattern = "@" id Pattern
/ "<" id PatternSequence ">"
/ Ref
/ LiteralPattern -- with a side condition
A host language will likely use the same ordering of its types as
specified by the schema. It is therefore recommended to specify first
the alternative best suited as a default initialization value (if
[^variant-names-unlike-binding-names]: Note that explicitly-given
*variant* names are unlike *binding* names in that binding names give
rise to a field in the record type for a definition, while variant
names are used as labels for alternatives in a sum type for a
definition.
A host language will likely use the same ordering of variants in a sum
type as specified by the schema. It is therefore recommended to specify
first the alternative best suited as a default initialization value (if
there is any).
### Intersection definitions.
AndPattern = NamedPattern "&" NamedPattern *("&" NamedPattern)
AndPattern = [andsep] NamedPattern 1*(andsep NamedPattern) [andsep]
andsep = 1*"&"
The right-hand-side of a definition may supply two or more patterns, the
*intersection* of whose denotations is the denotation of the overall
definition. When parsing, every pattern is tried: if all succeed, the
resulting information is combined into a single type; otherwise, the
overall parse fails.
definition. The patterns are separated by any number of ampersands `&`,
and leading or trailing ampersands are ignored. When parsing, every
pattern is tried: if all succeed, the resulting information is combined
into a single type; otherwise, the overall parse fails.
When serializing, the terms resulting from serializing at each pattern
are *merged* together.

View File

@ -23,14 +23,29 @@ ABNF allows easy definition of US-ASCII-based languages. However,
Preserves is a Unicode-based language. Therefore, we reinterpret ABNF as
a grammar for recognising sequences of Unicode scalar values.
<a id="encoding"></a>
**Encoding.** Textual syntax for a `Value` *SHOULD* be encoded using
UTF-8 where possible.
<a id="whitespace"></a>
**Whitespace.** Whitespace is defined as any number of spaces, tabs,
carriage returns, line feeds, or commas.
ws = *(%x20 / %x09 / CR / LF / ",")
<a id="delimiters"></a>
**Delimiters.** Some tokens (`Boolean`, `SymbolOrNumber`) *MUST* be
followed by a `delimiter` or by the end of the input.[^delimiters-lookahead]
delimiter = ws
/ "<" / ">" / "[" / "]" / "{" / "}"
/ "#" / ":" / DQUOTE / "|" / "@" / ";"
[^delimiters-lookahead]: The addition of this constraint means that
implementations must now use some kind of lookahead to make sure a
delimiter follows a `Boolean`; this should not be onerous, as
something similar is required to read `SymbolOrNumber`s correctly.
## Grammar
Standalone documents may have trailing whitespace.

View File

@ -109,7 +109,7 @@ label, then by field sequence.
labels as specially-formatted lists.
[^iri-labels]: It is occasionally (but seldom) necessary to
interpret such `Symbol` labels as UTF-8 encoded IRIs. Where a
interpret such `Symbol` labels as IRIs. Where a
label can be read as a relative IRI, it is notionally interpreted
with respect to the IRI
`urn:uuid:6bf094a6-20f1-4887-ada7-46834a9b5b34`; where a label can

View File

@ -5,10 +5,17 @@ title: "Open questions"
Q. Should "symbols" instead be URIs? Relative, usually; relative to
what? Some domain-specific base URI?
> No. They may be interpreted as URIs, of course; see
> [here](preserves.html#fn:iri-labels).
Q. Literal small integers: are they pulling their weight? They're not
absolutely necessary. A. No, they have been removed (as part of the changes
at version 0.990).
> No. They were removed in the simplification of the syntax that was the
> outcome of [issue
> 41](https://gitlab.com/preserves/preserves/-/issues/41).
Q. Should we go for trying to make the data ordering line up with the
encoding ordering? We'd have to only use streaming forms, and avoid
the small integer encoding, and not store record arities, and sort
@ -38,3 +45,8 @@ require any whitespace at all between elements of a list, making it
ambiguous: does `[123]` denote a single-element or a three-element
list? Compare JSON where `[1,2,3]` is unambiguously different from
`[123]`.
> With the addition of the notion of
> [delimiters](preserves-text.html#delimiters) to the text syntax, we at
> least answer the question of how `[123]` parses: it must yield a
> single-element list.