diff --git a/README.md b/README.md index 0d1d26b..21493c4 100644 --- a/README.md +++ b/README.md @@ -36,23 +36,30 @@ automatic, perfect-fidelity conversion between syntaxes. ## Implementations -Implementations of the data model, plus the textual and/or binary transfer syntaxes: +#### Implementations of the data model, plus Preserves textual and binary transfer syntax - - [Preserves for Nim](https://git.syndicate-lang.org/ehmry/preserves-nim) - - [Preserves for Python]({{page.projecttree}}/implementations/python/) ([`pip install preserves`](https://pypi.org/project/preserves/); [documentation available online](python/latest/)) - - [Preserves for Racket]({{page.projecttree}}/implementations/racket/preserves/) ([`raco pkg install preserves`](https://pkgs.racket-lang.org/package/preserves)) - - [Preserves for Rust]({{page.projecttree}}/implementations/rust/) ([crates.io package](https://crates.io/crates/preserves)) - - [Preserves for Squeak Smalltalk](https://squeaksource.com/Preserves.html) (`Installer ss project: 'Preserves'; install: 'Preserves'`) - - [Preserves for TypeScript and JavaScript]({{page.projecttree}}/implementations/javascript/) ([`yarn add @preserves/core`](https://www.npmjs.com/package/@preserves/core)) - - (Pre-alpha) Preserves for [C]({{page.projecttree}}/implementations/c/) and [C++]({{page.projecttree}}/implementations/cpp/) +| Language[^pre-alpha-implementations] | Code | Package | Docs | +|-----------------------|------------------------------------------------------------------------------|--------------------------------------------------------------------------------|-------------------------------------------| +| Nim | [git.syndicate-lang.org](https://git.syndicate-lang.org/ehmry/preserves-nim) | | | +| Python | [preserves.dev]({{page.projecttree}}/implementations/python/) | [`pip install preserves`](https://pypi.org/project/preserves/) | [docs](python/latest/) | +| Racket | [preserves.dev]({{page.projecttree}}/implementations/racket/preserves/) | [`raco pkg install preserves`](https://pkgs.racket-lang.org/package/preserves) | | +| Rust | [preserves.dev]({{page.projecttree}}/implementations/rust/) | [`cargo add preserves`](https://crates.io/crates/preserves) | [docs](https://docs.rs/preserves/latest/) | +| Squeak Smalltalk | [SqueakSource](https://squeaksource.com/Preserves.html) | `Installer ss project: 'Preserves';`
`  install: 'Preserves'` | | +| TypeScript/JavaScript | [preserves.dev]({{page.projecttree}}/implementations/javascript/) | [`yarn add @preserves/core`](https://www.npmjs.com/package/@preserves/core) | | -Implementations of the data model, plus Syrup transfer syntax: +[^pre-alpha-implementations]: Pre-alpha implementations also exist for + [C]({{page.projecttree}}/implementations/c/) and + [C++]({{page.projecttree}}/implementations/cpp/). - - [Syrup for Racket](https://github.com/ocapn/syrup/blob/master/impls/racket/syrup/syrup.rkt) - - [Syrup for Guile](https://github.com/ocapn/syrup/blob/master/impls/guile/syrup.scm) - - [Syrup for Python](https://github.com/ocapn/syrup/blob/master/impls/python/syrup.py) - - [Syrup for JavaScript](https://github.com/zarutian/agoric-sdk/blob/zarutian/captp_variant/packages/captp/lib/syrup.js) - - [Syrup for Haskell](https://github.com/zenhack/haskell-preserves) +#### Implementations of the data model, plus Syrup transfer syntax + +| Language | Code | +|------------|----------------------------------------------------------------------------------------------------------------------------------| +| Guile | [github.com/ocapn/syrup](https://github.com/ocapn/syrup/blob/master/impls/guile/syrup.scm) | +| Haskell | [github.com/zenhack/haskell-preserves](https://github.com/zenhack/haskell-preserves) | +| JavaScript | [github.com/zarutian/agoric-sdk](https://github.com/zarutian/agoric-sdk/blob/zarutian/captp_variant/packages/captp/lib/syrup.js) | +| Python | [github.com/ocapn/syrup](https://github.com/ocapn/syrup/blob/master/impls/python/syrup.py) | +| Racket | [github.com/ocapn/syrup](https://github.com/ocapn/syrup/blob/master/impls/racket/syrup/syrup.rkt) | ## Tools @@ -81,3 +88,5 @@ The contents of this repository are made available to you under the [Apache License, version 2.0](LICENSE) (), and are Copyright 2018-2022 Tony Garnock-Jones. + +## Notes diff --git a/_config.yml b/_config.yml index 4397296..b97fce5 100644 --- a/_config.yml +++ b/_config.yml @@ -14,4 +14,4 @@ defaults: title: "Preserves" version_date: "October 2023" -version: "0.990.0" +version: "0.990.1" diff --git a/_includes/cheatsheet-binary-plaintext.md b/_includes/cheatsheet-binary-plaintext.md new file mode 100644 index 0000000..a90cda0 --- /dev/null +++ b/_includes/cheatsheet-binary-plaintext.md @@ -0,0 +1,33 @@ +For a value `V`, we write `«V»` for the binary encoding of `V`. + +```text + «#f» = [0x80] + «#t» = [0x81] + + «@W V» = [0x85] ++ «W» ++ «V» + «#!V» = [0x86] ++ «V» + + «V» if V ∈ Float = [0x87, 0x04] ++ binary32(V) + «V» if V ∈ Double = [0x87, 0x08] ++ binary64(V) + + «V» if V ∈ SignedInteger = [0xB0] ++ varint(|intbytes(V)|) ++ intbytes(V) + «V» if V ∈ String = [0xB1] ++ varint(|utf8(V)|) ++ utf8(V) + «V» if V ∈ ByteString = [0xB2] ++ varint(|V|) ++ V + «V» if V ∈ Symbol = [0xB3] ++ varint(|utf8(V)|) ++ utf8(V) + + «» = [0xB4] ++ «L» ++ «F_1» ++...++ «F_m» ++ [0x84] + «[X_1...X_m]» = [0xB5] ++ «X_1» ++...++ «X_m» ++ [0x84] + «#{E_1...E_m}» = [0xB6] ++ «E_1» ++...++ «E_m» ++ [0x84] + «{K_1:V_1...K_m:V_m}» = [0xB7] ++ «K_1» ++ «V_1» ++...++ «K_m» ++ «V_m» ++ [0x84] + + varint(n) = [n] if n < 128 + [(n & 127) | 128] ++ varint(n >> 7) if n ≥ 128 + + intbytes(n) = the empty sequence if n = 0, otherwise signedBigEndian(n) + + signedBigEndian(n) = [n & 255] if -128 ≤ n ≤ 127 + signedBigEndian(n >> 8) ++ [n & 255] otherwise +``` + +The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and +8-byte IEEE 754 binary representations of `F` and `D`, respectively. diff --git a/_includes/cheatsheet-binary.md b/_includes/cheatsheet-binary.md index 9abec85..876a0b9 100644 --- a/_includes/cheatsheet-binary.md +++ b/_includes/cheatsheet-binary.md @@ -51,36 +51,3 @@ division](https://en.wikipedia.org/wiki/Euclidean_division); that is, if *n* = *dq* + *r* and 0 ≤ *r* < |d|. --> - - diff --git a/_includes/cheatsheet-text-plaintext.md b/_includes/cheatsheet-text-plaintext.md new file mode 100644 index 0000000..6c3a1ff --- /dev/null +++ b/_includes/cheatsheet-text-plaintext.md @@ -0,0 +1,44 @@ +```text +Document := Value ws +Value := ws (Record | Collection | Atom | Embedded | Annotated) +Collection := Sequence | Dictionary | Set +Atom := Boolean | ByteString | String | QuotedSymbol | Symbol | Number +ws := (space | tab | cr | lf | `,`)* + +Record := `<` Value+ ws `>` +Sequence := `[` Value* ws `]` +Dictionary := `{` (Value ws `:` Value)* ws `}` +Set := `#{` Value* ws `}` + +Boolean := `#t` | `#f` +ByteString := `#"` binchar* `"` + | `#x"` (ws hex hex)* ws `"` + | `#[` (ws base64char)* ws `]` +String := `"` («any unicode scalar except `\` or `"`» | escaped | `\"`)* `"` +QuotedSymbol := `|` («any unicode scalar except `\` or `|`» | escaped | `\|`)* `|` +Symbol := (`A`..`Z` | `a`..`z` | `0`..`9` | sympunct | symuchar)+ +Number := Float | Double | SignedInteger +Float := flt (`f`|`F`) | `#xf"` (ws hex hex)4 ws `"` +Double := flt | `#xd"` (ws hex hex)8 ws `"` +SignedInteger := int + +Embedded := `#!` Value +Annotated := Annotation Value +Annotation := `@` Value | `;` «any unicode scalar except cr or lf»* (cr | lf) + +escaped := `\\` | `\/` | `\b` | `\f` | `\n` | `\r` | `\t` | `\u` hex hex hex hex +binescaped := `\\` | `\/` | `\b` | `\f` | `\n` | `\r` | `\t` | `\x` hex hex +binchar := «any scalar ≥32 and ≤126, except `\` or `"`» | binescaped | `\"` +base64char := `A`..`Z` | `a`..`z` | `0`..`9` | `+` | `/` | `-` | `_` | `=` +sympunct := `~` | `!` | `$` | `%` | `^` | `&` | `*` | `?` + | `_` | `=` | `+` | `-` | `/` | `.` +symuchar := «any scalar value ≥128 whose Unicode category is + Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd, Nl, No, Pc, + Pd, Po, Sc, Sm, Sk, So, or Co» + +flt := int ( frac exp | frac | exp ) +int := (`-`|`+`) (`0`..`9`)+ +frac := `.` (`0`..`9`)+ +exp := (`e`|`E`) (`-`|`+`) (`0`..`9`)+ +hex := `A`..`F` | `a`..`f` | `0`..`9` +``` diff --git a/_includes/value-grammar.md b/_includes/value-grammar.md index c42f773..5ad8fde 100644 --- a/_includes/value-grammar.md +++ b/_includes/value-grammar.md @@ -1,17 +1,18 @@ - Value = Atom - | Compound - | Embedded +```text + Value = Atom + | Compound + | Embedded - Atom = Boolean - | Float - | Double - | SignedInteger - | String - | ByteString - | Symbol - - Compound = Record - | Sequence - | Set - | Dictionary + Atom = Boolean + | Float + | Double + | SignedInteger + | String + | ByteString + | Symbol + Compound = Record + | Sequence + | Set + | Dictionary +``` diff --git a/cheatsheet-plaintext.md b/cheatsheet-plaintext.md new file mode 100644 index 0000000..8cbe30c --- /dev/null +++ b/cheatsheet-plaintext.md @@ -0,0 +1,11 @@ +--- +no_site_title: true +title: "Preserves Quick Reference (Plaintext)" +--- + +Tony Garnock-Jones +{{ site.version_date }}. Version {{ site.version }}. + +{% include cheatsheet-binary-plaintext.md %} + +{% include cheatsheet-text-plaintext.md %} diff --git a/doc/preserves-schema-rs.md b/doc/preserves-schema-rs.md index aa205ed..6d7332a 100644 --- a/doc/preserves-schema-rs.md +++ b/doc/preserves-schema-rs.md @@ -14,7 +14,7 @@ inputs. You will usually not need to use the `preserves-schema-rs` command-line program. Instead, access the preserves-schema compiler API from your `build.rs`. The following example is taken from -[`build.rs` for the `preserves-path` crate](https://gitlab.com/preserves/preserves/-/blob/18ac9168996026073ee16164fce108054b2a0ed7/implementations/rust/preserves-path/build.rs): +[`build.rs` for the `preserves-path` crate](https://gitlab.com/preserves/preserves/-/blob/af5de5b836ffc51999db93797d1995ff677cf6f8/implementations/rust/preserves-path/build.rs): use preserves_schema::compiler::*; @@ -30,14 +30,14 @@ API from your `build.rs`. The following example is taken from let mut c = CompilerConfig::new(gen_dir, "crate::schemas".to_owned()); let inputs = expand_inputs(&vec!["path.bin".to_owned()])?; - c.load_schemas_and_bundles(&inputs)?; + c.load_schemas_and_bundles(&inputs, &vec![])?; compile(&c) } This approach also requires an `include!` from your main, hand-written source tree. The following is a snippet from -[`preserves-path/src/lib.rs`](https://gitlab.com/preserves/preserves/-/blob/18ac9168996026073ee16164fce108054b2a0ed7/implementations/rust/preserves-path/src/lib.rs): +[`preserves-path/src/lib.rs`](https://gitlab.com/preserves/preserves/-/blob/af5de5b836ffc51999db93797d1995ff677cf6f8/implementations/rust/preserves-path/src/lib.rs): pub mod schemas { include!(concat!(env!("OUT_DIR"), "/src/schemas/mod.rs")); @@ -52,20 +52,23 @@ Then, `cargo install preserves-schema`. ## Usage - preserves-schema 1.0.0 + preserves-schema 3.990.2 USAGE: - preserves-schema-rs [OPTIONS] --output-dir --prefix [--] [input-glob]... + preserves-schema-rs [FLAGS] [OPTIONS] --output-dir --prefix + [--] [input-glob]... FLAGS: - -h, --help Prints help information - -V, --version Prints version information + -h, --help Prints help information + --rustfmt-skip + -V, --version Prints version information OPTIONS: --module ... -o, --output-dir -p, --prefix --support-crate + --xref ... ARGS: ... diff --git a/git-hooks/pre-commit b/git-hooks/pre-commit index 18720af..f6b14ef 100755 --- a/git-hooks/pre-commit +++ b/git-hooks/pre-commit @@ -3,6 +3,12 @@ set -e exec 1>&2 +COMMAND=cmp +if [ "$1" = "--fix" ]; +then + COMMAND=cp +fi + # https://gitlab.com/preserves/preserves/-/issues/30 # # So it turns out that Racket's git-checkout mechanism pays attention @@ -16,10 +22,19 @@ exec 1>&2 # Ensure that various copies of schema.prs, schema.bin, path.bin, # samples.pr and samples.bin are in fact identical. -cmp path/path.bin implementations/python/preserves/path.prb -cmp path/path.bin implementations/rust/preserves-path/path.bin -cmp schema/schema.bin implementations/python/preserves/schema.prb -cmp schema/schema.prs implementations/racket/preserves/preserves-schema/schema.prs -cmp tests/samples.bin implementations/python/tests/samples.bin -cmp tests/samples.pr implementations/python/tests/samples.pr -cmp tests/samples.pr implementations/racket/preserves/preserves/tests/samples.pr +${COMMAND} path/path.bin implementations/python/preserves/path.prb +${COMMAND} path/path.bin implementations/rust/preserves-path/path.bin + +${COMMAND} schema/schema.bin implementations/python/preserves/schema.prb +${COMMAND} schema/schema.prs implementations/racket/preserves/preserves-schema/schema.prs + +${COMMAND} tests/samples.bin implementations/python/tests/samples.bin +${COMMAND} tests/samples.pr implementations/python/tests/samples.pr +${COMMAND} tests/samples.pr implementations/racket/preserves/preserves/tests/samples.pr + +${COMMAND} _includes/what-is-preserves.md implementations/rust/preserves/doc/what-is-preserves.md +${COMMAND} _includes/cheatsheet-binary-plaintext.md implementations/rust/preserves/doc/cheatsheet-binary-plaintext.md +${COMMAND} _includes/cheatsheet-text-plaintext.md implementations/rust/preserves/doc/cheatsheet-text-plaintext.md +${COMMAND} _includes/value-grammar.md implementations/rust/preserves/doc/value-grammar.md + +${COMMAND} _includes/what-is-preserves-schema.md implementations/rust/preserves-schema/doc/what-is-preserves-schema.md diff --git a/implementations/javascript/packages/schema-cli/.gitignore b/implementations/javascript/packages/schema-cli/.gitignore new file mode 100644 index 0000000..d21f3bc --- /dev/null +++ b/implementations/javascript/packages/schema-cli/.gitignore @@ -0,0 +1,2 @@ +dist/ +lib/ diff --git a/implementations/javascript/packages/schema-cli/.npmignore b/implementations/javascript/packages/schema-cli/.npmignore new file mode 100644 index 0000000..e69de29 diff --git a/implementations/javascript/packages/schema-cli/.yarnrc b/implementations/javascript/packages/schema-cli/.yarnrc new file mode 100644 index 0000000..29d68d6 --- /dev/null +++ b/implementations/javascript/packages/schema-cli/.yarnrc @@ -0,0 +1 @@ +version-tag-prefix javascript-@preserves/schema-cli@ diff --git a/implementations/javascript/packages/schema-cli/README.md b/implementations/javascript/packages/schema-cli/README.md new file mode 100644 index 0000000..41d5c55 --- /dev/null +++ b/implementations/javascript/packages/schema-cli/README.md @@ -0,0 +1 @@ +# Preserves Schema for TypeScript/JavaScript: Command-line tools diff --git a/implementations/javascript/packages/schema/bin/preserves-schema-ts.js b/implementations/javascript/packages/schema-cli/bin/preserves-schema-ts.js similarity index 100% rename from implementations/javascript/packages/schema/bin/preserves-schema-ts.js rename to implementations/javascript/packages/schema-cli/bin/preserves-schema-ts.js diff --git a/implementations/javascript/packages/schema/bin/preserves-schemac.js b/implementations/javascript/packages/schema-cli/bin/preserves-schemac.js similarity index 100% rename from implementations/javascript/packages/schema/bin/preserves-schemac.js rename to implementations/javascript/packages/schema-cli/bin/preserves-schemac.js diff --git a/implementations/javascript/packages/schema-cli/package.json b/implementations/javascript/packages/schema-cli/package.json new file mode 100644 index 0000000..9374815 --- /dev/null +++ b/implementations/javascript/packages/schema-cli/package.json @@ -0,0 +1,39 @@ +{ + "name": "@preserves/schema-cli", + "version": "0.990.1", + "description": "Command-line tools for Preserves Schema", + "homepage": "https://gitlab.com/preserves/preserves", + "license": "Apache-2.0", + "publishConfig": { + "access": "public" + }, + "repository": "gitlab:preserves/preserves", + "author": "Tony Garnock-Jones ", + "scripts": { + "clean": "rm -rf lib dist", + "prepare": "yarn compile && yarn rollup", + "compile": "tsc", + "compile:watch": "yarn compile -w", + "rollup": "rollup -c", + "rollup:watch": "yarn rollup -w", + "test": "true", + "veryclean": "yarn run clean && rm -rf node_modules" + }, + "bin": { + "preserves-schema-ts": "./bin/preserves-schema-ts.js", + "preserves-schemac": "./bin/preserves-schemac.js" + }, + "devDependencies": { + "@types/glob": "^7.1", + "@types/minimatch": "^3.0" + }, + "dependencies": { + "@preserves/core": "^0.990.0", + "@preserves/schema": "^0.990.1", + "chalk": "^4.1", + "chokidar": "^3.5", + "commander": "^7.2", + "glob": "^7.1", + "minimatch": "^3.0" + } +} diff --git a/implementations/javascript/packages/schema-cli/rollup.config.mjs b/implementations/javascript/packages/schema-cli/rollup.config.mjs new file mode 100644 index 0000000..be21d34 --- /dev/null +++ b/implementations/javascript/packages/schema-cli/rollup.config.mjs @@ -0,0 +1,17 @@ +import terser from '@rollup/plugin-terser'; + +function cli(name) { + return { + input: `lib/bin/${name}.js`, + output: [{file: `dist/bin/${name}.js`, format: 'commonjs'}], + external: [ + '@preserves/core', + '@preserves/schema', + ], + }; +} + +export default [ + cli('preserves-schema-ts'), + cli('preserves-schemac'), +]; diff --git a/implementations/javascript/packages/schema/src/bin/cli-utils.ts b/implementations/javascript/packages/schema-cli/src/bin/cli-utils.ts similarity index 98% rename from implementations/javascript/packages/schema/src/bin/cli-utils.ts rename to implementations/javascript/packages/schema-cli/src/bin/cli-utils.ts index 7e71e8d..50bc9fa 100644 --- a/implementations/javascript/packages/schema/src/bin/cli-utils.ts +++ b/implementations/javascript/packages/schema-cli/src/bin/cli-utils.ts @@ -1,10 +1,9 @@ import fs from 'fs'; import path from 'path'; import { glob } from 'glob'; -import { IdentitySet, formatPosition, Position } from '@preserves/core'; -import { readSchema } from '../reader'; import chalk from 'chalk'; -import * as M from '../meta'; +import { IdentitySet, formatPosition, Position } from '@preserves/core'; +import { readSchema, Meta as M } from '@preserves/schema'; export interface Diagnostic { type: 'warn' | 'error'; diff --git a/implementations/javascript/packages/schema/src/bin/preserves-schema-ts.ts b/implementations/javascript/packages/schema-cli/src/bin/preserves-schema-ts.ts similarity index 99% rename from implementations/javascript/packages/schema/src/bin/preserves-schema-ts.ts rename to implementations/javascript/packages/schema-cli/src/bin/preserves-schema-ts.ts index 337af94..b00a90b 100644 --- a/implementations/javascript/packages/schema/src/bin/preserves-schema-ts.ts +++ b/implementations/javascript/packages/schema-cli/src/bin/preserves-schema-ts.ts @@ -1,9 +1,8 @@ -import { compile } from '../index'; import fs from 'fs'; import path from 'path'; import minimatch from 'minimatch'; import { Command } from 'commander'; -import * as M from '../meta'; +import { compile, Meta as M } from '@preserves/schema'; import chalk from 'chalk'; import { is, Position } from '@preserves/core'; import chokidar from 'chokidar'; diff --git a/implementations/javascript/packages/schema/src/bin/preserves-schemac.ts b/implementations/javascript/packages/schema-cli/src/bin/preserves-schemac.ts similarity index 97% rename from implementations/javascript/packages/schema/src/bin/preserves-schemac.ts rename to implementations/javascript/packages/schema-cli/src/bin/preserves-schemac.ts index 0c3025f..ca620c3 100644 --- a/implementations/javascript/packages/schema/src/bin/preserves-schemac.ts +++ b/implementations/javascript/packages/schema-cli/src/bin/preserves-schemac.ts @@ -2,7 +2,7 @@ import { Command } from 'commander'; import { canonicalEncode, KeyedDictionary, underlying } from '@preserves/core'; import fs from 'fs'; import path from 'path'; -import * as M from '../meta'; +import { Meta as M } from '@preserves/schema'; import { expandInputGlob, formatFailures } from './cli-utils'; export type CommandLineArguments = { diff --git a/implementations/javascript/packages/schema-cli/tsconfig.json b/implementations/javascript/packages/schema-cli/tsconfig.json new file mode 100644 index 0000000..299625c --- /dev/null +++ b/implementations/javascript/packages/schema-cli/tsconfig.json @@ -0,0 +1,16 @@ +{ + "compilerOptions": { + "target": "ES2017", + "lib": ["es2019", "DOM"], + "declaration": true, + "baseUrl": "./src", + "rootDir": "./src", + "outDir": "./lib", + "declarationDir": "./lib", + "esModuleInterop": true, + "moduleResolution": "node", + "sourceMap": true, + "strict": true + }, + "include": ["src/**/*"] +} diff --git a/implementations/javascript/packages/schema/README.md b/implementations/javascript/packages/schema/README.md index 798d56b..0bd77ef 100644 --- a/implementations/javascript/packages/schema/README.md +++ b/implementations/javascript/packages/schema/README.md @@ -2,3 +2,7 @@ This is an implementation of [Preserves Schema](https://preserves.dev/preserves-schema.html) for TypeScript and JavaScript. + +This package implements a Schema runtime and a Schema-to-TypeScript compiler, but offers no +command line interfaces. See `@preserves/schema-cli` for command-line tools for working with +Schema and compiling from Schema to TypeScript. diff --git a/implementations/javascript/packages/schema/package.json b/implementations/javascript/packages/schema/package.json index 07e8c42..12339ba 100644 --- a/implementations/javascript/packages/schema/package.json +++ b/implementations/javascript/packages/schema/package.json @@ -1,6 +1,6 @@ { "name": "@preserves/schema", - "version": "0.990.0", + "version": "0.990.1", "description": "Schema support for Preserves data serialization format", "homepage": "https://gitlab.com/preserves/preserves", "license": "Apache-2.0", @@ -13,7 +13,7 @@ "types": "lib/index.d.ts", "author": "Tony Garnock-Jones ", "scripts": { - "regenerate": "rm -rf ./src/gen && yarn copy-schema && ./bin/preserves-schema-ts.js --output ./src/gen ./dist:schema.prs", + "regenerate": "rm -rf ./src/gen && yarn copy-schema && ../schema-cli/bin/preserves-schema-ts.js --output ./src/gen ./dist:schema.prs", "clean": "rm -rf lib dist", "prepare": "yarn compile && yarn rollup && yarn copy-schema", "compile": "tsc", @@ -25,18 +25,7 @@ "test:watch": "jest --watch", "veryclean": "yarn run clean && rm -rf node_modules" }, - "bin": { - "preserves-schema-ts": "./bin/preserves-schema-ts.js", - "preserves-schemac": "./bin/preserves-schemac.js" - }, "dependencies": { - "@preserves/core": "^0.990.0", - "@types/glob": "^7.1", - "@types/minimatch": "^3.0", - "chalk": "^4.1", - "chokidar": "^3.5", - "commander": "^7.2", - "glob": "^7.1", - "minimatch": "^3.0" + "@preserves/core": "^0.990.0" } } diff --git a/implementations/javascript/packages/schema/rollup.config.mjs b/implementations/javascript/packages/schema/rollup.config.mjs index 8195d34..9653e8b 100644 --- a/implementations/javascript/packages/schema/rollup.config.mjs +++ b/implementations/javascript/packages/schema/rollup.config.mjs @@ -31,13 +31,6 @@ function cli(name) { output: [{file: `dist/bin/${name}.js`, format: 'commonjs'}], external: [ '@preserves/core', - 'chalk', - 'chokidar', - 'fs', - 'glob', - 'minimatch', - 'path', - 'commander', ], }; } @@ -53,6 +46,4 @@ export default [ ], external: ['@preserves/core'], }, - cli('preserves-schema-ts'), - cli('preserves-schemac'), ]; diff --git a/implementations/javascript/watchall b/implementations/javascript/watchall index 4b1fcf9..fe0d265 100755 --- a/implementations/javascript/watchall +++ b/implementations/javascript/watchall @@ -20,5 +20,7 @@ open "cd packages/core; yarn run test:watch" open "cd packages/schema; yarn run compile:watch" open "cd packages/schema; yarn run rollup:watch" open "cd packages/schema; yarn run test:watch" +open "cd packages/schema-cli; yarn run compile:watch" +open "cd packages/schema-cli; yarn run rollup:watch" tmux select-layout even-vertical diff --git a/implementations/rust/Makefile b/implementations/rust/Makefile index 0aa547f..4432efb 100644 --- a/implementations/rust/Makefile +++ b/implementations/rust/Makefile @@ -5,6 +5,9 @@ all: cargo build --all-targets +doc: + cargo doc --workspace + x86_64-binary: x86_64-binary-release x86_64-binary-release: diff --git a/implementations/rust/preserves-schema/Cargo.toml b/implementations/rust/preserves-schema/Cargo.toml index c4ab2d7..d43deb6 100644 --- a/implementations/rust/preserves-schema/Cargo.toml +++ b/implementations/rust/preserves-schema/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "preserves-schema" -version = "3.990.0" +version = "3.990.3" authors = ["Tony Garnock-Jones "] edition = "2018" description = "Implementation of Preserves Schema code generation and support for Rust." diff --git a/implementations/rust/preserves-schema/README.md b/implementations/rust/preserves-schema/README.md index 75f73a5..5392f8c 100644 --- a/implementations/rust/preserves-schema/README.md +++ b/implementations/rust/preserves-schema/README.md @@ -1,4 +1,6 @@ -# Preserves Schema for Rust +```shell +cargo add preserves preserves-schema +``` -This is an implementation of [Preserves Schema](https://preserves.dev/preserves-schema.html) -for Rust. +This crate ([`preserves-schema` on crates.io](https://crates.io/crates/preserves-schema)) is an +implementation of [Preserves Schema](https://preserves.dev/preserves-schema.html) for Rust. diff --git a/implementations/rust/preserves-schema/doc/example.md b/implementations/rust/preserves-schema/doc/example.md new file mode 100644 index 0000000..8520813 --- /dev/null +++ b/implementations/rust/preserves-schema/doc/example.md @@ -0,0 +1,112 @@ +# Example + +[preserves-schemac]: https://preserves.dev/doc/preserves-schemac.html +[preserves-schema-rs]: https://preserves.dev/doc/preserves-schema-rs.html + +Preserves schemas are written in a syntax that (ab)uses [Preserves text +syntax][preserves::value::text] as a kind of S-expression. Schema source code looks like this: + +```preserves-schema +version 1 . +Present = . +Says = . +UserStatus = . +Status = =here / . +TimeStamp = string . +``` + +Conventionally, schema source code is stored in `*.prs` files. In this example, the source code +above is placed in `simpleChatProtocol.prs`. + +The Rust code generator for schemas requires not source code, but instances of the [Preserves +metaschema](https://preserves.dev/preserves-schema.html#appendix-metaschema). To compile schema +source code to metaschema instances, use [preserves-schemac][]: + +```shell +yarn global add @preserves/schema +preserves-schemac .:simpleChatProtocol.prs > simpleChatProtocol.prb +``` + +Binary-syntax metaschema instances are conventionally stored in `*.prb` files. If you have a +whole directory tree of `*.prs` files, you can supply just "`.`" without the "`:`"-prefixed +fileglob part.[^converting-metaschema-to-text] See the [preserves-schemac documentation][preserves-schemac]. + +[^converting-metaschema-to-text]: + Converting the `simpleChatProtocol.prb` file to Preserves text syntax lets us read the + metaschema instance corresponding to the source code: + ```shell + cat simpleChatProtocol.prb | preserves-tool convert + ``` + The result: + ```preserves + > + ]>> + Says: > + > + ]>> + Status: + ] + [ + "away" + > + ]>> + ] + ]> + TimeStamp: + UserStatus: > + > + ]>> + } + embeddedType: #f + version: 1 + }> + }> + ``` + +#### Generating Rust code from a schema + +Generate Rust definitions corresponding to a metaschema instance with [preserves-schema-rs][]. +The best way to use it is to integrate it into your `build.rs` (see [the +docs][preserves-schema-rs]), but you can also use it as a standalone command-line tool. + +The following command generates a directory `./rs/chat` containing rust sources for a module +that expects to be called `chat` in Rust code: + +```shell +preserves-schema-rs --output-dir rs/chat --prefix chat simpleChatProtocol.prb +``` + +Representative excerpts from one of the generated files, `./rs/chat/simple_chat_protocol.rs`: + +```rust,noplayground +pub struct Present { + pub username: std::string::String +} +pub struct Says { + pub who: std::string::String, + pub what: std::string::String +} +pub struct UserStatus { + pub username: std::string::String, + pub status: Status +} +pub enum Status { + Here, + Away { + since: std::boxed::Box + } +} +pub struct TimeStamp(pub std::string::String); +``` diff --git a/implementations/rust/preserves-schema/doc/what-is-preserves-schema.md b/implementations/rust/preserves-schema/doc/what-is-preserves-schema.md new file mode 100644 index 0000000..20c47d0 --- /dev/null +++ b/implementations/rust/preserves-schema/doc/what-is-preserves-schema.md @@ -0,0 +1,16 @@ +A Preserves schema connects Preserves `Value`s to host-language data +structures. Each definition within a schema can be processed by a +compiler to produce + + - a simple host-language *type definition*; + + - a partial *parsing* function from `Value`s to instances of the + produced type; and + + - a total *serialization* function from instances of the type to + `Value`s. + +Every parsed `Value` retains enough information to always be able to +be serialized again, and every instance of a host-language data +structure contains, by construction, enough information to be +successfully serialized. diff --git a/implementations/rust/preserves-schema/src/bin/preserves-schema-rs.rs b/implementations/rust/preserves-schema/src/bin/preserves-schema-rs.rs index d269723..6987812 100644 --- a/implementations/rust/preserves-schema/src/bin/preserves-schema-rs.rs +++ b/implementations/rust/preserves-schema/src/bin/preserves-schema-rs.rs @@ -1,3 +1,6 @@ +//! Command-line Rust code generator for Preserves Schema. See the documentation at +//! . + use std::io::Error; use std::io::ErrorKind; use std::path::PathBuf; diff --git a/implementations/rust/preserves-schema/src/compiler/mod.rs b/implementations/rust/preserves-schema/src/compiler/mod.rs index 67c965a..4633b23 100644 --- a/implementations/rust/preserves-schema/src/compiler/mod.rs +++ b/implementations/rust/preserves-schema/src/compiler/mod.rs @@ -1,3 +1,39 @@ +//! Implementation of the Schema-to-Rust compiler; this is the core of the +//! [preserves-schema-rs][] program. +//! +//! See the [documentation for preserves-schema-rs][preserves-schema-rs] for examples of how to +//! use the compiler programmatically from a `build.rs` script, but very briefly, use +//! [preserves-schemac](https://preserves.dev/doc/preserves-schemac.html) to generate a +//! metaschema instance `*.prb` file, and then put something like this in `build.rs`: +//! +//! ```rust,ignore +//! use preserves_schema::compiler::*; +//! +//! const PATH_TO_PRB_FILE: &'static str = "your-metaschema-instance-file.prb"; +//! +//! fn main() -> Result<(), std::io::Error> { +//! let buildroot = std::path::PathBuf::from(std::env::var_os("OUT_DIR").unwrap()); +//! +//! let mut gen_dir = buildroot.clone(); +//! gen_dir.push("src/schemas"); +//! let mut c = CompilerConfig::new(gen_dir, "crate::schemas".to_owned()); +//! +//! let inputs = expand_inputs(&vec![PATH_TO_PRB_FILE.to_owned()])?; +//! c.load_schemas_and_bundles(&inputs, &vec![])?; +//! compile(&c) +//! } +//! ``` +//! +//! plus something like this in your `lib.rs` or main program: +//! +//! ```rust,ignore +//! pub mod schemas { +//! include!(concat!(env!("OUT_DIR"), "/src/schemas/mod.rs")); +//! } +//! ``` +//! +//! [preserves-schema-rs]: https://preserves.dev/doc/preserves-schema-rs.html + pub mod context; pub mod cycles; pub mod names; @@ -29,11 +65,18 @@ use std::io::Read; use std::io::Write; use std::path::PathBuf; +/// Names a Schema module within a (collection of) Schema bundle(s). pub type ModulePath = Vec; +/// Implement this trait to extend the compiler with custom code generation support. The main +/// code generators are also implemented as plugins. +/// +/// For an example of its use outside the core compiler, see [`build.rs` for the `syndicate-rs` project](https://git.syndicate-lang.org/syndicate-lang/syndicate-rs/src/commit/60e6c6badfcbcbccc902994f4f32db6048f60d1f/syndicate/build.rs). pub trait Plugin: std::fmt::Debug { + /// Use `_module_ctxt` to emit code at a per-module level. fn generate_module(&self, _module_ctxt: &mut ModuleContext) {} + /// Use `module_ctxt` to emit code at a per-Schema-[Definition] level. fn generate_definition( &self, module_ctxt: &mut ModuleContext, @@ -110,17 +153,30 @@ impl ExternalModule { } } +/// Main entry point to the compiler. #[derive(Debug)] pub struct CompilerConfig { + /// All known Schema modules, indexed by [ModulePath] and annotated with a [Purpose]. pub bundle: Map, + /// Where output Rust code files will be placed. pub output_dir: PathBuf, + /// Fully-qualified Rust module prefix to use for each generated module. pub fully_qualified_module_prefix: String, + /// Rust module path to the [preserves_schema::support][crate::support] module. pub support_crate: String, + /// External modules for cross-referencing. pub external_modules: Map, + /// Plugins active in this compiler instance. pub plugins: Vec>, + /// If true, a directive is emitted in each module instructing + /// [rustfmt](https://github.com/rust-lang/rustfmt) to ignore it. pub rustfmt_skip: bool, } +/// Loads a [Schema] or [Bundle] from path `i` into `bundle` for the given `purpose`. +/// +/// If `i` holds a [Schema], then the file stem of `i` is used as the module name when placing +/// the schema in `bundle`. pub fn load_schema_or_bundle_with_purpose( bundle: &mut Map, i: &PathBuf, @@ -134,6 +190,11 @@ pub fn load_schema_or_bundle_with_purpose( Ok(()) } +/// Loads a [Schema] or [Bundle] from raw binary encoded value `input` into `bundle` for the +/// given `purpose`. +/// +/// If `input` corresponds to a [Schema], then `prefix` is used as its module name; otherwise, +/// it's a [Bundle], and `prefix` is ignored. pub fn load_schema_or_bundle_bin_with_purpose( bundle: &mut Map, prefix: &str, @@ -165,6 +226,10 @@ fn bundle_prefix(i: &PathBuf) -> io::Result<&str> { }) } +/// Loads a [Schema] or [Bundle] from path `i` into `bundle`. +/// +/// If `i` holds a [Schema], then the file stem of `i` is used as the module name when placing +/// the schema in `bundle`. pub fn load_schema_or_bundle(bundle: &mut Map, i: &PathBuf) -> io::Result<()> { let mut f = File::open(&i)?; let mut bs = vec![]; @@ -172,6 +237,10 @@ pub fn load_schema_or_bundle(bundle: &mut Map, i: &PathBuf) load_schema_or_bundle_bin(bundle, bundle_prefix(i)?, &bs[..]) } +/// Loads a [Schema] or [Bundle] from raw binary encoded value `input` into `bundle`. +/// +/// If `input` corresponds to a [Schema], then `prefix` is used as its module name; otherwise, +/// it's a [Bundle], and `prefix` is ignored. pub fn load_schema_or_bundle_bin( bundle: &mut Map, prefix: &str, @@ -199,6 +268,8 @@ pub fn load_schema_or_bundle_bin( } impl CompilerConfig { + /// Construct a [CompilerConfig] configured to send output files to `output_dir`, and to + /// use `fully_qualified_module_prefix` as the Rust module prefix for generated code. pub fn new(output_dir: PathBuf, fully_qualified_module_prefix: String) -> Self { CompilerConfig { bundle: Map::new(), @@ -277,6 +348,7 @@ impl CompilerConfig { } } +/// Expands a vector of [mod@glob]s to a vector of actual paths. pub fn expand_inputs(globs: &Vec) -> io::Result> { let mut result = Vec::new(); for g in globs.iter() { @@ -322,6 +394,7 @@ impl Schema { } } +/// Main entry point: runs the compilation process. pub fn compile(config: &CompilerConfig) -> io::Result<()> { let mut b = BundleContext::new(config); diff --git a/implementations/rust/preserves-schema/src/lib.rs b/implementations/rust/preserves-schema/src/lib.rs index 4f045e9..289c037 100644 --- a/implementations/rust/preserves-schema/src/lib.rs +++ b/implementations/rust/preserves-schema/src/lib.rs @@ -1,4 +1,12 @@ +#![doc = concat!( + include_str!("../README.md"), + "# What is Preserves Schema?\n\n", + include_str!("../doc/what-is-preserves-schema.md"), + include_str!("../doc/example.md"), +)] + pub mod compiler; +/// Auto-generated Preserves Schema Metaschema types, parsers, and unparsers. pub mod gen; pub mod support; pub mod syntax; diff --git a/implementations/rust/preserves-schema/src/support/interpret.rs b/implementations/rust/preserves-schema/src/support/interpret.rs index ccc3f97..a746d5a 100644 --- a/implementations/rust/preserves-schema/src/support/interpret.rs +++ b/implementations/rust/preserves-schema/src/support/interpret.rs @@ -1,3 +1,6 @@ +//! Interpreter for instances of Preserves Schema Metaschema, for schema-directed dynamic +//! parsing and unparsing of terms. + use crate::gen::schema::*; use preserves::value::merge::merge2; @@ -5,8 +8,10 @@ use preserves::value::Map; use preserves::value::NestedValue; use preserves::value::Value; +/// Represents an environment mapping schema module names to [Schema] instances. pub type Env = Map, Schema>; +/// Context for a given interpretation of a [Schema]. #[derive(Debug)] pub struct Context<'a, V: NestedValue> { pub env: &'a Env, @@ -20,6 +25,7 @@ enum DynField { } impl<'a, V: NestedValue> Context<'a, V> { + /// Construct a new [Context] with the given [Env]. pub fn new(env: &'a Env) -> Self { Context { env, @@ -27,6 +33,8 @@ impl<'a, V: NestedValue> Context<'a, V> { } } + /// Parse `v` using the rule named `name` from the module at path `module` in `self.env`. + /// Yields `Some(...)` if the parse succeeds, and `None` otherwise. pub fn dynamic_parse(&mut self, module: &Vec, name: &str, v: &V) -> Option { let old_module = (module.len() > 0).then(|| std::mem::replace(&mut self.module, module.clone())); @@ -39,6 +47,7 @@ impl<'a, V: NestedValue> Context<'a, V> { result } + #[doc(hidden)] pub fn dynamic_unparse(&mut self, _module: &Vec, _name: &str, _w: &V) -> Option { panic!("Not yet implemented"); } diff --git a/implementations/rust/preserves-schema/src/support/mod.rs b/implementations/rust/preserves-schema/src/support/mod.rs index e53e0ac..fa457bc 100644 --- a/implementations/rust/preserves-schema/src/support/mod.rs +++ b/implementations/rust/preserves-schema/src/support/mod.rs @@ -1,3 +1,7 @@ +//! The runtime support library for compiled Schemas. + +#[doc(hidden)] +/// Reexport lazy_static for generated code to use. pub use lazy_static::lazy_static; pub use preserves; @@ -21,10 +25,16 @@ use std::sync::Arc; use thiserror::Error; +/// Every [language][crate::define_language] implements [NestedValueCodec] as a marker trait. pub trait NestedValueCodec {} // marker trait impl NestedValueCodec for () {} +/// Implementors of [Parse] can produce instances of themselves from a [Value], given a +/// supporting [language][crate::define_language]. All Schema-compiler-produced types implement +/// [Parse]. pub trait Parse: Sized { + /// Decode the given `value` (using auxiliary structure from the `language` instance) to + /// produce an instance of [Self]. fn parse(language: L, value: &Value) -> Result; } @@ -34,7 +44,10 @@ impl<'a, T: NestedValueCodec, Value: NestedValue> Parse<&'a T, Value> for Value } } +/// Implementors of [Unparse] can convert themselves into a [Value], given a supporting +/// [language][crate::define_language]. All Schema-compiler-produced types implement [Unparse]. pub trait Unparse { + /// Encode `self` into a [Value] (using auxiliary structure from the `language` instance). fn unparse(&self, language: L) -> Value; } @@ -44,8 +57,13 @@ impl<'a, T: NestedValueCodec, Value: NestedValue> Unparse<&'a T, Value> for Valu } } +/// Every [language][crate::define_language] implements [Codec], which supplies convenient +/// shorthand for invoking [Parse::parse] and [Unparse::unparse]. pub trait Codec { + /// Delegates to [`T::parse`][Parse::parse], using `self` as language and the given `value` + /// as input. fn parse<'a, T: Parse<&'a Self, N>>(&'a self, value: &N) -> Result; + /// Delegates to [`value.unparse`][Unparse::unparse], using `self` as language. fn unparse<'a, T: Unparse<&'a Self, N>>(&'a self, value: &T) -> N; } @@ -59,6 +77,11 @@ impl Codec for L { } } +/// Implementors of [Deserialize] can produce instances of themselves from a [Value]. All +/// Schema-compiler-produced types implement [Deserialize]. +/// +/// The difference between [Deserialize] and [Parse] is that implementors of [Deserialize] know +/// which [language][crate::define_language] to use. pub trait Deserialize where Self: Sized, @@ -66,10 +89,14 @@ where fn deserialize<'de, R: Reader<'de, N>>(r: &mut R) -> Result; } +/// Extracts a simple literal term from a byte array using +/// [PackedReader][preserves::value::packed::PackedReader]. No embedded values are permitted. pub fn decode_lit(bs: &[u8]) -> io::Result { preserves::value::packed::from_bytes(bs, NoEmbeddedDomainCodec) } +/// When `D` can parse itself from an [IOValue], this function parses all embedded [IOValue]s +/// into `D`s. pub fn decode_embedded(v: &IOValue) -> Result>, ParseError> where for<'a> D: TryFrom<&'a IOValue, Error = ParseError>, @@ -77,6 +104,8 @@ where v.copy_via(&mut |d| Ok(Value::Embedded(Arc::new(D::try_from(d)?)))) } +/// When `D` can unparse itself into an [IOValue], this function converts all embedded `D`s +/// into [IOValue]s. pub fn encode_embedded(v: &ArcValue>) -> IOValue where for<'a> IOValue: From<&'a D>, @@ -85,10 +114,13 @@ where .unwrap() } +/// Error value yielded when parsing of an [IOValue] into a Schema-compiler-produced type. #[derive(Error, Debug)] pub enum ParseError { + /// Signalled when the input does not match the Preserves Schema associated with the type. #[error("Input not conformant with Schema: {0}")] ConformanceError(&'static str), + /// Signalled when the underlying Preserves library signals an error. #[error(transparent)] Preserves(preserves::error::Error), } @@ -120,10 +152,12 @@ impl From for io::Error { } impl ParseError { + /// Constructs a [ParseError::ConformanceError]. pub fn conformance_error(context: &'static str) -> Self { ParseError::ConformanceError(context) } + /// True iff `self` is a [ParseError::ConformanceError]. pub fn is_conformance_error(&self) -> bool { return if let ParseError::ConformanceError(_) = self { true diff --git a/implementations/rust/preserves-schema/src/syntax/block.rs b/implementations/rust/preserves-schema/src/syntax/block.rs index 3d4b47a..7bf1abc 100644 --- a/implementations/rust/preserves-schema/src/syntax/block.rs +++ b/implementations/rust/preserves-schema/src/syntax/block.rs @@ -1,12 +1,21 @@ +//! A library for emitting pretty-formatted structured source code. +//! +//! The main entry points are [Formatter::to_string] and [Formatter::write], plus the utilities +//! in the [macros] submodule. + use std::fmt::Write; use std::str; +/// Default width for pretty-formatting, in columns. pub const DEFAULT_WIDTH: usize = 80; +/// All pretty-formattable items must implement this trait. pub trait Emittable: std::fmt::Debug { + /// Serializes `self`, as pretty-printed code, on `f`. fn write_on(&self, f: &mut Formatter); } +/// Tailoring of behaviour for [Vertical] groupings. #[derive(Clone, PartialEq, Eq)] pub enum VerticalMode { Variable, @@ -14,13 +23,16 @@ pub enum VerticalMode { ExtraNewline, } +/// Vertical formatting for [Emittable]s. pub trait Vertical { fn set_vertical_mode(&mut self, mode: VerticalMode); fn write_vertically_on(&self, f: &mut Formatter); } +/// Polymorphic [Emittable], used consistently in the API. pub type Item = std::rc::Rc; +/// A possibly-vertical sequence of items with item-separating and -terminating text. #[derive(Clone)] pub struct Sequence { pub items: Vec, @@ -29,6 +41,8 @@ pub struct Sequence { pub terminator: &'static str, } +/// A sequence of items, indented when formatted vertically, surrounded by opening and closing +/// text. #[derive(Clone)] pub struct Grouping { pub sequence: Sequence, @@ -36,14 +50,18 @@ pub struct Grouping { pub close: &'static str, } +/// State needed for pretty-formatting of [Emittable]s. pub struct Formatter { + /// Number of available columns. Used to decide between horizontal and vertical layouts. pub width: usize, indent_delta: String, current_indent: String, + /// Mutable output buffer. Accumulates emitted text during writing. pub buffer: String, } impl Formatter { + /// Construct a Formatter using [DEFAULT_WIDTH] and a four-space indent. pub fn new() -> Self { Formatter { width: DEFAULT_WIDTH, @@ -53,6 +71,7 @@ impl Formatter { } } + /// Construct a Formatter just like `self` but with an empty `buffer`. pub fn copy_empty(&self) -> Formatter { Formatter { width: self.width, @@ -62,28 +81,37 @@ impl Formatter { } } + /// Yields the indent size. pub fn indent_size(self) -> usize { self.indent_delta.len() } + /// Updates the indent size. pub fn set_indent_size(&mut self, n: usize) { self.indent_delta = str::repeat(" ", n) } + /// Accumulates a text serialization of `e` in `buffer`. pub fn write(&mut self, e: E) { e.write_on(self) } + /// Emits a newline followed by indentation into `buffer`. pub fn newline(&mut self) { self.buffer.push_str(&self.current_indent) } + /// Creates a default Formatter, uses it to [write][Formatter::write] `e`, and yields the + /// contents of its `buffer`. pub fn to_string(e: E) -> String { let mut f = Formatter::new(); f.write(e); f.buffer } + /// Calls `f` in a context where the indentation has been increased by + /// [Formatter::indent_size] spaces. Restores the indentation level after `f` returns. + /// Yields the result of the call to `f`. pub fn with_indent R>(&mut self, f: F) -> R { let old_indent = self.current_indent.clone(); self.current_indent += &self.indent_delta; @@ -93,6 +121,12 @@ impl Formatter { } } +impl Default for Formatter { + fn default() -> Self { + Self::new() + } +} + impl Default for VerticalMode { fn default() -> Self { Self::Variable @@ -238,6 +272,12 @@ impl std::fmt::Debug for Grouping { //--------------------------------------------------------------------------- +/// Escapes `s` by substituting `\\` for `\`, `\"` for `"`, and `\u{...}` for characters +/// outside the range 32..126, inclusive. +/// +/// This process is intended to generate literals compatible with `rustc`; see [the language +/// reference on "Character and string +/// literals"](https://doc.rust-lang.org/reference/tokens.html#character-and-string-literals). pub fn escape_string(s: &str) -> String { let mut buf = String::new(); buf.push('"'); @@ -253,6 +293,13 @@ pub fn escape_string(s: &str) -> String { buf } +/// Escapes `bs` into a Rust byte string literal, treating each byte as its ASCII equivalent +/// except producing `\\` for 0x5c, `\"` for 0x22, and `\x..` for bytes outside the range +/// 0x20..0x7e, inclusive. +/// +/// This process is intended to generate literals compatible with `rustc`; see [the language +/// reference on "Byte string +/// literals"](https://doc.rust-lang.org/reference/tokens.html#byte-string-literals). pub fn escape_bytes(bs: &[u8]) -> String { let mut buf = String::new(); buf.push_str("b\""); @@ -262,7 +309,7 @@ pub fn escape_bytes(bs: &[u8]) -> String { '\\' => buf.push_str("\\\\"), '"' => buf.push_str("\\\""), _ if c >= ' ' && c <= '~' => buf.push(c), - _ => write!(&mut buf, "\\x{{{:02x}}}", b).expect("no IO errors building a string"), + _ => write!(&mut buf, "\\x{:02x}", b).expect("no IO errors building a string"), } } buf.push('"'); @@ -271,6 +318,7 @@ pub fn escape_bytes(bs: &[u8]) -> String { //--------------------------------------------------------------------------- +/// Utilities for constructing many useful kinds of [Sequence] and [Grouping]. pub mod constructors { use super::Emittable; use super::Grouping; @@ -279,10 +327,12 @@ pub mod constructors { use super::Vertical; use super::VerticalMode; + /// Produces a polymorphic, reference-counted [Item] from some generic [Emittable]. pub fn item(i: E) -> Item { std::rc::Rc::new(i) } + /// *a*`::`*b*`::`*...*`::`*z* pub fn name(pieces: Vec) -> Sequence { Sequence { items: pieces, @@ -292,6 +342,7 @@ pub mod constructors { } } + /// *ab...z* (directly adjacent, no separators or terminators) pub fn seq(items: Vec) -> Sequence { Sequence { items: items, @@ -301,6 +352,7 @@ pub mod constructors { } } + /// *a*`, `*b*`, `*...*`, `*z* pub fn commas(items: Vec) -> Sequence { Sequence { items: items, @@ -310,6 +362,7 @@ pub mod constructors { } } + /// `(`*a*`, `*b*`, `*...*`, `*z*`)` pub fn parens(items: Vec) -> Grouping { Grouping { sequence: commas(items), @@ -318,6 +371,7 @@ pub mod constructors { } } + /// `[`*a*`, `*b*`, `*...*`, `*z*`]` pub fn brackets(items: Vec) -> Grouping { Grouping { sequence: commas(items), @@ -326,6 +380,7 @@ pub mod constructors { } } + /// `<`*a*`, `*b*`, `*...*`, `*z*`>` pub fn anglebrackets(items: Vec) -> Grouping { Grouping { sequence: commas(items), @@ -334,6 +389,7 @@ pub mod constructors { } } + /// `{`*a*`, `*b*`, `*...*`, `*z*`}` pub fn braces(items: Vec) -> Grouping { Grouping { sequence: commas(items), @@ -342,6 +398,7 @@ pub mod constructors { } } + /// `{`*a*` `*b*` `*...*` `*z*`}` pub fn block(items: Vec) -> Grouping { Grouping { sequence: Sequence { @@ -355,10 +412,12 @@ pub mod constructors { } } + /// As [block], but always vertical pub fn codeblock(items: Vec) -> Grouping { vertical(false, block(items)) } + /// `{`*a*`; `*b*`; `*...*`; `*z*`}` pub fn semiblock(items: Vec) -> Grouping { Grouping { sequence: Sequence { @@ -372,6 +431,9 @@ pub mod constructors { } } + /// Overrides `v` to be always vertical. + /// + /// If `spaced` is true, inserts an extra newline between items. pub fn vertical(spaced: bool, mut v: V) -> V { v.set_vertical_mode(if spaced { VerticalMode::ExtraNewline @@ -381,6 +443,7 @@ pub mod constructors { v } + /// Adds a layer of indentation to the given [Sequence]. pub fn indented(sequence: Sequence) -> Grouping { Grouping { sequence, @@ -390,52 +453,84 @@ pub mod constructors { } } +/// Ergonomic syntax for using the constructors in submodule [constructors]; see the +/// documentation for the macros, which appears on the [page for the crate +/// itself][crate#macros]. pub mod macros { + /// `name!(`*a*`, `*b*`, `*...*`, `*z*`)` ⟶ *a*`::`*b*`::`*...*`::`*z* + /// + /// See [super::constructors::name]. #[macro_export] macro_rules! name { ($($item:expr),*) => {$crate::syntax::block::constructors::name(vec![$(std::rc::Rc::new($item)),*])} } + /// `seq!(`*a*`, `*b*`, `*...*`, `*z*`)` ⟶ *ab...z* + /// + /// See [super::constructors::seq]. #[macro_export] macro_rules! seq { ($($item:expr),*) => {$crate::syntax::block::constructors::seq(vec![$(std::rc::Rc::new($item)),*])} } + /// `commas!(`*a*`, `*b*`, `*...*`, `*z*`)` ⟶ *a*`, `*b*`, `*...*`, `*z* + /// + /// See [super::constructors::commas]. #[macro_export] macro_rules! commas { ($($item:expr),*) => {$crate::syntax::block::constructors::commas(vec![$(std::rc::Rc::new($item)),*])} } + /// `parens!(`*a*`, `*b*`, `*...*`, `*z*`)` ⟶ `(`*a*`, `*b*`, `*...*`, `*z*`)` + /// + /// See [super::constructors::parens]. #[macro_export] macro_rules! parens { ($($item:expr),*) => {$crate::syntax::block::constructors::parens(vec![$(std::rc::Rc::new($item)),*])} } + /// `brackets!(`*a*`, `*b*`, `*...*`, `*z*`)` ⟶ `[`*a*`, `*b*`, `*...*`, `*z*`]` + /// + /// See [super::constructors::brackets]. #[macro_export] macro_rules! brackets { ($($item:expr),*) => {$crate::syntax::block::constructors::brackets(vec![$(std::rc::Rc::new($item)),*])} } + /// `anglebrackets!(`*a*`, `*b*`, `*...*`, `*z*`)` ⟶ `<`*a*`, `*b*`, `*...*`, `*z*`>` + /// + /// See [super::constructors::anglebrackets]. #[macro_export] macro_rules! anglebrackets { ($($item:expr),*) => {$crate::syntax::block::constructors::anglebrackets(vec![$(std::rc::Rc::new($item)),*])} } + /// `braces!(`*a*`, `*b*`, `*...*`, `*z*`)` ⟶ `{`*a*`, `*b*`, `*...*`, `*z*`}` + /// + /// See [super::constructors::braces]. #[macro_export] macro_rules! braces { ($($item:expr),*) => {$crate::syntax::block::constructors::braces(vec![$(std::rc::Rc::new($item)),*])} } + /// `block!(`*a*`, `*b*`, `*...*`, `*z*`)` ⟶ `{`*a*` `*b*` `*...*` `*z*`}` + /// + /// See [super::constructors::block]. #[macro_export] macro_rules! block { ($($item:expr),*) => {$crate::syntax::block::constructors::block(vec![$(std::rc::Rc::new($item)),*])} } + /// As [`block`]`!`, but always vertical. See + /// [constructors::codeblock][super::constructors::codeblock]. #[macro_export] macro_rules! codeblock { ($($item:expr),*) => {$crate::syntax::block::constructors::codeblock(vec![$(std::rc::Rc::new($item)),*])} } + /// `semiblock!(`*a*`, `*b*`, `*...*`, `*z*`)` ⟶ `{`*a*`; `*b*`; `*...*`; `*z*`}` + /// + /// See [super::constructors::semiblock]. #[macro_export] macro_rules! semiblock { ($($item:expr),*) => {$crate::syntax::block::constructors::semiblock(vec![$(std::rc::Rc::new($item)),*])} diff --git a/implementations/rust/preserves-schema/src/syntax/mod.rs b/implementations/rust/preserves-schema/src/syntax/mod.rs index a863eaa..5aee0e8 100644 --- a/implementations/rust/preserves-schema/src/syntax/mod.rs +++ b/implementations/rust/preserves-schema/src/syntax/mod.rs @@ -1 +1,3 @@ +//! A library for emitting pretty-formatted structured source code. + pub mod block; diff --git a/implementations/rust/preserves/Cargo.toml b/implementations/rust/preserves/Cargo.toml index 3d1df3a..6d8bd77 100644 --- a/implementations/rust/preserves/Cargo.toml +++ b/implementations/rust/preserves/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "preserves" -version = "3.990.0" +version = "3.990.2" authors = ["Tony Garnock-Jones "] edition = "2018" description = "Implementation of the Preserves serialization format via serde." diff --git a/implementations/rust/preserves/README.md b/implementations/rust/preserves/README.md new file mode 100644 index 0000000..e09a241 --- /dev/null +++ b/implementations/rust/preserves/README.md @@ -0,0 +1,23 @@ +```shell +cargo add preserves +``` + +This crate ([`preserves` on crates.io](https://crates.io/crates/preserves)) implements +[Preserves](https://preserves.dev/) for Rust. It provides the core +[semantics](https://preserves.dev/preserves.html#semantics) as well as both the [human-readable +text syntax][crate::value::text] (a superset of JSON) and [machine-oriented binary +format][crate::value::packed] (including +[canonicalization](https://preserves.dev/canonical-binary.html)) for Preserves. + +This crate is the foundation for others such as + + - [`preserves-schema`](https://docs.rs/preserves-schema/), which implements [Preserves + Schema](https://preserves.dev/preserves-schema.html); + - [`preserves-path`](https://docs.rs/preserves-path/), which implements [Preserves + Path](https://preserves.dev/preserves-path.html); and + - [`preserves-tools`](https://crates.io/crates/preserves-tools), which provides command-line + utilities for working with Preserves, in particular + [`preserves-tool`](https://preserves.dev/doc/preserves-tool.html), a kind of Preserves + Swiss-army knife. + +It also includes [Serde](https://serde.rs/) support (modules [de], [ser], [symbol], [set]). diff --git a/implementations/rust/preserves/doc/cheatsheet-binary-plaintext.md b/implementations/rust/preserves/doc/cheatsheet-binary-plaintext.md new file mode 100644 index 0000000..a90cda0 --- /dev/null +++ b/implementations/rust/preserves/doc/cheatsheet-binary-plaintext.md @@ -0,0 +1,33 @@ +For a value `V`, we write `«V»` for the binary encoding of `V`. + +```text + «#f» = [0x80] + «#t» = [0x81] + + «@W V» = [0x85] ++ «W» ++ «V» + «#!V» = [0x86] ++ «V» + + «V» if V ∈ Float = [0x87, 0x04] ++ binary32(V) + «V» if V ∈ Double = [0x87, 0x08] ++ binary64(V) + + «V» if V ∈ SignedInteger = [0xB0] ++ varint(|intbytes(V)|) ++ intbytes(V) + «V» if V ∈ String = [0xB1] ++ varint(|utf8(V)|) ++ utf8(V) + «V» if V ∈ ByteString = [0xB2] ++ varint(|V|) ++ V + «V» if V ∈ Symbol = [0xB3] ++ varint(|utf8(V)|) ++ utf8(V) + + «» = [0xB4] ++ «L» ++ «F_1» ++...++ «F_m» ++ [0x84] + «[X_1...X_m]» = [0xB5] ++ «X_1» ++...++ «X_m» ++ [0x84] + «#{E_1...E_m}» = [0xB6] ++ «E_1» ++...++ «E_m» ++ [0x84] + «{K_1:V_1...K_m:V_m}» = [0xB7] ++ «K_1» ++ «V_1» ++...++ «K_m» ++ «V_m» ++ [0x84] + + varint(n) = [n] if n < 128 + [(n & 127) | 128] ++ varint(n >> 7) if n ≥ 128 + + intbytes(n) = the empty sequence if n = 0, otherwise signedBigEndian(n) + + signedBigEndian(n) = [n & 255] if -128 ≤ n ≤ 127 + signedBigEndian(n >> 8) ++ [n & 255] otherwise +``` + +The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and +8-byte IEEE 754 binary representations of `F` and `D`, respectively. diff --git a/implementations/rust/preserves/doc/cheatsheet-text-plaintext.md b/implementations/rust/preserves/doc/cheatsheet-text-plaintext.md new file mode 100644 index 0000000..6c3a1ff --- /dev/null +++ b/implementations/rust/preserves/doc/cheatsheet-text-plaintext.md @@ -0,0 +1,44 @@ +```text +Document := Value ws +Value := ws (Record | Collection | Atom | Embedded | Annotated) +Collection := Sequence | Dictionary | Set +Atom := Boolean | ByteString | String | QuotedSymbol | Symbol | Number +ws := (space | tab | cr | lf | `,`)* + +Record := `<` Value+ ws `>` +Sequence := `[` Value* ws `]` +Dictionary := `{` (Value ws `:` Value)* ws `}` +Set := `#{` Value* ws `}` + +Boolean := `#t` | `#f` +ByteString := `#"` binchar* `"` + | `#x"` (ws hex hex)* ws `"` + | `#[` (ws base64char)* ws `]` +String := `"` («any unicode scalar except `\` or `"`» | escaped | `\"`)* `"` +QuotedSymbol := `|` («any unicode scalar except `\` or `|`» | escaped | `\|`)* `|` +Symbol := (`A`..`Z` | `a`..`z` | `0`..`9` | sympunct | symuchar)+ +Number := Float | Double | SignedInteger +Float := flt (`f`|`F`) | `#xf"` (ws hex hex)4 ws `"` +Double := flt | `#xd"` (ws hex hex)8 ws `"` +SignedInteger := int + +Embedded := `#!` Value +Annotated := Annotation Value +Annotation := `@` Value | `;` «any unicode scalar except cr or lf»* (cr | lf) + +escaped := `\\` | `\/` | `\b` | `\f` | `\n` | `\r` | `\t` | `\u` hex hex hex hex +binescaped := `\\` | `\/` | `\b` | `\f` | `\n` | `\r` | `\t` | `\x` hex hex +binchar := «any scalar ≥32 and ≤126, except `\` or `"`» | binescaped | `\"` +base64char := `A`..`Z` | `a`..`z` | `0`..`9` | `+` | `/` | `-` | `_` | `=` +sympunct := `~` | `!` | `$` | `%` | `^` | `&` | `*` | `?` + | `_` | `=` | `+` | `-` | `/` | `.` +symuchar := «any scalar value ≥128 whose Unicode category is + Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd, Nl, No, Pc, + Pd, Po, Sc, Sm, Sk, So, or Co» + +flt := int ( frac exp | frac | exp ) +int := (`-`|`+`) (`0`..`9`)+ +frac := `.` (`0`..`9`)+ +exp := (`e`|`E`) (`-`|`+`) (`0`..`9`)+ +hex := `A`..`F` | `a`..`f` | `0`..`9` +``` diff --git a/implementations/rust/preserves/doc/value-grammar.md b/implementations/rust/preserves/doc/value-grammar.md new file mode 100644 index 0000000..5ad8fde --- /dev/null +++ b/implementations/rust/preserves/doc/value-grammar.md @@ -0,0 +1,18 @@ +```text + Value = Atom + | Compound + | Embedded + + Atom = Boolean + | Float + | Double + | SignedInteger + | String + | ByteString + | Symbol + + Compound = Record + | Sequence + | Set + | Dictionary +``` diff --git a/implementations/rust/preserves/doc/what-is-preserves.md b/implementations/rust/preserves/doc/what-is-preserves.md new file mode 100644 index 0000000..3c8d1bc --- /dev/null +++ b/implementations/rust/preserves/doc/what-is-preserves.md @@ -0,0 +1,12 @@ +*Preserves* is a data model, with associated serialization formats. + +It supports *records* with user-defined *labels*, embedded +*references*, and the usual suite of atomic and compound data types, +including *binary* data as a distinct type from text strings. Its +*annotations* allow separation of data from metadata such as comments, +trace information, and provenance information. + +Preserves departs from many other data languages in defining how to +*compare* two values. Comparison is based on the data model, not on +syntax or on data structures of any particular implementation +language. diff --git a/implementations/rust/preserves/src/de.rs b/implementations/rust/preserves/src/de.rs index 5b3f51c..2209b2b 100644 --- a/implementations/rust/preserves/src/de.rs +++ b/implementations/rust/preserves/src/de.rs @@ -1,3 +1,5 @@ +//! Support for Serde deserialization of Preserves terms described by Rust data types. + use serde::de::{DeserializeSeed, EnumAccess, MapAccess, SeqAccess, VariantAccess, Visitor}; use serde::Deserialize; @@ -11,13 +13,21 @@ use super::value::{IOValue, IOValueDomainCodec, PackedReader, TextReader, ViaCod pub use super::error::Error; +/// A [std::result::Result] type including [Error], the Preserves Serde deserialization error +/// type, as its error. pub type Result = std::result::Result; +/// Serde deserializer for Preserves-encoded Rust data. Use [Deserializer::from_reader] to +/// construct instances, or [from_bytes]/[from_text]/[from_read]/[from_reader] etc to +/// deserialize single terms directly. pub struct Deserializer<'de, 'r, R: Reader<'de, IOValue>> { + /// The underlying Preserves [reader][crate::value::reader::Reader]. pub read: &'r mut R, phantom: PhantomData<&'de ()>, } +/// Deserialize a `T` from `bytes`, which must contain a Preserves [machine-oriented binary +/// syntax][crate::value::packed] term corresponding to the Serde serialization of a `T`. pub fn from_bytes<'de, T>(bytes: &'de [u8]) -> Result where T: Deserialize<'de>, @@ -28,6 +38,8 @@ where )) } +/// Deserialize a `T` from `text`, which must contain a Preserves [text +/// syntax][crate::value::text] term corresponding to the Serde serialization of a `T`. pub fn from_text<'de, T>(text: &'de str) -> Result where T: Deserialize<'de>, @@ -38,6 +50,8 @@ where )) } +/// Deserialize a `T` from `read`, which must yield a Preserves [machine-oriented binary +/// syntax][crate::value::packed] term corresponding to the Serde serialization of a `T`. pub fn from_read<'de, 'r, IOR: io::Read + io::Seek, T>(read: &'r mut IOR) -> Result where T: Deserialize<'de>, @@ -48,6 +62,8 @@ where )) } +/// Deserialize a `T` from `read`, which must yield a Preserves term corresponding to the Serde +/// serialization of a `T`. pub fn from_reader<'r, 'de, R: Reader<'de, IOValue>, T>(read: &'r mut R) -> Result where T: Deserialize<'de>, @@ -58,6 +74,7 @@ where } impl<'r, 'de, R: Reader<'de, IOValue>> Deserializer<'de, 'r, R> { + /// Construct a Deserializer from `read`, a Preserves [reader][crate::value::Reader]. pub fn from_reader(read: &'r mut R) -> Self { Deserializer { read, @@ -344,6 +361,7 @@ impl<'r, 'de, 'a, R: Reader<'de, IOValue>> serde::de::Deserializer<'de> } } +#[doc(hidden)] pub struct Seq<'de, 'r, 'a, R: Reader<'de, IOValue>> { b: B::Type, i: B::Item, diff --git a/implementations/rust/preserves/src/error.rs b/implementations/rust/preserves/src/error.rs index 4e92ac6..c67638a 100644 --- a/implementations/rust/preserves/src/error.rs +++ b/implementations/rust/preserves/src/error.rs @@ -1,27 +1,47 @@ +//! Serde and plain-Preserves codec errors. + use num::bigint::BigInt; use std::convert::From; use std::io; +/// Representation of parse, deserialization, and other conversion errors. #[derive(Debug)] pub enum Error { + /// Generic IO error. Io(io::Error), + /// Generic message for the user. Message(String), + /// Invalid unicode scalar `n` found during interpretation of a `` record + /// as a Rust `char`. InvalidUnicodeScalar(u32), + /// Preserves supports arbitrary integers; when these are converted to specific Rust + /// machine word types, sometimes they exceed the available range. NumberOutOfRange(BigInt), + /// Serde has limited support for deserializing free-form data; this error is signalled + /// when one of the limits is hit. CannotDeserializeAny, + /// Syntax error: missing closing delimiter (`)`, `]`, `}`, `>` in text syntax; `0x84` in binary syntax; etc.) MissingCloseDelimiter, + /// Signalled when an expected term is not present. MissingItem, + /// Signalled when what was received did not match expectations. Expected(ExpectedKind, Received), + #[doc(hidden)] // TODO remove this enum variant? It isn't used StreamingSerializationUnsupported, } +/// Used in [Error::Expected] to indicate what was received. #[derive(Debug)] pub enum Received { + #[doc(hidden)] // TODO remove this enum variant? It isn't used ReceivedSomethingElse, + /// Received a record with the given label symbol text. ReceivedRecordWithLabel(String), + /// Received some other value, described in the `String` ReceivedOtherValue(String), } +/// Used in [Error::Expected] to indicate what was expected. #[derive(Debug, PartialEq)] pub enum ExpectedKind { Boolean, @@ -35,7 +55,9 @@ pub enum ExpectedKind { ByteString, Symbol, + /// Expected a record, either of a specific arity (length) or of no specific arity Record(Option), + /// Expected a record with a symbol label with text `String`, perhaps of some specific arity SimpleRecord(String, Option), Sequence, Set, @@ -87,14 +109,17 @@ impl std::fmt::Display for Error { //--------------------------------------------------------------------------- +/// True iff `e` is `Error::Io` pub fn is_io_error(e: &Error) -> bool { matches!(e, Error::Io(_)) } +/// Produce the generic "end of file" error, `Error::Io(`[io_eof]`())` pub fn eof() -> Error { Error::Io(io_eof()) } +/// True iff `e` is an "end of file" error; see [is_eof_io_error] pub fn is_eof_error(e: &Error) -> bool { if let Error::Io(ioe) = e { is_eof_io_error(ioe) @@ -103,10 +128,12 @@ pub fn is_eof_error(e: &Error) -> bool { } } +/// Produce a syntax error bearing the message `s` pub fn syntax_error(s: &str) -> Error { Error::Io(io_syntax_error(s)) } +/// True iff `e` is a syntax error; see [is_syntax_io_error] pub fn is_syntax_error(e: &Error) -> bool { if let Error::Io(ioe) = e { is_syntax_io_error(ioe) @@ -117,18 +144,22 @@ pub fn is_syntax_error(e: &Error) -> bool { //--------------------------------------------------------------------------- +/// Produce an [io::Error] of [io::ErrorKind::UnexpectedEof]. pub fn io_eof() -> io::Error { io::Error::new(io::ErrorKind::UnexpectedEof, "EOF") } +/// True iff `e` is [io::ErrorKind::UnexpectedEof] pub fn is_eof_io_error(e: &io::Error) -> bool { matches!(e.kind(), io::ErrorKind::UnexpectedEof) } +/// Produce a syntax error ([io::ErrorKind::InvalidData]) bearing the message `s` pub fn io_syntax_error(s: &str) -> io::Error { io::Error::new(io::ErrorKind::InvalidData, s) } +/// True iff `e` is an [io::ErrorKind::InvalidData] (a syntax error) pub fn is_syntax_io_error(e: &io::Error) -> bool { matches!(e.kind(), io::ErrorKind::InvalidData) } diff --git a/implementations/rust/preserves/src/hex.rs b/implementations/rust/preserves/src/hex.rs index 4e171e6..94f1f76 100644 --- a/implementations/rust/preserves/src/hex.rs +++ b/implementations/rust/preserves/src/hex.rs @@ -1,19 +1,38 @@ +//! Utilities for producing and flexibly parsing strings containing hexadecimal binary data. + +/// Utility for parsing hex binary data from strings. pub enum HexParser { + /// "Liberal" parsing simply ignores characters that are not (case-insensitive) hex digits. Liberal, + /// "Whitespace allowed" parsing ignores whitespace, but fails a parse on anything other + /// than hex or whitespace. WhitespaceAllowed, + /// "Strict" parsing accepts only (case-insensitive) hex digits; no whitespace, no other + /// characters. Strict, } +/// Utility for formatting binary data as hex. pub enum HexFormatter { + /// Produces LF-separated lines with a maximum of `usize` hex digits in each line. Lines(usize), + /// Simply packs hex digits in as tightly as possible. Packed, } +/// Convert a number 0..15 to a hex digit [char]. +/// +/// # Panics +/// +/// Panics if given `v` outside the range 0..15 inclusive. +/// pub fn hexdigit(v: u8) -> char { char::from_digit(v as u32, 16).expect("hexadecimal digit value") } impl HexParser { + /// Decode `s` according to the given rules for `self`; see [HexParser]. + /// If the parse fails, yield `None`. pub fn decode(&self, s: &str) -> Option> { let mut result = Vec::new(); let mut buf: u8 = 0; @@ -49,6 +68,7 @@ impl HexParser { } impl HexFormatter { + /// Encode `bs` according to the given rules for `self; see [HexFormatter]. pub fn encode(&self, bs: &[u8]) -> String { match self { HexFormatter::Lines(max_line_length) => { diff --git a/implementations/rust/preserves/src/lib.rs b/implementations/rust/preserves/src/lib.rs index 68c5f50..c82d923 100644 --- a/implementations/rust/preserves/src/lib.rs +++ b/implementations/rust/preserves/src/lib.rs @@ -1,3 +1,9 @@ +#![doc = concat!( + include_str!("../README.md"), + "# What is Preserves?\n\n", + include_str!("../doc/what-is-preserves.md"), +)] + pub mod de; pub mod error; pub mod hex; diff --git a/implementations/rust/preserves/src/ser.rs b/implementations/rust/preserves/src/ser.rs index 6167ba8..da910f5 100644 --- a/implementations/rust/preserves/src/ser.rs +++ b/implementations/rust/preserves/src/ser.rs @@ -1,3 +1,5 @@ +//! Support for Serde serialization of Rust data types into Preserves terms. + use super::value::boundary as B; use super::value::writer::{CompoundWriter, Writer}; use super::value::IOValueDomainCodec; @@ -7,11 +9,16 @@ pub use super::error::Error; type Result = std::result::Result; #[derive(Debug)] +/// Serde serializer for Preserves-encoding Rust data. Construct via [Serializer::new], and use +/// with [serde::Serialize::serialize] methods. pub struct Serializer<'w, W: Writer> { + /// The underlying Preserves [writer][crate::value::writer::Writer]. pub write: &'w mut W, } impl<'w, W: Writer> Serializer<'w, W> { + /// Construct a new [Serializer] targetting the given + /// [writer][crate::value::writer::Writer]. pub fn new(write: &'w mut W) -> Self { Serializer { write } } @@ -22,6 +29,7 @@ enum SequenceVariant { Record(W::RecWriter), } +#[doc(hidden)] pub struct SerializeCompound<'a, 'w, W: Writer> { b: B::Type, i: B::Item, @@ -29,6 +37,7 @@ pub struct SerializeCompound<'a, 'w, W: Writer> { c: SequenceVariant, } +#[doc(hidden)] pub struct SerializeDictionary<'a, 'w, W: Writer> { b: B::Type, ser: &'a mut Serializer<'w, W>, @@ -442,6 +451,8 @@ impl<'a, 'w, W: Writer> serde::ser::SerializeSeq for SerializeCompound<'a, 'w, W } } +/// Convenience function for directly serializing a Serde-serializable `T` to the given +/// `write`, a Preserves [writer][crate::value::writer::Writer]. pub fn to_writer(write: &mut W, value: &T) -> Result<()> { Ok(value.serialize(&mut Serializer::new(write))?) } diff --git a/implementations/rust/preserves/src/set.rs b/implementations/rust/preserves/src/set.rs index 13bef7a..001ab51 100644 --- a/implementations/rust/preserves/src/set.rs +++ b/implementations/rust/preserves/src/set.rs @@ -1,7 +1,26 @@ +//! Serde support for serializing Rust collections as Preserves sets. +//! +//! Serde doesn't include sets in its data model, so we do some somewhat awful tricks to force +//! things to come out the way we want them. +//! +//! # Example +//! +//! Annotate collection-valued fields that you want to (en|de)code as Preserves `Set`s with +//! `#[serde(with = "preserves::set")]`: +//! +//! ```rust +//! #[derive(serde::Serialize, serde::Deserialize)] +//! struct Example { +//! #[serde(with = "preserves::set")] +//! items: preserves::value::Set, +//! } +//! ``` + use crate::value::{self, to_value, IOValue, UnwrappedIOValue}; use serde::{Deserialize, Deserializer, Serialize, Serializer}; use std::iter::IntoIterator; +#[doc(hidden)] pub fn serialize(s: T, serializer: S) -> Result where S: Serializer, @@ -12,6 +31,7 @@ where UnwrappedIOValue::from(s).wrap().serialize(serializer) } +#[doc(hidden)] pub fn deserialize<'de, D, T>(deserializer: D) -> Result where D: Deserializer<'de>, diff --git a/implementations/rust/preserves/src/symbol.rs b/implementations/rust/preserves/src/symbol.rs index 61cd8ce..b275256 100644 --- a/implementations/rust/preserves/src/symbol.rs +++ b/implementations/rust/preserves/src/symbol.rs @@ -1,5 +1,25 @@ +//! Serde support for serializing Rust data as Preserves symbols. +//! +//! Serde doesn't include symbols in its data model, so we do some somewhat awful tricks to +//! force things to come out the way we want them. +//! +//! # Example +//! +//! Either use [Symbol] directly in your data types, or annotate [String]-valued fields that +//! you want to (en|de)code as Preserves `Symbol`s with `#[serde(with = "preserves::symbol")]`: +//! +//! ```rust +//! #[derive(serde::Serialize, serde::Deserialize)] +//! struct Example { +//! sym1: preserves::symbol::Symbol, +//! #[serde(with = "preserves::symbol")] +//! sym2: String, +//! } +//! ``` + use crate::value::{IOValue, NestedValue}; +/// Wrapper for a string to coerce its Preserves-serialization to `Symbol`. #[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Clone)] pub struct Symbol(pub String); @@ -26,6 +46,7 @@ impl<'de> serde::Deserialize<'de> for Symbol { } } +#[doc(hidden)] pub fn serialize(s: &str, serializer: S) -> Result where S: serde::Serializer, @@ -34,6 +55,7 @@ where Symbol(s.to_string()).serialize(serializer) } +#[doc(hidden)] pub fn deserialize<'de, D>(deserializer: D) -> Result where D: serde::Deserializer<'de>, diff --git a/implementations/rust/preserves/src/value/boundary.rs b/implementations/rust/preserves/src/value/boundary.rs index 86e85cf..24f7f77 100644 --- a/implementations/rust/preserves/src/value/boundary.rs +++ b/implementations/rust/preserves/src/value/boundary.rs @@ -1,3 +1,5 @@ +#![doc(hidden)] + #[derive(Default, Clone, Debug)] pub struct Type { pub closing: Option, diff --git a/implementations/rust/preserves/src/value/de.rs b/implementations/rust/preserves/src/value/de.rs index adae205..c767751 100644 --- a/implementations/rust/preserves/src/value/de.rs +++ b/implementations/rust/preserves/src/value/de.rs @@ -1,3 +1,5 @@ +//! Support Serde deserialization of Rust data types from Preserves *values* (not syntax). + use crate::error::{Error, ExpectedKind, Received}; use crate::value::repr::{Double, Float}; use crate::value::{IOValue, Map, NestedValue, UnwrappedIOValue, Value}; @@ -7,10 +9,14 @@ use std::iter::Iterator; pub type Result = std::result::Result; +/// Serde deserializer for constructing Rust data from an in-memory Preserves value. Use +/// [Deserializer::from_value] to construct instances, or [from_value] to deserialize single +/// values directly. pub struct Deserializer<'de> { input: &'de IOValue, } +/// Deserialize a `T` from `v`, a Preserves [IOValue]. pub fn from_value<'a, T>(v: &'a IOValue) -> Result where T: Deserialize<'a>, @@ -21,6 +27,7 @@ where } impl<'de> Deserializer<'de> { + /// Construct a Deserializer from `v`, an [IOValue]. pub fn from_value(v: &'de IOValue) -> Self { Deserializer { input: v } } @@ -331,6 +338,7 @@ impl<'de, 'a> serde::de::Deserializer<'de> for &'a mut Deserializer<'de> { } } +#[doc(hidden)] pub struct VecSeq<'a, 'de: 'a, I: Iterator> { iter: I, de: &'a mut Deserializer<'de>, @@ -359,6 +367,7 @@ impl<'de, 'a, I: Iterator> SeqAccess<'de> for VecSeq<'a, 'd } } +#[doc(hidden)] pub struct DictMap<'a, 'de: 'a> { pending: Option<&'de IOValue>, iter: Box + 'a>, diff --git a/implementations/rust/preserves/src/value/domain.rs b/implementations/rust/preserves/src/value/domain.rs index d45df47..1374c2e 100644 --- a/implementations/rust/preserves/src/value/domain.rs +++ b/implementations/rust/preserves/src/value/domain.rs @@ -1,3 +1,6 @@ +//! Traits for working with Preserves [embedded +//! values](https://preserves.dev/preserves.html#embeddeds). + use std::io; use super::packed; @@ -9,10 +12,12 @@ use super::NestedValue; use super::Reader; use super::Writer; +/// Implementations parse [IOValue]s to their own particular [Embeddable] values of type `D`. pub trait DomainParse { fn parse_embedded(&mut self, v: &IOValue) -> io::Result; } +/// Implementations read and parse from `src` to produce [Embeddable] values of type `D`. pub trait DomainDecode { fn decode_embedded<'de, 'src, S: BinarySource<'de>>( &mut self, @@ -21,6 +26,7 @@ pub trait DomainDecode { ) -> io::Result; } +/// Implementations unparse and write `D`s to `w`, a [writer][crate::value::writer::Writer]. pub trait DomainEncode { fn encode_embedded(&mut self, w: &mut W, d: &D) -> io::Result<()>; } @@ -41,6 +47,9 @@ impl<'a, D: Embeddable, T: DomainDecode> DomainDecode for &'a mut T { } } +/// Convenience codec: use this as embedded codec for encoding (only) when embedded values +/// should be serialized as Preserves `String`s holding their Rust [std::fmt::Debug] +/// representation. pub struct DebugDomainEncode; impl DomainEncode for DebugDomainEncode { @@ -49,6 +58,8 @@ impl DomainEncode for DebugDomainEncode { } } +/// Convenience codec: use this as embedded codec for decoding (only) when embedded values are +/// expected to conform to the syntax implicit in their [std::str::FromStr] implementation. pub struct FromStrDomainParse; impl, D: Embeddable + std::str::FromStr> DomainParse @@ -59,6 +70,8 @@ impl, D: Embeddable + std::str::FromStr> DomainP } } +/// Use this as embedded codec when embedded data are already [IOValue]s that can be directly +/// serialized and deserialized without further transformation. pub struct IOValueDomainCodec; impl DomainDecode for IOValueDomainCodec { @@ -77,6 +90,7 @@ impl DomainEncode for IOValueDomainCodec { } } +/// Use this as embedded codec to forbid use of embedded values; an [io::Error] is signalled. pub struct NoEmbeddedDomainCodec; impl DomainDecode for NoEmbeddedDomainCodec { @@ -101,9 +115,12 @@ impl DomainEncode for NoEmbeddedDomainCodec { } } +/// If some `C` implements [DomainDecode] but not [DomainParse], or vice versa, use `ViaCodec` +/// to promote the one to the other. Construct instances with [ViaCodec::new]. pub struct ViaCodec(C); impl ViaCodec { + /// Constructs a `ViaCodec` wrapper around an underlying codec of type `C`. pub fn new(c: C) -> Self { ViaCodec(c) } diff --git a/implementations/rust/preserves/src/value/magic.rs b/implementations/rust/preserves/src/value/magic.rs index 9a5acda..966e343 100644 --- a/implementations/rust/preserves/src/value/magic.rs +++ b/implementations/rust/preserves/src/value/magic.rs @@ -1,3 +1,12 @@ +#![doc(hidden)] + +//! A horrifying hack to Serde-serialize [IOValue] instances to Preserves *as themselves*. +//! +//! Frankly I think this portion of the codebase might not survive for long. I can't think of a +//! better way of achieving this, but the drawbacks of having this functionality are *severe*. +//! +//! See . + use super::repr::IOValue; pub static MAGIC: &str = "$____Preserves_Serde_Magic"; diff --git a/implementations/rust/preserves/src/value/merge.rs b/implementations/rust/preserves/src/value/merge.rs index 4eea4c3..3243be3 100644 --- a/implementations/rust/preserves/src/value/merge.rs +++ b/implementations/rust/preserves/src/value/merge.rs @@ -1,8 +1,13 @@ +//! Implements the Preserves +//! [merge](https://preserves.dev/preserves.html#appendix-merging-values) of values. + use super::Map; use super::NestedValue; use super::Record; use super::Value; +/// Merge two sequences of values according to [the +/// specification](https://preserves.dev/preserves.html#appendix-merging-values). pub fn merge_seqs(mut a: Vec, mut b: Vec) -> Option> { if a.len() > b.len() { std::mem::swap(&mut a, &mut b); @@ -16,6 +21,8 @@ pub fn merge_seqs(mut a: Vec, mut b: Vec) -> Option Some(r) } +/// Merge two values according to [the +/// specification](https://preserves.dev/preserves.html#appendix-merging-values). pub fn merge2(v: N, w: N) -> Option { let (mut v_anns, v_val) = v.pieces(); let (w_anns, w_val) = w.pieces(); @@ -52,6 +59,8 @@ pub fn merge2(v: N, w: N) -> Option { } } +/// Merge several values into a single value according to [the +/// specification](https://preserves.dev/preserves.html#appendix-merging-values). pub fn merge>(vs: I) -> Option { let mut vs = vs.into_iter(); let mut v = vs.next().expect("at least one value in merge()"); diff --git a/implementations/rust/preserves/src/value/mod.rs b/implementations/rust/preserves/src/value/mod.rs index 6c0dad1..df627fb 100644 --- a/implementations/rust/preserves/src/value/mod.rs +++ b/implementations/rust/preserves/src/value/mod.rs @@ -1,3 +1,53 @@ +//! # Representing, reading, and writing Preserves `Value`s as Rust data +//! +//! ``` +//! use preserves::value::{IOValue, text, packed}; +//! let v: IOValue = text::iovalue_from_str("")?; +//! let w: IOValue = packed::iovalue_from_bytes(b"\xb4\xb3\x02hi\x84")?; +//! assert_eq!(v, w); +//! assert_eq!(text::TextWriter::encode_iovalue(&v)?, ""); +//! assert_eq!(packed::PackedWriter::encode_iovalue(&v)?, b"\xb4\xb3\x02hi\x84"); +//! # Ok::<(), std::io::Error>(()) +//! ``` +//! +//! Preserves `Value`s are categorized in the following way. The core representation type, +//! [crate::value::repr::Value], reflects this structure. However, most of the time you will +//! work with [IOValue] or some other implementation of trait [NestedValue], which augments an +//! underlying [Value] with [*annotations*][crate::value::repr::Annotations] (e.g. comments) and fixes a strategy +//! for memory management. +//! +#![doc = include_str!("../../doc/value-grammar.md")] +//! +//! ## Memory management +//! +//! Each implementation of [NestedValue] chooses a different point in the space of possible +//! approaches to memory management for `Value`s. +//! +//! ##### `IOValue` +//! +//! The most commonly-used and versatile implementation, [IOValue], uses [std::sync::Arc] for +//! internal links in compound `Value`s. Unlike many of the other implementations of +//! [NestedValue], [IOValue] doesn't offer flexibility in the Rust data type to be used for +//! Preserves [embedded values](https://preserves.dev/preserves.html#embeddeds): instead, +//! embedded values in an [IOValue] are themselves [IOValue]s. +//! +//! ##### `ArcValue`, `RcValue`, and `PlainValue` +//! +//! For control over the Rust type to use for embedded values, choose [ArcValue], [RcValue], or +//! [PlainValue]. Use [ArcValue] when you wish to transfer values among threads. [RcValue] is +//! more niche; it may be useful for complex terms that do not need to cross thread boundaries. +//! [PlainValue] is even more niche: it does not use a reference-counted pointer type, meaning +//! it does not offer any kind of aliasing or sharing among subterms at all. +//! +//! # Parsing, pretty-printing, encoding and decoding `Value`s +//! +//! Modules [reader] and [writer] supply generic [Reader] and [Writer] traits for parsing and +//! unparsing Preserves data. Implementations of [Reader] and [Writer] connect Preserves data +//! to specific transfer syntaxes: +//! +//! - module [packed] supplies tools for working with the machine-oriented binary syntax +//! - module [text] supplies tools for working with human-readable text syntax + pub mod boundary; pub mod de; pub mod domain; @@ -56,6 +106,7 @@ pub use text::TextReader; pub use text::TextWriter; pub use writer::Writer; +#[doc(hidden)] pub fn invert_map(m: &Map) -> Map where A: Clone, diff --git a/implementations/rust/preserves/src/value/packed/constants.rs b/implementations/rust/preserves/src/value/packed/constants.rs index 66a0840..a9cb44c 100644 --- a/implementations/rust/preserves/src/value/packed/constants.rs +++ b/implementations/rust/preserves/src/value/packed/constants.rs @@ -1,6 +1,9 @@ +//! Definitions of the tags used in the binary encoding. + use std::convert::{From, TryFrom}; use std::io; +/// Rust representation of tags used in the binary encoding. #[derive(Debug, PartialEq, Eq)] pub enum Tag { False, @@ -19,8 +22,9 @@ pub enum Tag { Dictionary, } +/// Error value representing failure to decode a byte into a [Tag]. #[derive(Debug, PartialEq, Eq)] -pub struct InvalidTag(u8); +pub struct InvalidTag(pub u8); impl From for io::Error { fn from(v: InvalidTag) -> Self { diff --git a/implementations/rust/preserves/src/value/packed/mod.rs b/implementations/rust/preserves/src/value/packed/mod.rs index cf069ca..a450e99 100644 --- a/implementations/rust/preserves/src/value/packed/mod.rs +++ b/implementations/rust/preserves/src/value/packed/mod.rs @@ -1,3 +1,15 @@ +//! Implements the Preserves [machine-oriented binary +//! syntax](https://preserves.dev/preserves-binary.html). +//! +//! The main entry points for reading are functions [iovalue_from_bytes], +//! [annotated_iovalue_from_bytes], [from_bytes], and [annotated_from_bytes]. +//! +//! The main entry points for writing are [PackedWriter::encode_iovalue] and +//! [PackedWriter::encode]. +//! +//! # Summary of Binary Syntax +#![doc = include_str!("../../../doc/cheatsheet-binary-plaintext.md")] + pub mod constants; pub mod reader; pub mod writer; @@ -9,6 +21,8 @@ use std::io; use super::{BinarySource, DomainDecode, IOValue, IOValueDomainCodec, NestedValue, Reader}; +/// Reads a value from the given byte vector `bs` using the binary encoding, discarding +/// annotations. pub fn from_bytes>( bs: &[u8], decode_embedded: Dec, @@ -18,10 +32,13 @@ pub fn from_bytes>( .demand_next(false) } +/// Reads an [IOValue] from the given byte vector `bs` using the binary encoding, discarding +/// annotations. pub fn iovalue_from_bytes(bs: &[u8]) -> io::Result { from_bytes(bs, IOValueDomainCodec) } +/// As [from_bytes], but includes annotations. pub fn annotated_from_bytes>( bs: &[u8], decode_embedded: Dec, @@ -31,6 +48,7 @@ pub fn annotated_from_bytes>( .demand_next(true) } +/// As [iovalue_from_bytes], but includes annotations. pub fn annotated_iovalue_from_bytes(bs: &[u8]) -> io::Result { annotated_from_bytes(bs, IOValueDomainCodec) } diff --git a/implementations/rust/preserves/src/value/packed/reader.rs b/implementations/rust/preserves/src/value/packed/reader.rs index b1fb423..eb78c13 100644 --- a/implementations/rust/preserves/src/value/packed/reader.rs +++ b/implementations/rust/preserves/src/value/packed/reader.rs @@ -1,3 +1,5 @@ +//! Implementation of [Reader] for the binary encoding. + use crate::error::{self, io_syntax_error, is_eof_io_error, ExpectedKind, Received}; use num::bigint::BigInt; @@ -18,6 +20,7 @@ use super::super::{ }; use super::constants::Tag; +/// The binary encoding Preserves reader. pub struct PackedReader< 'de, 'src, @@ -25,7 +28,9 @@ pub struct PackedReader< Dec: DomainDecode, S: BinarySource<'de>, > { + /// Underlying source of bytes. pub source: &'src mut S, + /// Decoder for producing Rust values embedded in the binary data. pub decode_embedded: Dec, phantom: PhantomData<&'de N>, } @@ -67,6 +72,7 @@ fn out_of_range>(i: I) -> error::Error { impl<'de, 'src, N: NestedValue, Dec: DomainDecode, S: BinarySource<'de>> PackedReader<'de, 'src, N, Dec, S> { + /// Construct a new reader from a byte source and embedded-value decoder. #[inline(always)] pub fn new(source: &'src mut S, decode_embedded: Dec) -> Self { PackedReader { diff --git a/implementations/rust/preserves/src/value/packed/writer.rs b/implementations/rust/preserves/src/value/packed/writer.rs index 140f7f6..72d6294 100644 --- a/implementations/rust/preserves/src/value/packed/writer.rs +++ b/implementations/rust/preserves/src/value/packed/writer.rs @@ -1,3 +1,5 @@ +//! Implementation of [Writer] for the binary encoding. + use super::super::boundary as B; use super::super::suspendable::Suspendable; use super::super::DomainEncode; @@ -13,9 +15,11 @@ use std::ops::DerefMut; use super::super::writer::{varint, CompoundWriter, Writer}; +/// The binary encoding Preserves writer. pub struct PackedWriter(Suspendable); impl PackedWriter<&mut Vec> { + /// Encodes `v` to a byte vector. #[inline(always)] pub fn encode>( enc: &mut Enc, @@ -26,6 +30,7 @@ impl PackedWriter<&mut Vec> { Ok(buf) } + /// Encodes `v` to a byte vector. #[inline(always)] pub fn encode_iovalue(v: &IOValue) -> io::Result> { Self::encode(&mut IOValueDomainCodec, v) @@ -33,26 +38,31 @@ impl PackedWriter<&mut Vec> { } impl PackedWriter { + /// Construct a writer from the given byte sink `write`. #[inline(always)] pub fn new(write: W) -> Self { PackedWriter(Suspendable::new(write)) } + /// Retrieve a mutable reference to the underlying byte sink. #[inline(always)] pub fn w(&mut self) -> &mut W { self.0.deref_mut() } + #[doc(hidden)] #[inline(always)] pub fn write_byte(&mut self, b: u8) -> io::Result<()> { self.w().write_all(&[b]) } + #[doc(hidden)] #[inline(always)] pub fn write_integer(&mut self, bs: &[u8]) -> io::Result<()> { self.write_atom(Tag::SignedInteger, bs) } + #[doc(hidden)] #[inline(always)] pub fn write_atom(&mut self, tag: Tag, bs: &[u8]) -> io::Result<()> { self.write_byte(tag.into())?; @@ -60,17 +70,20 @@ impl PackedWriter { self.w().write_all(bs) } + #[doc(hidden)] #[inline(always)] pub fn suspend(&mut self) -> Self { PackedWriter(self.0.suspend()) } + #[doc(hidden)] #[inline(always)] pub fn resume(&mut self, other: Self) { self.0.resume(other.0) } } +#[doc(hidden)] pub struct BinaryOrderWriter(Vec>); impl BinaryOrderWriter { @@ -119,6 +132,7 @@ impl BinaryOrderWriter { } } +#[doc(hidden)] pub trait WriteWriter: Writer { fn write_raw_bytes(&mut self, v: &[u8]) -> io::Result<()>; diff --git a/implementations/rust/preserves/src/value/reader.rs b/implementations/rust/preserves/src/value/reader.rs index fc4b8dd..1900e66 100644 --- a/implementations/rust/preserves/src/value/reader.rs +++ b/implementations/rust/preserves/src/value/reader.rs @@ -1,3 +1,6 @@ +//! Generic [Reader] trait for parsing Preserves [Value][crate::value::repr::Value]s, +//! implemented by code that provides each specific transfer syntax. + use crate::error::{self, io_eof, ExpectedKind, Received}; use std::borrow::Cow; @@ -18,59 +21,104 @@ use super::ViaCodec; pub type ReaderResult = std::result::Result; +/// Tokens produced when performing +/// [SAX](https://en.wikipedia.org/wiki/Simple_API_for_XML)-style reading of terms. pub enum Token { + /// An embedded value was seen and completely decoded. Embedded(N::Embedded), + /// An atomic value was seen and completely decoded. Atom(N), + /// A compound value has been opened; its contents follow, and it will be terminated by + /// [Token::End]. Compound(CompoundClass), + /// Closes a previously-opened compound value. End, } +/// Generic parser for Preserves. pub trait Reader<'de, N: NestedValue> { + /// Retrieve the next parseable value or an indication of end-of-input. + /// + /// Yields `Ok(Some(...))` if a complete value is available, `Ok(None)` if the end of + /// stream has been reached, or `Err(...)` for parse or IO errors, including + /// incomplete/partial input. See also [Reader::demand_next]. fn next(&mut self, read_annotations: bool) -> io::Result>; + + // Hiding these from the documentation for the moment because I don't want to have to + // document the whole Boundary thing. + #[doc(hidden)] fn open_record(&mut self, arity: Option) -> ReaderResult; + #[doc(hidden)] fn open_sequence_or_set(&mut self) -> ReaderResult; + #[doc(hidden)] fn open_sequence(&mut self) -> ReaderResult<()>; + #[doc(hidden)] fn open_set(&mut self) -> ReaderResult<()>; + #[doc(hidden)] fn open_dictionary(&mut self) -> ReaderResult<()>; + #[doc(hidden)] fn boundary(&mut self, b: &B::Type) -> ReaderResult<()>; + #[doc(hidden)] // close_compound implies a b.shift(...) and a self.boundary(b). fn close_compound(&mut self, b: &mut B::Type, i: &B::Item) -> ReaderResult; + #[doc(hidden)] fn open_embedded(&mut self) -> ReaderResult<()>; + #[doc(hidden)] fn close_embedded(&mut self) -> ReaderResult<()>; + /// Allows structured backtracking to an earlier stage in a parse. Useful for layering + /// parser combinators atop a Reader. type Mark; + /// Retrieve a marker for the current position in the input. fn mark(&mut self) -> io::Result; + /// Seek the input to a previously-saved position. fn restore(&mut self, mark: &Self::Mark) -> io::Result<()>; + /// Get the next [SAX](https://en.wikipedia.org/wiki/Simple_API_for_XML)-style event, + /// discarding annotations. + /// + /// The `read_embedded_annotations` controls whether annotations are also skipped on + /// *embedded* values or not. fn next_token(&mut self, read_embedded_annotations: bool) -> io::Result>; + /// Get the next [SAX](https://en.wikipedia.org/wiki/Simple_API_for_XML)-style event, plus + /// a vector containing any annotations that preceded it. fn next_annotations_and_token(&mut self) -> io::Result<(Vec, Token)>; //--------------------------------------------------------------------------- + /// Skips the next available complete value. Yields an error if no such value exists. fn skip_value(&mut self) -> io::Result<()> { // TODO efficient skipping in specific impls of this trait let _ = self.demand_next(false)?; Ok(()) } + /// Retrieve the next parseable value, treating end-of-input as an error. + /// + /// Yields `Ok(...)` if a complete value is available or `Err(...)` for parse or IO errors, + /// including incomplete/partial input or end of stream. See also [Reader::next]. fn demand_next(&mut self, read_annotations: bool) -> io::Result { self.next(read_annotations)?.ok_or_else(io_eof) } + /// Yields the next value, if it is a `Boolean`, or an error otherwise. fn next_boolean(&mut self) -> ReaderResult { self.demand_next(false)?.value().to_boolean() } + /// Yields the next value, if it is a `Float`, or an error otherwise. fn next_float(&mut self) -> ReaderResult { Ok(self.demand_next(false)?.value().to_float()?.to_owned()) } + /// Yields the next value, if it is a `Double`, or an error otherwise. fn next_double(&mut self) -> ReaderResult { Ok(self.demand_next(false)?.value().to_double()?.to_owned()) } + /// Yields the next value, if it is a `SignedInteger`, or an error otherwise. fn next_signedinteger(&mut self) -> ReaderResult { Ok(self .demand_next(false)? @@ -79,64 +127,92 @@ pub trait Reader<'de, N: NestedValue> { .to_owned()) } + /// Yields the next value, if it is a `SignedInteger` that fits in [i8], or an error + /// otherwise. fn next_i8(&mut self) -> ReaderResult { self.demand_next(false)?.value().to_i8() } + /// Yields the next value, if it is a `SignedInteger` that fits in [u8], or an error + /// otherwise. fn next_u8(&mut self) -> ReaderResult { self.demand_next(false)?.value().to_u8() } + /// Yields the next value, if it is a `SignedInteger` that fits in [i16], or an error + /// otherwise. fn next_i16(&mut self) -> ReaderResult { self.demand_next(false)?.value().to_i16() } + /// Yields the next value, if it is a `SignedInteger` that fits in [u16], or an error + /// otherwise. fn next_u16(&mut self) -> ReaderResult { self.demand_next(false)?.value().to_u16() } + /// Yields the next value, if it is a `SignedInteger` that fits in [i32], or an error + /// otherwise. fn next_i32(&mut self) -> ReaderResult { self.demand_next(false)?.value().to_i32() } + /// Yields the next value, if it is a `SignedInteger` that fits in [u32], or an error + /// otherwise. fn next_u32(&mut self) -> ReaderResult { self.demand_next(false)?.value().to_u32() } + /// Yields the next value, if it is a `SignedInteger` that fits in [i64], or an error + /// otherwise. fn next_i64(&mut self) -> ReaderResult { self.demand_next(false)?.value().to_i64() } + /// Yields the next value, if it is a `SignedInteger` that fits in [u64], or an error + /// otherwise. fn next_u64(&mut self) -> ReaderResult { self.demand_next(false)?.value().to_u64() } + /// Yields the next value, if it is a `SignedInteger` that fits in [i128], or an error + /// otherwise. fn next_i128(&mut self) -> ReaderResult { self.demand_next(false)?.value().to_i128() } + /// Yields the next value, if it is a `SignedInteger` that fits in [u128], or an error + /// otherwise. fn next_u128(&mut self) -> ReaderResult { self.demand_next(false)?.value().to_u128() } + /// Yields the next value as an [f32], if it is a `Float`, or an error otherwise. fn next_f32(&mut self) -> ReaderResult { self.demand_next(false)?.value().to_f32() } + /// Yields the next value as an [f64], if it is a `Double`, or an error otherwise. fn next_f64(&mut self) -> ReaderResult { self.demand_next(false)?.value().to_f64() } + /// Yields the next value as a [char], if it is parseable by + /// [Value::to_char][crate::value::Value::to_char], or an error otherwise. fn next_char(&mut self) -> ReaderResult { self.demand_next(false)?.value().to_char() } + /// Yields the next value, if it is a `String`, or an error otherwise. fn next_str(&mut self) -> ReaderResult> { Ok(Cow::Owned( self.demand_next(false)?.value().to_string()?.to_owned(), )) } + /// Yields the next value, if it is a `ByteString`, or an error otherwise. fn next_bytestring(&mut self) -> ReaderResult> { Ok(Cow::Owned( self.demand_next(false)?.value().to_bytestring()?.to_owned(), )) } + /// Yields the next value, if it is a `Symbol`, or an error otherwise. fn next_symbol(&mut self) -> ReaderResult> { Ok(Cow::Owned( self.demand_next(false)?.value().to_symbol()?.to_owned(), )) } + #[doc(hidden)] fn open_option(&mut self) -> ReaderResult> { let b = self.open_record(None)?; let label: &str = &self.next_symbol()?; @@ -153,6 +229,7 @@ pub trait Reader<'de, N: NestedValue> { } } + #[doc(hidden)] fn open_simple_record(&mut self, name: &str, arity: Option) -> ReaderResult { let b = self.open_record(arity)?; let label: &str = &self.next_symbol()?; @@ -166,6 +243,7 @@ pub trait Reader<'de, N: NestedValue> { } } + /// Constructs a [ConfiguredReader] set with the given value for `read_annotations`. fn configured(self, read_annotations: bool) -> ConfiguredReader<'de, N, Self> where Self: std::marker::Sized, @@ -177,6 +255,7 @@ pub trait Reader<'de, N: NestedValue> { } } + #[doc(hidden)] fn ensure_more_expected(&mut self, b: &mut B::Type, i: &B::Item) -> ReaderResult<()> { if !self.close_compound(b, i)? { Ok(()) @@ -185,6 +264,7 @@ pub trait Reader<'de, N: NestedValue> { } } + #[doc(hidden)] fn ensure_complete(&mut self, mut b: B::Type, i: &B::Item) -> ReaderResult<()> { if !self.close_compound(&mut b, i)? { Err(error::Error::MissingCloseDelimiter) @@ -254,16 +334,27 @@ impl<'r, 'de, N: NestedValue, R: Reader<'de, N>> Reader<'de, N> for &'r mut R { } } +/// Generic seekable stream of input bytes. pub trait BinarySource<'de>: Sized { + /// Allows structured backtracking to an earlier position in an input. type Mark; + /// Retrieve a marker for the current position in the input. fn mark(&mut self) -> io::Result; + /// Seek the input to a previously-saved position. fn restore(&mut self, mark: &Self::Mark) -> io::Result<()>; + /// Skip the next byte. fn skip(&mut self) -> io::Result<()>; + /// Returns the next byte without advancing over it. fn peek(&mut self) -> io::Result; + /// Returns and consumes the next `count` bytes, which must all be available. Always yields + /// exactly `count` bytes or an error. fn readbytes(&mut self, count: usize) -> io::Result>; + /// As [BinarySource::readbytes], but uses `bs` as destination for the read bytes as well + /// as taking the size of `bs` as the count of bytes to read. fn readbytes_into(&mut self, bs: &mut [u8]) -> io::Result<()>; + /// Constructs a [PackedReader][super::PackedReader] that will read from `self`. fn packed>( &mut self, decode_embedded: Dec, @@ -271,12 +362,14 @@ pub trait BinarySource<'de>: Sized { super::PackedReader::new(self, decode_embedded) } + /// Constructs a [PackedReader][super::PackedReader] that will read [IOValue]s from `self`. fn packed_iovalues( &mut self, ) -> super::PackedReader<'de, '_, IOValue, IOValueDomainCodec, Self> { self.packed(IOValueDomainCodec) } + /// Constructs a [TextReader][super::TextReader] that will read from `self`. fn text>( &mut self, decode_embedded: Dec, @@ -284,6 +377,7 @@ pub trait BinarySource<'de>: Sized { super::TextReader::new(self, decode_embedded) } + /// Constructs a [TextReader][super::TextReader] that will read [IOValue]s from `self`. fn text_iovalues( &mut self, ) -> super::TextReader<'de, '_, IOValue, ViaCodec, Self> { @@ -291,12 +385,18 @@ pub trait BinarySource<'de>: Sized { } } +/// Implementation of [BinarySource] backed by an [`io::Read`]` + `[`io::Seek`] implementation. pub struct IOBinarySource { + /// The underlying byte source. pub read: R, + #[doc(hidden)] + /// One-place buffer for peeked bytes. pub buf: Option, } impl IOBinarySource { + /// Constructs an [IOBinarySource] from the given [`io::Read`]` + `[`io::Seek`] + /// implementation. #[inline(always)] pub fn new(read: R) -> Self { IOBinarySource { read, buf: None } @@ -364,12 +464,17 @@ impl<'de, R: io::Read + io::Seek> BinarySource<'de> for IOBinarySource { } } +/// Implementation of [BinarySource] backed by a slice of [u8]. pub struct BytesBinarySource<'de> { + /// The underlying byte source. pub bytes: &'de [u8], + #[doc(hidden)] + /// Current position within `bytes`. pub index: usize, } impl<'de> BytesBinarySource<'de> { + /// Constructs a [BytesBinarySource] from the given `u8` slice. #[inline(always)] pub fn new(bytes: &'de [u8]) -> Self { BytesBinarySource { bytes, index: 0 } @@ -432,21 +537,29 @@ impl<'de> BinarySource<'de> for BytesBinarySource<'de> { } } +/// A combination of a [Reader] with presets governing its operation. pub struct ConfiguredReader<'de, N: NestedValue, R: Reader<'de, N>> { + /// The underlying [Reader]. pub reader: R, + /// Configuration as to whether to include or discard annotations while reading. pub read_annotations: bool, phantom: PhantomData<&'de N>, } impl<'de, N: NestedValue, R: Reader<'de, N>> ConfiguredReader<'de, N, R> { + /// Constructs a [ConfiguredReader] based on the given `reader`. pub fn new(reader: R) -> Self { reader.configured(true) } + /// Updates the `read_annotations` field of `self`. pub fn set_read_annotations(&mut self, read_annotations: bool) { self.read_annotations = read_annotations } + /// Retrieve the next parseable value, treating end-of-input as an error. + /// + /// Delegates directly to [Reader::demand_next]. pub fn demand_next(&mut self) -> io::Result { self.reader.demand_next(self.read_annotations) } diff --git a/implementations/rust/preserves/src/value/repr.rs b/implementations/rust/preserves/src/value/repr.rs index 274eb55..d739b8b 100644 --- a/implementations/rust/preserves/src/value/repr.rs +++ b/implementations/rust/preserves/src/value/repr.rs @@ -1,3 +1,5 @@ +//! In-memory representation of Preserves `Value`s. + use num::bigint::BigInt; use num::traits::cast::ToPrimitive; use std::borrow::Cow; @@ -26,12 +28,19 @@ use super::TextWriter; use super::Writer; use crate::error::{Error, ExpectedKind, Received}; +/// A `Domain` implementation allows a Rust value to be placed as a Preserves [embedded +/// value](https://preserves.dev/preserves.html#embeddeds) inside a Preserves term. (See also +/// [Embeddable].) pub trait Domain: Sized + Debug + Eq + Hash + Ord { fn debug_encode(&self, w: &mut W) -> io::Result<()> { w.write_string(&format!("{:?}", self)) } } +/// Any Rust value that implements [`Domain`] and `Clone` is automatically `Embeddable`, and +/// may be placed as a Preserves [embedded +/// value](https://preserves.dev/preserves.html#embeddeds) inside a Preserves term. (See also +/// [Domain].) pub trait Embeddable: Domain + Clone {} impl Embeddable for T where T: Domain + Clone {} @@ -41,9 +50,15 @@ impl Domain for Arc { } } +/// This is the **primary programming interface** to Preserves values. The most common and +/// useful implementations of this trait are first [IOValue] and second [ArcValue]. pub trait NestedValue: Sized + Debug + Clone + Eq + Hash + Ord { + /// Every representation of Preserves values has an associated type: that of the Rust data + /// able to be [embedded](https://preserves.dev/preserves.html#embeddeds) inside a value. type Embedded: Embeddable; + /// `v` can be converted to a [Value]; `new` does this and then [wrap][Value::wrap]s it to + /// yield an instance of [Self]. #[inline(always)] fn new(v: V) -> Self where @@ -52,6 +67,8 @@ pub trait NestedValue: Sized + Debug + Clone + Eq + Hash + Ord { Value::from(v).wrap() } + /// [Embeds](https://preserves.dev/preserves.html#embeddeds) `e` to a Preserves embedded + /// value; `e` is first converted to [Self::Embedded]. #[inline(always)] fn domain(e: E) -> Self where @@ -60,28 +77,38 @@ pub trait NestedValue: Sized + Debug + Clone + Eq + Hash + Ord { Value::Embedded(e.into()).wrap() } + /// Yields a Preserves `Symbol` embodying the given text, `n`. #[inline(always)] fn symbol(n: &str) -> Self { Value::symbol(n).wrap() } + /// Yields a Preserves `ByteString`. #[inline(always)] fn bytestring<'a, V: Into>>(v: V) -> Self { Value::bytestring(v).wrap() } + /// Attaches the given [Annotations] to the [Value]. fn wrap(anns: Annotations, v: Value) -> Self; + /// Retrieves any annotations attached to `self`. fn annotations(&self) -> &Annotations; + /// Retrieves the underlying [Value] represented by `self`. fn value(&self) -> &Value; + /// Consumes `self`, yielding its annotations and underlying [Value]. fn pieces(self) -> (Annotations, Value); + /// Consumes `self`, yielding its underlying [Value] and discarding its annotations. fn value_owned(self) -> Value; + /// Retrieves the [ValueClass] of `self`. #[inline(always)] fn value_class(&self) -> ValueClass { self.value().value_class() } + /// Supplies an opportunity to customize debug formatting for `self`. Defaults to writing + /// `@`-prefixed annotations followed by the underlying value. fn debug_fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { for ann in self.annotations().slice() { write!(f, "@{:?} ", ann)?; @@ -89,10 +116,12 @@ pub trait NestedValue: Sized + Debug + Clone + Eq + Hash + Ord { self.value().fmt(f) } + /// Yields a deep copy of `self` with all annotations (recursively) removed. fn strip_annotations>(&self) -> M { M::wrap(Annotations::empty(), self.value().strip_annotations()) } + /// Yields a deep copy of `self`, mapping embedded values to a new type via `f`. fn copy_via(&self, f: &mut F) -> Result where F: FnMut(&Self::Embedded) -> Result, Err>, @@ -103,6 +132,7 @@ pub trait NestedValue: Sized + Debug + Clone + Eq + Hash + Ord { )) } + /// Calls `f` once for each (recursively) embedded [Self::Embedded] value in `self`. fn foreach_embedded(&self, f: &mut F) -> Result<(), Err> where F: FnMut(&Self::Embedded) -> Result<(), Err>, @@ -173,46 +203,56 @@ pub struct Float(pub f32); #[derive(Clone, Copy, Debug)] pub struct Double(pub f64); -/// A Record `Value` -- INVARIANT: length always non-zero +/// A Record `Value`. +/// +/// INVARIANT: The length of the contained vector **MUST** always be non-zero. #[derive(Clone, Debug, PartialEq, Eq, Hash, PartialOrd, Ord)] pub struct Record(pub Vec); impl Record { + /// Retrieve the record's label. #[inline(always)] pub fn label(&self) -> &N { &self.0[0] } + /// Retrieve a mutable reference to the record's label. #[inline(always)] pub fn label_mut(&mut self) -> &mut N { &mut self.0[0] } + /// Retrieve the arity of the record, the number of fields it has. #[inline(always)] pub fn arity(&self) -> usize { self.0.len() - 1 } + /// Retrieve a slice containing the fields of the record. #[inline(always)] pub fn fields(&self) -> &[N] { &self.0[1..] } + /// Retrieve a mutable slice containing the fields of the record. #[inline(always)] pub fn fields_mut(&mut self) -> &mut [N] { &mut self.0[1..] } + /// Retrieve a reference to a vector containing the record's label and fields. #[inline(always)] pub fn fields_vec(&self) -> &Vec { &self.0 } + /// Retrieve a mutable reference to a vector containing the record's label and fields. #[inline(always)] pub fn fields_vec_mut(&mut self) -> &mut Vec { &mut self.0 } + /// Converts `self` into a [Value]. #[inline(always)] pub fn finish(self) -> Value where @@ -493,10 +533,12 @@ impl< //--------------------------------------------------------------------------- impl Value { + /// Converts `self` to a [NestedValue] by supplying an empty collection of annotations. pub fn wrap(self) -> N { N::wrap(Annotations::empty(), self) } + /// Retrieves the [ValueClass] of `self`. fn value_class(&self) -> ValueClass { match self { Value::Boolean(_) => ValueClass::Atomic(AtomClass::Boolean), @@ -514,6 +556,11 @@ impl Value { } } + /// Retrieve a vector of the "children" of `self`. + /// + /// For atoms, this is an empty vector. For records, it's all the fields (but not the + /// label). For sequences and sets, it's the contained values. For dictionaries, it's all + /// the values in the key-value mappings (but not the keys). pub fn children(&self) -> Vec { match self { Value::Boolean(_) @@ -539,11 +586,13 @@ impl Value { ) } + /// True iff this is a [Value::Boolean]. #[inline(always)] pub fn is_boolean(&self) -> bool { self.as_boolean().is_some() } + /// Yields `Some` iff this is a [Value::Boolean]. #[inline(always)] pub fn as_boolean(&self) -> Option { if let Value::Boolean(b) = self { @@ -553,6 +602,7 @@ impl Value { } } + /// Retrieve a mutable reference to the contained boolean value iff this is a [Value::Boolean]. #[inline(always)] pub fn as_boolean_mut(&mut self) -> Option<&mut bool> { if let Value::Boolean(b) = self { @@ -562,17 +612,20 @@ impl Value { } } + /// Yields `Ok` iff this is a [Value::Boolean]; else [Error::Expected]. #[inline(always)] pub fn to_boolean(&self) -> Result { self.as_boolean() .ok_or_else(|| self.expected(ExpectedKind::Boolean)) } + /// True iff this is a [Value::Float]. #[inline(always)] pub fn is_float(&self) -> bool { self.as_float().is_some() } + /// Yields `Some` iff this is a [Value::Float]. #[inline(always)] pub fn as_float(&self) -> Option<&Float> { if let Value::Float(f) = self { @@ -582,6 +635,7 @@ impl Value { } } + /// Retrieve a mutable reference to the contained [Float] value iff this is a [Value::Float]. #[inline(always)] pub fn as_float_mut(&mut self) -> Option<&mut Float> { if let Value::Float(f) = self { @@ -591,37 +645,44 @@ impl Value { } } + /// Yields `Ok` iff this is a [Value::Float]; else [Error::Expected]. #[inline(always)] pub fn to_float(&self) -> Result<&Float, Error> { self.as_float() .ok_or_else(|| self.expected(ExpectedKind::Float)) } + /// As [Self::is_float]. #[inline(always)] pub fn is_f32(&self) -> bool { self.is_float() } + /// As [Self::as_float], but yields [f32] instead of [Float]. #[inline(always)] pub fn as_f32(&self) -> Option { self.as_float().map(|f| f.0) } + /// As [Self::as_float_mut], but [f32] instead of [Float]. #[inline(always)] pub fn as_f32_mut(&mut self) -> Option<&mut f32> { self.as_float_mut().map(|f| &mut f.0) } + /// As [Self::to_float], but with [f32] instead of [Float]. #[inline(always)] pub fn to_f32(&self) -> Result { self.to_float().map(|f| f.0) } + /// True iff this is a [Value::Double]. #[inline(always)] pub fn is_double(&self) -> bool { self.as_double().is_some() } + /// Yields `Some` iff this is a [Value::Double]. #[inline(always)] pub fn as_double(&self) -> Option<&Double> { if let Value::Double(f) = self { @@ -631,6 +692,7 @@ impl Value { } } + /// Retrieve a mutable reference to the contained [Double] value iff this is a [Value::Double]. #[inline(always)] pub fn as_double_mut(&mut self) -> Option<&mut Double> { if let Value::Double(f) = self { @@ -640,37 +702,44 @@ impl Value { } } + /// Yields `Ok` iff this is a [Value::Double]; else [Error::Expected]. #[inline(always)] pub fn to_double(&self) -> Result<&Double, Error> { self.as_double() .ok_or_else(|| self.expected(ExpectedKind::Double)) } + /// As [Self::is_double]. #[inline(always)] pub fn is_f64(&self) -> bool { self.is_double() } + /// As [Self::as_double], but yields [f64] instead of [Double]. #[inline(always)] pub fn as_f64(&self) -> Option { self.as_double().map(|f| f.0) } + /// As [Self::as_double_mut], but [f64] instead of [Double]. #[inline(always)] pub fn as_f64_mut(&mut self) -> Option<&mut f64> { self.as_double_mut().map(|f| &mut f.0) } + /// As [Self::to_double], but with [f64] instead of [Double]. #[inline(always)] pub fn to_f64(&self) -> Result { self.to_double().map(|f| f.0) } + /// True iff this is a [Value::SignedInteger]. #[inline(always)] pub fn is_signedinteger(&self) -> bool { self.as_signedinteger().is_some() } + /// Yields `Some` iff this is a [Value::SignedInteger]. #[inline(always)] pub fn as_signedinteger(&self) -> Option<&SignedInteger> { if let Value::SignedInteger(n) = self { @@ -680,6 +749,7 @@ impl Value { } } + /// Retrieve a mutable reference to the contained SignedInteger value iff this is a [Value::SignedInteger]. #[inline(always)] pub fn as_signedinteger_mut(&mut self) -> Option<&mut SignedInteger> { if let Value::SignedInteger(n) = self { @@ -689,93 +759,114 @@ impl Value { } } + /// Yields `Ok` iff this is a [Value::SignedInteger]; else [Error::Expected]. #[inline(always)] pub fn to_signedinteger(&self) -> Result<&SignedInteger, Error> { self.as_signedinteger() .ok_or_else(|| self.expected(ExpectedKind::SignedInteger)) } + /// True iff [Self::as_i] yields `Some`. #[inline(always)] pub fn is_i(&self) -> bool { self.as_i().is_some() } + /// Yields `Some` if `self` is a [Value::SignedInteger] that fits in [i128]. #[inline(always)] pub fn as_i(&self) -> Option { self.as_signedinteger().and_then(|n| n.try_into().ok()) } + /// Yields `Ok` if `self` is a [Value::SignedInteger] that fits in [i128]; else [Error::Expected]. #[inline(always)] pub fn to_i(&self) -> Result { self.as_i() .ok_or_else(|| self.expected(ExpectedKind::SignedIntegerI128)) } + /// True iff [Self::as_u] yields `Some`. #[inline(always)] pub fn is_u(&self) -> bool { self.as_u().is_some() } + /// Yields `Some` if `self` is a [Value::SignedInteger] that fits in [u128]. #[inline(always)] pub fn as_u(&self) -> Option { self.as_signedinteger().and_then(|n| n.try_into().ok()) } + /// Yields `Ok` if `self` is a [Value::SignedInteger] that fits in [u128]; else [Error::Expected]. #[inline(always)] pub fn to_u(&self) -> Result { self.as_u() .ok_or_else(|| self.expected(ExpectedKind::SignedIntegerU128)) } + /// Yields `Some` if `self` is a [Value::SignedInteger] that fits in [u8]. #[inline(always)] pub fn as_u8(&self) -> Option { self.as_u().and_then(|i| i.to_u8()) } + /// Yields `Some` if `self` is a [Value::SignedInteger] that fits in [i8]. #[inline(always)] pub fn as_i8(&self) -> Option { self.as_i().and_then(|i| i.to_i8()) } + /// Yields `Some` if `self` is a [Value::SignedInteger] that fits in [u16]. #[inline(always)] pub fn as_u16(&self) -> Option { self.as_u().and_then(|i| i.to_u16()) } + /// Yields `Some` if `self` is a [Value::SignedInteger] that fits in [i16]. #[inline(always)] pub fn as_i16(&self) -> Option { self.as_i().and_then(|i| i.to_i16()) } + /// Yields `Some` if `self` is a [Value::SignedInteger] that fits in [u32]. #[inline(always)] pub fn as_u32(&self) -> Option { self.as_u().and_then(|i| i.to_u32()) } + /// Yields `Some` if `self` is a [Value::SignedInteger] that fits in [i32]. #[inline(always)] pub fn as_i32(&self) -> Option { self.as_i().and_then(|i| i.to_i32()) } + /// Yields `Some` if `self` is a [Value::SignedInteger] that fits in [u64]. #[inline(always)] pub fn as_u64(&self) -> Option { self.as_u().and_then(|i| i.to_u64()) } + /// Yields `Some` if `self` is a [Value::SignedInteger] that fits in [i64]. #[inline(always)] pub fn as_i64(&self) -> Option { self.as_i().and_then(|i| i.to_i64()) } + /// Yields `Some` if `self` is a [Value::SignedInteger] that fits in [u128]. #[inline(always)] pub fn as_u128(&self) -> Option { - self.as_u().and_then(|i| i.to_u128()) + self.as_u() } + /// Yields `Some` if `self` is a [Value::SignedInteger] that fits in [i128]. #[inline(always)] pub fn as_i128(&self) -> Option { - self.as_i().and_then(|i| i.to_i128()) + self.as_i() } + /// Yields `Some` if `self` is a [Value::SignedInteger] that fits in [usize]. #[inline(always)] pub fn as_usize(&self) -> Option { self.as_u().and_then(|i| i.to_usize()) } + /// Yields `Some` if `self` is a [Value::SignedInteger] that fits in [isize]. #[inline(always)] pub fn as_isize(&self) -> Option { self.as_i().and_then(|i| i.to_isize()) } + /// Yields `Ok` if `self` is a [Value::SignedInteger] that fits in [i8]; + /// otherwise, [Error::Expected] or [Error::NumberOutOfRange]. #[inline(always)] pub fn to_i8(&self) -> Result { match self.as_i() { @@ -786,6 +877,8 @@ impl Value { } } + /// Yields `Ok` if `self` is a [Value::SignedInteger] that fits in [u8]; + /// otherwise, [Error::Expected] or [Error::NumberOutOfRange]. #[inline(always)] pub fn to_u8(&self) -> Result { match self.as_u() { @@ -796,6 +889,8 @@ impl Value { } } + /// Yields `Ok` if `self` is a [Value::SignedInteger] that fits in [i16]; + /// otherwise, [Error::Expected] or [Error::NumberOutOfRange]. #[inline(always)] pub fn to_i16(&self) -> Result { match self.as_i() { @@ -806,6 +901,8 @@ impl Value { } } + /// Yields `Ok` if `self` is a [Value::SignedInteger] that fits in [u16]; + /// otherwise, [Error::Expected] or [Error::NumberOutOfRange]. #[inline(always)] pub fn to_u16(&self) -> Result { match self.as_u() { @@ -816,6 +913,8 @@ impl Value { } } + /// Yields `Ok` if `self` is a [Value::SignedInteger] that fits in [i32]; + /// otherwise, [Error::Expected] or [Error::NumberOutOfRange]. #[inline(always)] pub fn to_i32(&self) -> Result { match self.as_i() { @@ -826,6 +925,8 @@ impl Value { } } + /// Yields `Ok` if `self` is a [Value::SignedInteger] that fits in [u32]; + /// otherwise, [Error::Expected] or [Error::NumberOutOfRange]. #[inline(always)] pub fn to_u32(&self) -> Result { match self.as_u() { @@ -836,6 +937,8 @@ impl Value { } } + /// Yields `Ok` if `self` is a [Value::SignedInteger] that fits in [i64]; + /// otherwise, [Error::Expected] or [Error::NumberOutOfRange]. #[inline(always)] pub fn to_i64(&self) -> Result { match self.as_i() { @@ -846,6 +949,8 @@ impl Value { } } + /// Yields `Ok` if `self` is a [Value::SignedInteger] that fits in [u64]; + /// otherwise, [Error::Expected] or [Error::NumberOutOfRange]. #[inline(always)] pub fn to_u64(&self) -> Result { match self.as_u() { @@ -856,6 +961,8 @@ impl Value { } } + /// Yields `Ok` if `self` is a [Value::SignedInteger] that fits in [i128]; + /// otherwise, [Error::Expected] or [Error::NumberOutOfRange]. #[inline(always)] pub fn to_i128(&self) -> Result { match self.as_i() { @@ -866,6 +973,8 @@ impl Value { } } + /// Yields `Ok` if `self` is a [Value::SignedInteger] that fits in [u128]; + /// otherwise, [Error::Expected] or [Error::NumberOutOfRange]. #[inline(always)] pub fn to_u128(&self) -> Result { match self.as_u() { @@ -876,6 +985,8 @@ impl Value { } } + /// Yields `Ok` if `self` is a [Value::SignedInteger] that fits in [isize]; + /// otherwise, [Error::Expected] or [Error::NumberOutOfRange]. #[inline(always)] pub fn to_isize(&self) -> Result { match self.as_i() { @@ -886,6 +997,8 @@ impl Value { } } + /// Yields `Ok` if `self` is a [Value::SignedInteger] that fits in [usize]; + /// otherwise, [Error::Expected] or [Error::NumberOutOfRange]. #[inline(always)] pub fn to_usize(&self) -> Result { match self.as_u() { @@ -896,6 +1009,10 @@ impl Value { } } + /// Yields `Ok` if `self` is a record with label a symbol `UnicodeScalar` and single field + /// a SignedInteger that can represent a valid Unicode scalar value. Otherwise, + /// [Error::Expected] or [Error::InvalidUnicodeScalar]. otherwise, [Error::Expected] or + /// [Error::NumberOutOfRange]. #[inline(always)] pub fn to_char(&self) -> Result { let fs = self.to_simple_record("UnicodeScalar", Some(1))?; @@ -903,11 +1020,13 @@ impl Value { char::try_from(c).map_err(|_| Error::InvalidUnicodeScalar(c)) } + /// True iff this is a [Value::String]. #[inline(always)] pub fn is_string(&self) -> bool { self.as_string().is_some() } + /// Yields `Some` iff this is a [Value::String]. #[inline(always)] pub fn as_string(&self) -> Option<&String> { if let Value::String(s) = self { @@ -917,6 +1036,7 @@ impl Value { } } + /// Retrieve a mutable reference to the contained String value iff this is a [Value::String]. #[inline(always)] pub fn as_string_mut(&mut self) -> Option<&mut String> { if let Value::String(s) = self { @@ -926,6 +1046,7 @@ impl Value { } } + /// Consumes `self`, yielding a `String` iff `self` is a [Value::String]. #[inline(always)] pub fn into_string(self) -> Option { match self { @@ -934,22 +1055,26 @@ impl Value { } } + /// Yields `Ok` iff this is a [Value::String]; else [Error::Expected]. #[inline(always)] pub fn to_string(&self) -> Result<&String, Error> { self.as_string() .ok_or_else(|| self.expected(ExpectedKind::String)) } + /// Constructs a [Value::ByteString] from `v`. #[inline(always)] pub fn bytestring<'a, V: Into>>(v: V) -> Self { Value::ByteString(v.into().into_owned()) } + /// True iff this is a [Value::ByteString]. #[inline(always)] pub fn is_bytestring(&self) -> bool { self.as_bytestring().is_some() } + /// Yields `Some` iff this is a [Value::ByteString]. #[inline(always)] pub fn as_bytestring(&self) -> Option<&Vec> { if let Value::ByteString(s) = self { @@ -959,6 +1084,7 @@ impl Value { } } + /// Retrieve a mutable reference to the contained bytes value iff this is a [Value::ByteString]. #[inline(always)] pub fn as_bytestring_mut(&mut self) -> Option<&mut Vec> { if let Value::ByteString(s) = self { @@ -968,6 +1094,7 @@ impl Value { } } + /// Consumes `self`, yielding a `Vec` iff `self` is a [Value::ByteString]. #[inline(always)] pub fn into_bytestring(self) -> Option> { match self { @@ -976,22 +1103,26 @@ impl Value { } } + /// Yields `Ok` iff this is a [Value::ByteString]; else [Error::Expected]. #[inline(always)] pub fn to_bytestring(&self) -> Result<&Vec, Error> { self.as_bytestring() .ok_or_else(|| self.expected(ExpectedKind::ByteString)) } + /// Constructs a [Value::Symbol] from `v`. #[inline(always)] pub fn symbol(s: &str) -> Value { Value::Symbol(s.to_string()) } + /// True iff this is a [Value::Symbol]. #[inline(always)] pub fn is_symbol(&self) -> bool { self.as_symbol().is_some() } + /// Yields `Some` iff this is a [Value::Symbol]. #[inline(always)] pub fn as_symbol(&self) -> Option<&String> { if let Value::Symbol(s) = self { @@ -1001,6 +1132,7 @@ impl Value { } } + /// Retrieve a mutable reference to the contained Symbol's string iff this is a [Value::Symbol]. #[inline(always)] pub fn as_symbol_mut(&mut self) -> Option<&mut String> { if let Value::Symbol(s) = self { @@ -1010,6 +1142,7 @@ impl Value { } } + /// Consumes `self`, yielding a `String` iff `self` is a [Value::Symbol]. #[inline(always)] pub fn into_symbol(self) -> Option { match self { @@ -1018,12 +1151,16 @@ impl Value { } } + /// Yields `Ok` iff this is a [Value::Symbol]; else [Error::Expected]. #[inline(always)] pub fn to_symbol(&self) -> Result<&String, Error> { self.as_symbol() .ok_or_else(|| self.expected(ExpectedKind::Symbol)) } + /// Constructs a record with the given label and expected arity. The new record will + /// initially not have any fields, but will be allocated with capacity for `expected_arity` + /// fields. #[inline(always)] pub fn record(label: N, expected_arity: usize) -> Record { let mut v = Vec::with_capacity(expected_arity + 1); @@ -1031,11 +1168,13 @@ impl Value { Record(v) } + /// True iff this is a [Value::Record]. #[inline(always)] pub fn is_record(&self) -> bool { matches!(*self, Value::Record(_)) } + /// Yields `Some` iff this is a [Value::Record]. #[inline(always)] pub fn as_record(&self, arity: Option) -> Option<&Record> { if let Value::Record(r) = self { @@ -1049,6 +1188,7 @@ impl Value { } } + /// Consumes `self`, yielding a `Record` iff `self` is a [Value::Record]. #[inline(always)] pub fn into_record(self) -> Option> { match self { @@ -1057,6 +1197,7 @@ impl Value { } } + /// Retrieve a mutable reference to the contained Record value iff this is a [Value::Record]. #[inline(always)] pub fn as_record_mut(&mut self, arity: Option) -> Option<&mut Record> { if let Value::Record(r) = self { @@ -1070,22 +1211,27 @@ impl Value { } } + /// Yields `Ok` iff this is a [Value::Record]; else [Error::Expected]. #[inline(always)] pub fn to_record(&self, arity: Option) -> Result<&Record, Error> { self.as_record(arity) .ok_or_else(|| self.expected(ExpectedKind::Record(arity))) } + /// Like [Self::record], but for the common case where the label is to be a `Symbol` with a + /// given text. #[inline(always)] pub fn simple_record(label: &str, expected_arity: usize) -> Record { Self::record(N::symbol(label), expected_arity) } + /// Constructs a record with label a symbol with text `label`, and no fields. #[inline(always)] pub fn simple_record0(label: &str) -> Value { Self::simple_record(label, 0).finish() } + /// Constructs a record with label a symbol with text `label`, and one field. #[inline(always)] pub fn simple_record1(label: &str, field: N) -> Value { let mut r = Self::simple_record(label, 1); @@ -1093,11 +1239,15 @@ impl Value { r.finish() } + /// True iff `self` is a record with label a symbol with text `label` and arity matching + /// `arity`: any arity, if `arity == None`, or the specific `usize` concerned otherwise. #[inline(always)] pub fn is_simple_record(&self, label: &str, arity: Option) -> bool { self.as_simple_record(label, arity).is_some() } + /// Yields `Some` containing a reference to the record's fields iff + /// [`Self::is_simple_record`]`(label, arity)` returns true. #[inline(always)] pub fn as_simple_record(&self, label: &str, arity: Option) -> Option<&[N]> { self.as_record(arity).and_then(|r| match r.label().value() { @@ -1106,12 +1256,14 @@ impl Value { }) } + /// Like [Self::as_simple_record], but yields [Error::Expected] on failure. #[inline(always)] pub fn to_simple_record(&self, label: &str, arity: Option) -> Result<&[N], Error> { self.as_simple_record(label, arity) .ok_or_else(|| self.expected(ExpectedKind::SimpleRecord(label.to_owned(), arity))) } + /// Serde's "option" type is incoded in Preserves as `` or ``. #[inline(always)] pub fn to_option(&self) -> Result, Error> { match self.as_simple_record("None", Some(0)) { @@ -1123,11 +1275,13 @@ impl Value { } } + /// True iff this is a [Value::Sequence]. #[inline(always)] pub fn is_sequence(&self) -> bool { self.as_sequence().is_some() } + /// Yields `Some` iff this is a [Value::Sequence]. #[inline(always)] pub fn as_sequence(&self) -> Option<&Vec> { if let Value::Sequence(s) = self { @@ -1137,6 +1291,7 @@ impl Value { } } + /// Consumes `self`, yielding a [`Vec`] iff `self` is a [Value::Sequence]. #[inline(always)] pub fn into_sequence(self) -> Option> { match self { @@ -1145,6 +1300,7 @@ impl Value { } } + /// Retrieve a mutable reference to the contained [`Vec`] iff this is a [Value::Sequence]. #[inline(always)] pub fn as_sequence_mut(&mut self) -> Option<&mut Vec> { if let Value::Sequence(s) = self { @@ -1154,17 +1310,20 @@ impl Value { } } + /// Yields `Ok` iff this is a [Value::Sequence]; else [Error::Expected]. #[inline(always)] pub fn to_sequence(&self) -> Result<&Vec, Error> { self.as_sequence() .ok_or_else(|| self.expected(ExpectedKind::Sequence)) } + /// True iff this is a [Value::Set]. #[inline(always)] pub fn is_set(&self) -> bool { self.as_set().is_some() } + /// Yields `Some` iff this is a [Value::Set]. #[inline(always)] pub fn as_set(&self) -> Option<&Set> { if let Value::Set(s) = self { @@ -1174,6 +1333,7 @@ impl Value { } } + /// Consumes `self`, yielding a [`Set`] iff `self` is a [Value::Set]. #[inline(always)] pub fn into_set(self) -> Option> { match self { @@ -1182,6 +1342,7 @@ impl Value { } } + /// Retrieve a mutable reference to the contained Set value iff this is a [Value::Set]. #[inline(always)] pub fn as_set_mut(&mut self) -> Option<&mut Set> { if let Value::Set(s) = self { @@ -1191,17 +1352,20 @@ impl Value { } } + /// Yields `Ok` iff this is a [Value::Set]; else [Error::Expected]. #[inline(always)] pub fn to_set(&self) -> Result<&Set, Error> { self.as_set() .ok_or_else(|| self.expected(ExpectedKind::Set)) } + /// True iff this is a [Value::Dictionary]. #[inline(always)] pub fn is_dictionary(&self) -> bool { self.as_dictionary().is_some() } + /// Yields `Some` iff this is a [Value::Dictionary]. #[inline(always)] pub fn as_dictionary(&self) -> Option<&Map> { if let Value::Dictionary(s) = self { @@ -1211,6 +1375,7 @@ impl Value { } } + /// Consumes `self`, yielding a [`Map`] iff `self` is a [Value::Dictionary]. #[inline(always)] pub fn into_dictionary(self) -> Option> { match self { @@ -1219,6 +1384,7 @@ impl Value { } } + /// Retrieve a mutable reference to the contained Map value iff this is a [Value::Dictionary]. #[inline(always)] pub fn as_dictionary_mut(&mut self) -> Option<&mut Map> { if let Value::Dictionary(s) = self { @@ -1228,17 +1394,20 @@ impl Value { } } + /// Yields `Ok` iff this is a [Value::Dictionary]; else [Error::Expected]. #[inline(always)] pub fn to_dictionary(&self) -> Result<&Map, Error> { self.as_dictionary() .ok_or_else(|| self.expected(ExpectedKind::Dictionary)) } + /// True iff this is a [Value::Embedded]. #[inline(always)] pub fn is_embedded(&self) -> bool { self.as_embedded().is_some() } + /// Yields `Some` iff this is a [Value::Embedded]. #[inline(always)] pub fn as_embedded(&self) -> Option<&N::Embedded> { if let Value::Embedded(d) = self { @@ -1248,12 +1417,14 @@ impl Value { } } + /// Yields `Ok` iff this is a [Value::Embedded]; else [Error::Expected]. #[inline(always)] pub fn to_embedded(&self) -> Result<&N::Embedded, Error> { self.as_embedded() .ok_or_else(|| self.expected(ExpectedKind::Embedded)) } + /// Yields a deep copy of `self` with all annotations (recursively) removed. pub fn strip_annotations>(&self) -> Value { match self { Value::Boolean(b) => Value::Boolean(*b), @@ -1282,6 +1453,7 @@ impl Value { } } + /// Yields a deep copy of `self`, mapping embedded values to a new type via `f`. pub fn copy_via(&self, f: &mut F) -> Result, Err> where F: FnMut(&N::Embedded) -> Result, Err>, @@ -1319,6 +1491,7 @@ impl Value { }) } + /// Calls `f` once for each (recursively) embedded value in `self`. pub fn foreach_embedded(&self, f: &mut F) -> Result<(), Err> where F: FnMut(&N::Embedded) -> Result<(), Err>, @@ -1377,6 +1550,7 @@ impl Index<&N> for Value { //--------------------------------------------------------------------------- // This part is a terrible hack +#[doc(hidden)] impl serde::Serialize for UnwrappedIOValue { fn serialize(&self, serializer: S) -> Result where @@ -1386,6 +1560,7 @@ impl serde::Serialize for UnwrappedIOValue { } } +#[doc(hidden)] impl<'de> serde::Deserialize<'de> for UnwrappedIOValue { fn deserialize(deserializer: D) -> Result where @@ -1397,20 +1572,30 @@ impl<'de> serde::Deserialize<'de> for UnwrappedIOValue { //--------------------------------------------------------------------------- +/// Representation of a collection of annotations to be attached to a [Value] by way of an +/// implementation of trait [NestedValue]. + #[derive(Clone)] -pub struct Annotations(Option>>); +pub struct Annotations( + /// The complex-seeming `Option>>` is used to save memory, since a `Box` is + /// smaller than a `Vec`. + Option>>, +); impl Annotations { + /// Yield the empty [Annotations] sequence. #[inline(always)] pub fn empty() -> Self { Annotations(None) } + /// Yield [Annotations] from a vector of values. #[inline(always)] pub fn new(anns: Option>) -> Self { Annotations(anns.map(Box::new)) } + /// Extract carried annotations, if there are any. #[inline(always)] pub fn maybe_slice(&self) -> Option<&[N]> { match &self.0 { @@ -1419,11 +1604,13 @@ impl Annotations { } } + /// Extract carried annotations, supplying an empty slice if there are none. #[inline(always)] pub fn slice(&self) -> &[N] { self.maybe_slice().unwrap_or(&[]) } + /// Produce a fresh [Vec] of the carried annotations. #[inline(always)] pub fn to_vec(self) -> Vec { use std::ops::DerefMut; @@ -1432,6 +1619,7 @@ impl Annotations { .unwrap_or_default() } + /// Allows in-place updating of the collection of carried annotations. pub fn modify(&mut self, f: F) -> &mut Self where F: FnOnce(&mut Vec), @@ -1455,6 +1643,7 @@ impl Annotations { self } + /// Yields a deep copy of `self`, mapping embedded values to a new type via `f`. pub fn copy_via(&self, f: &mut F) -> Result, Err> where F: FnMut(&N::Embedded) -> Result, Err>, @@ -1514,6 +1703,7 @@ impl Ord for AnnotatedValue { //--------------------------------------------------------------------------- +/// A simple tree representation without any reference counting. #[derive(Clone, PartialEq, Eq, Hash, PartialOrd, Ord)] pub struct PlainValue(AnnotatedValue>); @@ -1569,6 +1759,7 @@ impl Debug for PlainValue { use std::rc::Rc; +/// A representation of a Preserves Value using [Rc] for reference-counting of subvalues. #[derive(Clone, PartialEq, Eq, Hash, PartialOrd, Ord)] pub struct RcValue(Rc>>); @@ -1614,6 +1805,7 @@ impl Debug for RcValue { //--------------------------------------------------------------------------- +/// A representation of a Preserves Value using [Arc] for reference-counting of subvalues. #[derive(Clone, PartialEq, Eq, Hash, PartialOrd, Ord)] pub struct ArcValue(Arc>>); @@ -1660,6 +1852,8 @@ impl Debug for ArcValue { //--------------------------------------------------------------------------- +/// A representation of a Preserves Value using [Arc] for reference-counting of subvalues and +/// having [IOValue] as [NestedValue::Embedded]. #[derive(Clone, PartialEq, Eq, Hash, PartialOrd, Ord)] pub struct IOValue(Arc>); pub type UnwrappedIOValue = Value; @@ -1738,6 +1932,7 @@ impl<'de> serde::Deserialize<'de> for IOValue { //--------------------------------------------------------------------------- +/// A "dummy" value that has no structure at all. #[derive(Clone, PartialEq, Eq, Hash, PartialOrd, Ord)] pub struct DummyValue(AnnotatedValue>); @@ -1788,6 +1983,7 @@ impl NestedValue for DummyValue { //--------------------------------------------------------------------------- +#[doc(hidden)] // https://stackoverflow.com/questions/34304593/counting-length-of-repetition-in-macro/34324856 #[macro_export] //#[allow(unused_macros)] @@ -1796,6 +1992,8 @@ macro_rules! count__ { ( $x:tt $($xs:tt)* ) => (1usize + $crate::count__!($($xs)*)); } +/// Convenience syntax for efficiently constructing Preserves +/// [record][crate::value::Value::record] values. #[macro_export] macro_rules! rec { ( $label:expr $(, $item:expr)* ) => { diff --git a/implementations/rust/preserves/src/value/ser.rs b/implementations/rust/preserves/src/value/ser.rs index 117edce..749cb47 100644 --- a/implementations/rust/preserves/src/value/ser.rs +++ b/implementations/rust/preserves/src/value/ser.rs @@ -1,6 +1,10 @@ +//! Support for Serde serialization of Rust data types into Preserves *values* (not syntax). + use crate::value::{repr::Record, IOValue, Map, Value}; use serde::Serialize; +/// Empty/placeholder type for representing serialization errors: serialization to values +/// cannot fail. #[derive(Debug)] pub enum Error {} impl serde::ser::Error for Error { @@ -20,17 +24,22 @@ impl std::fmt::Display for Error { type Result = std::result::Result; +/// Serde serializer for converting Rust data to in-memory Preserves values, which can then be +/// serialized using text or binary syntax, analyzed further, etc. pub struct Serializer; +#[doc(hidden)] pub struct SerializeDictionary { next_key: Option, items: Map, } +#[doc(hidden)] pub struct SerializeRecord { r: Record, } +#[doc(hidden)] pub struct SerializeSequence { vec: Vec, } @@ -359,6 +368,7 @@ impl serde::ser::SerializeSeq for SerializeSequence { } } +/// Convenience function for directly converting a Serde-serializable `T` to an [IOValue]. pub fn to_value(value: T) -> IOValue where T: Serialize, diff --git a/implementations/rust/preserves/src/value/signed_integer.rs b/implementations/rust/preserves/src/value/signed_integer.rs index a4b39de..a8142cd 100644 --- a/implementations/rust/preserves/src/value/signed_integer.rs +++ b/implementations/rust/preserves/src/value/signed_integer.rs @@ -1,3 +1,6 @@ +//! Representation of Preserves `SignedInteger`s as [i128]/[u128] (if they fit) or [BigInt] (if +//! they don't). + use num::bigint::BigInt; use num::traits::cast::ToPrimitive; use num::traits::sign::Signed; @@ -7,8 +10,10 @@ use std::convert::TryFrom; use std::convert::TryInto; use std::fmt; -// Invariant: if I128 can be used, it will be; otherwise, if U128 can -// be used, it will be; otherwise, Big will be used. +/// Internal representation of Preserves `SignedInteger`s. +/// +/// Invariant: if I128 can be used, it will be; otherwise, if U128 can be used, it will be; +/// otherwise, Big will be used. #[derive(Clone, Debug, PartialEq, Eq, Hash)] pub enum SignedIntegerRepr { I128(i128), @@ -16,6 +21,7 @@ pub enum SignedIntegerRepr { Big(Box), } +/// Main representation of Preserves `SignedInteger`s. #[derive(Clone, PartialEq, Eq, Hash)] pub struct SignedInteger(SignedIntegerRepr); @@ -87,18 +93,25 @@ impl PartialOrd for SignedInteger { } impl SignedInteger { + /// Extract the internal representation. pub fn repr(&self) -> &SignedIntegerRepr { &self.0 } + /// Does this `SignedInteger` fit in an [i128]? (See also [the TryFrom instance for + /// i128](#impl-TryFrom<%26SignedInteger>-for-i128).) pub fn is_i(&self) -> bool { matches!(self.0, SignedIntegerRepr::I128(_)) } + /// Does this `SignedInteger` fit in a [u128], but not an [i128]? (See also [the TryFrom + /// instance for u128](#impl-TryFrom<%26SignedInteger>-for-u128).) pub fn is_u(&self) -> bool { matches!(self.0, SignedIntegerRepr::U128(_)) } + /// Does this `SignedInteger` fit neither in a [u128] nor an [i128]? (See also [the TryFrom + /// instance for BigInt](#impl-From<%26'a+SignedInteger>-for-BigInt).) pub fn is_big(&self) -> bool { matches!(self.0, SignedIntegerRepr::Big(_)) } diff --git a/implementations/rust/preserves/src/value/suspendable.rs b/implementations/rust/preserves/src/value/suspendable.rs index 7940492..4e0d789 100644 --- a/implementations/rust/preserves/src/value/suspendable.rs +++ b/implementations/rust/preserves/src/value/suspendable.rs @@ -1,3 +1,5 @@ +#![doc(hidden)] + use std::ops::{Deref, DerefMut}; pub enum Suspendable { diff --git a/implementations/rust/preserves/src/value/text/mod.rs b/implementations/rust/preserves/src/value/text/mod.rs index 94921ee..7fe346b 100644 --- a/implementations/rust/preserves/src/value/text/mod.rs +++ b/implementations/rust/preserves/src/value/text/mod.rs @@ -1,3 +1,15 @@ +//! Implements the Preserves [human-oriented text +//! syntax](https://preserves.dev/preserves-text.html). +//! +//! The main entry points for reading are functions [iovalue_from_str], +//! [annotated_iovalue_from_str], [from_str], and [annotated_from_str]. +//! +//! The main entry points for writing are [TextWriter::encode_iovalue] and +//! [TextWriter::encode]. +//! +//! # Summary of Text Syntax +#![doc = include_str!("../../../doc/cheatsheet-text-plaintext.md")] + pub mod reader; pub mod writer; @@ -10,6 +22,7 @@ use std::io; use super::{DomainParse, IOValue, IOValueDomainCodec, NestedValue, Reader, ViaCodec}; +/// Reads a value from the given string using the text syntax, discarding annotations. pub fn from_str>( s: &str, decode_embedded: Dec, @@ -17,10 +30,12 @@ pub fn from_str>( TextReader::new(&mut BytesBinarySource::new(s.as_bytes()), decode_embedded).demand_next(false) } +/// Reads an [IOValue] from the given string using the text syntax, discarding annotations. pub fn iovalue_from_str(s: &str) -> io::Result { from_str(s, ViaCodec::new(IOValueDomainCodec)) } +/// As [from_str], but includes annotations. pub fn annotated_from_str>( s: &str, decode_embedded: Dec, @@ -28,6 +43,7 @@ pub fn annotated_from_str>( TextReader::new(&mut BytesBinarySource::new(s.as_bytes()), decode_embedded).demand_next(true) } +/// As [iovalue_from_str], but includes annotations. pub fn annotated_iovalue_from_str(s: &str) -> io::Result { annotated_from_str(s, ViaCodec::new(IOValueDomainCodec)) } diff --git a/implementations/rust/preserves/src/value/text/reader.rs b/implementations/rust/preserves/src/value/text/reader.rs index 2b0f709..dbed1ce 100644 --- a/implementations/rust/preserves/src/value/text/reader.rs +++ b/implementations/rust/preserves/src/value/text/reader.rs @@ -1,3 +1,5 @@ +//! Implementation of [Reader] for the text syntax. + use crate::error::io_syntax_error; use crate::error::is_eof_io_error; use crate::error::syntax_error; @@ -35,8 +37,11 @@ use std::io; use std::iter::FromIterator; use std::marker::PhantomData; +/// The text syntax Preserves reader. pub struct TextReader<'de, 'src, D: Embeddable, Dec: DomainParse, S: BinarySource<'de>> { + /// Underlying source of (utf8) bytes. pub source: &'src mut S, + /// Decoder for producing Rust values embedded in the text. pub dec: Dec, phantom: PhantomData<&'de D>, } @@ -56,6 +61,7 @@ fn append_codepoint(bs: &mut Vec, n: u32) -> io::Result<()> { impl<'de, 'src, D: Embeddable, Dec: DomainParse, S: BinarySource<'de>> TextReader<'de, 'src, D, Dec, S> { + /// Construct a new reader from a byte (utf8) source and embedded-value decoder. pub fn new(source: &'src mut S, dec: Dec) -> Self { TextReader { source, @@ -155,6 +161,7 @@ impl<'de, 'src, D: Embeddable, Dec: DomainParse, S: BinarySource<'de>> } } + /// Retrieve the next [IOValue] in the input stream. pub fn next_iovalue(&mut self, read_annotations: bool) -> io::Result { let mut r = TextReader::new(self.source, ViaCodec::new(IOValueDomainCodec)); let v = r.demand_next(read_annotations)?; diff --git a/implementations/rust/preserves/src/value/text/writer.rs b/implementations/rust/preserves/src/value/text/writer.rs index c2fa886..5d37755 100644 --- a/implementations/rust/preserves/src/value/text/writer.rs +++ b/implementations/rust/preserves/src/value/text/writer.rs @@ -1,3 +1,5 @@ +//! Implementation of [Writer] for the text syntax. + use crate::hex::HexFormatter; use crate::value::suspendable::Suspendable; use crate::value::writer::CompoundWriter; @@ -15,17 +17,26 @@ use std::io; use super::super::boundary as B; +/// Specifies a comma style for printing using [TextWriter]. #[derive(Clone, Copy, Debug)] pub enum CommaStyle { + /// No commas will be printed. (Preserves text syntax treats commas as whitespace (!).) None, + /// Commas will be used to separate subterms. Separating, + /// Commas will be used to terminate subterms. Terminating, } +/// The (optionally pretty-printing) text syntax Preserves writer. pub struct TextWriter { w: Suspendable, + /// Selects a comma style to use when printing. pub comma_style: CommaStyle, + /// Specifies indentation to use when pretty-printing; 0 disables pretty-printing. pub indentation: usize, + /// An aid to use of printed terms in shell scripts: set `true` to escape spaces embedded + /// in strings and symbols. pub escape_spaces: bool, indent: String, } @@ -37,6 +48,8 @@ impl std::default::Default for CommaStyle { } impl TextWriter<&mut Vec> { + /// Writes `v` to `f` using text syntax. Selects indentation mode based on + /// [`f.alternate()`][std::fmt::Formatter::alternate]. pub fn fmt_value>( f: &mut std::fmt::Formatter<'_>, enc: &mut Enc, @@ -52,6 +65,7 @@ impl TextWriter<&mut Vec> { .map_err(|_| io::Error::new(io::ErrorKind::Other, "could not append to Formatter")) } + /// Encode `v` to a [String]. pub fn encode>( enc: &mut Enc, v: &N, @@ -61,12 +75,14 @@ impl TextWriter<&mut Vec> { Ok(String::from_utf8(buf).expect("valid UTF-8 from TextWriter")) } + /// Encode `v` to a [String]. pub fn encode_iovalue(v: &IOValue) -> io::Result { Self::encode(&mut IOValueDomainCodec, v) } } impl TextWriter { + /// Construct a writer from the given byte sink `w`. pub fn new(w: W) -> Self { TextWriter { w: Suspendable::new(w), @@ -77,16 +93,19 @@ impl TextWriter { } } + /// Update selected comma-printing style. pub fn set_comma_style(mut self, v: CommaStyle) -> Self { self.comma_style = v; self } + /// Update selected space-escaping style. pub fn set_escape_spaces(mut self, v: bool) -> Self { self.escape_spaces = v; self } + #[doc(hidden)] pub fn suspend(&mut self) -> Self { TextWriter { w: self.w.suspend(), @@ -95,10 +114,12 @@ impl TextWriter { } } + #[doc(hidden)] pub fn resume(&mut self, other: Self) { self.w.resume(other.w) } + #[doc(hidden)] pub fn write_stringlike_char_fallback(&mut self, c: char, f: F) -> io::Result<()> where F: FnOnce(&mut W, char) -> io::Result<()>, @@ -114,22 +135,26 @@ impl TextWriter { } } + #[doc(hidden)] pub fn write_stringlike_char(&mut self, c: char) -> io::Result<()> { self.write_stringlike_char_fallback(c, |w, c| write!(w, "{}", c)) } + #[doc(hidden)] pub fn add_indent(&mut self) { for _ in 0..self.indentation { self.indent.push(' ') } } + #[doc(hidden)] pub fn del_indent(&mut self) { if self.indentation > 0 { self.indent.truncate(self.indent.len() - self.indentation) } } + #[doc(hidden)] pub fn indent(&mut self) -> io::Result<()> { if self.indentation > 0 { write!(self.w, "{}", &self.indent) @@ -138,6 +163,7 @@ impl TextWriter { } } + #[doc(hidden)] pub fn indent_sp(&mut self) -> io::Result<()> { if self.indentation > 0 { write!(self.w, "{}", &self.indent) @@ -146,6 +172,7 @@ impl TextWriter { } } + /// Borrow the underlying byte sink. pub fn borrow_write(&mut self) -> &mut W { &mut self.w } diff --git a/implementations/rust/preserves/src/value/writer.rs b/implementations/rust/preserves/src/value/writer.rs index 090b6c7..1751437 100644 --- a/implementations/rust/preserves/src/value/writer.rs +++ b/implementations/rust/preserves/src/value/writer.rs @@ -1,3 +1,6 @@ +//! Generic [Writer] trait for unparsing Preserves [Value]s, implemented by code that provides +//! each specific transfer syntax. + use super::boundary as B; use super::repr::{Double, Float, NestedValue, Value}; use super::signed_integer::SignedIntegerRepr; @@ -5,61 +8,103 @@ use super::DomainEncode; use num::bigint::BigInt; use std::io; +#[doc(hidden)] +/// Utility trait for tracking unparser state during production of compound `Value`s. pub trait CompoundWriter: Writer { fn boundary(&mut self, b: &B::Type) -> io::Result<()>; } +/// Generic unparser for Preserves. pub trait Writer: Sized { + // Hiding these from the documentation for the moment because I don't want to have to + // document the whole Boundary thing. + #[doc(hidden)] type AnnWriter: CompoundWriter; + #[doc(hidden)] type RecWriter: CompoundWriter; + #[doc(hidden)] type SeqWriter: CompoundWriter; + #[doc(hidden)] type SetWriter: CompoundWriter; + #[doc(hidden)] type DictWriter: CompoundWriter; + #[doc(hidden)] type EmbeddedWriter: Writer; + #[doc(hidden)] fn start_annotations(&mut self) -> io::Result; + #[doc(hidden)] fn end_annotations(&mut self, ann: Self::AnnWriter) -> io::Result<()>; + #[doc(hidden)] fn write_bool(&mut self, v: bool) -> io::Result<()>; + #[doc(hidden)] fn write_f32(&mut self, v: f32) -> io::Result<()>; + #[doc(hidden)] fn write_f64(&mut self, v: f64) -> io::Result<()>; + #[doc(hidden)] fn write_i8(&mut self, v: i8) -> io::Result<()>; + #[doc(hidden)] fn write_u8(&mut self, v: u8) -> io::Result<()>; + #[doc(hidden)] fn write_i16(&mut self, v: i16) -> io::Result<()>; + #[doc(hidden)] fn write_u16(&mut self, v: u16) -> io::Result<()>; + #[doc(hidden)] fn write_i32(&mut self, v: i32) -> io::Result<()>; + #[doc(hidden)] fn write_u32(&mut self, v: u32) -> io::Result<()>; + #[doc(hidden)] fn write_i64(&mut self, v: i64) -> io::Result<()>; + #[doc(hidden)] fn write_u64(&mut self, v: u64) -> io::Result<()>; + #[doc(hidden)] fn write_i128(&mut self, v: i128) -> io::Result<()>; + #[doc(hidden)] fn write_u128(&mut self, v: u128) -> io::Result<()>; + #[doc(hidden)] fn write_int(&mut self, v: &BigInt) -> io::Result<()>; + #[doc(hidden)] fn write_string(&mut self, v: &str) -> io::Result<()>; + #[doc(hidden)] fn write_bytes(&mut self, v: &[u8]) -> io::Result<()>; + #[doc(hidden)] fn write_symbol(&mut self, v: &str) -> io::Result<()>; + #[doc(hidden)] fn start_record(&mut self, field_count: Option) -> io::Result; + #[doc(hidden)] fn end_record(&mut self, rec: Self::RecWriter) -> io::Result<()>; + #[doc(hidden)] fn start_sequence(&mut self, item_count: Option) -> io::Result; + #[doc(hidden)] fn end_sequence(&mut self, seq: Self::SeqWriter) -> io::Result<()>; + #[doc(hidden)] fn start_set(&mut self, item_count: Option) -> io::Result; + #[doc(hidden)] fn end_set(&mut self, set: Self::SetWriter) -> io::Result<()>; + #[doc(hidden)] fn start_dictionary(&mut self, entry_count: Option) -> io::Result; + #[doc(hidden)] fn end_dictionary(&mut self, dict: Self::DictWriter) -> io::Result<()>; + #[doc(hidden)] fn start_embedded(&mut self) -> io::Result; + #[doc(hidden)] fn end_embedded(&mut self, ptr: Self::EmbeddedWriter) -> io::Result<()>; + /// Flushes any buffered output. fn flush(&mut self) -> io::Result<()>; //--------------------------------------------------------------------------- + /// Writes [NestedValue] `v` to the output of this [Writer]. fn write>( &mut self, enc: &mut Enc, @@ -88,6 +133,7 @@ pub trait Writer: Sized { Ok(()) } + /// Writes [Value] `v` to the output of this [Writer]. fn write_value>( &mut self, enc: &mut Enc, @@ -167,6 +213,13 @@ pub trait Writer: Sized { } } +/// Writes a [varint](https://protobuf.dev/programming-guides/encoding/#varints) to `w`. +/// Returns the number of bytes written. +/// +/// ```text +/// varint(n) = [n] if n < 128 +/// [(n & 127) | 128] ++ varint(n >> 7) if n ≥ 128 +/// ``` pub fn varint(w: &mut W, mut v: u64) -> io::Result { let mut byte_count = 0; loop { diff --git a/preserves-schema.md b/preserves-schema.md index b1c5ebc..e4a20fb 100644 --- a/preserves-schema.md +++ b/preserves-schema.md @@ -4,7 +4,7 @@ title: "Preserves Schema" --- Tony Garnock-Jones -February 2023. Version 0.3.1. +October 2023. Version 0.3.3. [abnf]: https://tools.ietf.org/html/rfc7405 @@ -189,12 +189,14 @@ with algebraic data types would produce a labelled-sum-of-products type. ### Alternation definitions. - OrPattern = AltPattern "/" AltPattern *("/" AltPattern) + OrPattern = [orsep] AltPattern 1*(orsep AltPattern) [orsep] + orsep = 1*"/" -The right-hand-side of a definition may supply two or more -*alternatives*. When parsing, the alternatives are tried in order; the -result of the first successful alternative is the result of the entire -parse. +The right-hand-side of a definition may supply two or more *alternatives*. +Alternatives are separated by any number of slashes `/`, and leading or +trailing slashes are ignored. When parsing, the alternatives are tried in +order; the result of the first successful alternative is the result of the +entire parse. **Host-language types.** The type corresponding to an `OrPattern` is an algebraic sum type, a union type, a variant type, or a concrete subclass @@ -205,31 +207,39 @@ definition-unique *name*. The name is used to uniquely label the alternative's host-language representation (for example, a subclass, or a member of a tagged union type). -A variant name can either be given explicitly as `@name` (see discussion -of `NamedPattern` below) or inferred. It can only be inferred from the -label of a record pattern, from the name of a reference to another -definition, or from the text of a "sufficiently identifierlike" literal -pattern - one that matches a string, symbol, number or boolean: +A variant name can either be given explicitly as `@name` or +inferred.[^variant-names-unlike-binding-names] It can only be inferred +from the label of a record pattern, from the name of a reference to +another definition, or from the text of a "sufficiently identifierlike" +literal pattern - one that matches a string, symbol, number or boolean: - AltPattern = "@" id SimplePattern + AltPattern = "@" id Pattern / "<" id PatternSequence ">" / Ref / LiteralPattern -- with a side condition -A host language will likely use the same ordering of its types as -specified by the schema. It is therefore recommended to specify first -the alternative best suited as a default initialization value (if +[^variant-names-unlike-binding-names]: Note that explicitly-given + *variant* names are unlike *binding* names in that binding names give + rise to a field in the record type for a definition, while variant + names are used as labels for alternatives in a sum type for a + definition. + +A host language will likely use the same ordering of variants in a sum +type as specified by the schema. It is therefore recommended to specify +first the alternative best suited as a default initialization value (if there is any). ### Intersection definitions. - AndPattern = NamedPattern "&" NamedPattern *("&" NamedPattern) + AndPattern = [andsep] NamedPattern 1*(andsep NamedPattern) [andsep] + andsep = 1*"&" The right-hand-side of a definition may supply two or more patterns, the *intersection* of whose denotations is the denotation of the overall -definition. When parsing, every pattern is tried: if all succeed, the -resulting information is combined into a single type; otherwise, the -overall parse fails. +definition. The patterns are separated by any number of ampersands `&`, +and leading or trailing ampersands are ignored. When parsing, every +pattern is tried: if all succeed, the resulting information is combined +into a single type; otherwise, the overall parse fails. When serializing, the terms resulting from serializing at each pattern are *merged* together. diff --git a/preserves-text.md b/preserves-text.md index c70212d..d622307 100644 --- a/preserves-text.md +++ b/preserves-text.md @@ -23,14 +23,29 @@ ABNF allows easy definition of US-ASCII-based languages. However, Preserves is a Unicode-based language. Therefore, we reinterpret ABNF as a grammar for recognising sequences of Unicode scalar values. + **Encoding.** Textual syntax for a `Value` *SHOULD* be encoded using UTF-8 where possible. + **Whitespace.** Whitespace is defined as any number of spaces, tabs, carriage returns, line feeds, or commas. ws = *(%x20 / %x09 / CR / LF / ",") + +**Delimiters.** Some tokens (`Boolean`, `SymbolOrNumber`) *MUST* be +followed by a `delimiter` or by the end of the input.[^delimiters-lookahead] + + delimiter = ws + / "<" / ">" / "[" / "]" / "{" / "}" + / "#" / ":" / DQUOTE / "|" / "@" / ";" + +[^delimiters-lookahead]: The addition of this constraint means that + implementations must now use some kind of lookahead to make sure a + delimiter follows a `Boolean`; this should not be onerous, as + something similar is required to read `SymbolOrNumber`s correctly. + ## Grammar Standalone documents may have trailing whitespace. diff --git a/preserves.md b/preserves.md index 67ae0e2..6f963aa 100644 --- a/preserves.md +++ b/preserves.md @@ -109,7 +109,7 @@ label, then by field sequence. labels as specially-formatted lists. [^iri-labels]: It is occasionally (but seldom) necessary to - interpret such `Symbol` labels as UTF-8 encoded IRIs. Where a + interpret such `Symbol` labels as IRIs. Where a label can be read as a relative IRI, it is notionally interpreted with respect to the IRI `urn:uuid:6bf094a6-20f1-4887-ada7-46834a9b5b34`; where a label can diff --git a/questions.md b/questions.md index 87ef836..6f6c5f9 100644 --- a/questions.md +++ b/questions.md @@ -5,10 +5,17 @@ title: "Open questions" Q. Should "symbols" instead be URIs? Relative, usually; relative to what? Some domain-specific base URI? +> No. They may be interpreted as URIs, of course; see +> [here](preserves.html#fn:iri-labels). + Q. Literal small integers: are they pulling their weight? They're not absolutely necessary. A. No, they have been removed (as part of the changes at version 0.990). +> No. They were removed in the simplification of the syntax that was the +> outcome of [issue +> 41](https://gitlab.com/preserves/preserves/-/issues/41). + Q. Should we go for trying to make the data ordering line up with the encoding ordering? We'd have to only use streaming forms, and avoid the small integer encoding, and not store record arities, and sort @@ -38,3 +45,8 @@ require any whitespace at all between elements of a list, making it ambiguous: does `[123]` denote a single-element or a three-element list? Compare JSON where `[1,2,3]` is unambiguously different from `[123]`. + +> With the addition of the notion of +> [delimiters](preserves-text.html#delimiters) to the text syntax, we at +> least answer the question of how `[123]` parses: it must yield a +> single-element list.