Commit Graph

460 Commits

Author SHA1 Message Date
Tony Garnock-Jones 98c76df2f7 Repair accidentally-committed reference to local path (!) 2022-02-04 14:15:28 +01:00
Tony Garnock-Jones 0a0d977a48 Bump deps 2022-02-04 14:13:08 +01:00
Tony Garnock-Jones 8a0675d8ee (cargo-release) version 0.22.0 2022-02-04 14:02:10 +01:00
Tony Garnock-Jones af2578f887 (cargo-release) version 0.17.0 2022-02-04 14:02:10 +01:00
Tony Garnock-Jones 84ebf530d3 (cargo-release) version 0.22.0 2022-02-04 14:02:10 +01:00
Tony Garnock-Jones f88592282d MAJOR REFACTORING OF CORE ASSERTION-TRACKING STRUCTURES. Little impact on API. Read on for details.
2022-02-01 15:22:30 Two problems.

 - If a stop action panics (in `_terminate_facet`), the Facet is dropped before its outbound
   handles are removed. With the code as it stands, this leaks assertions (!!).

 - The logic for removing an outbound handle seems to be running in the wrong facet context???
   (See `f.outbound_handles.remove(&handle)` in the cleanup actions
    - I think I need to remove the for_myself mechanism
    - and add some callbacks to run only on successful commit

2022-02-02 12:12:33 This is hard.

Here's the current implementation:

 - assert
    - inserts into outbound_handles of active facet
    - adds cleanup action describing how to do the retraction
    - enqueues the assert action, which
       - calls e.assert()

 - retract
    - looks up & removes the cleanup action, which
       - enqueues the retract action, which
          - removes from outbound_handles of the WRONG facet in the WRONG actor
          - calls e.retract()

 - _terminate_facet
    - uses outbound_handles to retract the facet's assertions
    - doesn't directly touch cleanup actions, relying on retract to do that
    - if one of a facet's stop actions panics, will drop the facet, leaking its assertions
    - actually, even if a stop action yields `Err`, it will drop the facet and leak assertions
    - yikes

 - facet drop
    - panics if outbound_handles is nonempty

 - actor cleanup
    - relies on facet tree to find assertions to retract

Revised plan:

 - ✓ revise Activation/PendingEvents structures
    - rename `cleanup_actions` to `outbound_assertions`
    - remove `for_myself` queues and `final_actions`
    - add `pre_commit_actions`, `rollback_actions` and `commit_actions`

 - ✓ assert
    - as before
    - but on rollback, removes from `outbound_handles` (if the facet still exists) and
      `outbound_assertions` (always)
    - marks the new assertion as "established" on commit

 - ✓ retract
    - lookup in `outbound_assertions` by handle, using presence as indication it hasn't been
      scheduled in this turn
    - on rollback, put it back in `outbound_assertions` ONLY IF IT IS MARKED ESTABLISHED -
      otherwise it is a retraction of an `assert` that has *also* been rolled back in this turn
    - on commit, remove it from `outbound_handles`
    - enqueue the retract action, which just calls e.retract()

 - ✓ _terminate_facet
    - revised quite a bit now we rely on `RunningActor::cleanup` to use `outbound_assertions`
      rather than the facet tree.
    - still drops Facets on panic, but this is now mostly harmless (reorders retractions a bit)
    - handles `Err` from a stop action more gracefully
    - slightly cleverer tracking of what needs doing based on a `TerminationDirection`
    - now ONLY applies to ORDERLY cleanup of the facet tree. Disorderly cleanup ignores the
      facet tree and just retracts the assertions willy-nilly.

 - ✓ facet drop
    - warn if outbound_handles is nonempty, but don't do anything about it

 - ✓ actor cleanup
    - doesn't use the facet tree at all.
    - cleanly shutting down is done elsewhere
    - uses the remaining entries in `outbound_assertions` (previously `cleanup_actions`) to
      deal with retractions for dropped facets as well as any other facets that haven't been
      cleanly shut down

 - ✓ activate
    - now has a panic_guard::PanicGuard RAII for conveying a crash to an actor in case the
      activation is happening from a linked task or another thread (this wasn't the case in the
      examples that provoked this work, though)
    - simplified
    - explicit commit/rollback decision

 - ✓ Actor::run
    - no longer uses the same path for crash-termination and success-termination
    - instead, for success-termination, takes a turn that calls Activation::stop_root
       - this cleans up the facet tree using _terminate_facet
       - when the turn ends, it notices that the root facet is gone and shuts down the actor
       - so in principle there will be nothing for actor cleanup to do

2022-02-04 13:52:34 This took days. :-(
2022-02-04 13:59:37 +01:00
Tony Garnock-Jones 98731ba968 Merge latest changes from the syndicate-protocols repository 2022-02-03 22:57:58 +01:00
Tony Garnock-Jones d820601eea Better trace messages from dependency tracking 2022-02-03 22:57:21 +01:00
Tony Garnock-Jones 28b0c5b4d5 One-shot daemons shouldn't be considered ready at all, just complete 2022-02-03 22:56:20 +01:00
Tony Garnock-Jones 19c96bdef2 Allow userDefined states 2022-02-03 22:55:06 +01:00
Tony Garnock-Jones 99a027dc26 Remove unwanted commented-out code 2022-02-03 15:59:19 +01:00
Tony Garnock-Jones 9add501124 Remove the (no-op) rollback entirely 2022-02-02 12:21:43 +01:00
Tony Garnock-Jones 38a5279827 Include facet ID in panic message when nonempty outbound_handles at drop time 2022-02-02 12:10:33 +01:00
Tony Garnock-Jones 1244e416d0 clear/deliver -> rollback/commit, and don't commit on drop 2022-02-02 12:10:13 +01:00
Tony Garnock-Jones d7a847de37 Refactor with_facet 2022-02-02 11:52:13 +01:00
Tony Garnock-Jones 4ea07cdd6b Further simplify supervision protocols 2022-01-26 23:37:43 +01:00
Tony Garnock-Jones 70c442ad47 Use a named unit struct instead of () 2022-01-26 23:37:21 +01:00
Tony Garnock-Jones 7e4654c8f7 Simplify and repair stdout/stderr logging in daemons 2022-01-26 23:37:04 +01:00
Tony Garnock-Jones 1111776754 Eliminate need for awkward boot_fn transmission subprotocol 2022-01-26 22:30:47 +01:00
Tony Garnock-Jones cc11120f23 Avoid erasing information immediately prior to it being needed (!) (when we can) 2022-01-26 22:12:45 +01:00
Tony Garnock-Jones e600d59f6e Conditional match expressions. I can't help but feel I'm committing some kind of crime against programming language design here. 2022-01-20 10:17:15 +01:00
Tony Garnock-Jones 9080dc6f1e Fill in the rest of the jolly owl 2022-01-20 10:12:04 +01:00
Tony Garnock-Jones a9f83e0a9d Merge latest changes from the syndicate-protocols repository 2022-01-20 10:12:00 +01:00
Tony Garnock-Jones ab34b62cf1 Refine the trace protocol a bit 2022-01-20 09:40:53 +01:00
Tony Garnock-Jones 4dc613a091 Foundations for causal tracing 2022-01-19 14:40:50 +01:00
Tony Garnock-Jones f7a5edff39 Merge latest changes from the syndicate-protocols repository 2022-01-19 14:36:09 +01:00
Tony Garnock-Jones 5a65256cf3 Syndicate traces 2022-01-19 14:24:21 +01:00
Tony Garnock-Jones 650463ff20 Accommodate extension point 2022-01-17 00:32:16 +01:00
Tony Garnock-Jones c951cea508 Merge latest changes from the syndicate-protocols repository 2022-01-17 00:26:10 +01:00
Tony Garnock-Jones 257c604e2b Repair bad record pattern 2022-01-17 00:22:10 +01:00
Tony Garnock-Jones a06d532006 Extension point. Closes #2 2022-01-16 21:17:36 +01:00
Tony Garnock-Jones 45f9abfd97 (cargo-release) version 0.21.0 2022-01-16 15:15:51 +01:00
Tony Garnock-Jones 894f0a648a (cargo-release) version 0.16.0 2022-01-16 15:15:51 +01:00
Tony Garnock-Jones e6a2a25f62 (cargo-release) version 0.21.0 2022-01-16 15:15:51 +01:00
Tony Garnock-Jones 3d3c1ebf70 Better handling of activation after termination, which repairs a scary-looking-but-harmless panic in config_watcher's private thread 2022-01-16 00:02:33 +01:00
Tony Garnock-Jones a37a2739a0 Log compiled instructions in config_watcher 2022-01-15 23:23:48 +01:00
Tony Garnock-Jones 11894ecb70 Better tracing of supervisor activity 2022-01-15 23:23:18 +01:00
Tony Garnock-Jones b810784750 Script `+=` operator; sketch of `=~` operator 2022-01-15 23:22:51 +01:00
Tony Garnock-Jones 9453408e42 Propagate script compilation errors properly. 2022-01-15 23:22:13 +01:00
Tony Garnock-Jones 2b296d79c7 Repair error in dataspace assertion idempotency.
If a facet, during X, asserts X, for all X, then X includes all
`Observe` assertions. Assertion of X should be a no-op (though
subsequent retractions of X will have no effect!) since duplicates are
ignored. However, the implementation had been ignoring whether it had
seen `Observe` assertions before, and was *always* (re)placing them
into the index, leading to runaway growth.

The repair is to only process `Observe` records on first assertion and
last retraction.

As part of this change, Dataspaces have been given names, and some
cruft from the previous implementation has been removed.
2022-01-15 23:18:29 +01:00
Tony Garnock-Jones af4af8b048 Bump deps 2022-01-14 15:55:30 +01:00
Tony Garnock-Jones 78ef7c07db documentation.prs 2022-01-14 15:36:41 +01:00
Tony Garnock-Jones 6325538ea6 (cargo-release) version 0.20.1 2022-01-12 12:28:38 +01:00
Tony Garnock-Jones 7fbe6360e7 Support patterns like <?r <Something _ _ _>> 2022-01-12 12:28:03 +01:00
Tony Garnock-Jones d007da2e94 (cargo-release) version 0.20.0 2022-01-10 13:39:48 +01:00
Tony Garnock-Jones 08c7bd3808 (cargo-release) version 0.15.0 2022-01-10 13:39:48 +01:00
Tony Garnock-Jones 96cfb1d4e7 (cargo-release) version 0.20.0 2022-01-10 13:39:48 +01:00
Tony Garnock-Jones 2d179d1e46 Avoid racy approaches to actor-termination.
They're still there: you can use turn.state.shutdown(), which enqueues
a message for eventual actor shutdown. But it's better to use
turn.stop_root(), which terminates the actor's root facet within the
current turn, ensuring that the actor's exit_status is definitely set
by the time the turn has committed.

This is necessary to avoid a racy panic in supervision: before this
change, an asynchronous SystemMessage::Release was sent when the last
facet of an actor was stopped. Depending on load (!), any retractions
resulting from the shutdown would be delivered before the Release
arrived at the stopping actor. The supervision logic expected
exit_status to be definitely set by the time release() fired, which
wasn't always true. Now that in-turn shutdown has been implemented,
this is a reliable invariant.

A knock-on change is the need to remove
enqueue_for_myself_at_commit(), replacing it with a use of
pending.for_myself.push(). The old enqueue_for_myself_at_commit
approach could lead to lost actions as follows:

    A: start linked task T, which spawns a new tokio coroutine
            T: activate some facet in A and terminate A's root facet
            T: at this point, A transitions to "not running"
    A: spawn B, enqueuing a call to B's boot()
    A: commit turn. Deliveries for others go out as usual,
       but those for A will be discarded since A is "not running".
       This means that the call to B's boot() goes missing.

Using pending.for_myself.push() instead assures that B's boot will
always run at the end of A's turn, without regard for whether A is in
some terminated state.

I think that this kind of race could have happened before, but
something about switching away from shutdown() seems to trigger it
somewhat reliably.
2022-01-10 12:52:29 +01:00
Tony Garnock-Jones e06e5fef10 Put thread IDs in logging output 2022-01-10 12:52:12 +01:00
Tony Garnock-Jones c3a9525ef1 Track enough information to allow piecing-together of parent/child relationships among actors 2022-01-10 12:52:12 +01:00