Commit Graph

71 Commits

Author SHA1 Message Date
Tony Garnock-Jones bdb0cc1023 Repair severe error in turn rollback 2024-04-01 16:52:24 +02:00
Tony Garnock-Jones 9084c1781e Repair nested-panic situation 2024-03-29 10:23:21 +01:00
Tony Garnock-Jones 55456621d4 Handle refinement to gatekeeper protocol allowing JIT binding and/or direct rejection 2024-03-22 11:22:58 +01:00
Tony Garnock-Jones f4a4b4d595 Reuse a single Activation per actor: this merges RunningActor with Activation 2024-03-04 10:07:31 +01:00
Tony Garnock-Jones b7d4bd4b58 Avoid uselessly computing turn descriptions when there is no listener for them 2024-03-03 14:15:56 +01:00
Tony Garnock-Jones b4f355aa0d Oops, had ExitStatus without derive Debug 2024-02-24 21:58:56 +01:00
Tony Garnock-Jones 1ff222b291 Demote terminate-on-drop to a debug message rather than an error 2024-02-24 13:08:32 +01:00
Tony Garnock-Jones e501d0f76a Repair warnings 2024-02-24 13:06:22 +01:00
Tony Garnock-Jones 0f2d9239f9 Remove now-retired Float references 2024-02-03 15:24:28 +01:00
Tony Garnock-Jones 461ac034f8 Avoid double-execution within a round; see syndicate-lang/syndicate-js#3 2023-12-19 23:12:13 +13:00
Tony Garnock-Jones 090ac8780f Add "KeepAlive" for when a driver is still getting ready to expose an Entity but hasn't done so yet. 2023-11-12 10:14:54 +01:00
Tony Garnock-Jones 1f7930d31a ring.rs 2023-11-08 19:30:26 +01:00
Tony Garnock-Jones 764fb3b866 Remove (trivial) unnecessary clone 2023-11-07 00:40:43 +01:00
Tony Garnock-Jones 726265132f Small initial capacity 2023-11-07 00:11:59 +01:00
Tony Garnock-Jones f6b6dd25f1 Small performance win from avoiding use of HashMap in single-receiver case 2023-11-06 23:54:59 +01:00
Tony Garnock-Jones a74cd19526 Remove apparently-useless drop() call 2023-05-26 13:52:31 +02:00
Tony Garnock-Jones 833be7b293 Update attenuations 2023-02-06 14:48:18 +01:00
Tony Garnock-Jones 94040ae566 More ergonomic guard api 2023-01-30 17:29:25 +01:00
Tony Garnock-Jones dbbbc8c1c6 Breaking change: much improved error API 2023-01-30 14:25:58 +01:00
Tony Garnock-Jones f88592282d MAJOR REFACTORING OF CORE ASSERTION-TRACKING STRUCTURES. Little impact on API. Read on for details.
2022-02-01 15:22:30 Two problems.

 - If a stop action panics (in `_terminate_facet`), the Facet is dropped before its outbound
   handles are removed. With the code as it stands, this leaks assertions (!!).

 - The logic for removing an outbound handle seems to be running in the wrong facet context???
   (See `f.outbound_handles.remove(&handle)` in the cleanup actions
    - I think I need to remove the for_myself mechanism
    - and add some callbacks to run only on successful commit

2022-02-02 12:12:33 This is hard.

Here's the current implementation:

 - assert
    - inserts into outbound_handles of active facet
    - adds cleanup action describing how to do the retraction
    - enqueues the assert action, which
       - calls e.assert()

 - retract
    - looks up & removes the cleanup action, which
       - enqueues the retract action, which
          - removes from outbound_handles of the WRONG facet in the WRONG actor
          - calls e.retract()

 - _terminate_facet
    - uses outbound_handles to retract the facet's assertions
    - doesn't directly touch cleanup actions, relying on retract to do that
    - if one of a facet's stop actions panics, will drop the facet, leaking its assertions
    - actually, even if a stop action yields `Err`, it will drop the facet and leak assertions
    - yikes

 - facet drop
    - panics if outbound_handles is nonempty

 - actor cleanup
    - relies on facet tree to find assertions to retract

Revised plan:

 - ✓ revise Activation/PendingEvents structures
    - rename `cleanup_actions` to `outbound_assertions`
    - remove `for_myself` queues and `final_actions`
    - add `pre_commit_actions`, `rollback_actions` and `commit_actions`

 - ✓ assert
    - as before
    - but on rollback, removes from `outbound_handles` (if the facet still exists) and
      `outbound_assertions` (always)
    - marks the new assertion as "established" on commit

 - ✓ retract
    - lookup in `outbound_assertions` by handle, using presence as indication it hasn't been
      scheduled in this turn
    - on rollback, put it back in `outbound_assertions` ONLY IF IT IS MARKED ESTABLISHED -
      otherwise it is a retraction of an `assert` that has *also* been rolled back in this turn
    - on commit, remove it from `outbound_handles`
    - enqueue the retract action, which just calls e.retract()

 - ✓ _terminate_facet
    - revised quite a bit now we rely on `RunningActor::cleanup` to use `outbound_assertions`
      rather than the facet tree.
    - still drops Facets on panic, but this is now mostly harmless (reorders retractions a bit)
    - handles `Err` from a stop action more gracefully
    - slightly cleverer tracking of what needs doing based on a `TerminationDirection`
    - now ONLY applies to ORDERLY cleanup of the facet tree. Disorderly cleanup ignores the
      facet tree and just retracts the assertions willy-nilly.

 - ✓ facet drop
    - warn if outbound_handles is nonempty, but don't do anything about it

 - ✓ actor cleanup
    - doesn't use the facet tree at all.
    - cleanly shutting down is done elsewhere
    - uses the remaining entries in `outbound_assertions` (previously `cleanup_actions`) to
      deal with retractions for dropped facets as well as any other facets that haven't been
      cleanly shut down

 - ✓ activate
    - now has a panic_guard::PanicGuard RAII for conveying a crash to an actor in case the
      activation is happening from a linked task or another thread (this wasn't the case in the
      examples that provoked this work, though)
    - simplified
    - explicit commit/rollback decision

 - ✓ Actor::run
    - no longer uses the same path for crash-termination and success-termination
    - instead, for success-termination, takes a turn that calls Activation::stop_root
       - this cleans up the facet tree using _terminate_facet
       - when the turn ends, it notices that the root facet is gone and shuts down the actor
       - so in principle there will be nothing for actor cleanup to do

2022-02-04 13:52:34 This took days. :-(
2022-02-04 13:59:37 +01:00
Tony Garnock-Jones 9add501124 Remove the (no-op) rollback entirely 2022-02-02 12:21:43 +01:00
Tony Garnock-Jones 38a5279827 Include facet ID in panic message when nonempty outbound_handles at drop time 2022-02-02 12:10:33 +01:00
Tony Garnock-Jones 1244e416d0 clear/deliver -> rollback/commit, and don't commit on drop 2022-02-02 12:10:13 +01:00
Tony Garnock-Jones d7a847de37 Refactor with_facet 2022-02-02 11:52:13 +01:00
Tony Garnock-Jones cc11120f23 Avoid erasing information immediately prior to it being needed (!) (when we can) 2022-01-26 22:12:45 +01:00
Tony Garnock-Jones 9080dc6f1e Fill in the rest of the jolly owl 2022-01-20 10:12:04 +01:00
Tony Garnock-Jones 4dc613a091 Foundations for causal tracing 2022-01-19 14:40:50 +01:00
Tony Garnock-Jones 3d3c1ebf70 Better handling of activation after termination, which repairs a scary-looking-but-harmless panic in config_watcher's private thread 2022-01-16 00:02:33 +01:00
Tony Garnock-Jones 2d179d1e46 Avoid racy approaches to actor-termination.
They're still there: you can use turn.state.shutdown(), which enqueues
a message for eventual actor shutdown. But it's better to use
turn.stop_root(), which terminates the actor's root facet within the
current turn, ensuring that the actor's exit_status is definitely set
by the time the turn has committed.

This is necessary to avoid a racy panic in supervision: before this
change, an asynchronous SystemMessage::Release was sent when the last
facet of an actor was stopped. Depending on load (!), any retractions
resulting from the shutdown would be delivered before the Release
arrived at the stopping actor. The supervision logic expected
exit_status to be definitely set by the time release() fired, which
wasn't always true. Now that in-turn shutdown has been implemented,
this is a reliable invariant.

A knock-on change is the need to remove
enqueue_for_myself_at_commit(), replacing it with a use of
pending.for_myself.push(). The old enqueue_for_myself_at_commit
approach could lead to lost actions as follows:

    A: start linked task T, which spawns a new tokio coroutine
            T: activate some facet in A and terminate A's root facet
            T: at this point, A transitions to "not running"
    A: spawn B, enqueuing a call to B's boot()
    A: commit turn. Deliveries for others go out as usual,
       but those for A will be discarded since A is "not running".
       This means that the call to B's boot() goes missing.

Using pending.for_myself.push() instead assures that B's boot will
always run at the end of A's turn, without regard for whether A is in
some terminated state.

I think that this kind of race could have happened before, but
something about switching away from shutdown() seems to trigger it
somewhat reliably.
2022-01-10 12:52:29 +01:00
Tony Garnock-Jones c3a9525ef1 Track enough information to allow piecing-together of parent/child relationships among actors 2022-01-10 12:52:12 +01:00
Tony Garnock-Jones 58bde1e29d Add Activation::stop_root 2022-01-10 11:23:02 +01:00
Tony Garnock-Jones 82ccbdb282 Simplify and correct facet stop logic; always run stop actions in parent facet context 2022-01-08 15:27:44 +01:00
Tony Garnock-Jones 0d25d76bec Split out (internal) on_facet_stop from on_stop 2022-01-08 15:26:34 +01:00
Tony Garnock-Jones 19b04b82a2 Improve documentation regarding stop/exit actions 2022-01-08 15:25:41 +01:00
Tony Garnock-Jones be27348d29 Activation::facet_ids 2022-01-08 15:24:10 +01:00
Tony Garnock-Jones f956f3d994 Activation::every 2022-01-07 17:15:51 +01:00
Tony Garnock-Jones 34c336e457 More tracing 2021-12-01 11:06:39 +01:00
Tony Garnock-Jones 11363c5776 If an actor panics, make sure to clean up in drop if we can 2021-12-01 11:06:29 +01:00
Tony Garnock-Jones 4713005997 wait_for_all_actors_to_stop 2021-10-08 16:37:26 +02:00
Tony Garnock-Jones baf98d6c54 Better span naming and logging tweaks 2021-10-08 16:37:17 +02:00
Tony Garnock-Jones 0d7ac7441f stop() and stop_facet(facet_id) now return unit 2021-10-07 16:59:34 +02:00
Tony Garnock-Jones f640111f20 Huh, I seem to have left this unfinished 2021-10-06 22:02:27 +02:00
Tony Garnock-Jones 2a7606d626 Track actors globally (eventually for reflection/introspection) 2021-10-05 12:39:28 +02:00
Tony Garnock-Jones ed12c0883e Switch to parking_lot for another performance boost 2021-09-30 13:32:41 +02:00
Tony Garnock-Jones de795219af Fix up daemon retry logic. Also: named fields; better stop logic.
In particular:

1. The root facet is considered inert even if it has outbound
assertions. This is because the only outbound assertion it can have is
a half-link to a peer actor, which shouldn't prevent the actor from
terminating normally if the user-level "root" facet stops.

2. On stop_facet_and_continue, parent-facet continuations execute
inline rather than at commit time. This is so that a user-level "root"
facet can *replace* itself. Remains to be properly exercised/tested.
2021-09-28 17:10:36 +02:00
Tony Garnock-Jones 013e99af70 Greatly improve service lifecycle handling 2021-09-28 12:53:18 +02:00
Tony Garnock-Jones 5a8a508fdc More general on_stop; the old behaviour is now at on_stop_notify 2021-09-24 16:14:55 +02:00
Tony Garnock-Jones ffae9be241 No more distinction between internal/external protocol variants 2021-09-24 13:04:15 +02:00
Tony Garnock-Jones 531d66205b Intra-actor dataflow and fields; `enclose!` macro 2021-09-23 21:43:32 +02:00
Tony Garnock-Jones ccd54be3b2 Adapt to new Preserves major version; stub daemon basis 2021-09-19 16:53:37 +02:00