syndicate-rs

Commit Graph

Author	SHA1	Message	Date
Tony Garnock-Jones	f88592282d	MAJOR REFACTORING OF CORE ASSERTION-TRACKING STRUCTURES. Little impact on API. Read on for details. 2022-02-01 15:22:30 Two problems. - If a stop action panics (in `_terminate_facet`), the Facet is dropped before its outbound handles are removed. With the code as it stands, this leaks assertions (!!). - The logic for removing an outbound handle seems to be running in the wrong facet context??? (See `f.outbound_handles.remove(&handle)` in the cleanup actions - I think I need to remove the for_myself mechanism - and add some callbacks to run only on successful commit 2022-02-02 12:12:33 This is hard. Here's the current implementation: - assert - inserts into outbound_handles of active facet - adds cleanup action describing how to do the retraction - enqueues the assert action, which - calls e.assert() - retract - looks up & removes the cleanup action, which - enqueues the retract action, which - removes from outbound_handles of the WRONG facet in the WRONG actor - calls e.retract() - _terminate_facet - uses outbound_handles to retract the facet's assertions - doesn't directly touch cleanup actions, relying on retract to do that - if one of a facet's stop actions panics, will drop the facet, leaking its assertions - actually, even if a stop action yields `Err`, it will drop the facet and leak assertions - yikes - facet drop - panics if outbound_handles is nonempty - actor cleanup - relies on facet tree to find assertions to retract Revised plan: - ✓ revise Activation/PendingEvents structures - rename `cleanup_actions` to `outbound_assertions` - remove `for_myself` queues and `final_actions` - add `pre_commit_actions`, `rollback_actions` and `commit_actions` - ✓ assert - as before - but on rollback, removes from `outbound_handles` (if the facet still exists) and `outbound_assertions` (always) - marks the new assertion as "established" on commit - ✓ retract - lookup in `outbound_assertions` by handle, using presence as indication it hasn't been scheduled in this turn - on rollback, put it back in `outbound_assertions` ONLY IF IT IS MARKED ESTABLISHED - otherwise it is a retraction of an `assert` that has also been rolled back in this turn - on commit, remove it from `outbound_handles` - enqueue the retract action, which just calls e.retract() - ✓ _terminate_facet - revised quite a bit now we rely on `RunningActor::cleanup` to use `outbound_assertions` rather than the facet tree. - still drops Facets on panic, but this is now mostly harmless (reorders retractions a bit) - handles `Err` from a stop action more gracefully - slightly cleverer tracking of what needs doing based on a `TerminationDirection` - now ONLY applies to ORDERLY cleanup of the facet tree. Disorderly cleanup ignores the facet tree and just retracts the assertions willy-nilly. - ✓ facet drop - warn if outbound_handles is nonempty, but don't do anything about it - ✓ actor cleanup - doesn't use the facet tree at all. - cleanly shutting down is done elsewhere - uses the remaining entries in `outbound_assertions` (previously `cleanup_actions`) to deal with retractions for dropped facets as well as any other facets that haven't been cleanly shut down - ✓ activate - now has a panic_guard::PanicGuard RAII for conveying a crash to an actor in case the activation is happening from a linked task or another thread (this wasn't the case in the examples that provoked this work, though) - simplified - explicit commit/rollback decision - ✓ Actor::run - no longer uses the same path for crash-termination and success-termination - instead, for success-termination, takes a turn that calls Activation::stop_root - this cleans up the facet tree using _terminate_facet - when the turn ends, it notices that the root facet is gone and shuts down the actor - so in principle there will be nothing for actor cleanup to do 2022-02-04 13:52:34 This took days. :-(	2022-02-04 13:59:37 +01:00
Tony Garnock-Jones	99a027dc26	Remove unwanted commented-out code	2022-02-03 15:59:19 +01:00
Tony Garnock-Jones	9add501124	Remove the (no-op) rollback entirely	2022-02-02 12:21:43 +01:00
Tony Garnock-Jones	38a5279827	Include facet ID in panic message when nonempty outbound_handles at drop time	2022-02-02 12:10:33 +01:00
Tony Garnock-Jones	1244e416d0	clear/deliver -> rollback/commit, and don't commit on drop	2022-02-02 12:10:13 +01:00
Tony Garnock-Jones	d7a847de37	Refactor with_facet	2022-02-02 11:52:13 +01:00
Tony Garnock-Jones	4ea07cdd6b	Further simplify supervision protocols	2022-01-26 23:37:43 +01:00
Tony Garnock-Jones	70c442ad47	Use a named unit struct instead of ()	2022-01-26 23:37:21 +01:00
Tony Garnock-Jones	1111776754	Eliminate need for awkward boot_fn transmission subprotocol	2022-01-26 22:30:47 +01:00
Tony Garnock-Jones	cc11120f23	Avoid erasing information immediately prior to it being needed (!) (when we can)	2022-01-26 22:12:45 +01:00
Tony Garnock-Jones	9080dc6f1e	Fill in the rest of the jolly owl	2022-01-20 10:12:04 +01:00
Tony Garnock-Jones	4dc613a091	Foundations for causal tracing	2022-01-19 14:40:50 +01:00
Tony Garnock-Jones	650463ff20	Accommodate extension point	2022-01-17 00:32:16 +01:00
Tony Garnock-Jones	3d3c1ebf70	Better handling of activation after termination, which repairs a scary-looking-but-harmless panic in config_watcher's private thread	2022-01-16 00:02:33 +01:00
Tony Garnock-Jones	11894ecb70	Better tracing of supervisor activity	2022-01-15 23:23:18 +01:00
Tony Garnock-Jones	2b296d79c7	Repair error in dataspace assertion idempotency. If a facet, during X, asserts X, for all X, then X includes all `Observe` assertions. Assertion of X should be a no-op (though subsequent retractions of X will have no effect!) since duplicates are ignored. However, the implementation had been ignoring whether it had seen `Observe` assertions before, and was always (re)placing them into the index, leading to runaway growth. The repair is to only process `Observe` records on first assertion and last retraction. As part of this change, Dataspaces have been given names, and some cruft from the previous implementation has been removed.	2022-01-15 23:18:29 +01:00
Tony Garnock-Jones	2d179d1e46	Avoid racy approaches to actor-termination. They're still there: you can use turn.state.shutdown(), which enqueues a message for eventual actor shutdown. But it's better to use turn.stop_root(), which terminates the actor's root facet within the current turn, ensuring that the actor's exit_status is definitely set by the time the turn has committed. This is necessary to avoid a racy panic in supervision: before this change, an asynchronous SystemMessage::Release was sent when the last facet of an actor was stopped. Depending on load (!), any retractions resulting from the shutdown would be delivered before the Release arrived at the stopping actor. The supervision logic expected exit_status to be definitely set by the time release() fired, which wasn't always true. Now that in-turn shutdown has been implemented, this is a reliable invariant. A knock-on change is the need to remove enqueue_for_myself_at_commit(), replacing it with a use of pending.for_myself.push(). The old enqueue_for_myself_at_commit approach could lead to lost actions as follows: A: start linked task T, which spawns a new tokio coroutine T: activate some facet in A and terminate A's root facet T: at this point, A transitions to "not running" A: spawn B, enqueuing a call to B's boot() A: commit turn. Deliveries for others go out as usual, but those for A will be discarded since A is "not running". This means that the call to B's boot() goes missing. Using pending.for_myself.push() instead assures that B's boot will always run at the end of A's turn, without regard for whether A is in some terminated state. I think that this kind of race could have happened before, but something about switching away from shutdown() seems to trigger it somewhat reliably.	2022-01-10 12:52:29 +01:00
Tony Garnock-Jones	e06e5fef10	Put thread IDs in logging output	2022-01-10 12:52:12 +01:00
Tony Garnock-Jones	c3a9525ef1	Track enough information to allow piecing-together of parent/child relationships among actors	2022-01-10 12:52:12 +01:00
Tony Garnock-Jones	58bde1e29d	Add Activation::stop_root	2022-01-10 11:23:02 +01:00
Tony Garnock-Jones	a6ea858f1c	Belt and suspenders	2022-01-09 21:01:55 +01:00
Tony Garnock-Jones	82ccbdb282	Simplify and correct facet stop logic; always run stop actions in parent facet context	2022-01-08 15:27:44 +01:00
Tony Garnock-Jones	0d25d76bec	Split out (internal) on_facet_stop from on_stop	2022-01-08 15:26:34 +01:00
Tony Garnock-Jones	19b04b82a2	Improve documentation regarding stop/exit actions	2022-01-08 15:25:41 +01:00
Tony Garnock-Jones	be27348d29	Activation::facet_ids	2022-01-08 15:24:10 +01:00
Tony Garnock-Jones	6f8fb014f2	Update daemon restart policy defaults to line up better with the new supervisor defaults	2022-01-07 22:05:12 +01:00
Tony Garnock-Jones	fce928b5b0	Warn on restart intensity excess	2022-01-07 17:16:20 +01:00
Tony Garnock-Jones	33a0a52d6b	Change SupervisorConfiguration default to RestartPolicy::Always	2022-01-07 17:16:05 +01:00
Tony Garnock-Jones	f956f3d994	Activation::every	2022-01-07 17:15:51 +01:00
Tony Garnock-Jones	bbcc15c74d	Fix length checks	2021-12-13 16:05:43 +01:00
Tony Garnock-Jones	f5b1fec90f	Follow simplifications to sturdy caveats	2021-12-13 16:00:25 +01:00
Tony Garnock-Jones	a831b02ca5	Accommodate changes to dataspacePatterns	2021-12-13 15:43:24 +01:00
Tony Garnock-Jones	730fa2098b	It is OK for an assertion to be placed at an unregistered remote_oid, it turns out	2021-12-01 11:14:02 +01:00
Tony Garnock-Jones	34c336e457	More tracing	2021-12-01 11:06:39 +01:00
Tony Garnock-Jones	11363c5776	If an actor panics, make sure to clean up in drop if we can	2021-12-01 11:06:29 +01:00
Tony Garnock-Jones	2ec35ad868	Process the rest of the turn even when an unknown oid is seen	2021-10-18 17:21:09 +02:00
Tony Garnock-Jones	4713005997	wait_for_all_actors_to_stop	2021-10-08 16:37:26 +02:00
Tony Garnock-Jones	baf98d6c54	Better span naming and logging tweaks	2021-10-08 16:37:17 +02:00
Tony Garnock-Jones	ac6f37cf0c	Clean up error reporting	2021-10-07 18:10:59 +02:00
Tony Garnock-Jones	40025b90a6	More capability-oriented scripting language	2021-10-07 17:00:04 +02:00
Tony Garnock-Jones	0d7ac7441f	stop() and stop_facet(facet_id) now return unit	2021-10-07 16:59:34 +02:00
Tony Garnock-Jones	f640111f20	Huh, I seem to have left this unfinished	2021-10-06 22:02:27 +02:00
Tony Garnock-Jones	7117215963	Binary and text support	2021-10-05 21:11:16 +02:00
Tony Garnock-Jones	9af31cfaad	More debug output	2021-10-05 19:10:30 +02:00
Tony Garnock-Jones	2a7606d626	Track actors globally (eventually for reflection/introspection)	2021-10-05 12:39:28 +02:00
Tony Garnock-Jones	ed12c0883e	Switch to parking_lot for another performance boost	2021-09-30 13:32:41 +02:00
Tony Garnock-Jones	de795219af	Fix up daemon retry logic. Also: named fields; better stop logic. In particular: 1. The root facet is considered inert even if it has outbound assertions. This is because the only outbound assertion it can have is a half-link to a peer actor, which shouldn't prevent the actor from terminating normally if the user-level "root" facet stops. 2. On stop_facet_and_continue, parent-facet continuations execute inline rather than at commit time. This is so that a user-level "root" facet can replace itself. Remains to be properly exercised/tested.	2021-09-28 17:10:36 +02:00
Tony Garnock-Jones	013e99af70	Greatly improve service lifecycle handling	2021-09-28 12:53:18 +02:00
Tony Garnock-Jones	a263a7091d	Tweak debug outputs	2021-09-26 11:02:55 +02:00
Tony Garnock-Jones	5a8a508fdc	More general on_stop; the old behaviour is now at on_stop_notify	2021-09-24 16:14:55 +02:00

1 2

87 Commits