synit/notes/process-supervision.md

152 lines
7.1 KiB
Markdown
Raw Permalink Normal View History

2021-08-24 08:54:40 +00:00
---
title: 'Survey: Process Supervision'
---
# {{ page.title }}
RedoxOS "fired" -- can't find code or a homepage?
daemontools
- https://cr.yp.to/daemontools.html
- services should be symlinked into a directory monitored by svscan
- programs:
- svscanboot - starts svscan for /service, plus readproctitle for errors via ps
- svscan - "starts and monitors a collection of services."
- starts one `supervise` per service in a service directory (cwd)
- designed to run forever
- if `s` is a service, and `s/log` is a service, creates *two*
`supervise`s, with a pipe between them
- if any `supervise` terminates, it restarts it
- reuses the *same* pipe if restarting one end of a connected
pair of `supervise`s; this way no log messages are lost
- supervise - (re)starts ./run for a given service. Writes status to ./supervise/*
- svc - talks to a `supervise`
- svok - predicate: is `supervise` running for a service?
- svstat - list service status information for zero or more services (given explicitly)
- fghack - vile hack for antibackgrounding
- pgrphack - wrap a child in a new process group
- readproctitle - takes stdin and puts it into its own command-line, to show up in ps output
- multilog - scriptable filterable actions on each line of stdin; e.g. append to log, replace contents of file etc.
- a "log" is a directory full of files with a special format
- tai64n - puts hex TAI timestamps on each line of stdin
- tai64nlocal - rewrites tai64n timestamps to human-readable
- setuidgid - command wrapper for setting uid/gid
- envuidgid - command wrapper for setting environment variables UID and GID
- envdir - command wrapper for setting environment based on files in a directory
- softlimit - rlimit
- setlock - command wrapper for holding a "locked ofile" during the lifetime of the command
- what is an "ofile"?
- hax to get daemontools svscan as PID 1: https://code.dogmap.org/svscan-1/
- "For a clean shutdown, we want to kill each service and ensure
that its logger has written all the logs before killing the
logger."
See "Artistic considerations" on https://skarnet.org/software/s6/why.html
systemd
- sd_listen_fds() - LISTEN_PID, LISTEN_FDS, LISTEN_FDNAMES; https://www.freedesktop.org/software/systemd/man/sd_listen_fds.html#
- sd_notify() - NOTIFY_SOCKET; https://www.freedesktop.org/software/systemd/man/sd_notify.html#
- sd_booted() - checks for /run/systemd/system/
- sd_watchdog_enabled() - WATCHDOG_USEC, WATCHDOG_PID; https://www.freedesktop.org/software/systemd/man/sd_watchdog_enabled.html#
Dinit https://github.com/davmac314/dinit
- startup notification lets you signal when the process is actually ready
- has something akin to dpkg's automatic/manual installation of
packages, but for service startedness. You can "release" a service
which is like "stop" but doesn't stop it if some dependent service
is using it.
- can "pin" a service in stopped or started state to prevent it from
starting/stopping.
s6 has startup notification - is it compatible with systemd?
Reasons why DBUS is the way it is (2015): https://lwn.net/Articles/641277/
- "Message passing or IPC isn't really the most important part of
dbus. Process lifecycle tracking and discovery are more important.
However, by integrating the IPC system with the lifecycle tracking
you can simplify the overall system and avoid race conditions."
- "dbus has a lot of semantic guarantees, such as message ordering,
that reduce application complexity and therefore reduce code and
reduce bugs." - sounds familiar!
- "dbus names are directly modeled on X selections (see ICCCM)" - huh
Horust
- https://news.ycombinator.com/item?id=22657301
- https://horust.dev/ (broken?!?!)
- https://github.com/FedericoPonzi/Horust
Integrate monit with syndicate-system?
s6 - https://skarnet.org/software/s6/ - Laurent Bercot
- see also https://skarnet.com/projects/service-manager.html
- and https://archive.fosdem.org/2017/schedule/event/s6_supervision/
- oh! and this! https://skarnet.org/software/s6-rc/overview.html
- and https://skarnet.org/software/s6-rc/faq.html
- redirfd trickery to get FIFOs set up for dependency resolution for
early logging: https://skarnet.org/software/s6/s6-svscan-1.html#log
-- in principle, could Syndicate dependency tracking take the place
of this?
- Maybe not because: "No logs are ever lost." (from https://skarnet.org/software/s6-linux-init/)
- On the other hand, maybe, if the actual mechanism for log
collection is a simple FIFO rather than full Syndicate (which
would be used for dependency tracking but not communication, for
this specific subtask). Keeps things UNIXy, keeps things
accessible, relies on the kernel for buffering stuff...?
Logging in daemontools:
- https://cr.yp.to/daemontools/faq/create.html#runlog
- stdout is piped to stdin of the logger program
runit - see https://docs.voidlinux.org/config/services/index.html
2021-09-02 14:52:11 +00:00
2021-08-24 I asked on IRC: "I have a question about Erlang history, I
wonder if any of the old timers are here. I want to know how and when
supervisor.erl began its life. Joe's HOPL paper mentions "BOS" as a
source of inspiration, but I want to know more..."
> 15:03:25 < okeuday> tonyg: OTP behaviors were attributed to Lennart
> Öhman (working at Sjöland & Thyselius Telecom AB) in the past, but
> there are likely more details involved
Restart policies and lifecycles: daemontools-encore and nosh both use
stopped, starting, started, running, failed, and stopping.
(https://unix.stackexchange.com/questions/271413/is-there-a-retry-count-setting-for-svscan)
Daemontools just always restarts `./run` (after pausing 1 second). s6
is similar, but runs `./finish` if it exists, before restarting.
2022-01-20 16:07:45 +00:00
Mac OS X "Lion" (?) introduced "sudden termination" at the Cocoa level: apps can indicate they
are OK with quick shutdown via SIGKILL. There's an analogous `launchd` feature for sudden
termination of daemons and background Mac processes, though confusingly the docs also say that
SIGTERM is sent a few seconds before SIGKILL for daemons.
- https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/Lifecycle.html
- https://developer.apple.com/documentation/foundation/processinfo#1651129
- The time delay between SIGTERM and SIGKILL can be controlled (defaults to 20s?):
- https://apple.stackexchange.com/a/361723
Oh this is cool! From <https://www.launchd.info/>:
> ## Daemons and Agents
>
> launchd differentiates between agents and daemons. The main difference is that an agent is
> run on behalf of the logged in user while a daemon runs on behalf of the root user or any
> user you specify with the UserName key.
We have a similar distinction: the "system layer" of supervision vs. programs supervised by
other portions and/or by inferior syndicate-server instances (like inferior DBus/systemd
instances). Maybe holding the *conceptual* distinction between agents and daemons is also
worthwhile!
2022-06-07 07:02:05 +00:00
"The Design and Implementation of the NetBSD rc.d system", Luke Mewburn,
http://www.mewburn.net/luke/papers/rc.d.pdf