forked from synit/synit
126 lines
5.8 KiB
Markdown
126 lines
5.8 KiB
Markdown
---
|
|
title: 'Survey: Process Supervision'
|
|
---
|
|
|
|
# {{ page.title }}
|
|
|
|
RedoxOS "fired" -- can't find code or a homepage?
|
|
|
|
daemontools
|
|
- https://cr.yp.to/daemontools.html
|
|
|
|
- services should be symlinked into a directory monitored by svscan
|
|
|
|
- programs:
|
|
- svscanboot - starts svscan for /service, plus readproctitle for errors via ps
|
|
- svscan - "starts and monitors a collection of services."
|
|
- starts one `supervise` per service in a service directory (cwd)
|
|
- designed to run forever
|
|
- if `s` is a service, and `s/log` is a service, creates *two*
|
|
`supervise`s, with a pipe between them
|
|
- if any `supervise` terminates, it restarts it
|
|
- reuses the *same* pipe if restarting one end of a connected
|
|
pair of `supervise`s; this way no log messages are lost
|
|
- supervise - (re)starts ./run for a given service. Writes status to ./supervise/*
|
|
- svc - talks to a `supervise`
|
|
- svok - predicate: is `supervise` running for a service?
|
|
- svstat - list service status information for zero or more services (given explicitly)
|
|
- fghack - vile hack for antibackgrounding
|
|
- pgrphack - wrap a child in a new process group
|
|
|
|
- readproctitle - takes stdin and puts it into its own command-line, to show up in ps output
|
|
- multilog - scriptable filterable actions on each line of stdin; e.g. append to log, replace contents of file etc.
|
|
- a "log" is a directory full of files with a special format
|
|
- tai64n - puts hex TAI timestamps on each line of stdin
|
|
- tai64nlocal - rewrites tai64n timestamps to human-readable
|
|
|
|
- setuidgid - command wrapper for setting uid/gid
|
|
- envuidgid - command wrapper for setting environment variables UID and GID
|
|
- envdir - command wrapper for setting environment based on files in a directory
|
|
- softlimit - rlimit
|
|
- setlock - command wrapper for holding a "locked ofile" during the lifetime of the command
|
|
- what is an "ofile"?
|
|
|
|
- hax to get daemontools svscan as PID 1: https://code.dogmap.org/svscan-1/
|
|
- "For a clean shutdown, we want to kill each service and ensure
|
|
that its logger has written all the logs before killing the
|
|
logger."
|
|
|
|
See "Artistic considerations" on https://skarnet.org/software/s6/why.html
|
|
|
|
systemd
|
|
- sd_listen_fds() - LISTEN_PID, LISTEN_FDS, LISTEN_FDNAMES; https://www.freedesktop.org/software/systemd/man/sd_listen_fds.html#
|
|
- sd_notify() - NOTIFY_SOCKET; https://www.freedesktop.org/software/systemd/man/sd_notify.html#
|
|
- sd_booted() - checks for /run/systemd/system/
|
|
- sd_watchdog_enabled() - WATCHDOG_USEC, WATCHDOG_PID; https://www.freedesktop.org/software/systemd/man/sd_watchdog_enabled.html#
|
|
|
|
Dinit https://github.com/davmac314/dinit
|
|
- startup notification lets you signal when the process is actually ready
|
|
- has something akin to dpkg's automatic/manual installation of
|
|
packages, but for service startedness. You can "release" a service
|
|
which is like "stop" but doesn't stop it if some dependent service
|
|
is using it.
|
|
- can "pin" a service in stopped or started state to prevent it from
|
|
starting/stopping.
|
|
|
|
s6 has startup notification - is it compatible with systemd?
|
|
|
|
Reasons why DBUS is the way it is (2015): https://lwn.net/Articles/641277/
|
|
- "Message passing or IPC isn't really the most important part of
|
|
dbus. Process lifecycle tracking and discovery are more important.
|
|
However, by integrating the IPC system with the lifecycle tracking
|
|
you can simplify the overall system and avoid race conditions."
|
|
- "dbus has a lot of semantic guarantees, such as message ordering,
|
|
that reduce application complexity and therefore reduce code and
|
|
reduce bugs." - sounds familiar!
|
|
- "dbus names are directly modeled on X selections (see ICCCM)" - huh
|
|
|
|
Horust
|
|
- https://news.ycombinator.com/item?id=22657301
|
|
- https://horust.dev/ (broken?!?!)
|
|
- https://github.com/FedericoPonzi/Horust
|
|
|
|
Integrate monit with syndicate-system?
|
|
|
|
s6 - https://skarnet.org/software/s6/ - Laurent Bercot
|
|
- see also https://skarnet.com/projects/service-manager.html
|
|
- and https://archive.fosdem.org/2017/schedule/event/s6_supervision/
|
|
|
|
- oh! and this! https://skarnet.org/software/s6-rc/overview.html
|
|
- and https://skarnet.org/software/s6-rc/faq.html
|
|
|
|
- redirfd trickery to get FIFOs set up for dependency resolution for
|
|
early logging: https://skarnet.org/software/s6/s6-svscan-1.html#log
|
|
-- in principle, could Syndicate dependency tracking take the place
|
|
of this?
|
|
- Maybe not because: "No logs are ever lost." (from https://skarnet.org/software/s6-linux-init/)
|
|
- On the other hand, maybe, if the actual mechanism for log
|
|
collection is a simple FIFO rather than full Syndicate (which
|
|
would be used for dependency tracking but not communication, for
|
|
this specific subtask). Keeps things UNIXy, keeps things
|
|
accessible, relies on the kernel for buffering stuff...?
|
|
|
|
Logging in daemontools:
|
|
- https://cr.yp.to/daemontools/faq/create.html#runlog
|
|
- stdout is piped to stdin of the logger program
|
|
|
|
runit - see https://docs.voidlinux.org/config/services/index.html
|
|
|
|
2021-08-24 I asked on IRC: "I have a question about Erlang history, I
|
|
wonder if any of the old timers are here. I want to know how and when
|
|
supervisor.erl began its life. Joe's HOPL paper mentions "BOS" as a
|
|
source of inspiration, but I want to know more..."
|
|
|
|
> 15:03:25 < okeuday> tonyg: OTP behaviors were attributed to Lennart
|
|
> Öhman (working at Sjöland & Thyselius Telecom AB) in the past, but
|
|
> there are likely more details involved
|
|
|
|
Restart policies and lifecycles: daemontools-encore and nosh both use
|
|
|
|
stopped, starting, started, running, failed, and stopping.
|
|
|
|
(https://unix.stackexchange.com/questions/271413/is-there-a-retry-count-setting-for-svscan)
|
|
|
|
Daemontools just always restarts `./run` (after pausing 1 second). s6
|
|
is similar, but runs `./finish` if it exists, before restarting.
|