synit/notes/process-supervision.md

7.1 KiB

title
Survey: Process Supervision

{{ page.title }}

RedoxOS "fired" -- can't find code or a homepage?

daemontools

  • https://cr.yp.to/daemontools.html

  • services should be symlinked into a directory monitored by svscan

  • programs:

    • svscanboot - starts svscan for /service, plus readproctitle for errors via ps

    • svscan - "starts and monitors a collection of services."

      • starts one supervise per service in a service directory (cwd)
      • designed to run forever
      • if s is a service, and s/log is a service, creates two supervises, with a pipe between them
      • if any supervise terminates, it restarts it
      • reuses the same pipe if restarting one end of a connected pair of supervises; this way no log messages are lost
    • supervise - (re)starts ./run for a given service. Writes status to ./supervise/*

    • svc - talks to a supervise

    • svok - predicate: is supervise running for a service?

    • svstat - list service status information for zero or more services (given explicitly)

    • fghack - vile hack for antibackgrounding

    • pgrphack - wrap a child in a new process group

    • readproctitle - takes stdin and puts it into its own command-line, to show up in ps output

    • multilog - scriptable filterable actions on each line of stdin; e.g. append to log, replace contents of file etc.

      • a "log" is a directory full of files with a special format
    • tai64n - puts hex TAI timestamps on each line of stdin

    • tai64nlocal - rewrites tai64n timestamps to human-readable

    • setuidgid - command wrapper for setting uid/gid

    • envuidgid - command wrapper for setting environment variables UID and GID

    • envdir - command wrapper for setting environment based on files in a directory

    • softlimit - rlimit

    • setlock - command wrapper for holding a "locked ofile" during the lifetime of the command

      • what is an "ofile"?
  • hax to get daemontools svscan as PID 1: https://code.dogmap.org/svscan-1/

    • "For a clean shutdown, we want to kill each service and ensure that its logger has written all the logs before killing the logger."

See "Artistic considerations" on https://skarnet.org/software/s6/why.html

systemd

Dinit https://github.com/davmac314/dinit

  • startup notification lets you signal when the process is actually ready
  • has something akin to dpkg's automatic/manual installation of packages, but for service startedness. You can "release" a service which is like "stop" but doesn't stop it if some dependent service is using it.
  • can "pin" a service in stopped or started state to prevent it from starting/stopping.

s6 has startup notification - is it compatible with systemd?

Reasons why DBUS is the way it is (2015): https://lwn.net/Articles/641277/

  • "Message passing or IPC isn't really the most important part of dbus. Process lifecycle tracking and discovery are more important. However, by integrating the IPC system with the lifecycle tracking you can simplify the overall system and avoid race conditions."
  • "dbus has a lot of semantic guarantees, such as message ordering, that reduce application complexity and therefore reduce code and reduce bugs." - sounds familiar!
  • "dbus names are directly modeled on X selections (see ICCCM)" - huh

Horust

Integrate monit with syndicate-system?

s6 - https://skarnet.org/software/s6/ - Laurent Bercot

Logging in daemontools:

runit - see https://docs.voidlinux.org/config/services/index.html

2021-08-24 I asked on IRC: "I have a question about Erlang history, I wonder if any of the old timers are here. I want to know how and when supervisor.erl began its life. Joe's HOPL paper mentions "BOS" as a source of inspiration, but I want to know more..."

15:03:25 < okeuday> tonyg: OTP behaviors were attributed to Lennart Öhman (working at Sjöland & Thyselius Telecom AB) in the past, but there are likely more details involved

Restart policies and lifecycles: daemontools-encore and nosh both use

stopped, starting, started, running, failed, and stopping.

(https://unix.stackexchange.com/questions/271413/is-there-a-retry-count-setting-for-svscan)

Daemontools just always restarts ./run (after pausing 1 second). s6 is similar, but runs ./finish if it exists, before restarting.

Mac OS X "Lion" (?) introduced "sudden termination" at the Cocoa level: apps can indicate they are OK with quick shutdown via SIGKILL. There's an analogous launchd feature for sudden termination of daemons and background Mac processes, though confusingly the docs also say that SIGTERM is sent a few seconds before SIGKILL for daemons.

Oh this is cool! From https://www.launchd.info/:

Daemons and Agents

launchd differentiates between agents and daemons. The main difference is that an agent is run on behalf of the logged in user while a daemon runs on behalf of the root user or any user you specify with the UserName key.

We have a similar distinction: the "system layer" of supervision vs. programs supervised by other portions and/or by inferior syndicate-server instances (like inferior DBus/systemd instances). Maybe holding the conceptual distinction between agents and daemons is also worthwhile!

"The Design and Implementation of the NetBSD rc.d system", Luke Mewburn, http://www.mewburn.net/luke/papers/rc.d.pdf