416 lines
21 KiB
Markdown
416 lines
21 KiB
Markdown
# The System Layer
|
||
|
||
*Tony Garnock-Jones
|
||
October 2022*
|
||
|
||
The [*system layer*](glossary.md#system-layer) ([Rice 2019][]; [Corbet 2019][]) is an essential
|
||
part of an operating system, mediating between user-facing programs and the kernel. Its
|
||
importance lies in its role as the technical foundation for many qualities[^qualities] relevant
|
||
to system security, resilience, connectivity, maintainability and usability.
|
||
|
||
In the Linux world, existing system layer realizations cross-cut many, many projects:
|
||
NetworkManager, GNOME, DBus, systemd, OpenRC, apt, apk, and so on. Each project has its own
|
||
role in the overall system layer, and none takes a strong stance on the overall architecture
|
||
that results from their combination. However, there are a group of basic concepts involved in a
|
||
system layer that transcend individual subprojects, relating to issues of IPC, discovery, and
|
||
whole-machine and application state management.
|
||
|
||
This document examines the architecture of system layers in general, touching on
|
||
responsibilities currently handled at each of these levels, with the aim of bringing the
|
||
concept of "system layer" into sharper focus.
|
||
|
||
## What is a system layer?
|
||
|
||
The term "system layer" was coined[^as-far-as-i-know] by Benno Rice in
|
||
[a 2019 talk](https://youtu.be/o_AIw9bGogo). Here's an excerpt from
|
||
[the relevant portion of Rice's talk](https://youtu.be/o_AIw9bGogo?t=911):[^cleaned-up-automated-transcript]
|
||
|
||
> ... dynamic DHCP, IPv6 auto config, all these kinds of things are
|
||
> more dynamic. Time is more dynamic. Some aspects of device handling,
|
||
> you know, all of these things are a lot more dynamic now, and we
|
||
> need a way of strapping these things together so we can manage them
|
||
> that doesn't involve installing 15 different packages that all
|
||
> behave differently.
|
||
>
|
||
> <small>[15:08]</small> **And so what that ends up becoming, is what
|
||
> I term the system layer.** Which is a bunch of stuff which might be
|
||
> running in user space or might be running in kernel space but is
|
||
> **providing systemic level stuff** as opposed to the stuff that
|
||
> you're writing or using directly. So this could include things like
|
||
> NetworkManager, and udev, and a whole bunch of things.
|
||
>
|
||
> Systemd as a project ends up **complementing the Linux kernel by
|
||
> providing all of this user space system layer**.
|
||
|
||
(It's a really good talk.) The system layer idea seems to have been
|
||
latent for a long time, and only recently to have been given a name.
|
||
|
||
Some examples include:
|
||
|
||
- The Mac OS frameworks above the kernel level
|
||
- The Android system with its APIs and SDKs
|
||
- Various combinations of package manager, init system, service manager, support daemons, and
|
||
user interface (be it ever so minimal); for example, debian+systemd+udevd+GNOME, or
|
||
alpine+OpenRC+eudev+SSH.
|
||
|
||
Both Android and Mac OS embody substantially complete visions of a system layer, while the
|
||
visions are much more fragmented in the Linux world. Even in cases where systemd makes up a
|
||
good fraction of a particular system layer, most systems augment it with a wide variety of
|
||
other software.
|
||
|
||
## What does a system layer do?
|
||
|
||
A system layer addresses myriad system-level problems that applications face that are
|
||
out-of-scope for the operating system kernel.
|
||
|
||
It solves these problems so that application developers can rely on shared vocabulary, common
|
||
interfaces, and on communal development effort. The result is improved interoperability,
|
||
compositionality, securability, etc., and reduced duplication of effort, less scope for design
|
||
flaws, and so on.
|
||
|
||
The scope of the system layer changes with time as the needs of applications and users change
|
||
and grow. The problems it addresses range from the highly abstract to the relatively concrete.
|
||
For example, a system layer may:
|
||
|
||
- supply services in response to static or dynamic demand
|
||
- monitor and react to changes in system state
|
||
- give higher-level perspectives to users and applications on system state and resources
|
||
- offer access control mechanisms and enforce access control policies
|
||
- offer a coherent, system-wide approach to security and privacy
|
||
- offer inter-process communication media
|
||
- provide name-binding and name resolution services
|
||
- provide job-queueing and -scheduling services, including calendar-like and time-based scheduling
|
||
- provide user interface facilities
|
||
- provide system-wide "cut-and-paste" services for user-controlled IPC
|
||
- provide system configuration and user preference databases
|
||
- support software package installation, upgrade, and removal
|
||
- offer state (data, configuration) replication services
|
||
- provide data backup facilities
|
||
|
||
among other things. All of these areas are common *across* applications, unique to none of
|
||
them.
|
||
|
||
To come up with this list, I surveyed a number of existing open systems such as Linux
|
||
distributions, desktop environments, and so on, plus (in a limited way) Android and Mac OS,
|
||
looking for commonalities and differences. That is, the list was developed in a largely
|
||
informal way. Despite this, I've found it a fruitful starting point for an investigation of the
|
||
properties of system layers in general. I welcome additional perspectives that others might
|
||
bring.
|
||
|
||
In the remainder of this document, I'll use each of the topics in the list above as a
|
||
perspective from which to examine existing software. I'll then attempt a synthesis of the
|
||
results of this analysis into a firmer idea of what form a system layer could and perhaps
|
||
should take.
|
||
|
||
## Service management and system reactivity
|
||
|
||
An *extremely* common reoccuring pair of related themes in system layers of all sorts is
|
||
**service management** and **system reactivity**. That is, the system layer takes on the tasks
|
||
of starting and stopping services in response to static or dynamic demand, and of monitoring
|
||
and reacting to changes in system state. While the kernel offers raw sense data plus a
|
||
low-level vocabulary for managing the collection of running processes on a system, applications
|
||
and users need a higher-level vocabulary for managing running software in terms of services and
|
||
service relationships.
|
||
|
||
These tasks can be broken down into smaller, but still general, pieces:
|
||
|
||
- primitive ability to start and stop service instances
|
||
- declaration of singleton service instances, service classes, and instances of service classes
|
||
- declaration of relationships (including runtime dependencies) among services
|
||
- facility for managing service names and connecting service names to service instances
|
||
- user interface for examining the service namespace and the collection of running and runnable services
|
||
- facility for noticing and a medium for publishing and subscribing to changes in system state
|
||
|
||
Concrete examples include:
|
||
|
||
- starting services in response to statically-configured runlevels (OpenRC, systemd, SysV init, etc.)
|
||
- starting dependencies before dependent services (OpenRC, systemd, SysV init, etc.)
|
||
- restarting terminated or failed services in a supervision hierarchy (daemontools, s6, etc.; Erlang/OTP)
|
||
- starting services by service name on demand (DBus, etc.)
|
||
- starting services by socket activation (systemd, etc.)
|
||
- virtual-machine and container lifecycles, including supervision and restart of containers (docker, docker-compose, etc.)
|
||
- reacting to hotplugging of a device by installing a driver or starting a program (udevd, etc.)
|
||
- reacting to system metrics (e.g. temperature, load average, memory pressure) by changing something
|
||
- reacting to network connectivity changes (NetworkManager, etc.)
|
||
- setup and naming of devices and network routes (udevd, NetworkManager, etc.)
|
||
|
||
Laurent Bercot has produced an excellent [comparison
|
||
table](https://skarnet.com/projects/service-manager.html#comparison) in a page describing [a
|
||
new service manager for Linux
|
||
distributions](https://skarnet.com/projects/service-manager.html).
|
||
|
||
## Higher-level perspectives on and control over system state and resources
|
||
|
||
An essential system layer task is to give users and applications higher-level perspectives on
|
||
system state, resource availability and resource consumption than those offered by the kernel.
|
||
|
||
For example, the kernel's [`NETLINK_ROUTE`](https://en.wikipedia.org/wiki/Netlink) sockets
|
||
allow processes to observe changes in network interface and routing configuration, but
|
||
applications often do not need the fine detail on offer: instead, they need higher-level
|
||
knowledge such as "a usable default route for IPv4 exists", or "IPv4 connectivity is available,
|
||
but metered".
|
||
|
||
Breaking this task down into smaller pieces yields:
|
||
|
||
- access to low-level descriptions of system state, resource availability, and resource usage
|
||
- ability to either poll for or subscribe to changes in such state
|
||
- ability to compute relevant higher-level perspectives on the state
|
||
- a medium for communicating such changes to users and applications
|
||
|
||
Concrete examples include:
|
||
|
||
- computing default-route availability from `NETLINK_ROUTE` events over `netlink` sockets, as discussed
|
||
- use of `NETLINK_KOBJECT_UEVENT` by udev to configure and expose hotplugged devices to userland
|
||
- interrogation of disk devices and partition tables to provide views on and control over available filesystems (gnome-disks, etc.)
|
||
- interrogation of audio devices and audio routing options to provide high-level views and control over audio setup (pipewire, pulseaudio, etc.), e.g. volume level display and volume controls, mute, select input/output channel, play/pause, skip, rewind etc.
|
||
- high-level perspectives on devices such as displays, printers, mice, keyboards, touchpads, accelerometers, proximity sensors, temperature monitors and so on (GNOME, XFCE4, KDE, cups, etc.), communicated via DBus and friends
|
||
- system configuration databases (`/etc`, Windows' Registry, GNOME configuration databases)
|
||
- location services mapping from low-level GPS and wifi information to medium-level concrete location coordinates to high-level "you are at home", "you are in the office"-style knowledge about location
|
||
- telephony services exposing high-level call management interfaces backed by low-level modem operations
|
||
|
||
Slightly harder to see, but still certainly an example of the subject of this section, is the
|
||
collection of userland tools commonly associated with Unix-like operating systems more
|
||
generally. The file system, for example, is firmly a systems concern and not an
|
||
application-level concern, so the system layer provides general tools for manipulating,
|
||
examining, and repairing the file system. This includes not only tools such as `fsck`, `df`,
|
||
and `mount`, but facilities such as automounting, mounting and `fsck`ing at boot, scanning and
|
||
manipulating partition tables, configuring `lvm`, and even the humble `ls`, `cp` and friends.
|
||
On systems such as Mac OS, the Finder and Disk Utility programs and their associated underlying
|
||
system services are analogous parts of the system layer.
|
||
|
||
## Access control mechanisms and policies, security, and privacy
|
||
|
||
- offer access control mechanisms and enforce access control policies
|
||
- offer a coherent, system-wide approach to security and privacy
|
||
|
||
- *access control*
|
||
- resource allocation services
|
||
- ACL-based access control for system services and DBus objects
|
||
|
||
### Security and privacy
|
||
|
||
Existing system layers rely on single-machine approaches to security
|
||
and securability that do not scale well: for example, Unix ACLs and
|
||
user- and group-ID-based permissions. The theory of object
|
||
capabilities (“ocaps”), exemplified in languages such as E and
|
||
programming models such as Actors, offers a fine-grained approach that
|
||
can be made to scale further than a single machine. However, ocaps
|
||
only control access to shared programs. Access controls for shared
|
||
data are left implicit. In addition, ideas of location and system
|
||
boundary are left implicit in ocap systems.
|
||
|
||
I will adapt ocaps to syndicated actors. Because the Syndicated Actor model includes a
|
||
first-class notion of shared data as well as a layered conception of locations and location
|
||
boundaries, syndicated capabilities will reflect these ideas directly. I will generalize the
|
||
Syndicated Actor model’s existing notions of place, connecting capabilities not to individual
|
||
actors but to individual places and the data held therein. I will draw on existing ocap
|
||
literature, including in particular the recent notion of Macaroons ([Birgisson et al 2014][])
|
||
and older ideas from SPKI/SDSI ([Ylonen et al 1999][]; [Ellison 1999][]).
|
||
|
||
**Q. How do you feel dataspaces would most enhance privacy or trust?**
|
||
|
||
Capability technology offers strong, flexible control over access to any given dataspace
|
||
without getting lost in the weeds of identity management: identity is an application-local,
|
||
application-private concern.
|
||
|
||
Dataspaces default to being closed, "invite-only" networks, meaning casual observation of
|
||
activity in a dataspace is not possible. But the necessary extension of the capability model to
|
||
handle the data-sharing aspects of dataspaces gives benefit in terms of privacy and trust that
|
||
goes beyond the already considerable benefits a traditional capability model offers.
|
||
|
||
Traditional capabilities directly control access to behavioural objects, and only indirectly
|
||
control access to data held within such objects. Syndicated capabilities, by contrast, directly
|
||
control access to shared data held within a space - changes to which may trigger activity in
|
||
"objects" participating in the dataspace.
|
||
|
||
In other words, traditional capabilities encode data access controls in terms of object access
|
||
controls; syndicated capabilities, vice versa.
|
||
|
||
This ability to directly express access to shared data gives system designers a powerful tool
|
||
for thinking about permitted information flows, including questions of privacy. Furthermore,
|
||
*attenuating* the authority of syndicated capabilities before passing them on to some other
|
||
principal allows for strong partitioning of access within a dataspace, offering fine-grained,
|
||
local, compositional decisions about access to shared data. Finally, it becomes possible to
|
||
expose capabilities to end-users (roughly analogous to URLs), putting that power in their hands
|
||
also.
|
||
|
||
I should also mention that dataspaces can scale from managing activity within a single OS
|
||
process up to coordinating activity between machines around the world. A distributed dataspace
|
||
could be an excellent foundation for collaborative applications, where privacy concerns come to
|
||
the forefront. In effect, a dataspace can become a richly-structured "VPN", containing
|
||
application-specific shared data and with application- or schema-specific access controls.
|
||
|
||
|
||
## Inter-process communication and networking
|
||
|
||
- offer inter-process communication media
|
||
|
||
- *inter-process communication*
|
||
- DBus as a program-to-program communication bus
|
||
- email for use by system services
|
||
|
||
X11 for IPC
|
||
|
||
## Name-binding, name-resolution, and namespaces
|
||
|
||
- provide name-binding and name resolution services
|
||
|
||
udev - /dev namespace
|
||
|
||
- *naming services*
|
||
- publishing names for intra-machine services on this system
|
||
- publishing names for LAN services on this system
|
||
- resolving names of intra-machine services on this system
|
||
- resolving names of services on other systems[^libc-resolver]
|
||
|
||
|
||
## Job queueing and job scheduling
|
||
|
||
- provide job-queueing and -scheduling services, including calendar-like and time-based scheduling
|
||
|
||
cron
|
||
at
|
||
systemd timers
|
||
|
||
cups, lpd
|
||
|
||
mail queue management?
|
||
|
||
## User interface
|
||
|
||
- provide user interface facilities
|
||
|
||
(TO APPLICATIONS but I guess also for the system layer itself)
|
||
|
||
- provide system-wide "cut-and-paste" services for user-controlled IPC
|
||
|
||
email for talking to users
|
||
notifications - system tray
|
||
|
||
- ui facilities
|
||
- the thing that asks for user input during apt configuration
|
||
- the alert/prompt boxes in a web browser (?)
|
||
- notifications
|
||
- system tray, applets
|
||
|
||
## System configuration and user preferences
|
||
|
||
- provide system configuration and user preference databases
|
||
|
||
- system configuration database
|
||
- system settings manager
|
||
|
||
## Software management
|
||
|
||
- support software package installation, upgrade, and removal
|
||
|
||
cc
|
||
apt
|
||
apk
|
||
|
||
## State replication and data backup
|
||
|
||
- offer state replication services
|
||
- provide data backup facilities
|
||
|
||
- state replication services
|
||
- contact book, address book
|
||
- file replication across machines
|
||
- sticky-notes, google keep
|
||
- todo list
|
||
|
||
- backup facilities
|
||
- Time Machine
|
||
|
||
## Synthesis, or, Toward a Complete Vision of a System Layer
|
||
|
||
Want to make it *easy* integrate portions of a system layer together. The core of the core has
|
||
to be good IPC and state-management and -introspection.
|
||
|
||
- systemd/udev/D-Bus/NetworkManager/dhcpcd/etc., as sketched above
|
||
- init/inetd/crond/etc., the traditional Unix system layer
|
||
- daemontools/runit/s6: service supervision software
|
||
- OpenRC/[s6-rc](https://skarnet.com/projects/service-manager.html):
|
||
service manager and supervisor used in Alpine
|
||
- Android architecture components
|
||
- Erlang's OTP, the system layer for the Erlang virtual operating system
|
||
|
||
| Component | SM | RX | HL | AC | PR | IPC | NS | JQ | UI | CF | RR | BK |
|
||
|----------------------|----|----|----|----|----|-----|----|----|----|----|----|----|
|
||
| Linux kernel | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | | | |
|
||
| udev | | ✓ | | ✓ | | | ✓ | | | | | |
|
||
| D-Bus | ✓ | | | ✓ | | ✓ | ✓ | | | | | |
|
||
| NetworkManager | | ✓ | ✓ | ✓ | | | | | | | | |
|
||
| dhcpcd | | | | | | | | | | | | |
|
||
| systemd | ✓ | ✓ | | | | | ✓ | ✓ | | | | |
|
||
| daemontools/runit/s6 | ✓ | | | | | | | | | | | |
|
||
| OpenRC | ✓ | | | | | | | | | | | |
|
||
| OTP (Erlang) | ✓ | | | | | ✓ | ✓ | ✓ | ✓ | | | |
|
||
| X11 | | | | ✓ | | ✓ | ✓ | | ✓ | | | |
|
||
| Time Machine | | | | | | | | | | | | ✓ |
|
||
| Nextcloud | | | | ✓ | | ✓ | ✓ | | ✓ | | ✓ | |
|
||
| Syncthing | | | | ✓ | | | ✓ | | | | ✓ | |
|
||
| Windows Registry | | | | | | | | | | ✓ | | |
|
||
| GNOME | | ✓ | ✓ | ✓ | | | | | ✓ | ✓ | | |
|
||
| Android | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | |
|
||
|
||
|
||
|
||
## References
|
||
|
||
[Bass et al 1998]: #ref:bass98
|
||
[**Bass et al 1998**] <span id="ref:bass98"> Bass, Len, Paul Clements, and Rick
|
||
Kazman. Software Architecture in Practice. Addison-Wesley, 1998.</span>
|
||
|
||
[Birgisson et al 2014]: #ref:birgisson14
|
||
[**Birgisson et al 2014**] <span id="ref:birgisson14"> Birgisson, Arnar, Joe Gibbs Politz,
|
||
Úlfar Erlingsson, Ankur Taly, Michael Vrable, and Mark Lentczner. “Macaroons: Cookies with
|
||
Contextual Caveats for Decentralized Authorization in the Cloud.” In Network and Distributed
|
||
System Security Symposium. San Diego, California: Internet Society, 2014.</span>
|
||
|
||
[Clements et al 2001]: #ref:clements01
|
||
[**Clements et al 2001**] <span id="ref:clements01"> Clements, Paul, Rick Kazman, and Mark
|
||
Klein. Evaluating Software Architectures: Methods and Case Studies. Addison-Wesley,
|
||
2001.</span>
|
||
|
||
[Corbet 2019]: #ref:corbet19
|
||
[**Corbet 2019**] <span id="ref:corbet19"> Corbet, Jonathan. “Systemd as Tragedy.” LWN.Net,
|
||
January 28, 2019. <https://lwn.net/Articles/777595/>.</span>
|
||
|
||
[Ellison 1999]: #ref:ellison99
|
||
[**Ellison 1999**] <span id="ref:ellison99"> Ellison, Carl. SPKI Requirements. Request for
|
||
Comments 2692. RFC Editor, 1999. <https://doi.org/10.17487/RFC2692>.</span>
|
||
|
||
[Rice 2019]: #ref:rice19
|
||
[**Rice 2019**] <span id="ref:rice19"> Rice, Benno. “The Tragedy of Systemd.” Conference
|
||
Presentation at linux.conf.au, Christchurch, New Zealand, January 24, 2019.
|
||
<https://www.youtube.com/watch?v=o_AIw9bGogo>.</span>
|
||
|
||
[Ylonen et al 1999]: #ref:ylonen99
|
||
[**Ylonen et al 1999**] <span id="ref:ylonen99"> Ylonen, Tatu, Brian Thomas, Butler Lampson,
|
||
Carl Ellison, Ronald L. Rivest, and William S. Frantz. SPKI Certificate Theory. Request for
|
||
Comments 2693. RFC Editor, 1999. <https://doi.org/10.17487/RFC2693>.</span>
|
||
|
||
---
|
||
|
||
#### Notes
|
||
|
||
[^qualities]: Known in the literature as “-ilities”; see e.g.
|
||
[Bass et al 1998][] or
|
||
[Clements et al 2001][].
|
||
|
||
[^as-far-as-i-know]: I wrote to Benno Rice to ask him about the term. He replied that he
|
||
doesn't know of any earlier use of "system layer" for this particular bundle of ideas.
|
||
Quoted (with permission) from his email to me: <q>I’m not going to claim to be the first
|
||
who thought of the idea but the name was something I came up with to describe the services
|
||
that run in userspace but provide system-level services. I’m happy to own it if nobody else
|
||
had the idea first. 🙃</q> It looks to me, then, like the term originated with him in 2019.
|
||
|
||
[^cleaned-up-automated-transcript]: I cut and pasted the automated
|
||
YouTube transcript of the talk, and then cleaned it up.
|
||
(Emphasis mine.)
|
||
|
||
[^libc-resolver]: The resolver built in to libc plays the major part in this; but things like
|
||
dnsmasq play a role too, especially when combined with virtual machines running within a
|
||
host.
|