21 KiB

Raw Blame History

The System Layer

Tony Garnock-Jones
October 2022

The system layer (Rice 2019; Corbet 2019) is an essential part of an operating system, mediating between user-facing programs and the kernel. Its importance lies in its role as the technical foundation for many qualities¹ relevant to system security, resilience, connectivity, maintainability and usability.

In the Linux world, existing system layer realizations cross-cut many, many projects: NetworkManager, GNOME, DBus, systemd, OpenRC, apt, apk, and so on. Each project has its own role in the overall system layer, and none takes a strong stance on the overall architecture that results from their combination. However, there are a group of basic concepts involved in a system layer that transcend individual subprojects, relating to issues of IPC, discovery, and whole-machine and application state management.

This document examines the architecture of system layers in general, touching on responsibilities currently handled at each of these levels, with the aim of bringing the concept of "system layer" into sharper focus.

What is a system layer?

The term "system layer" was coined² by Benno Rice in a 2019 talk. Here's an excerpt from the relevant portion of Rice's talk:³

... dynamic DHCP, IPv6 auto config, all these kinds of things are more dynamic. Time is more dynamic. Some aspects of device handling, you know, all of these things are a lot more dynamic now, and we need a way of strapping these things together so we can manage them that doesn't involve installing 15 different packages that all behave differently.

[15:08] And so what that ends up becoming, is what I term the system layer. Which is a bunch of stuff which might be running in user space or might be running in kernel space but is providing systemic level stuff as opposed to the stuff that you're writing or using directly. So this could include things like NetworkManager, and udev, and a whole bunch of things.

Systemd as a project ends up complementing the Linux kernel by providing all of this user space system layer.

(It's a really good talk.) The system layer idea seems to have been latent for a long time, and only recently to have been given a name.

Some examples include:

The Mac OS frameworks above the kernel level
The Android system with its APIs and SDKs
Various combinations of package manager, init system, service manager, support daemons, and user interface (be it ever so minimal); for example, debian+systemd+udevd+GNOME, or alpine+OpenRC+eudev+SSH.

Both Android and Mac OS embody substantially complete visions of a system layer, while the visions are much more fragmented in the Linux world. Even in cases where systemd makes up a good fraction of a particular system layer, most systems augment it with a wide variety of other software.

What does a system layer do?

A system layer addresses myriad system-level problems that applications face that are out-of-scope for the operating system kernel.

It solves these problems so that application developers can rely on shared vocabulary, common interfaces, and on communal development effort. The result is improved interoperability, compositionality, securability, etc., and reduced duplication of effort, less scope for design flaws, and so on.

The scope of the system layer changes with time as the needs of applications and users change and grow. The problems it addresses range from the highly abstract to the relatively concrete. For example, a system layer may:

supply services in response to static or dynamic demand
monitor and react to changes in system state
give higher-level perspectives to users and applications on system state and resources
offer access control mechanisms and enforce access control policies
offer a coherent, system-wide approach to security and privacy
offer inter-process communication media
provide name-binding and name resolution services
provide job-queueing and -scheduling services, including calendar-like and time-based scheduling
provide user interface facilities
provide system-wide "cut-and-paste" services for user-controlled IPC
provide system configuration and user preference databases
support software package installation, upgrade, and removal
offer state (data, configuration) replication services
provide data backup facilities

among other things. All of these areas are common across applications, unique to none of them.

To come up with this list, I surveyed a number of existing open systems such as Linux distributions, desktop environments, and so on, plus (in a limited way) Android and Mac OS, looking for commonalities and differences. That is, the list was developed in a largely informal way. Despite this, I've found it a fruitful starting point for an investigation of the properties of system layers in general. I welcome additional perspectives that others might bring.

In the remainder of this document, I'll use each of the topics in the list above as a perspective from which to examine existing software. I'll then attempt a synthesis of the results of this analysis into a firmer idea of what form a system layer could and perhaps should take.

Service management and system reactivity

An extremely common reoccuring pair of related themes in system layers of all sorts is service management and system reactivity. That is, the system layer takes on the tasks of starting and stopping services in response to static or dynamic demand, and of monitoring and reacting to changes in system state. While the kernel offers raw sense data plus a low-level vocabulary for managing the collection of running processes on a system, applications and users need a higher-level vocabulary for managing running software in terms of services and service relationships.

These tasks can be broken down into smaller, but still general, pieces:

primitive ability to start and stop service instances
declaration of singleton service instances, service classes, and instances of service classes
declaration of relationships (including runtime dependencies) among services
facility for managing service names and connecting service names to service instances
user interface for examining the service namespace and the collection of running and runnable services
facility for noticing and a medium for publishing and subscribing to changes in system state

Concrete examples include:

starting services in response to statically-configured runlevels (OpenRC, systemd, SysV init, etc.)
starting dependencies before dependent services (OpenRC, systemd, SysV init, etc.)
restarting terminated or failed services in a supervision hierarchy (daemontools, s6, etc.; Erlang/OTP)
starting services by service name on demand (DBus, etc.)
starting services by socket activation (systemd, etc.)
virtual-machine and container lifecycles, including supervision and restart of containers (docker, docker-compose, etc.)
reacting to hotplugging of a device by installing a driver or starting a program (udevd, etc.)
reacting to system metrics (e.g. temperature, load average, memory pressure) by changing something
reacting to network connectivity changes (NetworkManager, etc.)
setup and naming of devices and network routes (udevd, NetworkManager, etc.)

Laurent Bercot has produced an excellent comparison table in a page describing a new service manager for Linux distributions.

Higher-level perspectives on and control over system state and resources

An essential system layer task is to give users and applications higher-level perspectives on system state, resource availability and resource consumption than those offered by the kernel.

For example, the kernel's NETLINK_ROUTE sockets allow processes to observe changes in network interface and routing configuration, but applications often do not need the fine detail on offer: instead, they need higher-level knowledge such as "a usable default route for IPv4 exists", or "IPv4 connectivity is available, but metered".

Breaking this task down into smaller pieces yields:

access to low-level descriptions of system state, resource availability, and resource usage
ability to either poll for or subscribe to changes in such state
ability to compute relevant higher-level perspectives on the state
a medium for communicating such changes to users and applications

Concrete examples include:

computing default-route availability from NETLINK_ROUTE events over netlink sockets, as discussed
use of NETLINK_KOBJECT_UEVENT by udev to configure and expose hotplugged devices to userland
interrogation of disk devices and partition tables to provide views on and control over available filesystems (gnome-disks, etc.)
interrogation of audio devices and audio routing options to provide high-level views and control over audio setup (pipewire, pulseaudio, etc.), e.g. volume level display and volume controls, mute, select input/output channel, play/pause, skip, rewind etc.
high-level perspectives on devices such as displays, printers, mice, keyboards, touchpads, accelerometers, proximity sensors, temperature monitors and so on (GNOME, XFCE4, KDE, cups, etc.), communicated via DBus and friends
system configuration databases (/etc, Windows' Registry, GNOME configuration databases)
location services mapping from low-level GPS and wifi information to medium-level concrete location coordinates to high-level "you are at home", "you are in the office"-style knowledge about location
telephony services exposing high-level call management interfaces backed by low-level modem operations

Slightly harder to see, but still certainly an example of the subject of this section, is the collection of userland tools commonly associated with Unix-like operating systems more generally. The file system, for example, is firmly a systems concern and not an application-level concern, so the system layer provides general tools for manipulating, examining, and repairing the file system. This includes not only tools such as fsck, df, and mount, but facilities such as automounting, mounting and fscking at boot, scanning and manipulating partition tables, configuring lvm, and even the humble ls, cp and friends. On systems such as Mac OS, the Finder and Disk Utility programs and their associated underlying system services are analogous parts of the system layer.

Access control mechanisms and policies, security, and privacy

offer access control mechanisms and enforce access control policies
offer a coherent, system-wide approach to security and privacy
access control
- resource allocation services
- ACL-based access control for system services and DBus objects

Security and privacy

Existing system layers rely on single-machine approaches to security and securability that do not scale well: for example, Unix ACLs and user- and group-ID-based permissions. The theory of object capabilities (“ocaps”), exemplified in languages such as E and programming models such as Actors, offers a fine-grained approach that can be made to scale further than a single machine. However, ocaps only control access to shared programs. Access controls for shared data are left implicit. In addition, ideas of location and system boundary are left implicit in ocap systems.

I will adapt ocaps to syndicated actors. Because the Syndicated Actor model includes a first-class notion of shared data as well as a layered conception of locations and location boundaries, syndicated capabilities will reflect these ideas directly. I will generalize the Syndicated Actor model’s existing notions of place, connecting capabilities not to individual actors but to individual places and the data held therein. I will draw on existing ocap literature, including in particular the recent notion of Macaroons (Birgisson et al 2014) and older ideas from SPKI/SDSI (Ylonen et al 1999; Ellison 1999).

Q. How do you feel dataspaces would most enhance privacy or trust?

Capability technology offers strong, flexible control over access to any given dataspace without getting lost in the weeds of identity management: identity is an application-local, application-private concern.

Dataspaces default to being closed, "invite-only" networks, meaning casual observation of activity in a dataspace is not possible. But the necessary extension of the capability model to handle the data-sharing aspects of dataspaces gives benefit in terms of privacy and trust that goes beyond the already considerable benefits a traditional capability model offers.

Traditional capabilities directly control access to behavioural objects, and only indirectly control access to data held within such objects. Syndicated capabilities, by contrast, directly control access to shared data held within a space - changes to which may trigger activity in "objects" participating in the dataspace.

In other words, traditional capabilities encode data access controls in terms of object access controls; syndicated capabilities, vice versa.

This ability to directly express access to shared data gives system designers a powerful tool for thinking about permitted information flows, including questions of privacy. Furthermore, attenuating the authority of syndicated capabilities before passing them on to some other principal allows for strong partitioning of access within a dataspace, offering fine-grained, local, compositional decisions about access to shared data. Finally, it becomes possible to expose capabilities to end-users (roughly analogous to URLs), putting that power in their hands also.

I should also mention that dataspaces can scale from managing activity within a single OS process up to coordinating activity between machines around the world. A distributed dataspace could be an excellent foundation for collaborative applications, where privacy concerns come to the forefront. In effect, a dataspace can become a richly-structured "VPN", containing application-specific shared data and with application- or schema-specific access controls.

Inter-process communication and networking

offer inter-process communication media
inter-process communication
- DBus as a program-to-program communication bus
- email for use by system services

X11 for IPC

Name-binding, name-resolution, and namespaces

provide name-binding and name resolution services

udev - /dev namespace

naming services
- publishing names for intra-machine services on this system
- publishing names for LAN services on this system
- resolving names of intra-machine services on this system
- resolving names of services on other systems⁴

Job queueing and job scheduling

provide job-queueing and -scheduling services, including calendar-like and time-based scheduling

cron at systemd timers

cups, lpd

mail queue management?

User interface

provide user interface facilities

(TO APPLICATIONS but I guess also for the system layer itself)

provide system-wide "cut-and-paste" services for user-controlled IPC

email for talking to users notifications - system tray

ui facilities
- the thing that asks for user input during apt configuration
- the alert/prompt boxes in a web browser (?)
- notifications
- system tray, applets

System configuration and user preferences

provide system configuration and user preference databases
system configuration database
- system settings manager

Software management

support software package installation, upgrade, and removal

cc apt apk

State replication and data backup

offer state replication services
provide data backup facilities
state replication services
- contact book, address book
- file replication across machines
- sticky-notes, google keep
- todo list
backup facilities
- Time Machine

Synthesis, or, Toward a Complete Vision of a System Layer

Want to make it easy integrate portions of a system layer together. The core of the core has to be good IPC and state-management and -introspection.

systemd/udev/D-Bus/NetworkManager/dhcpcd/etc., as sketched above
init/inetd/crond/etc., the traditional Unix system layer
daemontools/runit/s6: service supervision software
OpenRC/s6-rc: service manager and supervisor used in Alpine
Android architecture components
Erlang's OTP, the system layer for the Erlang virtual operating system

Component	SM	RX	HL	AC	PR	IPC	NS	JQ	UI	CF	RR	BK
Linux kernel	✓	✓	✓	✓	✓	✓	✓	✓
udev		✓		✓			✓
D-Bus	✓			✓		✓	✓
NetworkManager		✓	✓	✓
dhcpcd
systemd	✓	✓					✓	✓
daemontools/runit/s6	✓
OpenRC	✓
OTP (Erlang)	✓					✓	✓	✓	✓
X11				✓		✓	✓		✓
Time Machine												✓
Nextcloud				✓		✓	✓		✓		✓
Syncthing				✓			✓				✓
Windows Registry										✓
GNOME		✓	✓	✓					✓	✓
Android	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓

References

[Bass et al 1998] Bass, Len, Paul Clements, and Rick Kazman. Software Architecture in Practice. Addison-Wesley, 1998.

[Birgisson et al 2014] Birgisson, Arnar, Joe Gibbs Politz, Úlfar Erlingsson, Ankur Taly, Michael Vrable, and Mark Lentczner. “Macaroons: Cookies with Contextual Caveats for Decentralized Authorization in the Cloud.” In Network and Distributed System Security Symposium. San Diego, California: Internet Society, 2014.

[Clements et al 2001] Clements, Paul, Rick Kazman, and Mark Klein. Evaluating Software Architectures: Methods and Case Studies. Addison-Wesley, 2001.

[Corbet 2019] Corbet, Jonathan. “Systemd as Tragedy.” LWN.Net, January 28, 2019. https://lwn.net/Articles/777595/.

[Ellison 1999] Ellison, Carl. SPKI Requirements. Request for Comments 2692. RFC Editor, 1999. https://doi.org/10.17487/RFC2692.

[Rice 2019] Rice, Benno. “The Tragedy of Systemd.” Conference Presentation at linux.conf.au, Christchurch, New Zealand, January 24, 2019. https://www.youtube.com/watch?v=o_AIw9bGogo.

[Ylonen et al 1999] Ylonen, Tatu, Brian Thomas, Butler Lampson, Carl Ellison, Ronald L. Rivest, and William S. Frantz. SPKI Certificate Theory. Request for Comments 2693. RFC Editor, 1999. https://doi.org/10.17487/RFC2693.

Notes

Known in the literature as “-ilities”; see e.g. Bass et al 1998 or Clements et al 2001. ↩︎
I wrote to Benno Rice to ask him about the term. He replied that he doesn't know of any earlier use of "system layer" for this particular bundle of ideas. Quoted (with permission) from his email to me: I’m not going to claim to be the first who thought of the idea but the name was something I came up with to describe the services that run in userspace but provide system-level services. I’m happy to own it if nobody else had the idea first. 🙃 It looks to me, then, like the term originated with him in 2019. ↩︎
I cut and pasted the automated YouTube transcript of the talk, and then cleaned it up. (Emphasis mine.) ↩︎
The resolver built in to libc plays the major part in this; but things like dnsmasq play a role too, especially when combined with virtual machines running within a host. ↩︎

21 KiB Raw Blame History Unescape Escape