What is Mac OS X?© Amit Singh. All Rights Reserved. Written in December 2003
XNU: The Kernel
The Mac OS X kernel is called
XNU. It can be viewed as consisting of the following components:
XNU contains code based on Mach, the legendary architecture that originated as a research project at Carnegie Mellon University in the mid 1980s (Mach itself traces its philosophy to the Accent operating system, also developed at CMU), and has been part of many important systems. Early versions of Mach had monolithic kernels, with much of BSD's code in the kernel. Mach 3.0 was the first microkernel implementation.
XNU's Mach component is based on Mach 3.0, although it's not used as a microkernel. The BSD subsystem is part of the kernel and so are various other subsystems that are typically implemented as user-space servers in microkernel systems. XNU's Mach is responsible for various low-level aspects of the system, such as:
- preemptive multitasking, including kernel threads (POSIX threads on Mac OS X are implemented using kernel threads)
- protected memory
- virtual memory management
- inter-process communication
- interrupt management
- real-time support
- kernel debugging support (the built-in low-level kernel debugger,
ddb, is part of XNU's Mach component, and so is
kdp, a remote kernel debugging protocol implementation)
- console I/O
The sequence of events prior to the kernel is passed control is described in Booting Mac OS X. The secondary bootloader eventually calls the kernel's "startup" code, forwarding various boot arguments to it. This low-level code is where every processor in the system starts (from the kernel's point of view). Various important variables, like maximum virtual and physical addresses, the threshold temperature for throttling down a CPU's speed, are initialized here, BAT registers are cleared, Altivec (if present) is initialized, caches are initialized, etc. Eventually this code jumps to boot initialization code for the architecture (
ppc_init() on the PowerPC). Thereafter:
- A template thread is filled in, and an initial thread is created from this template. It is set to be the "current" thread.
- Some CPU housekeeping is done.
- The "Platform Expert" (see below) is initialized (
PE_init_platform()), with a flag indicating that the VM is not yet initialized. This saves the boot arguments, the device tree and display information in a state variable. Another call to
PE_init_platform()is made after the VM is initialized.
- Mach VM is initialized.
- The function
machine_startup()is called. It takes some actions based on the boot arguments, performs some housekeeping, starts thermal monitoring for the CPU, and calls
setup_main()performs a lot of work: initializing the scheduler, IPC, kernel extension loading, clock, timers, tasks, threads, etc. and finally creates a kernel thread called
startup_threadthat creates further kernel threads.
startup_threadcreates a number of other threads (the idle threads, service threads for clock and device, ...). It also initializes the thread reaper, the stack swapin and the periodic scheduler mechanism. It is here that the BSD subsystem is initialized (via
startup_threadbecomes the pageout daemon once it finishes its work.
At this point, Mach is up and running.
XNU's BSD component uses FreeBSD as the primary reference codebase (although some code might be traced to other BSDs). Darwin 7.x (Mac OS X 10.3.x) uses FreeBSD 5.x. As mentioned before, BSD runs not as an external (or user-level) server, but is part of the kernel itself. Some aspects that BSD is responsible for include:
- process model
- user ids, permissions, basic security policies
- POSIX API, BSD style system calls
- TCP/IP stack, BSD sockets, firewall
- VFS and filesystems (see Mac OS X Filesystems for details)
- System V IPC
- crypto framework
- various synchronization mechanisms
Note that XNU has a unified buffer cache but it ties in to Mach's VM.
XNU uses a synchronization abstraction (built on top of Mach mutexes) called funnels to serialize access to the BSD portion of the kernel. The kernel variables pointing to these funnels have the
_flock suffix, such as
network_flock. When Mach initializes the BSD subsystem via a call to
bsd_init(), the first operation performed is the allocation of funnels (the kernel funnel's state is set to
- The kernel memory allocator is initialized.
- The "Platform Expert" (see below) is called upon to see if there are any boot arguments for BSD.
- VFS buffers/hash tables are allocated and initialized.
- Process related structures are allocated/initialized. This includes the list of all processes, the list of zombie processes, hash tables for process ids and process groups.
- Process 0 is created and initialized (credentials, file descriptor table, audit information, limits, etc.). The variable
kernprocpoints to process 0.
- The machine dependent real-time clock's time and date are initialized.
- The Unified Buffer Cache is initialized (via
ubc_init(), which essentially initializes a Mach VM Zone via
zinit(), which allocates a region of memory from the page-level allocator).
- Various VFS structures/mechanisms are initialized: the vnode table, the filesystem event mechanism, the vnode name cache, etc. Each present filesystem time is also initialized.
mbufs(memory buffers, used heavily in network memory-management) are initialized via
- Facilities/subsystems such as
aio, and System V IPC are initialized.
- The kernel's generic MIB (management information base) is initialized.
- The data link interface layer is initialized.
- Sockets and protocol families are initialized.
- Kernel profiling is started, and BSD is "published" as a resource in the IOKit.
- Ethernet devices are initialized.
- A Mach Zone is initialized for the vnode pager.
- BSD tries to mount the root filesystem (which could be coming over the network, for example, a Mac OS X disk image (
.dmg) exported over NFS).
devfsis mounted on
- A new process is created (cloned) from
kernproc(process 0). This newly created process has
pid1, and is set to become
mach_init, which starts
mach_initis loaded and run via
bsdinit_task(), which is called by the BSD asynchronous trap handler (
The rest of the user space startup is described in Mac OS X System Startup.
I/O Kit, the object-oriented device driver framework of the XNU kernel is radically different from that on traditional systems.
I/O Kit uses a restricted subset of C++ (based on Embedded C++) as its programming language. This system is implemented by the
libkern library. Features of C++ that are not allowed in this subset include:
- multiple inheritance
- RTTI (run-time type information), although I/O Kit has its own run-time typing system
The device driver model provided by the I/O Kit has several useful features (in no particular order):
- numerous device families (ATA/ATAPI, FireWire, Graphics, HID, Network, PCI, USB, HID, ...)
- object oriented abstractions of devices that can be shared
- plug-and-play and hot-plugging
- power management
- preemptive multitasking, threading, symmetric multiprocessing, memory protection and data management
- dynamic matching and loading of drivers (multiple bus types)
- a database for tracking and maintaining detailed information on instantiated objects (the I/O Registry)
- a database of all I/O Kit classes available on a system (the I/O Catalog)
- an extensive API
- mechanisms/interfaces for applications and user-space drivers to communicate with the I/O Kit
- driver stacking
I/O Kit's implementation consists of three C++ libraries that are present in the kernel and available to loadable drivers:
Kernel/IOKit. The I/O Kit includes a modular, layered run-time architecture that presents an abstraction of the underlying hardware by capturing the dynamic relationships between the various hardware/software components (involved in an I/O connection).
Various tools such as
kextcache, etc. let you explore and control various aspects of I/O Kit. For example, the following command shows status of dynamically loaded kernel extensions:
Index Refs Address Size Wired Name (Version) <Linked Against>
1 1 0x0 0x0 0x0 com.apple.kernel (7.2)
2 1 0x0 0x0 0x0 com.apple.kpi.bsd (7.2)
3 1 0x0 0x0 0x0 com.apple.kpi.iokit (7.2)
4 1 0x0 0x0 0x0 com.apple.kpi.libkern (7.2)
The following command lists the details of the I/O Kit registry in excruciating detail:
% ioreg -l -w 0
+-o Root <class IORegistryEntry, retain count 12>
| "IOKitBuildVersion" = "IOKit Component Version 7.2:
Thu Dec 11 16:15:20 PST 2003;
| "IONDRVFramebufferGeneration" = <0000000200000002>
/* thousands of lines of output */
The Platform Expert is an object (one can think of it as a driver) that knows the type of platform that the system is running on. I/O Kit registers a nub (see below) for the Platform Expert. This nub then loads the correct platform specific driver, which further discovers the buses present on the system, registering a nub for each bus found. The I/O Kit loads a matching driver for each bus nub, which discovers the devices connected to the bus, and so on. Thus, the Platform Expert is responsible for actions such as:
- Building the device tree (as described above)
- Parse certain boot arguments
- Identify the machine (including processor and bus clock speeds)
- Initialize a "user interface" to be used in case of kernel panics
libkern and libsa
As described earlier, the I/O Kit uses a restricted subset of C++. This system, implemented by
libkern, provides features such as:
- Dynamic object allocation, construction, destruction (including data structures such as Arrays, Booleans, Dictionaries, ...)
- Certain atomic operations, miscellaneous functions (
- Provisions for tracking the number of current instances for each class
- Ways to avoid the "Fragile Base Class Problem"
libsa provides functions for miscellaneous purposes: binary searching, symbol remangling (used for gcc 2.95 to 3.3, for example), dgraphs, catalogs, kernel extension management, sorting, patching vtables, etc.