Extended Binary Format Support for Mac OS X

© Amit Singh. All Rights Reserved. Written in January 2009


Executive Summary

This document discusses XBinary, a new software that lets you add kernel-level support for executing files in arbitrary binary formats on Mac OS X.


Introduction

A common activity that you implicitly do as a computer user is executing programs. In Mac OS X implementation terms, executing a program boils down to a process invoking a system call like posix_spawn(2) or execve(2) on the program's executable file. The latter is often called a "binary," even though executables can just as well be ASCII (or some other "readable" encoding) text, such as in the case of a shell script.

Natively, at the kernel level, Mac OS X recognizes the following executable formats.

It's an ELF World

Most modern operating systems and environments, with Mac OS X (Mach-O) and Microsoft Windows (PE) being notable exceptions, use the ELF object file format.

Fat binaries were originally a feature of the NEXTSTEP operating system, one of Mac OS X's ancestors. On Mac OS X, kernel support for executing fat binaries was actually present in the first release—10.4—of Mac OS X "Tiger", even though it wasn't needed until the first x86 version: 10.4.4.

You can find details of how the Mac OS X kernel handles program execution in Section 7.5 (pages 812 through 827) of the Mac OS X Internals book.

Extended Executability

Now, there can be scenarios in which you might want to seamlessly execute binaries that are not natively supported by the operating system. There exist programs—emulators, loaders, and such—that let you run non-native binaries from a variety of operating systems and environments. Apple's own Rosetta is an example: it lets you seamlessly run PowerPC binaries on the x86 version of Mac OS X, without you having to do anything special or different. However, Rosetta handling is hardcoded into the Mac OS X kernel and similar support for handling other binary formats is not possible today on Mac OS X.

Let us consider a specific non-Apple example. Apout is a portable C program that lets you run PDP-11 Unix binaries on modern operating systems. Apout simulates user-mode PDP-11 instructions and converts system call requests to native (Mac OS X in our case) system calls. You would normally use Apout by giving it a PDP-11 binary as an argument along with any other arguments that need to be passed to the PDP-11 binary. For example, the following is how you would run the ls program from Fifth Edition UNIX.

$ /usr/local/bin/apout /work/unixv5/bin/ls -l total 23 drwx------ 9 245 306 Jan 14 20:45 Desktop drwx------ 17 245 578 Jan 14 23:00 Documents drwx------ 25 245 850 Dec 25 12:43 Downloads drwx------ 49 245 1666 Sep 25 00:04 Library drwx------ 5 245 170 Dec 15 23:49 Movies drwx------ 5 245 170 Mar 2 07:51 Music drwx------ 30 245 1020 Dec 10 05:04 Pictures drwxr-xr-x 5 245 170 Oct 27 00:22 Public drwxr-xr-x 9 245 306 Mar 4 06:56 Sites

It would be nicer, more convenient, and cooler if we could simply run the PDP-11 binary—and in general any binary that can be run through programs such as Apout—seamlessly without having to specify any emulator or launcher program—just as if it were a "native" binary. That is:

$ /work/unixv5/bin/ls -l total 23 drwx------ 9 245 306 Jan 14 20:45 Desktop ...

For this to work at the lowest level, the Mac OS X kernel must first recognize the PDP-11 binary. Moreover, since the kernel obviously doesn't know how to load and run PDP-11 binaries, it must somehow arrange for Apout to handle the binary's execution, passing Apout all arguments and environment variables. You could think of what we're talking about as a kernel-level Launch Services mechanism.

Launch Services

Mac OS X has a high-level API called Launch Services that lets you bind "documents" to applications. Using Launch Services, a program can open applications, documents, and URLs as per preestablished bindings. In particular, when you double click on a file or folder icon in the Finder, it is Launch Services that the Finder calls, if necessary, to query how to handle your request. That said, Launch Services is high-level (much above the kernel) and isn't seamless in that lower (non-GUI) layers of the operating system do not go through this API. In the aforementioned PDP-11 example, you can't just use Launch Services to run the binary as we did on the command line and have the desired result.

As another example, consider the Native Client (NaCl) technology from Google. NaCl uses custom ELF binaries. To run such binaries standalone, NaCl comes with a loader program, sel_ldr (NaCl Simple/Secure ELF Loader), which parses an ELF binary, allocates memory, loads the relocatable image from the binary into memory, relocates it, and finally runs it. Again, if we want NaCl ELF binaries to seamlessly be part of our system's executable namespace, we need a way for the kernel to recognize such binaries and hand them over to sel_ldr for execution.

Many Uses

Even more examples would be those of seamlessly executing Microsoft Windows binaries through WINE, Vx32 ELF binaries, and Java applications (class files or JAR files).

In fact, we could even "execute" files that aren't executable in any traditional sense. For example, a JPEG image when executed could open up in an image viewer. A C source code file when executed could be dynamically compiled and run as if it were a script.

You could also use such a mechanism as a component of your own encrypted, signed, or sandboxed binary scheme.

Extending Executability

Mac OS X does not provide any kernel-level or user-level interfaces to extend binary format support. To make such things possible on Mac OS X, we'll need to write special software from scratch. This being new kernel functionality, the software will involve a kernel extension. Our goal is to extend the kernel such that it can recognize arbitrary binary formats and execute them through specified handler programs. Specifically, we'll place the following high-level requirements on the software.

Let us call the new software that implements these features XBinary.

binfmt_misc

Those familiar with the Linux kernel will realize that functionality similar to what's been described here exists in Linux as the binfmt_misc kernel feature. XBinary is conceptually similar in many ways to binfmt_misc, but their implementations are unrelated. As we will see, XBinary also has some Mac OS X specific features.

XBinary

The XBinary software consists of a kernel extension (xbinary.kext) and a command-line tool (xbinary). To get started, simply download and install the XBinary package. Both the kernel extension and the tool are installed under /Library/Application Support/xbinary/. The kernel extension must be loaded for the XBinary facility to be available. The tool is used to enable, manage, and disable the facility.

NB: XBinary should be regarded as research software at this point. My goal in releasing it is to make experimentation involving new binary formats easy for developers, researchers, and power users. Without such software, even to add support for one new type of binary format, you would have to add code to the core Mac OS X kernel and recompile the kernel, making the exercise rather painful, time consuming, and inconvenient. In contrast, XBinary is a configurable facility that can be dynamically loaded and unloaded on a stock operating system. Besides, low-level feature parity with other operating systems (Linux in this case) is generally a nice thing.

Let us take XBinary for a spin. You'll be using the xbinary tool for all interaction with the facility. We'll assume that the tool is in your PATH. (You could make a symbolic link to it in /usr/local/bin/ or another directory of your choosing.) The tool produces copious "help" output, which is reproduced below for reference.

XBinary: extended binary format support for Mac OS X Copyright (c) 2009 Amit Singh. All Rights Reserved. http://osxbook.com The XBinary software allows you to extend the Mac OS X kernel such that it can recognize arbitrary binary formats and execute them through specified handler programs. (Mac OS X natively supports executing only Mach-O binaries, Universal (fat) binaries, and interpreter scripts.) XBinary consists of a kernel extension (xbinary.kext) and this command-line tool, which lets you control the XBinary facility. This requires superuser privileges, so you should run this tool using sudo(8). The XBinary kext must be loaded for the facility to be available. -E, --enable_facility enable facility and load kext if necessary -D, --disable_facility disable facility (will not unload kext) -U, --unload_facility disable facility and unload kext You can add and manipulate in-kernel entries that enable recognition of binary formats. -a NAME OTHER_ARGS..., --add_entry NAME OTHER_ARGS... create new entry with unique name NAME -r NAME, --remove_entry NAME remove existing entry named NAME -e NAME, --enable_entry NAME enable existing entry named NAME -d NAME, --disable_entry NAME disable existing entry named NAME -l, --list_entries list existing entries -P, --purge_entries remove all existing entries Each entry must have as its name a unique identifier string up to 31 bytes in size. Additionally, a set of other arguments specify to the kernel how to recognize that binary format and which interpreter to invoke to handle it. XBinary can recognize a binary EITHER by matching magic bytes within the first page of the file OR by matching a file extension. OTHER_ARGS must be a valid combination of the following arguments (some are optional). -m MAGIC, --magic MAGIC magic bytes (up to 64 bytes) -o OFFSET, --offset OFFSET optional magic offset in bytes (default 0) -M MASK, --mask MASK optional magic mask (default all 0xff) -x EXT, --extension EXT file extension to match (up to 31 bytes) -i INTERP, --interpreter INTERP path to interpreter (up to 303 bytes) -p POS, --position POS add entry at position POS (default -1) -s STATE, --state STATE "enabled" (default) or "disabled" You can also specify flags that affect how a matched entry is processed. By default, the argument vector IS adjusted and setuid/setgid binaries are NOT allowed. -A, --preserve_argv do not adjust the argument vector -S, --allow_sugid allow setuid/segid binaries Other notes: * Mandatory arguments for a new entry specification include NAME, INTERP, and either EXT or MAGIC. MAGIC can be optionally qualified by OFFSET and MASK. * MAGIC must be specified in hexadecimal with 2 ASCII characters per byte and no "0x" prefix. Use this tool's -H argument for examples. * All MAGIC bytes (that is, OFFSET + length(MAGIC)) must lie within the first page (4096 bytes) of the file. * MASK is specified in the same format as MAGIC. If specified, MASK must have the same length as MAGIC. * The kernel considers entries in the order they are shown listed by the tool. By default, new entries go to the end of the list. You can use the position argument to insert an entry at a specific position. 0 represents the head of the list and -1 represents the end. * Unless you specify 'allow_sugid' while adding an entry, setuid/segid bits on matched binaries are ignored. * Entries reside in kernel memory and will disappear if the XBinary kext is unloaded. Only disabling the facility will not destroy any entries though. * See http://osxbook.com/software/xbinary for more details, including how the kernel invokes an interpreter. Use the -H argument to see some examples.

Let us first ensure that the facility is enabled. When you load the XBinary kernel extension, the facility is enabled by default. The -E option loads the kernel extension if it isn't loaded already. If it is loaded but the facility has been explicitly disabled through the -D option, the -E option reenables it.

$ sudo xbinary -E $

Of course, to begin with, there are no configured entries. Let us consider the case of PDP-11 binaries. As expected, by default, a PDP-11 binary will be rejected with an ENOEXEC error.

$ /work/unixv5/bin/ls bash: /work/unixv5/bin/ls: cannot execute binary file

Let us create some entries for PDP-11 executables. Magic numbers for PDP-11 executables can be found in Section 5 the UNIX Programmer's Manual, Volume 1. (See a.out(5).)

These examples assume that you are on an x86 (little-endian) machine. On PowerPC (big-endian) systems, you may need to byte-swap the magic/mask specifications when appropriate.

$ sudo xbinary -a "PDP-11 Old Overlay Executables" -m 0501 -i /usr/local/bin/apout $ sudo xbinary -a "PDP-11 Executables" -m 0701 -i /usr/local/bin/apout $ sudo xbinary -a "PDP-11 Pure Executables" -m 0801 -i /usr/local/bin/apout $ sudo xbinary -a "PDP-11 Separate I&D Executables" -m 0901 -i /usr/local/bin/apout

Our entries for PDP-11 executables all specify magic bytes to match. The absence of a magic offset means the bytes begin at offset zero within an executable. The absence of a magic mask means the bytes must match exactly. We can now use the -l arguments to list the now-in-kernel entries.

$ sudo xbinary -l Entry 0 name = PDP-11 Old Overlay Executables state = enabled flags = default magic_offset = 0 magic_bytes = 0501 interpreter = /usr/local/bin/apout Entry 1 name = PDP-11 Executables state = enabled flags = default magic_offset = 0 magic_bytes = 0701 interpreter = /usr/local/bin/apout Entry 2 name = PDP-11 Pure Executables state = enabled flags = default magic_offset = 0 magic_bytes = 0801 interpreter = /usr/local/bin/apout Entry 3 name = PDP-11 Separate I&D Executables state = enabled flags = default magic_offset = 0 magic_bytes = 0901 interpreter = /usr/local/bin/apout XBinary is globally enabled. 4 entries total.

Let us try to execute our Fifth Edition UNIX binary again, assuming /usr/local/bin/apout is a proper installation of Apout.

$ /work/unixv5/bin/ls Desktop Documents Downloads Library Movies Music Pictures Public Sites

If we disable the relevant entry, things should go back to the old behavior.

$ sudo xbinary -d "PDP-11 Executables" $ sudo xbinary -l Entry 0 name = PDP-11 Executables state = disabled ... $ /work/unixv5/bin/ls bash: /work/unixv5/bin/ls: cannot execute binary file

An entry for NaCl ELF executables would involve using a magic mask value. At the time of this writing, to tag its ELF binaries, NaCl uses 123 (0x7B) as the value for the OS ABI in the e_ident field of the ELF header.

$ sudo xbinary -a "Native Client ELF Executables" \ -m 7f454c460000017B00000000000000000000030001 \ -M ffffffff0000ffff00000000000000000000ff00ff \ -i /path/to/sel_ldr $ sudo xbinary -l ... Entry 3 name = Native Client ELF Executables state = enabled flags = default magic_offset = 0 magic_bytes = 7f454c460000017b00000000000000000000030001 E L F { magic_mask = ffffffff0000ffff00000000000000000000ff00ff interpreter = /path/to/sel_ldr XBinary is globally enabled. 5 entries total.

Let us look at Java applications next. Normally, when you compile and run Java applications from the command line, you compile using a Java compiler (javac in our case) and run using a Java application launcher (java in our case). By default, the first non-option argument to the launcher is the name of the class to be invoked, as illustrated by the following example.

$ ls HelloWorld.java $ cat HelloWorld.java class HelloWorld { public static void main(String args[]) { System.out.println("Hello, World!"); } } $ javac HelloWorld.java $ ls HelloWorld.class HelloWorld.java $ java HelloWorld Hello, World!

The fact that the Java launcher needs a class name by default means our XBinary entry for Java applications won't be as straightforward as, say, for PDP-11 binaries. We'll employ a wrapper script that will determine the startup class name given a Java class file and subsequently invoke the Java application launcher. We'll then specify this wrapper script as the interpreter in our XBinary entry. We don't even have to write such scripts: they already exist for use with the aforementioned binfmt_misc facility from Linux. You can download xbinary-java.tar.gz and place its following constituent files in /usr/local/bin/: javawrapper, javaclassname, and jarwrapper. Once these files are in place, we are ready for Java binary support.

$ chmod +x HelloWorld.class $ ./HelloWorld.class bash: ./HelloWorld.class: cannot execute binary file $ sudo xbinary -a "Java Programs" -m cafebabe -i /usr/local/bin/javawrapper $ ./HelloWorld.class Hello, World!

You can use /usr/local/bin/jarwrapper to support executable Jar files.

Cafe Babe

Note that the magic number for compiled Java class files (0xcafebabe) is actually the same as used by fat binaries.

Earlier, we talked about "executable" C source files. Again, we can reuse something from the Linux world: binfmtc, a program that dynamically compiles and executes C programs as if they were scripts. In fact, binfmtc supports other languages besides C—see its documentation for details. We'll assume that you have compiled and installed it as /usr/local/bin/binfmtc-interpreter.

To make a C source file be handled by binfmtc, you must have /*BINFMTC: compile-time-options as the first line of the file. (Of course, the comment must also be closed on a subsequent line.) The following example shows how you can have C programs be treated as executable scripts.

$ perl -e 'print unpack("H*", "/*BINFMTC:"), "\n";' 2f2a42494e464d54433a $ sudo xbinary -a "Executable C Programs" -m 2f2a42494e464d54433a \ -i /usr/local/bin/binfmtc-interpreter $ sudo xbinary -l ... name = Executable C Programs state = enabled flags = default magic_offset = 0 magic_bytes = 2f2a42494e464d54433a / * B I N F M T C : interpreter = /usr/local/bin/binfmtc-interpreter ... $ cat c-exec.c /*BINFMTC: An "executable" C source file.*/ #include <stdio.h> int main(int argc, char** argv) { int i; printf("This is an executable C source file!\n"); for (i = 0; i < argc; i++) { printf("argv[%d] = %s\n", i, argv[i]); } return 0; } $ chmod +x c-exec.c $ ./c-exec.c arg1 arg2 This is an executable C source file! argv[0] = ./hello.c argv[1] = arg1 argv[2] = arg2

Finally, we can have JPEG image files be associated—at the kernel level—with Preview.app. For variety, we'll create an XBinary entry based on file name extension (.jpg) rather than a magic number.

$ sudo xbinary -a "JPEG Images" -x jpg \ -i /Applications/Preview.app/Contents/MacOS/Preview $ sudo xbinary -l ... name = JPEG Images state = enabled flags = default extension = jpg interpreter = /Applications/Preview.app/Contents/MacOS/Preview ... $ chmod +x /path/to/some/image.jpg $ /path/to/some/image.jpg ...

Remember that the XBinary entries live in kernel memory—specifically, in the memory owned by the XBinary kernel extension. Therefore, if you unload the XBinary kernel extension, the entries go away and things return to their original state. However, the XBinary facility can be disabled (the -D option) without removing any entries. When the facility is disabled, the system behaves as if XBinary weren't present. XBinary can be reenabled, with all entries being intact, through the -E option of the xbinary tool. Be sure to read the help output from the tool for other usage notes.

Bonus Feature: Extra Fat Binaries

The fat binary mechanism is simple and useful. As noted earlier, a fat binary is merely a wrapper—a concatenation of multiple binaries, if you will—that is recognized by the kernel. The kernel chooses and executes one of the binaries from within the possibly multiple binaries contained in the wrapper. In doing so, the discriminant used by the kernel is the processor architecture of each constituent binary in the wrapper.

It would be even more useful if it were possible to have discriminants other than the processor architecture in fat binaries. In the past, I've had both need and desire for a fat binary mechanism that could take into account the operating system version—that is, have a Universal binary containing, say, Tiger and Leopard versions of a program. Depending on the nature of the program and the APIs it uses, this can simplify code creation and maintenance.

Let us look at an example. Consider a 2-way fat binary containing i386 and x86_64 architectures. We can use the lipo command-line tool to show information on a fat binary.

$ lipo -info some_fat_binary Architectures in the fat file: some_fat_binary are: i386 x86_64

Now think of an "extended" fat binary mechanism that incorporates operating system versions in addition to processor architectures. We'll assume there's an extended version of the lipo tool as well. Let us call it xlipo.

$ xlipo -info some_xfat_binary Architectures in the fat file: some_fat_binary are: x86_64_10.6 x86_64_10.5 \ x86_64_10.4 x86_64 i386_10.6 i386_10.5 i386_10.4 i386

We see that our hypothetical extended fat binary contains eight "architectures". For each of the two original architectures, x86_64 and i386, we've extended the architecture by adding an operating system version. A resultant tuple, say, { i386, 10.5 } represents an x86 binary that's meant for Mac OS X Leopard. If no operating system version is specified, we could have that tuple match any operating system version. When the kernel looks at such a binary, we could have the default matching algorithm be along the following lines.

Match the "closest" { processor architecture, OS version } found in the binary. For the case of a 64-bit Leopard machine, we would want the kernel to look for { x86_64, 10.5 } first. If that fails, look for { i386, 10.5 } next. If that fails, look for a generic (no operating system version specified) binary, that is, { i386, * }. If that fails, look for operating system versions older than the current one. If that too fails, look for operating system versions newer than the current one. Of course, one could have other matching algorithms and even more parameters based on which to match.

Since XBinary is experimental, why not experiment with such a feature too? Therefore, I added an implementation of an extended fat binary mechanism to the x86 version of XBinary. The implementation uses the aforementioned matching algorithm. In contrast with the fat binary magic number (0xcafebabe), the magic number used by the extended fat mechanism is 0xcafed00d. To play with this mechanism, download xbinary-xfat.tar.gz, which includes a modified version of lipo along with a test program. The following example shows this feature at work.

$ tar -xzvf xbinary-xfat.tar.gz $ cd xbinary-xfat $ ls Makefile hello_10.4.c hello_10.5.c hello_10.6.c hello_64.c hello.c hello_10.4_64.c hello_10.5_64.c hello_10.6_64.c xlipo $ cat hello_10.5_64.c #include main() { printf("This is 64-bit 10.5.\n"); }

The various hello*.c files represent { processor architecture, OS version }-specific implementations of a program. Running make would create hello_fat, an 8-way extended fat binary.

$ make ... $ ./xlipo -detailed_info ./hello_fat Fat header in: hello_fat fat_magic 0xcafed00d nfat_arch 8 architecture x86_64_10.6 ... architecture i386 cputype CPU_TYPE_I386 cpusubtype CPU_SUBTYPE_I386_ALL offset 102400 size 12588 align 2^12 (4096)

With the XBinary facility disabled, our extended fat binary would be rejected by the operating system. With the facility enabled, the kernel would choose and run the most appropriate binary. On a 64-bit Leopard machine, it should run the program contained in hello_10.5_64.c.

$ sudo xbinary -D # Disable XBinary $ sudo xbinary -l # Check XBinary is globally disabled. No entries. $ ./hello_fat bash: ./hello_fat: cannot execute binary file $ sudo xbinary -E # Reenable XBinary $ ./hello_fat This is 64-bit 10.5.

We can also try removing architectures from the extended fat binary and see how the kernel portion of XBinary chooses the next best binary.

$ ./xlipo -remove x86_64_10.5 hello_fat -output hello_fat $ ./hello_fat This is 32-bit 10.5. $ ./xlipo -remove i386_10.5 hello_fat -output hello_fat $ ./hello_fat This is 64-bit vanilla. $ ./xlipo -remove x86_64 hello_fat -output hello_fat $ ./hello_fat This is 32-bit vanilla. $ ./xlipo -remove i386 hello_fat -output hello_fat $ ./hello_fat This is 64-bit 10.4. ...

Note that the extended fat feature as implemented by XBinary is not a complete implementation—other components of the operating system would need to be updated for complete support. For example, XBinary wouldn't be able to help the dynamic linker with choosing the best library from an extended fat library file. Tools like nm, otool, and ar would need to be extended as well.

Security

You need superuser privileges to install, load, enable, disable, and otherwise control XBinary. In particular, you need superuser privileges to create or modify an XBinary entry.

By default, setuid/setgid bits on both the target binary and the associated interpreter are ignored. setuid/setgid must be explicitly enabled on a per entry basis.

Download


You must read and agree to this site's terms and conditions before downloading or using any software or other material available from this site.


XBinary requires Mac OS X 10.5.x (Leopard).

xbinary.pkg.zip

Enjoy executing!