Instruction Re-writing :: Function Re-routing© Amit Singh. All Rights Reserved.Written in Early 2003
Consider the following common (well, maybe not so common) problem: You have written a loadable kernel module (kernel extension, in Mac OS X parlance). There is some function
foo() in the kernel that you would like to re-route, that is, replace with your own version called
my_foo(). What's more, you would like to be able to call the original
There are several things to be observed here:
- You may not have access to the source code of the kernel.
- Even if you do have access to the source, you may not want to modify the kernel source for various reasons (legal, recompilation, ...).
- You may not know, or want to know every single detail of
foo()- all you want is to do one or more of the following: pre-process the function's arguments, call it and post-process the result.
This kind of thing is really easy to do with VFSs. Traditionally, you deal with a collection of function pointers, and it is easy to stack such collections where a specific function
bar() in each layer do many things: pass through to the layer below, pre- or post- process arguments/results, block the call from passing through and so on.
Things are a little harder when there are no function pointers. In the situation described earlier,
foo() is a function at some address in the kernel. There may be an arbitrary number of callers that invoke this function. What follows is a scheme that allows you to replace
my_foo() so that calls to
foo() result in
my_foo() being invoked.
my_foo() can call
foo() itself though. Consider the picture shown on the right.
my_foo() are present as shown, with their respective instructions being
I0 I1 I2 I3 ... and
I'0 I'1 I'2 I'3 ... respectively.
A caller of
foo() eventually executes
I0. If we were to re-route
foo() so that
my_foo() would be called instead, an obvious thing to do would be to replace
I0 with an unconditional branch to
my_foo(). That would effectively leave
foo() out of the picture, and all calls to it would implicitly end up invoking
my_foo(). This is fine if we do not wish to call the real
foo() at all, but more often than not you do not wish to clobber
foo(), but want to be able to call it, say from within
my_foo(), as discussed earlier. A little more work is needed to achieve this.
I0, the clobbered instruction from
foo() somewhere. Then we allocate a region of memory large enough to hold a few instructions and mark it as executable (via a call such as
mprotect()). We call this region
foo_stub. Note that it is not necessary to dynamically allocate this region of memory - a very convenient way to "pre-allocate" such a region would be to have an actual function (that does nothing)
my_foo(). Finally, we copy
I0 to the beginning of
foo_stub. The very next instruction would be an unconditional branch to
foo + 4. Thereafter, the "original"
foo() can be invoked via
Note that while the demonstration source code accessible from this page (for PowerPC/Mac OS X) does work, it should be considered a prototype. If you were to do such re-routing at production level, you would have to worry about various issues: invalidating the instruction cache, multi-processor complications, the maximum possible distance between
foo_stub() etc. These issues will greatly depend on the specific processor you are working on (SPARC, PowerPC, x86 ...).
The "distance" issue is more severe on certain processors. A jump (unconditional branch) on MIPS uses 6 bits for the operand field and 26 bits for the address field. The effective addressable distance is 28 bits (4 times more) because MIPS refers to the number of words instead of bytes. This works because all instructions are 4 bytes long. 28 bits gives you 256 MB of leeway.
SPARC uses a 22 bit signed integer for branch address, but has two zero bits appended to it, effectively a 24 bit program counter relative reachability. This gives you 16 MB (+- 8MB) of leeway.
The unconditional branch in our target architecture (PowerPC) has a 24 bit address field, but as in MIPS, all instructions are 32 bits long, so PowerPC refers to words instead of bytes. The effective branch address (
LI || 0b00) is 26 bits wide, which gives you 64 MB (+- 32 MB) as the maximum distance you can jump. Refer to the picture below for details.
In the provided sample code, we assume that our jump will be less than 32 MB in absolute value (no exception handling is attempted if this is not the case!). You are very likely to run into situations when this would not be the case: it could be that you are trying to re-route a Mac OS X kernel function that is too far from your version of that function. Though innovative solutions might be possible depending upon the instruction set/processor in question, but a straightforward solution would be to find and use an unused region of memory near that function.
Finally, please note that on a production Mac OS X system utilizing this technique, you would use something like
icbf to flush the cache(s). The sample code does not do that, but simply calls an arbitrary function (say,
sync()) which effectively has the same result in our case.