Objective-C for the AVR, part 3

Matt Jacobson
September 2021

This is part three in an occasional series about my AVR Objective-C runtime. See also my previous entries on the topic (1, 2, 4).

I've managed to implement a lot of functionality in avr-objc without doing any load-time work. The compiler has set up data structures in memory and invoked the runtime when it needs dynamic behavior.

But there are certain features that are difficult—or even impossible—unless the runtime gets a chance to run some code before the program begins. I'll discuss implementing them in this entry.

Implementing `objc_getClass`

So far, all of the operations I've had to do on class objects have involved messaging them. In a message to a class object, the compiler passes the address of the class as the self parameter to the message dispatcher routine, and the dispatcher can just operate on it.

What if the program doesn't know the name of the class it wants to use at compile time? For example, there might be a set of classes that all implement an identical interface (whether through inheritance or protocol adoption), and the name of the specific class to use is read from disk. (This is essentially how nibs work on macOS and iOS.)

To support this pattern, the runtime provides a function called objc_getClass(), which takes a class name and returns the class object with that name. Implementing this function requires knowledge of the full set of available classes.

Reading the class list

This knowledge is provided by compiler-generated metadata. As described in an earlier entry, the exact format of this metadata varies among the different runtime ABIs.

I'll discuss the format used under the objc4 modern ABI, which avr-objc aims to conform to. Here, the list of classes is provided as a simple array of pointers to class objects.

The length of that array is a little tricky. On NeXTSTEP-derived OSes, the array is stored in a special Mach-O section called __DATA, __objc_classlist. The size of the array—the number of class objects—is implicit: the size of the Mach-O section can be determined at runtime (from the Mach-O header metadata), and it's guaranteed to be a multiple of the pointer size. The number of pointers stored in the section is the number of classes.

In my case, though, I don't have access to executable metadata at runtime. That stuff is all stripped out when the binary is programmed into flash. So while my code can know the address of the start of the array, it doesn't know its size—that is, the number of classes.

One easy way to solve this is to take advantage of the fact that while the compiler and runtime don't know the size of the section, the linker does. By using a linker script, I can define a symbol at the end of the class list. The runtime can then use this symbol as a landmark to avoid running past the end of the list.

Unfortunately, the code GCC uses to place the class list in its own section is currently specific to Darwin targets. So I ended up patching the compiler to do this correctly for general ELF targets, too.

With that fix in place, I amended my linker script like so:

.data:
{
  /* ... */

  __OBJC_CLASSLIST_BEGIN = .;
  KEEP(*(objc_classlist))
  __OBJC_CLASSLIST_END = .;

  /* ... */
}

This code does two important things. First, it defines the explicit beginning/end symbols to be used by the runtime. Second, it ensures that the class list, which is now in a separate section, is linked into the .data section, even though no other symbol explicitly references it.

With the symbols defined, I then wrote some code in the runtime to walk the list:

extern const Class classlist[]   __asm__("__OBJC_CLASSLIST_BEGIN");
extern const Class classlist_end __asm__("__OBJC_CLASSLIST_END");

Class objc_getClass(const char *const name) {
    for (const Class *ptr = classlist; ptr < &classlist_end; ptr++) {
        const Class cls = *ptr;

        if (!strcmp(name, cls->rodata->name)) {
            return cls;
        }
    }

    return Nil;
}

In a more professional runtime, I might dynamically construct a hash map from names to Classes. But here, I'm operating in a memory-constrained environment that's not likely to have many classes—nor many instances of objc_getClass. So a linear probe of the class list seems fine, at least for now.

I spent a while worrying whether the comparison between ptr and &classlist_end was valid, but I believe that the standard says it is.^[1]

Categories

Categories are documented by the compiler in almost exactly the same way as classes. The compiler emits a list of pointers to Category data structures into a special section, and once again I can modify the linker script to allow the runtime to walk through the list safely.

A Category adds instance and/or class methods to a particular class. But so far, my class_lookupMethod has been looking directly at class metadata to find method implementations. That won't suffice now that the method might be provided by a category.

In fact, another feature of Objective-C is the ability to add (and remove) methods at runtime. It therefore makes sense to write general functionality for dynamically added methods; then, loading a category is just a matter of making use of that functionality at load time.

I discussed the basics of messaging in my first entry on this runtime. Each Class has an associated class_ro struct hanging off of it, off which hangs an array of method structs, each containing the name of the method and its IMP. Messaging is a "simple" matter of probing the method array for the right method and then jumping to it. See the earlier entry for exposition of the scare quotes.

Adding a method would ideally be as simple as appending to the existing method array. But that's not easy: there's no guarantee of any particular amount of extra space at the end of that array (and in fact the amount of space is likely zero, as the compiler will try to pack its data structures in tightly to minimize wasted bytes).

I could copy the method array into dynamically allocated memory. But since the compiler's data structures lie in memory considered to be statically allocated, the old copy can't be reclaimed. So if a class has a large number of methods in its main @implementation, that could mean a lot of wasted memory just to insert a single dynamically added method.

Instead, what most runtimes do is to maintain a second method list for dynamically added methods. This second method list need only exist for dynamically modified classes.

How should this second list be associated with the Class? As discussed peripherally in that earlier entry, each Class has pointer-sized members cache and vtable, which are for the runtime's use. They are initially populated with pointers, respectively, to objc_empty_cache and objc_empty_vtable; since this runtime making no use of method caches or vtables, it defines both of those symbols to zero.

cache and vtable therefore represent usable storage in each Class from which I can hang runtime-generated data. Since I may want to add other runtime data later, I defined a struct to contain all runtime data:

struct class_rw {
    uint8_t num_added_methods;
    struct method added_methods[MAX_ADDED_METHODS];
};

The name class_rw is also what the objc4 runtime uses, in obvious analogy with class_ro, though it hangs it off the class in a different way.

I then renamed the void *vtable member to struct class_rw *rwdata and defined it to point at the class_rw, if any. If no runtime data has been set up for the class yet, rwdata continues to contain its old value of objc_empty_vtable. Since objc_empty_vtable is defined to zero, it's also a valid null pointer here.

While I was at it, I renamed void *cache to uintptr_t flags. This gives me 16 flag bits I can set on a class without having to pointer-chase into the rwdata. I'll make use of this later. uintptr_t is always the same size as pointer types, so this replacement fits, and the initial value of objc_empty_cache initializes all the flags to zero, which is what I want.

Anyway, back to the method list. To keep things simple, I defined a static cap for the number of added methods. Mostly, I just felt like getting something running without fiddling with reallocf and available/used counts. I can revisit this in the future as needed.

Adding a method is then as simple as getting the rwdata (creating it if needed) and appending a method to the array, assuming there's room (or aborting otherwise). As for messaging, the class_lookupMethod routine now has two places to look for methods. Conventionally, Objective-C method dispatchers look at dynamically added methods before static methods; this enables the questionable and often unintentional practice of category stomping, in which a main-@implementation method is replaced by a method from a category.^[2]

With rudimentary method adding support added, implementing categories is straightforward. Each Category looks like this:

struct objc_category {
    char *name;
    struct objc_class *cls;
    struct method_list *instanceMethods;
    struct method_list *classMethods;
};

instanceMethods and classMethods are the same kinds of method lists found in classes' rodatas. To install a category, the runtime adds the instanceMethods to the class itself and the classMethods to the class's metaclass.

Ivar offsets and class sizes

There's one other interesting load-time task that's worth implementing at this point: non-fragile ivars.

The "modern" objc4 ABI introduced non-fragile ivars—the ability for the ivar layout of a superclass to change without breaking binary compatibility with subclasses. Greg Parker has a great succinct description of the problem here that I won't attempt to match. In short, since the full ivar layout (including superclasses) is not known until runtime, all ivar accesses use a runtime-computed offset. Those offsets need to be computed at load time, before any ivars are accessed.

To do that, I once again need to walk the class list. I can use the same approach I showed above in objc_getClass; in this case, for each class, I'll fix up its ivar offsets.

Like its methods, a class's ivar offsets hang off of its rodata in its ivars member.^[3] Each ivar contains a bunch of metadata: its name, @encode type, alignment, and size. But here I'm only interested in its uint16_t *offset member: a pointer to a uint16_t that contains the offset that needs to be updated.

How does the runtime know what to update the offsets to? First, it needs to know the size of the ivars from the class's superclass, if any. So the first thing it does is recurse to the superclass.

Since a class can have more than one subclass, it's valuable to ensure that a class only undergoes this process once. This is a good use for the flags member I added earlier. I defined a flag CLASS_SETUP to denote that the load-time work for a class is done; the objc_setup_class routine can then early-out if the flag is set (and set it otherwise).

After setting up the superclass, I may assume that superclass->rodata->instance_size has been corrected. (Either the superclass was a root class, in which case the compiler's value was correct, or it wasn't, in which case it was fixed up.) Since an object's ivars are ordered from root class to leaf class, the value of superclass->rodata->instance_size represents how many bytes lie between the start of the object and the current class's ivars.

I can compare that value against cls->rodata->instance_start, which is what the compiler thought it would be, based on the ivars it could see at compile time. If the two values match, then nothing needs to be updated: the values the compiler entered are still valid at runtime.

But if they don't, then each ivar needs to be adjusted by the difference between the two values. cls->rodata->instance_size needs to be adjusted similarly (so that subclasses can depend on it).

Implementing `class_createInstance`

Also, now that I've resolved the instance_size of each class, I can easily implement a correct version of object instantiation, instead of using @defs (which breaks under non-fragile ivars) like I did in my first entry.

id class_createInstance(Class cls) {
    const size_t size = cls->rodata->instance_size;
    const id object = calloc(size, 1);
    *(struct objc_class **)object = cls;
    return object;
}

Simple.

Calling `+load`

Objective-C allows classes and categories to run code at load time by implementing a +load method. avr-objc needs to call those methods as part of its load-time work.

Calling the +load of a particular class is pretty easy. The only trick is that the method may or may not be present, so I effectively need to do a responds-to-selector check before calling. Currently, the only code that looks up a method in a class is class_lookupMethod, used in the standard messaging path. But since the messaging code can't handle a NULL IMP return, class_lookupMethod aborts if the method is missing.

So I refactored class_lookupMethod into class_lookupMethodIfPresent, which will happily return NULL, and a new class_lookupMethod, which simply calls class_lookupMethodIfPresent and aborts if it returns NULL.

Then it's just a matter of (once again) walking the class list and using class_lookupMethodIfPresent to see if the class has a +load method. Since I'm looking for a class method, the class passed to class_lookupMethodIfPresent is the metaclass. To enforce that superclasses get +load before subclasses, I recurse to the superclass first, using a new flag CLASS_LOADED to avoid hitting the same class twice.

Categories complicate things a bit. As a special case, if a category implements +load, the method does not stomp any +load in the class itself; rather, the category's +load is called in addition to the class's +load. This requires two changes. First, a category's +load should be skipped when adding methods to the class's method list. Second, there needs to be a second pass over the categories to call their +loads, if any.

That's all for now. In the next entry, I'll discuss switching over to compile with clang.

ISO C11, §6.5.9.6: Two pointers compare equal if ... one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space. ... For the purposes of [equality comparison], a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type. ... Two objects may be adjacent in memory ... because the implementation chose to place them so, even though they are unrelated. ↩︎
This is why it's best—if insufficiently common—practice to add prefixes to category method names to denote the owner of the category. For example, AppKit adds a method -NS_view to the CALayer class in CoreAnimation. ↩︎
Yeah, it's a little weird that we're modifying data in the "read-only" data. ↩︎