Objective-C for the AVR, part 4

Matt Jacobson
November 2021

This is part four in an occasional series about my AVR Objective-C runtime. See also my previous entries on the topic (1, 2, 3).

In this entry, I'll describe some of the work I've done to make avr-objc work with Clang—and vice versa.

Motivation

So far, I've been using GCC, since it has the most mature AVR backend of any open-source compiler. And while its Objective-C support is solid, it lacks some newer language features—most annoyingly the ability to declare ivars in class extensions and implementations (but also more advanced stuff like ARC, modern literals syntax, and more).

Clang's Objective-C support is fully up-to-date essentially by definition, since it's the compiler Apple uses and actively develops. In fact, without any meaningful documentation of modern language features from Apple, the Clang source is essentially the language and ABI spec. LLVM is Clang's backend, and as of recently it has an "experimental" AVR backend (which seems to be mostly as a target for Rust).

I'll describe the bugs roughly in the order I encountered and fixed them.

Fixing linker order; optimized function prologue

Modern LLVM is unlike GCC in that it uses its own built-in assembler, not an external assembler like the GNU assembler. (Clang is therefore immune to the GNU assembler bug I fixed in my first entry on avr-objc.) However, at least currently, it does depend on an external linker to produce the final linked product.

avr-objc depends on some library functions from avr-libc, and avr-libc–which is compiled with GCC–in turn depends on some helper routines from GCC's runtime library, libgcc. All three libraries are linked into the final product. For the most part, this is fine, but it runs afoul a bit of a quirk with how the GNU linker handles library archives.

An archive (the kind operated upon by ar) contains a set of discrete object files. The object files in an archive generally depend on one another, but they exist in an unlinked state and can be independently included in the final product by a linker. Many libraries—avr-libc and libgcc among them—make use of this fact by splitting their code into many small object files; only the objects containing required symbols are included in the product, helping to minimize code size.

The GNU linker handles this selection using a very simple process. The linker keeps track of a list of undefined symbols (that is, symbols which some object has referenced but for which a definition has not yet been encountered). When presented with an archive, the linker selects objects in the archive that resolve any undefined symbols—and discards the other objects. The linker proceeds with the selected objects as if they'd been specified directly. After processing the selected objects, the linker moves on to the next input file.

If a later input file adds an undefined symbol that would have been resolved by a discarded object from an earlier archive, you're out of luck: the linker does not go back to previous archives to re-evaluate them.

As it happens, both Clang and GCC specify the "system libraries" in this order: -l:crtatmega1284.o -lgcc -lm -lc -latmega1284. Notice that libgcc precedes libc (and libm). Under Clang, this can result in rather cryptic linker failures like this:

/opt/local/lib/gcc/avr/10.3.0/../../../../bin/avr-ld: /opt/local/lib/gcc/avr/10.3.0/../../../../avr/lib/avr51/libc.a(printf.o): in function `printf':
printf.c:(.text.avr-libc+0x8): undefined reference to `__prologue_saves__'
/opt/local/lib/gcc/avr/10.3.0/../../../../bin/avr-ld: printf.c:(.text.avr-libc+0x24): undefined reference to `__epilogue_restores__'

What's happened here is that printf(), part of avr-libc, was compiled with GCC. As a code size optimization, GCC replaced printf's function prologue and epilogue with calls to a deduplicated prologue and epilogue.

Aside: what's a function prologue?

Part of platform's ABI defines what registers a function call is allowed to "clobber" (change without regard for their previous contents) and which registers must preserve their contents through the call. (I've cited parts of this ABI in previous entries, but the entire thing is spelled out on this wiki page.) A function may make use of preserved registers, but it has to save their old values and restore them before returning control.

The AVR architecture uses a lot (32) of small (8-bit) registers. Pushing each register on the stack is a two-byte instruction, which can easily turn into a lot of code for register-heavy functions. Functions are also responsible for adjusting the stack pointer to reserve any stack space they require; since the stack pointer is 16 bits wide, this is a complicated manuever in the face of potential interrupts.

To prevent this work from dominating the actual work of a function, libgcc provides a deduplicated prologue (and epilogue) that functions can call in only a few bytes of code.^[1]

Disassembly of section .text.libgcc.prologue:

00000000 <__prologue_saves__>:
   0:   2f 92           push    r2              ; Save caller-saved registers
   2:   3f 92           push    r3
   4:   4f 92           push    r4
   6:   5f 92           push    r5
   8:   6f 92           push    r6
   a:   7f 92           push    r7
   c:   8f 92           push    r8
   e:   9f 92           push    r9
  10:   af 92           push    r10
  12:   bf 92           push    r11
  14:   cf 92           push    r12
  16:   df 92           push    r13
  18:   ef 92           push    r14
  1a:   ff 92           push    r15
  1c:   0f 93           push    r16
  1e:   1f 93           push    r17
  20:   cf 93           push    r28
  22:   df 93           push    r29
  24:   cd b7           in      r28, 0x3d       ; Read the stack pointer
  26:   de b7           in      r29, 0x3e       ;   into r28:r29
  28:   ca 1b           sub     r28, r26        ; Create r26:r27 bytes of
  2a:   db 0b           sbc     r29, r27        ;   stack space
  2c:   0f b6           in      r0, 0x3f        ; Disable interrupts to atomically
  2e:   f8 94           cli                     ;   write stack pointer
  30:   de bf           out     0x3e, r29       ; Write the stack pointer (low byte)
  32:   0f be           out     0x3f, r0        ; Restore interrupts flag (after following instruction)
  34:   cd bf           out     0x3d, r28       ; Write the stack pointer (high byte)
  36:   09 94           ijmp                    ; Return (through Z register)

A function invokes the prologue like this:

some_function:
    ldi r27, 0x10             ; Reserve 16 bytes of
    ldi r26, 0x00             ;   stack space
    ldiw r30, 1f              ; Load return address (label "1")
    jmp __prologue_saves__    ; Jump to prologue code
1:                            ; Will return here

A similar dance is performed for the epilogue.

Why does this matter, again?

libgcc (which provides __prologue_saves__) precedes libc (which provides printf, among many other functions that make use of the optimized prologue) in the linker's input files list. And __prologue_saves__ lives in its own object file _prologue.o inside libgcc, so that the linker can select it without pulling in other code. Therefore, unless some earlier object requires __prologue_saves__, the linker is unable to link in printf and fails.

I mentioned earlier that GCC and Clang provide the linker the same input files in the same order. How, then, is GCC able to avoid this problem?

The GNU linker provides arguments --start-group and --end-group, which, when added to the input file list, create a group from the intervening files. Unlike normal linking, which only considers an archive once, the archives in a group are "searched repeatedly until no new undefined references are created" (per the manual).

GCC simply surrounds the "system libraries" with --start-group and --end-group. Clang uses the same archive order but does not supply these arguments. So I fixed that here. Clang's driver code is nice and modular, so it was dead simple to make this change without worrying that I was breaking any of the many other platforms Clang supports. Most of my time was spent learning about and updating Clang's somewhat cumbersome test suite.

Support for constructor routines

avr-objc makes use of the GNU language extension __attribute__((constructor)), which is a qualifier that can be applied to a function definition to make it a constructor. (Clang also recognizes the attribute syntax.) A constructor^[2] is a function taking no arguments that is automatically executed prior to main.

Support for running constructor routines is provided by the C runtime. When the microcontroller boots up, code from libgcc and crtatmega1284.o (built, somewhat confusingly, by avr-libc) runs to set up an environment suitable for executing a C program. For example, static-duration variables need to be initialized in SRAM (either by copying their contents from flash or by zeroing them). An initial stack must be established. And main must be called with suitable arguments.

This process is nicely reflected by the standard AVR linker script, which specifically orders the bits of code that carry out these functions:

*(.init0)  /* Start here after reset.  */
KEEP (*(.init0))
*(.init1)
KEEP (*(.init1))
*(.init2)  /* Clear __zero_reg__, set up stack pointer.  */
KEEP (*(.init2))
*(.init3)
KEEP (*(.init3))
*(.init4)  /* Initialize data and BSS.  */
KEEP (*(.init4))
*(.init5)
KEEP (*(.init5))
*(.init6)  /* C++ constructors.  */
KEEP (*(.init6))
*(.init7)
KEEP (*(.init7))
*(.init8)
KEEP (*(.init8))
*(.init9)  /* Call main().  */
KEEP (*(.init9))

At step .init6, libgcc code is inserted to call constructor routines. The code is pretty simple: it looks for a symbol __ctors_start, which is assumed to be an array of function pointers that must be called. The end of the array is denoted with the symbol __ctors_end. This is a very similar scheme to what I used to walk the Objective-C class list in my previous entry—although in this case the code walking the list is written directly in assembly.

As described earlier, libgcc splits its functionality into a number of small object files. The code in section .init6 is in an object named _ctors.o, consisting of a single symbol __do_global_ctors. You might think that specifying *(.init6) in the linker script would cause the linker to include _ctors.o. But it doesn't work that way: *(.init6) just means, "Place any stuff from sections named .init6 in selected objects here.". Something else needs to explicitly require _ctors.o.

If nothing does, then—frustratingly—the build succeeds, but your constructor routines just never run. You're then left to fiddle with disassembler output to figure out why.

So, how does this work for GCC? Simply: GCC emits a .globl __do_global_ctors into the generated assembly; the resulting assembler output contains an undefined symbol named __do_global_ctors, which causes _ctors.o to be included at link time.

Clang didn't do this, so I needed to wire it up (PR here). In Clang, kind of thing is actually handled at the LLVM level. LLVM has first-class support for constructor functions through the llvm.global_ctors variable. So the change was just a matter of emitting the LLVM equivalent of .globl __do_global_ctors when the llvm.global_ctors variable was non-empty. Luckily, LLVM already did something similar for the .init4 steps for initializing static variables: __do_copy_data and __do_clear_bss, so I had a bit of a model to work with.

One final quirk here was that there's a second scheme for building constructor routines, where the array of function pointers is placed in a section called .init_array.^[3] But since the standard AVR script looks for .ctors, it was required to pass the argument -fno-use-init-array to the compiler frontend. I changed the Clang driver to pass this argument automatically for the AVR target (PR here).

Finding avr-libc

This one's not very exciting. Like GCC, Clang depends on knowing where libgcc and avr-libc live to produce a linked product. Both are required for, if nothing else, providing the .init* code discussed in the previous section.

I keep my avr-libc in /opt/local/avr/lib/avr51/libc.a, for a few reasons. First, this is where the avr-libc port on MacPorts installs it.

Second, and more importantly, this path conforms to the standard cross-compilation directory structure, namely:

$PREFIX/$TRIPLE/{include,lib}/$MULTILIB

where:

$PREFIX is /opt/local (as specified when building the compiler), a standard "user supplied stuff" root directory on macOS
$TRIPLE is avr
$MULTILIB is avr51, the category of my ATmega1284 chip

The best documentation I can find of this "standard" is this snippet of Thomas Petazzoni's presentation on cross toolchains. I'd be interested to know if there's a more formal specification, but so far I haven't found one.

Anyway, Clang only looked in the following places:

$PREFIX/usr/avr/lib/$MULTILIB
$PREFIX/usr/lib/avr/lib/$MULTILIB

The first seems somewhat reasonable since usr is sometimes a subdirectory of the $PREFIX—though, to my knowledge, that's only ever the case when $PREFIX is /. I'd sort of expect /usr to be part of the $PREFIX itself in these cases.

The second one makes absolutely no sense to me. /usr/lib/avr/lib/? And, correspondingly, /usr/lib/avr/include/? These make no sense. But apparently this is where Ubuntu installs stuff.

Regardless, neither one fits my (rather reasonable) use case, so I fixed that.

Objective-C ABI discrepancies

I discussed Objective-C compiler metadata at length in my previous entry. The format of the metadata is rather informally standardized. There's no ABI spec for Objective-C; rather, the correct format is whatever is produced by the compiler and consumed by the runtime. Since I'm writing a runtime, that means the compiler is my only spec. This, uh, has its pros (code don't lie) and cons (well, read on).

It turns out there are some subtle differences between the formats generated by GCC and Clang. For example, here is what GCC generates for the class_ro structure:

struct class_ro {
    unsigned int flags;
    unsigned int instance_start;
    unsigned int instance_size;
    unsigned int reserved;
    unsigned char *ivarLayout;
    /* [snip] */
};

And here is what Clang generates:

struct class_ro {
    unsigned int flags;
    unsigned int instance_start;
    unsigned int instance_size;
    unsigned char *ivarLayout;
    /* [snip] */
};

What's behind the missing reserved field? Consider that this structure is only used by the objc4 modern ABI—and that, apart from some weirdo trying to make it run on AVR micros, it's only used on Darwin on x8664 and arm64.^[4] Both platforms operate under an _LP64 data model—that is, long and pointers are 64 bits wide, but int is 32 bits wide. Furthermore, all three types are naturally aligned to their respective sizes.

In such a data model, if ivarLayout (a 64-bit pointer) is preceded by an odd number of unsigned int (32-bit) members (as in the Clang definition), the compiler implicitly inserts 32 bits of padding before ivarLayout to align it properly. The GCC definition, by contrast, simply makes that padding explicit by adding a 32-bit unsigned int member in the right place. Under this data model, therefore, the definitions are technically different—but practically identical.

Now consider the AVR data model. Here, both unsigned int and pointers are 16 bits wide and 16-bit aligned. Clang's definition, therefore, needs no implicit padding. (This is great, since it means the structure doesn't waste any memory in the name of alignment.) What about GCC's definition? Since it explicitly adds a reserved field, ivarLayout (and all subsequent members) are pushed out by 16 bits to make space for the unused unsigned int.

This is bad: it means that a runtime would have to know which compiler built a particular class_ro to interpret it—at least, to interpret any data past instance_size. This includes the pointer to the method list! Since there is no marker telling us which style of class_ro is used for a particular class, a runtime would not get very far trying to figure this out.

Luckily, the fix is simple: remove the explicit reserved field from GCC's definition. Under the LP64 data models described above, the 32 bits of padding will be implicitly added to preserve natural alignment, just like in Clang. And under the AVR data model, no unused padding is added. I posted a patch here.

A similar problem cropped up for protocol_list_t (I'll discuss implementing protocols in a future entry); I've posted a (decidedly more involved but spiritually similar) patch for that one here, though it still needs a little work.

Address space handling in Clang's Objective-C frontend

AVR microcontrollers implement a Harvard architecture, in which code and data live in completely separate address spaces. On AVRs, this is for a very practical reason: the code lives in flash memory, and data lives in SRAM. Which memory is accessed depends on whether instructions are being fetched or data is being accessed. (There are also instructions you can use to load bytes from or store bytes to the flash; this can be useful for read-only data.)

LLVM supports targets with multiple address spaces. Address spaces are numbered, starting at zero, and all pointers are defined to be in some particular address space. Values in LLVM IR are explicitly (and repeatedly) typed; it is invalid to use a pointer from address space X where a pointer from address space Y is expected.

LLVM has the concept of a program address space, in which all functions implicitly live. Function pointers then implicitly point to addresses in the program address space.

When Clang's Objective-C frontend emits a class's method list, it generates an array of structures like this:

struct method {
    SEL name;
    char *types;
    IMP imp;
}

Recall, however, that IMP is a typedef for id (*)(id, SEL, ...). That is, it's a pointer to a function returning id and accepting an id (self), a SEL (_cmd), and some variadic parameters.

This definition is technically incorrect. Not all Objective-C methods need return an id.^[5] And of course most Objective-C methods do not take any variadic parameters.^[6]

Casting one function pointer to another function pointer type is sometimes fine. On x86_64, for example, as long as the return types are both returned in the %rax register (i.e., it's an integer or pointer), the calling conventions are identical. However, as I discussed at length in a previous entry, if the return type is a struct, the calling convention is radically different.

On AVR, there's another source of incompatibility: the way arguments are passed for variadic functions is completely different from non-variadic functions, even for the non-variadic arguments. (Apple's arm64 platforms also have special handling of variadic arguments.)

Anyway, to sidestep all this complexity, the Clang Objective-C frontend instead types imp as char *—or, rather, as its LLVM IR equivalent, i8*. The LLVM instruction bitcast .. to is used to effect the cast.

For single-address-space targets, this is fine. While a function pointer is not really a valid pointer to i8, there's an implicit understanding that the Objective-C runtime will (effectively) cast the pointer into the correct type before using it.

But on Harvard architectures, function pointers live in a different address space from normal data. Specifically, on AVR, function pointers live in addrspace(1). But i8* is (implicitly, by omission) in addrspace(0). Using bitcast .. to to convert to another address space is illegal.

There is an addrspacecast .. to instruction that allows cross-address-space conversions. But its semantics are that both the input and output pointers are valid pointers to the same object. Since AVR's two address spaces are non-overlapping, this won't work either.

There's an easy fix: type imp not as i8* but as i8 addrspace(1)*. Or, more generally, tag the type with the target's program address space.

That's the change I've proposed here.

Invalid `zext` when emitting synthesized property accessors

I haven't yet discussed adding runtime support for properties, and I'll save a more general discussion of it for a later entry. For now, suffice it to say that @synthesized properties emit calls to runtime provided functions objc_getProperty and objc_setProperty.

One of the arguments to those functions is the offset of the property's backing ivar; the offset is typed as ptrdiff_t.

Conflictingly, ivar metadata stores the offsets as long. (I touched on ivar metadata in a previous entry.) So the compiler has to emit a conversion of the ivar offset value (which, you'll recall, might have been adjusted at runtime and therefore must be loaded from memory) to ptrdiff_t.

On targets with LP64 data models, long and ptrdiff_t are the same size. But on AVR, long is 32 bits wide, while ptrdiff_t is 16 bits wide.

The Clang Objective-C frontend currently attempts to use the LLVM zext instruction to convert the loaded long value to ptrdiff_t. When ptrdiff_t is smaller than long, this fails, aborting the compiler. I've written a simple fix here that explicitly allows for either type to be larger and using the trunc instruction.

That's enough noodling on compilers for now. There's likely more to say in this area, but this entry is too long as is.

In my next entry, I'll go back to discussing functionality I've added to the runtime.

For a fascinating read on why the interrupts flag is restored before the high byte of the stack pointer is set, read this Stack Overflow answer. ↩︎
The name derives from its use of the same runtime mechanism as used to initialize static-duration C++ objects—by calling their constructors. ↩︎
There are other differences, too, described nicely in this Stack Overflow answer. ↩︎
It was also previously used on arm32 (i.e., pre-A7 iPhones). But, as far as I can tell, mainline GCC never (correctly) supported them as a target. The Apple GCC fork did conditionalize the reserved field on TARGET_64BIT (see build_v2_class_template here), though even that isn't technically correct. ↩︎
Historically, it was common for methods to return self (an id) if they had nothing else to return, but this is considered highly outmoded. I can, however, attest that there are remnants of this legacy in modern AppKit source. ↩︎
As far as I'm aware, it's syntactically not possible to define an Objective-C method that takes (in addition to the implicit self and _cmd) only variadic parameters. I'd be interested to know if that's wrong though. ↩︎