Object file formats

Recently I jus had the chance to go through a powerpoint presentation of the software development process inside the company that I am currently working. In ordr to understand the code and the project structure there are some know-how requirements, I would say:   you will need to know how is the code implemented or developped (which rules does it have to fulfill, mainly in automotive companies this has a lot to do with MISRA guidelines) how is the code structured, how are the files organized, how the binary is created, how is the code executed and some post-execution phases which are primarly related to simulation and debugging activities.

One of the most important things in this know-how chain is, in my opinion, but not only mine, the way that the binary looks like. If you already have some experience in the embedded world you may know that after compiling a source file this result in an object file which, by the help of a linker, is linked together with other object files, or .lib files (some exalpanations about this maybe later…), in order to generate an executable image.

What does this executable image represents? Before answering this I would like to introduce you (probably you already knew about it) a book called Real time concepts in Embedded systems by a Chinese author, Quing Li, currently a senior architect at WindRiver systems. I consider it very useful and also very inteligible for a wide range of embedded programmers, I mean it will help a lot a beginner because it describes almost every phase in the embedded software developping process, starting from the tools and  going into the details of some RTOS terms like deadlocks, memory management, OS scheduler and services and many others.

Second chapter of this book is called Basic of developing for embedded systems and it roughly describes the flow that a program has to follow from source code writing to deploying the executable on the target. Sub-chapter 2.3 is exactly about the current topic: object file formats. The driving idea of a common object file format is, as the author says:

to allow development tools which might be produced by different vendors-such as a compiler, assembler, linker, and debugger that conform to the well-defined standard-to interoperate with each other.

Why to have a common binary file standard? To allow different types of compilers to produce code for different machines and (an argument which is more embedded-related) to allow different debuggers to understand different binary formats.

Just a little bit of history …

When did it all appear? As you may guessed already this concept is strongly related to the advent of portable code, which on its turn is related to the UNIX phenomenon.  The very first object file format was a.out  (the name stands for assembler output, if no name is explicitely provided for the object file, the compiler/linker will fill a default one called a.out). This was used as the very first executable format for first UNIX editions. This was in time superseeded by COFF (Common Object File Format)  once with the advent of UNIX System V in 1983 and also byMach-O in case of operating systems based on Mach kernel, as NeXTSTEP or MAC OS X.

Before going further just to remind one thing, you do not have to get confused by so many binary formats, even nowadays many microprocessor supplier companies are using their “slightly customized” binary format,but in the end all of them had to keep compatibility with ELF (a more evolved binary format, the acronym comes from Executable and Linkable Format).

Coming back to the topic, why was a.out format abandoned? Here you may find a very good article about the differences between ELF and a.out, and  also about a.out shortcomings. What are those? There are mainly two things: the header of an a.out file, named struct exec, contains limited information and allows only a certain number of sections to exist (we’ll discuss later about sections also) and second thing, it does not provide built-in support for shared libraries (this was, as far as I read from Wikipedia, one of the reason that a.out was abandoned by the Linux project).

As we were discussing about COFF format replacing a.out on UNIX systems, the former was replaced, on its turn, by ELF format which is close to become a standard executable format.

I won’t go right now into imlpementation details of an ELF file, I will reserve this pleasure for a later post, but I would like to stress that  it is important not to consider an object or an executable file not as a black box, and to really have a clear and meaningful idea abut what’s hidden inside.

So… if we do not enjoy the pleasure of ELF file imlpementation details, at least we should continue with a little bit of history, or how I like to call it computer science culture. One thing which I missed during the post is that ELF binary format specification was first publish in System V Release 4 Application Binary Interface.

What is System V R4? A UNIX-like operating system designed mainly with the support of AT&T.

What is Application Binary Interface? Quite the same thing as an API, but it interfaces the binary code with the software layer above (most probably OS), in case of an API, this usually interfaces some OS functions to the application, so one (or few) level upper than ABI.

But what other binary formats exist beside ELF? As you noticed Microsoft Windows and also Apple MAC OS were not counted among ELF-compatible OSes.

In case of Mac OS X, Mach-O is the object file format used. This has generally the same structure as an ELF file, header followed by several sections

Why became ELF format so popular? Honestly I cannot give you an accurate answer since I am also studying binary\object file formats, but as far as I know it is one of the most portable, it is not bound to a specific hardware, actually this is why was so largely embraced by many UNIX-like OSes (Linux, Solaris, FreeBSD, NetBSD). Some meaningful information about it you can find here the idea in this post is to provide some historical background and to answer more to why? than to how?.

Well… why MAC OS X  stick with it and didn’t shifted to ELF too? One of the main reasons is that it does not support many architectures, it supports only one type of machine per file (so a Mach-O binary based will run only on a single type of processor, it is not mandatory to be PowerPC, but it has to be only one). Here you can find a very intersting post about Apple binaries and the way they were adapted on-the-fly in order to support many types of architectures. What’s the idea? Ideally a binary should be compiled in such a way that it can run on as many hardware platforms as possible. When I will write a future post about ELF files… and hopefully this will be very soon, you will see that in the ELF header there is a field called e_machine which indicates the supported type of CPU that the binary can run on. Well… the “problem” is that only one CPU is supported at a given time. The point is that Apple used Mach-O format on their PowerPC-based computers well before ELF appeared. Once with the advent of Microsoft-Intel worldwide success-story they decided to make their application also Intel-compatible and to build them directly in Intel native code (it could be the alternative of emulating the application, but this implies a serious speed/time overhead). Apple decided to create a new binary format, an enhancement of Mach-O, which will support both, Intel and PowerPC. This became the notorious Multiple-Architecture or Universal or Fat Binary format.
But still remains the question that the author of the quoted post asks:
Why Apple didn’t used ELF on Intel CPU’s like the others? OK, they created the fat binary format just to keep compatibility between different PowerPC types, or between PowerPC and Motorola 68k, but that question still begs…

Enough for now, we’ll talk later on more about binaries.
All the best!


2 Responses to Object file formats

  1. kellogs says:

    >>Why Apple didn’t used ELF on Intel CPU’s like the others?

    why did Apple not announce when the iPhone 3G issue will have been solved ? Why do they get the ultimate word whether to allow or not an application on their apple store ? Why have they chosen objective-c instead of c++ like everyone else have done ? How come a company can possibly hold a patent over the multi-touch design paradigm ? Why does an Apple computer cost twice at its IBM-compatible counterpart ?

    Frankly, I would nnuke them out of the IT scene in a blink of an eye.

    • :))
      I do not think you’re right here, those questions are not inter-related
      what I (along with many others, because I was not the firts one, definetely) said is that they had the chance not to complicate their life in an useless way
      this has nothing to do with the way they manage their Appstore ( Why do they get the ultimate word whether to allow or not an application on their apple store ? ) it’s just their choice, but currently they rule in terms of iPhone apps, the patent for multitouch screen …. they won’t lock this forever, it is almost like holding a patent for LCD screen
      Why have they chosen objective-c instead of c++ like everyone else have done ? – maybe this is a valid question, but taking into account they were developping Objective-C far before C++ existed …
      but … when it comes about “what on earth they decided to use a binary format which was more inflexible and which had to be completely adjusted, ending to have an application for a Mac twice bigger than the corresponding one running on a PC?” – yes, that’s a valid question
      I won’t ever throw them out of the IT scene and for sure this scene could be far poorer without them, but I’m just wondering about some things….
      uuuff… too many question related to Apple and too less answers (says the philosopher within myself :)))

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: