Memory pools or best practice in dynamic allocation in embedded systems

Before going into any presentation it is worthwhile to remember one important thing which I missed right from the beginning: only pointers (here I’m including arrays also) can be used in conjunction with dynamic memory allocation.

In C you cannot have something like this:

int n = (*int) malloc (sizeof (int));
// because malloc is returning a (void *) it has to be
// assigned as an r-value to a pointer

I read in many places that the cast to (*int) is not so safe and is better to be avoided but this is more subject of a footer note.

Malloc implementation presented in my previous post had many drawbacks. Just to name few of them:

    – there is no memory overflow handling, only the presumption of a success is considered
    – it cannot be used in multitasking environments because it has no locking mechanism implemented (mutex or semaphore)
    – is_available flag occupies an integer even if should be only one bit
    – it does not handle free space in order to avoid, or at least to limit fragmentation
    – it is non-deterministic, the algorithm has to iterate through each heap location in order to find the right one

Let’s just reacap a little bit how things are passing in the standard way when allocating dynamically memory. At the beginning the operating system granted a chunk of 32 KB memory to the current process (this is the thing that brk and sbrk handled), from which I would like to use (to allocate with malloc) 2 Kbytes:

I want to grab some more, let’s say … 3 KB:

Then, some more, we are far from running out of memory:

At the end we reach this point

And we suddently decide that we do not need the first 3 KB which was allocated so we free it:

Then we free also another 2 KB dynamically allocated piece of memory:

Now we would like to allocate 5 KB, but where would it fit? We have to step through each free location and see if it fits.
First one which is free ? NO
Second one? NO
Third one? NO …oops in total there are 9KB of free memory but we cannot allocate 5 KB, so we got an external fragmentation.

Let’s see what happens if we try to allocate 4 KB. We will have to iterate again through the complete free list and find the third position which fits, so those are the time and space (fragmentation) costs that we are talking about.

One method to deal with the two main shortcomings that “standard malloc” has (unpredictable time execution and fragmentation) is to use memory pools or partitions of memory having a fixed size. This is somehow a compromise between static allocation (the one with global variables or static variables) and malloc or the one that I just outlined it.

A pool is a block of memory of a fixed size, so the heap (or the memory pool) will be split into a number of blocks having different sizes (all of them power of two, just to cope with ordinary C data types). This brings a great benefit in terms of time execution, by knowing which block size is needed the allocation is faster, actually any non-deterministic behavior is excluded.

Another aspect of memory pool method consists in its robustness. The allocation scenario can be tuned to work perfectly for a certain application but it will, for sure, miserably fail for other applications, so robustness is traded for portability.

So how could we approach our allocation problem with this implementation?

As I already have told you the OS can be tuned for the application that will run. Memory pools sizes will be chosen just to match the chunk requested by the application. It will be needed one slice of 16 KB, two of 4 KB and four of 2 KB, so three different pools, one containing block of 2 KB, one containing blocks of 4 KB and another with 16 KB. The memory layout will look as below

you can remark two internal fragmentation, when allocating two 4 KB blocks when only 3 KB were needed.
After freeing the two blocks it will look like:

the main point is that when a new block is needed to be allocated this won’t take so much time as it was in the “standard” case. Memory allocator (malloc replacer) will find which type of pool does it need it, whether is 2 KB, 4 KB or 16 KB, and will go directly there to search for free space. The allocator is not forced to iterate through each free location.

A good approach of this fast-free-block-search algorithm I found in Real-Time concepts for embedded systems, a book by Qing Li and Caroline Yao. This was about creating a separate structure handling the free block in the free list. It is about a heap data structure. This is in fact a simple linked list where on top one may find the element having the greatest key value and then, its children are nodes having smaller key values.

Many RTOSes (real-time-operating-systems) provide mechanisms for allocating fixed-size partitions. Instead of using classical malloc or instead of creating your own custom memory allocator, you just have to call those “APIs” and they will do “the dirty job for you”. Those are indeed some black-boxes, but despite malloc, after running some benchmark code you know for sure their upper execution time boundary.

Before ending it is important to mention that this approach is very helpful for programs which are going through specific stages and each of it has memory allocated for a certain period of time. This is often the case in networking applications – embedded networking code and embedded protocol stack implementation.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: