2010年9月2日 星期四

Implementation of aligned memory allocation

Thanks to evolution in CPU architecture, you have super computers in your house. However, to get benefit from it, it requires some techniques. There are something new, while there are something very traditional.
One of such traditional approach is to use aligned memory allocation. Because CPU access memory most efficiently when accessing in certain unit, it is better if the allocated memory lies in the certain unit boundary. For example, let’s assume a CPU architecture is most efficient when memory is aligned every 4 bytes. When it accesses memory located at multiples of 4 in its address space, it is much faster than when the accessed memory is at 0×03 or 0×05.
The Unix malloc functions usually return aligned memory space, while Windows version doesn’t. Instead, the Windows provide _aligned_malloc().
Then, how to create a aligned malloc() function? There can be some special cases that you want to implement your own aligned malloc, although I don’t imagine such a case. Let’s figure out how to by looking at one of existing implementations. ( You can search one using the Google, and you will find out that they are similar.)
01// size : the size of allocated memory
02//        The actual size of allocation will be greater than this size.
03// alignment : the alignment boundary
04void *aligned_memory_alloc( size_t size, size_t alignment )
05{
06    void *pa, *ptr;
07 
08    //pa=malloc(((size+alignment-1)&~(alignment-1))+sizeof(void *)+alignment-1);
09 
10    // 1
11    pa=malloc((size+alignment-1)+sizeof(void *));
12    if(!pa)
13        return NULL;
14 
15    // 2
16    ptr=(void*)( ((ULONG_PTR)pa+sizeof(void *)+alignment-1)&~(alignment-1) );
17 
18    // 3
19    *((void **)ptr-1)=pa;
20 
21    printf("CAlignedAlloc::new(%d,%d)=%x\n", (ULONG)size, alignment, (ULONG)ptr);
22 
23    return ptr;
24}

Point is to allocate more space than required and make it point to some position in the allocated memory. The pointed position is aligned location.
At 1, the total space to allocate is :
size ; the space a user want to allocate
+ (alignment-1) ; additional space due to the alignment
+ sizeof (void *) ; The head location of the newly allocated memory
; contains an address where the aligned memory block starts
The size part is obvious. The sizeof (void *) part follows the design of aligned memory allocation. Without this part, it will not know from where to free the aligned memory space, and from where to access the memory to read/write from/to the space.
For the (alignment-1), please take a look at this picture.

The red arrows show what the destination address should be if the allocated memory is 1, 2, 3 or 5, 6, 7. They are relocated to 4 and 8, respectively. So, it can be shifted up to 3 slots to the right. So, it is (alignment-1)
At 2, the ptr points to the location of aligned place. After reserving the space for storing where the whole allocated memory block is, i.e. pa, it calculates the aligned location. If you look at the picture above, you will see why (alignment-1) is added and the address is “AND”ed with 1′s complement of the (alignment-1). For example, 4-bytes alignment means masking out the last 2bits. It is like removing the last 2 bits.
At 3, “address length” bytes before, it saves the address which points where the whole allocated memory starts. Why it uses (void **) instead of (void *) is because the pointer (ptr-1) is points to an address, which is a pointer to pointer.

And finally it returns the aligned memory location, ptr.
How about the free() function? You can’t free the address to the aligned position. The whole memory space should be freed. That is why the starting address of the whole memory space was saved above.
1void aligned_free( void *ptr )
2{
3    printf("CAlignedAlloc::free(%x)\n", (ULONG)ptr);
4    if(ptr)
5        free(*((void **)ptr-1));
6}   

At Just 1 address-width, i.e. 4bytes for 32bit CPU and 8bytes for 64bit CPU, before the location of the aligned space, it contains the starting address of the whole block.
It is better to use (void **) or (void *) to calculate how much space is required to save a pointer, because it works for 64bit architecture as well as 32bit architecture. Actually it works for any architecture.

台灣網際網路連線頻寬圖

How to use simple speedtest in RaspberryPi CLI

  pi@ChunchaiRPI2:/tmp $  wget -O speedtest-cli https://raw.githubusercontent.com/sivel/speedtest-cli/master/speedtest.py --2023-06-26 10:4...