Memory alignment problem of structure members

结构体成员内存对齐问题

Before discussing memory alignment, let’s first introduce a related concept—offset.

The distance between the actual address of a storage unit and the segment address it belongs to is called the intra-segment offset, also referred to as “effective address or offset.”

In simple terms, in a structure, the offset refers to the difference between the address of a member in the structure variable and the address of the structure itself.

Consider the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// Define a structure foo with two members
struct Foo{
char flag;
int i;
};
int main(int argc,char* argv[])
{
Foo foo;
foo.flag='T';
// Attempting to get the address of foo.i by adding 1 to the position of the foo.flag member
int *pi=(int*)(&foo.flag+1);
// Assigning a value to the member stored at the address obtained in the previous step
*pi=0x01020304;
// Output the values of foo.flag and foo.i
printf("flag=%c,i=%x\n",foo.flag,foo.i);
return 0;
}

In the code above, a structure with a character member flag and an integer member i is defined. In the main function, an attempt is made to assign the integer member of the structure a value of 0x01020304 via a pointer, but the actual result of the program is 0x01, indicating an assignment error.
The layout of the members of the struct in the IR code generated by Clang is:

1
%struct.Foo = type { i8, i32 }

Output result:

The issue with the program lies in the pointer assignment, specifically at int *pi=(int*)(&foo.flag+1);. The error arises from assuming that adding 1 to the address of the character member flag of the structure gives the address of the integer member i, and then assigning a value to that address, expecting i to receive the intended value. However, the assignment result is not as expected. The root of this problem is memory byte alignment.

Memory byte alignment refers to the requirement that various types of data be stored in memory according to certain rules to ensure efficient CPU access, rather than stored purely in a byte-by-byte sequence. The default alignment length for each data type is dependent on the specific implementation of the compiler, and different compilers may differ. In most cases, the alignment length for basic data types is simply the size of the data type itself (sizeof value).

For instance, the char type occupies one byte, so the alignment length is one byte; the int type occupies 4 bytes, so the alignment length is four bytes; the double type occupies 8 bytes, so its alignment length is 8 bytes.

For structure data members, the default byte alignment generally satisfies the following criteria:

  1. The address of the structure variable must be divisible by the size of its widest data type member.
  2. The offset of each member relative to the structure’s starting address is an integer multiple of the size of that member; if necessary, bytes will be padded between members. (0 is considered an integer multiple of any number)
  3. The total size occupied by the structure variable must necessarily be an integer multiple of the size of the widest data type of each member. If necessary, additional bytes may be padded at the end of the last member to ensure that the size of the structure is an integer multiple of the size of the widest data type.
  4. The size of the union member is determined by the size of its largest member.
  5. Since structure types need to take byte alignment into account, the order of member declarations can affect the size of the structure.

In the code example at the beginning of this article, the integer member i in structure foo occupies 4 bytes and is the member with the largest space occupation, so foo must reside at a memory address that is a multiple of 4. The starting address of the character member flag is the same as the starting address of foo, and flag occupies 1 byte. The starting address of the integer data member i must be a multiple of 4, so it cannot be directly stored at the address &flag+1 (as flag occupies one byte, its offset is 1, making flag+1 no longer a multiple of 4), but rather must be stored at &flag+4. Consequently, 3 bytes following flag are wasted, leading to foo needing to occupy a total of 8 bytes of memory space, not 5 bytes (the sum of sizeof of char type and int type).

As illustrated, the layout of foo’s members in memory is stored as follows, where each cell represents 1 byte.

In the code above, assigning a 4-byte integer 0x01020304 to the address &flag+1 results in 1, since the 3 bytes have no effect on the variable i.

Therefore, the correct code to assign a value to member i would be:

1
2
3
4
5
6
7
foo.flag='T';
// Do not use the address of flag + 1 to assign a value to member i; instead, use the address of foo.i directly.
int *pi=&foo.i;
// Assigning a value to the member stored at the address obtained in the previous step
*pi=0x01020304;
// Output the values of foo.flag and foo.i
printf("flag=%c,i=%x\n",foo.flag,foo.i);

The details of byte alignment depend on the specific implementation of the compiler, and different platforms may vary. Some compilers allow for changing the default memory alignment conditions in code through the preprocessor directive #pragma pack(n) or type attribute __attribute__((packed)).

Next, let’s analyze the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
struct student{
int year;
double math;
// Union
union hold{
char GPA;
int degit;
char letter;
}hold;
};

int main(void){
sizeof(student);
return 0;
}
// The output is 24, compiled on windows7x64,GCC5.2.0

The student class in Clang generates the following IR code:

1
2
%struct.student = type { i32, double, %"union.student::hold" }
%"union.student::hold" = type { i32 }
Member Size Offset
(int 4byte)year 4 0
(double 8byte)math 8 8
(union (int)4byte)hold 4 16
Padding 4 bytes 24

The offset of year is 0, the offset of math is sizeof(year)+4byte, which is 8 bytes, while the offset of hold is the offset of math (8) plus the size of math (8), which yields 16. The size of the union is defined by its largest member (which is int degit; occupying 4 bytes). The size of sizeof(student) is the offset of the last member plus its size (16+4=20), but the result is not an integer multiple of all member sizes, so an additional 4 bytes are padded after hold to meet this requirement. Therefore, the size of sizeof(student) is 24 bytes.

Suddenly, a picture is worth a thousand words…..

The preprocessor directive #pragma pack(ALIGN_NUM) can be used to specify alignment:

1
2
3
4
5
6
// sizeof(A) == 16
struct A{
int ival;
bool b;
double dval;
};

Using custom alignment (1 means unaligned):

1
2
3
4
5
6
7
8
// sizeof(A) == 13
#pragma pack(32)
struct A{
int ival;
bool b;
double dval;
};
#pragma pack()

The alignof keyword introduced after C++11 can be used to obtain the alignment size of a type.

For example, the following structure:

1
2
3
4
5
6
7
8
9
10
11
12
struct A
{
char c;
int ival;
double dval;
short sival;
};
int main()
{
printf("%d\n", alignof(A));
}
// output: 8
The article is finished. If you have any questions, please comment and communicate.

Scan the QR code on WeChat and follow me.

Title:Memory alignment problem of structure members
Author:LIPENGZHA
Publish Date:2016/06/02 10:12
Word Count:4.5k Words
Link:https://en.imzlp.com/posts/61962/
License: CC BY-NC-SA 4.0
Reprinting of the full article is prohibited.
Your donation will encourage me to keep creating!