C++中指向类成员的指针并非指针

“Pointers to members” is a feature in C++ that is not commonly used. However, the use of the term “pointer” is somewhat inaccurate here, as they do not contain an address, and their behavior is not like pointers. This article will analyze how “pointers to class members” are implemented in clang through LLVM-IR, alongside some definitions from the C++14 standard and related LLVM-IR syntax.

Firstly, the C++ standard does not specify how “pointers to class members” should be implemented, as it relies on the implementation by the compiler. However, the standard indicates that “pointers to members” are distinctly different from ordinary pointers.

The type “pointer to member” is distinct from the type “pointer”, that is, a pointer to member is declared only by the pointer to member declarator syntax, and never by the pointer declarator syntax. There is no “reference-to-member” type in C++.

For ordinary pointers, they contain an address and can be dereferenced to access the pointed-to object indirectly.

1
2
3
int x=123;
int *xp=&x;
*xp=456;

However, a pointer to a member does not point to the memory address of a specific object. It points to a specific member of a class rather than to the member of a specified object. Let’s analyze what “pointers to members” actually are through the implementation of the Clang compiler. Assume we have the following class type:

Pointers to Class Data Members

1
2
3
4
5
6
7
8
9
struct A{
A(int x=0,double y=0.0,char z='\0'):a{x},b{y},c{z}{}
void func(){
std::cout<<"void A::func()"<<std::endl;
}
int a;
double b;
char c;
};

Then we create a “pointer to class member” with the following code:

1
2
3
4
// non-initializer
int A::*ap;
double A::*bp;
char A::*cp;

The above is the uninitialized version. Let’s provide an initialized version, and later we’ll compare the LLVM-IR to see what “pointers to class data members” have been initialized to.

1
2
3
4
// initializer
int A::*ap=&A::a;
double A::*bp=&A::b;
char A::*cp=&A::c;

By comparing with diff, we can observe the difference in the LLVM-IR code for both:

It can be seen that those with initialization do not store some kind of address in the “pointer to class data members,” but are actually an integer type. The implementation in Clang reflects that they are equal to the offset values of the data members within the class. For detailed content on class internal offset values, please refer to my other article: Memory Alignment Issues with Structure Members.

This means that the implementation of “pointers to class members” retrieves the offset value of that class member within the class, which also confirms the reason why “pointers to class members” cannot be accessed independently (they depend on a specific object)—they are simply offset values, requiring a specific object to access the sub-object at that offset.

Let’s try to access a specific object through a class object and a “pointer to data member”:

1
2
3
A x{123};
int A::*ap=&A::a;
x.*ap=888;

We can still check its LLVM-IR code:

1
2
3
4
5
6
7
8
9
10
11
12
13
# Create class object x and call A's constructor
%2 = alloca %struct.A, align 8
call void @_ZN1AC2Eidc(%struct.A* %2, i32 123, double 0.000000e+00, i8 0)
# Initialize "pointer to class data member" ap (retrieve its offset value)
%3 = alloca i64, align 8
store i64 0, i64* %3, align 8

%4 = load i64, i64* %3, align 8
%5 = bitcast %struct.A* %2 to i8*
# Access the sub-object (data member) through the specific object's pointer and offset
%6 = getelementptr inbounds i8, i8* %5, i64 %4
%7 = bitcast i8* %6 to i32*
store i32 888, i32* %7, align 4

You can view the LLVM-IR syntax in the LLVM Language Reference Manual.

A relatively complex part here is the syntax of getelementptr inbounds:

1
2
3
4
# Syntax
<result> = getelementptr <ty>, <ty>* <ptrval>{, [inrange] <ty> <idx>}*
<result> = getelementptr inbounds <ty>, <ty>* <ptrval>{, [inrange] <ty> <idx>}*
<result> = getelementptr <ty>, <ptr vector> <ptrval>, [inrange] <vector index type> <idx>

The getelementptr instruction is used to get the address of a subelement of an aggregate data structure. It performs address calculation only and does not access memory. The instruction can also be used to calculate a vector of such addresses.

  • The first argument is always a type used as the basis for the calculations.
  • The second argument is always a pointer or a vector of pointers, and is the base address to start from.
  • The remaining arguments are indices that indicate which of the elements of the aggregate object are indexed.

The interpretation of each index is dependent on the type being indexed into. The first index always indexes the pointer value given as the first argument, the second index indexes a value of the type pointed to (not necessarily the value directly pointed to, since the first index can be non-zero), etc. The first type indexed into must be a pointer value, subsequent types can be arrays, vectors, and structs. Note that subsequent types being indexed into can never be pointers, since that would require loading the pointer before continuing calculation.

There won’t be more elaboration on LLVM-IR; let’s return to analyzing “pointers to data members.”

When we assign an initial value to a “pointer to class data members,” we are effectively obtaining the offset of that data member within the class. Unless the & operator is applied to a static data member within a class, it does not yield an actual address, but rather an offset.

This should also be why the C++ standard stipulates that “pointers to class members” cannot point to static members:

[ISO/IEC 14882:2014] A pointer to member shall not point to a static member of a class (9.4), a member with reference type, or “cv void.”

Previously, it was briefly mentioned that “pointers to members” cannot be accessed independently—they require a specific class A object. This is because “pointers to members” are merely the offset of that member within the class; to access the sub-object (data member) at that offset, we need the address of an object of class (A).

When we use .* or ->* to access an offset through a class object or pointer, what gets executed is the getelementptr inbounds part of the LLVM-IR code provided above.

Pointers to Class Member Functions

The previous section covered “pointers to class data members” and their implementation in clang. Now, “pointers to class member functions” differ slightly from “pointers to class data members.”

An implementation of a pointer to a member function must store certain information, such as whether the function it’s pointing to is a virtual function or a non-virtual function, and how to find the appropriate virtual table pointer. Therefore, the implementation of pointers to class member functions often consists of a small structure to store this critical information.

Note: There is no pointer to a virtual function. A virtual function is a property of the function itself, not an attribute of the pointer to a class member function.

Following the same logic as the previous section, we first create a “pointer to a class member function.” Unlike ordinary function pointers, initializing a “pointer to a member function” must use the address operator on a member function within the class:

1
void (A::*funcp)();

Its LLVM-IR code would be:

1
%3 = alloca { i64, i64 }, align 8

It can be seen that, in clang, a pointer to class member functions is different from a pointer to class data members. “Pointers to class member functions” have a structure containing two i64 objects. Let’s attempt an initialization operation:

1
void (A::*funcp)()=&A::func;

We can check its LLVM-IR code:

1
2
%3 = alloca { i64, i64 }, align 8
store { i64, i64 } { i64 ptrtoint (void (%struct.A*)* @_ZN1A4funcEv to i64), i64 0 }, { i64, i64 }* %3, align 8

It can be seen that here the function address of A::func (which takes a pointer to class A and does not return) is converted to i64, and is stored in the member pointer structure. The second parameter is an offset modifier for the this pointer, as the position of data members is not absolute but relative to the base class in class inheritance scenarios. This offset can be represented using the ptrdiff_t type, as seen in the following example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
class A{
public:
void func()
{
std::cout<<"A::func"<<", this ptr address is "<<this<<std::endl;
}
char pad16[16];
};

class B{
public:
void bar(){
std::cout<<"B::Bar"<<", this ptr address is "<<this<<std::endl;
}
char pad8[8];
};
class C:public A,public B{};

int main()
{
C cobj;
void(C::*Afunc)()=&C::func;
void(C::*Bbar)()=&C::bar;

(cobj.*Afunc)();
(cobj.*Bbar)();

return 0;
}
// output
A::func, this ptr address is 0x61fe30
B::Bar, this ptr address is 0x61fe40

As can be observed, the difference between the two this addresses is the size of the memory layout of the A class object. Now let’s look at the IR code for the member function pointer assignment:

1
2
3
4
5
6
7
8
9
define i32 @main() #4 {
// ...
%2 = alloca %class.C, align 1
%3 = alloca { i64, i64 }, align 8
%4 = alloca { i64, i64 }, align 8
store { i64, i64 } { i64 ptrtoint (void (%class.A*)* @_ZN1A4funcEv to i64), i64 0 }, { i64, i64 }* %3, align 8
store { i64, i64 } { i64 ptrtoint (void (%class.B*)* @_ZN1B3barEv to i64), i64 16 }, { i64, i64 }* %4, align 8
// ...
}

The above code allocates two structures for member functions—two int64. The first member is used to store the member function pointer, while the second is used to store the offset of the this pointer.

Understanding this pattern allows us to directly extract the real function pointer address of the member function pointer (by retrieving the first element of its structure):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class A{
public:
void func(int ival)
{
std::cout<<"HelloWorld,"<<"ival = "<<ival<<std::endl;
}
};

int main()
{
union{
void(A::*Afunc)(int);
void(*realAfunc[2])(A*,int);
}AfuncAddr;
AfuncAddr.Afunc=&A::func;

AfuncAddr.realAfunc[0](NULL,123);

}

// Output
HelloWorld, ival = 123

However, using this method is undefined behavior because the this pointer passed is NULL, and if the member function depends on this operation (like polymorphism or accessing member variables), it will crash.

In summary, when using member function pointers, we need to call them with an object of class A (since .* and ->* have lower precedence than (), parentheses are required for x.*funcp):

1
2
3
A x(123);
void (A::*funcp)()=&A::func;
(x.*funcp)();

Its LLVM-IR code would be:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
%2 = alloca %struct.A, align 8
call void @_ZN1AC2Eidc(%struct.A* %2, i3 123, double 0.000000e+00, i8 0)
%3 = alloca { i64, i64 }, align 8
# Store the member function address into allocated space (%3)
store { i64, i64 } { i64 ptrtoint (void (%struct.A*)* @_ZN1A4funcEv to i64), i64 0 }, { i64, i64 }* %3, align 8

%4 = load { i64, i64 }, { i64, i64 }* %3, align 8
# Retrieve the second i64 data from the structure and place into %5
%5 = extractvalue { i64, i64 } %4, 1
%6 = bitcast %struct.A* %2 to i8*
# Access the sub-object at the offset of this object
%7 = getelementptr inbounds i8, i8* %6, i64 %5
%8 = bitcast i8* %7 to %struct.A*
# Retrieve the first i64 data from the structure and place into %9
%9 = extractvalue { i64, i64 } %4, 0
# Perform bitwise AND operation on %9 with 1
%10 = and i64 %9, 1
# Check if %10 is not equal to 0 (ne for not equal)
%11 = icmp ne i64 %10, 0
# Based on the above check result, execute different branches
br i1 %11, label %12, label %19

; <label>:12: ; preds = %0
%13 = bitcast %struct.A* %8 to i8**
%14 = load i8*, i8** %13, align 8
%15 = sub i64 %9, 1
%16 = getelementptr i8, i8* %14, i64 %15
%17 = bitcast i8* %16 to void (%struct.A*)**
%18 = load void (%struct.A*)*, void (%struct.A*)** %17, align 8
br label %21

; <label>:19: ; preds = %0
%20 = inttoptr i64 %9 to void (%struct.A*)*
br label %21

; <label>:21: ; preds = %19, %12
%22 = phi void (%struct.A*)* [ %18, %12 ], [ %20, %19 ]

# Call member function via class pointer (similarly to class internal member functions having this pointer)
# The reason member function pointers need to be called with an object or pointer is that it is required here to supplement the this pointer needed for the standalone member function pointer
call void %22(%struct.A* %8)

Moreover, pointers to member functions exhibit a form of covariance: there exists a conversion from a pointer to a base class member function to a pointer to a derived class member function, whereas the reverse is not permissible.

For example, we have the following class:

1
2
3
4
5
6
7
8
9
10
struct base{
virtual void func(){
cout<<"base::func()"<<endl;
}
};
struct A:public base{
void func(){
cout<<"A::func"<<endl;
}
};

It is possible to convert from pointers to base class member functions to pointers to derived class member functions (exhibiting polymorphism):

1
2
3
void (A::*AfuncP)()=&base::func;
A aobj;
(aobj.*AfuncP)(); // output:A::func()

However, the reverse results in a compilation error:

1
2
// error: cannot initialize a variable of type 'void (base::*)()' with an rvalue of type 'void (A::*)()':different classes ('base' vs 'A')
void (base::*baseFuncp)()=&A::func;

Combining with STL

Pointers to member functions can be implemented through the standard library’s mem_fun adapter to make them behave like function objects (functors), allowing them to be applied in related algorithms (use std::mem_fun when containers hold pointers and std::mem_fun_ref when they hold references).

For instance, performing the member function operation on all objects stored in a container:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
struct C{
void func(){
cout<<"C::func"<<endl;
}
};

int main()
{
vector<C> x;
x.resize(5);
std::for_each(x.begin(), x.end(),std::mem_fun_ref(&C::func));
}
// Output
/*
C::func
C::func
C::func
C::func
C::func
*/

Let’s look at a simple implementation of std::mem_fun (SGISTL, a bit outdated):

1
2
3
4
5
6
7
8
9
10
11
12
template <class _Ret, class _Tp>
class mem_fun_ref_t : public unary_function<_Tp,_Ret> {
public:
explicit mem_fun_ref_t(_Ret (_Tp::*__pf)()) : _M_f(__pf) {}
_Ret operator()(_Tp& __r) const { return (__r.*_M_f)(); }
private:
_Ret (_Tp::*_M_f)();
};

template <class _Ret, class _Tp>
inline mem_fun_ref_t<_Ret,_Tp> mem_fun_ref(_Ret (_Tp::*__f)())
{ return mem_fun_ref_t<_Ret,_Tp>(__f); }

It can be observed that the SGISTL implementation creates a function object that wraps the member function, which receives a passed instance of the member function’s class as an argument (pointer or reference, here only the implementation of std::mem_fun_ref is shown, while std::mem_fun is similar).

In light of the SGISTL implementation limitations—it can only call non-parameterized member functions (this can depend on different STL implementations). What if we want to call parameterized member functions? We can use std::bind! Unlike C++11’s std::bind1st and std::bind2nd, which were clumsy, C++11’s std::bind is a real gem as it does not restrict the number of parameters!

Let’s slightly modify the class C to take a parameter:

1
2
3
4
5
6
7
8
9
struct C{
C(const int& x):ival{x}{}
void addNum(const int& iArg){
this->ival+=iArg;
std::cout<<this->ival<<std::endl;
};
private:
int ival;
};

If we then use std::mem_fun for adaptation, an error message no matching function will appear. We can use std::bind instead:

1
2
std::vector<C> c{1,2,3,4,5};
std::for_each(x.begin(), x.end(),std::bind(&C::func,_1,3));

Since member function pointers need to be accessed through class pointers or class objects, the first argument must be passed to the member function pointer bound in bind.

Conversion from Member Function Pointer to Ordinary Function Pointer

From the above discussion, we learn that member function pointers form a structure with the first element storing the member function pointer and the second storing the offset of the this pointer. Together, they constitute the member function pointer, and indeed you might have realized: class member functions are functions with this pointers:

1
2
3
4
class A{
public:
void func(int,double,void*){}
};

Thus, the function pointer of the member function func is:

1
void(*)(A*,int,double,void*);

Therefore, if we want to convert a member function pointer to a function pointer format, we simply need to retrieve the first element of the member function pointer structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class A{
public:
void func(int,double,void*){
printf("A::func\n");
}
};

void gfunc(void(*pfunc)(A*,int,double,void*))
{
A obj;
// (obj.*pfunc)();
pfunc(&obj,0,1.1,0);
}

union U{
void(A::*func)(int,double,void*);
void(*pure_func)(A*,int,double,void*);
};

int main()
{
U unionObj;
unionObj.func=&A::func;
gfunc(unionObj.pure_func);
return 0;
}

This allows us to assign from a member function pointer to an ordinary function pointer. Direct conversion is not permitted (as observed by compilers); using a union can bypass the compiler restrictions, making this approach more versatile.

Update Log

2017.05.11

  • Added content on the use of standard library with pointers to member functions.

2018.09.25

  • Supplemented content on the struct initialization of member function pointers.

2018.11.06

  • Added content on converting from member function pointers to ordinary function pointers.

2019.04.22

  • Enhanced information on extracting raw function pointers from the member function pointer structure.
The article is finished. If you have any questions, please comment and communicate.

Scan the QR code on WeChat and follow me.

Title:C++中指向类成员的指针并非指针
Author:LIPENGZHA
Publish Date:2017/04/29 21:28
Update Date:2019/04/22 00:38
World Count:9.3k Words
Link:https://en.imzlp.com/posts/27615/
License: CC BY-NC-SA 4.0
Reprinting of the full article is prohibited.
Your donation will encourage me to keep creating!