UE reflection impl analysis: C++ features

In the previous article, the basic concepts of UE’s reflection were introduced. This article begins to explore the specific implementation of UE’s reflection mechanism.

Before introducing UE’s code, some C++ features need to be highlighted. Although UE’s reflection implementation relies heavily on UHT’s code generation, it also requires support from C++ syntax features. Only by understanding these features and their underlying meanings can we better grasp UE’s reflection mechanism.

The C++ features and standards described in this article are based on ISO/IEC 14882:2014, which is the C++14 standard.

The C++ feature that the reflection implementation depends on is how type information is stored in C++ for runtime access.

class/struct

For C++ classes, special attention needs to be paid to memory layout because an instance of a class is essentially a block of memory, and how to interpret this block of memory needs to be determined through the class’s type information. In C++, struct and class are used to define classes, with only the default access control permissions differing. In UE, however, USTRUCT and UCLASS are strictly distinguished; USTRUCT cannot have reflection functions.

Data Members

Due to compatibility with C and the features of the language, C++ also has the concept of POD. Introducing POD requires understanding the standard-layout class from the C++ standard ([ISO/IEC 14882:2014 9.1.7]):
A standard-layout class is a class that:

has no non-static data members of type non-standard-layout class (or array of such types) or reference,
has no virtual functions (10.3) and no virtual base classes (10.1),
has the same access control (Clause 11) for all non-static data members,
has no non-standard-layout base classes,
either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and
has no base classes of the same type as the first non-static data member.

As C++ can insert or adjust member order when implementing features, the C++ standard has placed the aforementioned restrictions on standard-layout class.

POD can be understood as a class that only contains data, and the order of data members is fixed (cannot include multiple differing access control permissions because the compiler may mix them up, and the standard does not guarantee this). In addition to memory alignment, no extra memory will be inserted.

In a previous article of mine: Memory Alignment Issues with Struct Members, I introduced memory alignment. Due to the existence of memory alignment, data members within a class are not necessarily stored one after the other; there can be some gaps in the memory layout, leading to two issues:

The offset address of data members within a class depends on memory alignment.
Different declaration orders can lead to changes in class layout (the compiler may also rearrange based on different access control permissions).

To obtain the offset values of data members within a class without considering the aforementioned details, C++ introduced a feature called Pointers to Members, which is detailed in ISO/IEC 14882:2014 §8.3.3.

I have also provided a more detailed introduction to pointers to members in a previous article: Pointers to Class Members in C++ Are Not Pointers.

Pointers to class members are crucial for obtaining data member and function descriptions relative to their type information, such as the offset of data members in the class layout, the offset value of this for member functions, and the raw function pointer of member functions. With this information, specific members can be accessed through class instances.

Consider the following class:

class ClassExample
{
public:
  	bool bBoolean;
  	int ival;
  	double dval;
    void func(int){}
};

To retrieve the internal offset values of its three members:

1
2
3

bool ClassExample::*bBoolean = &ClassExample::bBoolean;
int ClassExample::*ival = &ClassExample::ival;
double ClassExample::*dval = &ClassExample::dval;

After translation through LLVM-IR, this can be seen more visually:

1
2
3

store i64 0, i64* %1, align 8
store i64 4, i64* %2, align 8
store i64 8, i64* %3, align 8

What if we change func to be virtual in the class definition?

1
2
3

store i64 8, i64* %1, align 8
store i64 12, i64* %2, align 8
store i64 16, i64* %3, align 8

We can see changes in memory layout because a virtual function has been introduced into the class, leading to a virtual table pointer being inserted into the class layout, occupying 8 bytes.

To summarize: through member function pointers and related features, we can determine the offsets of data members in the class layout at compile time. By using this offset + size of the data member (sizeof), we can correctly access the memory where the specified member is located.

However, there is a restriction in C++: it is not possible to take the address of a bit-field:

The address-of operator & shall not be applied to a bit-field, so there are no pointers to bit-fields.

Bit-fields are widely used in UE for boolean values, and C++ guarantees this usage:

A bool value can successfully be stored in a bit-field of any nonzero size.

Moreover, the allocation and alignment of bit-fields are implementation-defined:

[ISO/IEC 14882:2014 9.6 Bit-fields] Allocation of bit-fields within a class object is implementation-defined. Alignment of bit-fields is implementation-defined.

Since it is not possible to take the address of a bit-field, special handling is needed in the reflection implementation for uint8 bEnable:1; to retrieve bits within the bit-field.

For example, in FBoolProperty, the implementation for retrieving the bool value is as follows:

FORCEINLINE bool GetPropertyValue(void const* A) const
{
	check(FieldSize != 0);
	uint8* ByteValue = (uint8*)A + ByteOffset;
	return !!(*ByteValue & FieldMask);
}

It also accommodates NativeBool and bit-field.

To distinguish between the reflection information generated for uint8 bit_field:1; and bool native_bool;, the reflection information uses UE4CodeGen_Private::EPropertyGenFlags::NativeBool. Those with this flag represent bool native_bool;.

Member Functions

UE’s reflection functions are all member functions and need to inherit from UObject.

The implementation of reflection for member functions in UE does not rely on C++ pointers to member functions; it completely relies on UHT to generate a unified prototype Thunk function, which calls the actual executing function (including operations like retrieving data from the stack).

Additionally, it generates reflection information for the parameters used to pass to the ProcessEvent function, along with the memory offsets of each parameter and return value relative to the parameter structure generated by UHT.

For the following function:

1 2	UFUNCTION() bool func(int32 InIval);

The Thunk function generated by UHT is:

DEFINE_FUNCTION(URefObject::execfunc)
{
    P_GET_PROPERTY(FIntProperty,Z_Param_InIval);
    P_FINISH;
    P_NATIVE_BEGIN;
    *(bool*)Z_Param__Result=P_THIS->func(Z_Param_InIval);
    P_NATIVE_END;
}

In this way, all reflection functions unify their call prototypes to:

1	void execfunc( UObject* Context, FFrame& Stack, RESULT_DECL )

Each reflected function can retrieve its parameters from this prototype, allowing for the execution of the actual function call (or executing Blueprint bytecode).

enum

Enumeration values are not integers but can be promoted to integer types.

[ISO/IEC 14882:2014] Therefore, enumerations (7.2) are not integral; however, enumerations can be promoted to integral types as specified in 4.5.

In UE’s UEnum, enumeration values are stored using int64, so as long as we know the name of the enumeration and the corresponding integer value, we can perform conversions between the names, integer values, and enums.

UHT generates this information for enumerations with UENUM, constructing UEnum instances to store it at runtime.

The syntax for Enum in UE is:

UENUM()
enum class EClassEnum: uint8
{
	A =0,
	B,
	C = 11,
	D
};

Partial reflection code generated by UHT:

static const UE4CodeGen_Private::FEnumeratorParam Enumerators[] = {
    { "EClassEnum::A", (int64)EClassEnum::A },
    { "EClassEnum::B", (int64)EClassEnum::B },
    { "EClassEnum::C", (int64)EClassEnum::C },
    { "EClassEnum::D", (int64)EClassEnum::D },
};

Based on the reflection UEnum, we can convert between strings and enumeration values:

template<typename ENUM_TYPE>
static bool GetEnumValueByName(const FString& InEnumValueName, ENUM_TYPE& OutEnumValue)
{
    bool bStatus = false;
    UEnum* FoundEnum = StaticEnum<ENUM_TYPE>();
    FString EnumTypeName = FoundEnum->CppType;
    if (FoundEnum)
    {
        FString EnumValueFullName = EnumTypeName + TEXT("::") + InEnumValueName;
        int32 EnumIndex = FoundEnum->GetIndexByName(FName(*EnumValueFullName));
        if (EnumIndex != INDEX_NONE)
        {
            int32 EnumValue = FoundEnum->GetValueByIndex(EnumIndex);
            ENUM_TYPE ResultEnumValue = (ENUM_TYPE)EnumValue;
            OutEnumValue = ResultEnumValue;
            bStatus = true;
        }
    }
    return bStatus;
}

We can also access enums in Blueprints:

FString UFlibAppHelper::GetEnumNameByValue(TSoftObjectPtr<UUserDefinedEnum> EnumPath, int32 value)
{
    FString result;
    UUserDefinedEnum* Enumer = LoadObject<UUserDefinedEnum>(nullptr, *EnumPath.ToString());
    if (Enumer)
    {
        result = Enumer->GetDisplayNameTextByValue(value).ToString();
    }
    return result;
}

Scoped Enum

A little side note about C++ scoped enums.

C++11 introduced scoped enum:

enum class EClassEnum{
	A = 0,
	B,
	C
};

Why was such syntax introduced? Because prior to C++11, the definitions of enum values resided in the entire containing namespace. The C++ standard describes:

[ISO/IEC 14882:2014 §7.2] The enumeration type declared with an enum-key of only enum is an unscoped enumeration, and its enumerators are unscoped enumerators.

The following code will lead to a redefinition error:

enum ENormalEnum{
	A = 0
};
enum EOtherEnum{
	A = 0
};

Thus, when writing code, developers often add namespaces to distinguish enumeration namespaces manually:

namespace ENamespaceEnum
{
  enum Type
  {
    A = 0,
    B,
    C
  };
};

Since the enumeration values of the above Type reside within the current namespace, they can be used in the following way:

1
2
3

ENamespaceEnum::A;
ENamespaceEnum::B;
ENamespaceEnum::C;

This is actually a form of weakly typed enumeration, where the enumeration itself is not a type. Therefore, C++11 introduced Scoped Enum, which can be understood as a strongly typed enumeration:

enum class EScopedEnum{
	A = 0,
	B,
	C
};

Using it has a similar effect to the aforementioned namespace approach.

The values of Scoped Enumeration can also be explicitly converted to numeric types:

[ISO/IEC 14882:2014 §5.2.9] A value of a scoped enumeration type (7.2) can be explicitly converted to an integral type.

Moreover, if the underlying type of a scoped enum is not explicitly specified, its default underlying type is int:

[ISO/IEC 14882:2014 §7.2] Each enumeration also has an underlying type. The underlying type can be explicitly specified using enum-base; if not explicitly specified, the underlying type of a scoped enumeration type is int.

In LLVM, the handling of Scoped enums is done at the front end of the compiler, and the IR code generated from the following:

enum ENormalEnum{
	A = 0,
	B,
	C
};
enum class EScopedEnum{
	A = 0,
	B,
	C
};

int main()
{
	printf("A:%d,B:%d,C:%d\n",A,B,C);
	printf("A:%d,B:%d,C:%d\n",EScopedEnum::A,EScopedEnum::B,EScopedEnum::C);
}

The main function’s LLVM-IR:

; Function Attrs: uwtable
define i32 @main() #3 {
entry:
  %call = call i32 (i8*, ...) @_Z6printfPKcz(i8* getelementptr inbounds ([16 x i8], [16 x i8]* @.str, i32 0, i32 0), i32 0, i32 1, i32 2)
  %call1 = call i32 (i8*, ...) @_Z6printfPKcz(i8* getelementptr inbounds ([16 x i8], [16 x i8]* @.str, i32 0, i32 0), i32 0, i32 1, i32 2)
  ret i32 0
}

In generating IR, there will be no symbol information, just constants remaining.

Static Initialization Timing

The reflection technology in UE also relies on the static initialization timing of C++, as seen in gen.cpp:

1	static FCompiledInDefer Z_CompiledInDefer_UClass_URefObject(Z_Construct_UClass_URefObject, &URefObject::StaticClass, TEXT("/Script/RefExample"), TEXT("URefObject"), false, nullptr, nullptr, nullptr);

The C++ standard discusses the timing of static initialization:

It is implementation-defined whether the dynamic initialization of a non-local variable with static storage duration is done before the first statement of main.

Even though the standard mentions implementation-defined behavior, nearly all compilers used operate according to this convention.