UE Performance Analysis: Memory Optimization

UE性能分析:内存优化

In game development, program performance is a critical issue that needs to be prioritized. To cover the largest user base possible, it’s essential to consider the operational effects on mid to low-end devices and to ensure compatibility with a wide range of hardware configurations. In this context, analyzing and optimizing the performance bottlenecks in games becomes key.

Loading more resources into memory at runtime is essentially a space-for-time approach. Frequent disk I/O is very time-consuming; by preloading resources into memory, high-speed reading can be achieved. However, memory resources are also limited and cannot be used without restrictions. This is particularly significant for mid to low-end mobile devices, where devices with 4GB or even less memory still hold a considerable market share. Therefore, memory usage must be optimized, and excessive memory consumption can lead to being terminated by the system.

Memory optimization is fundamentally about seeking a balance between loading efficiency and memory usage. The goal is to satisfy the need for compatibility with more low-end devices while also maximizing the use of available memory to improve program efficiency without triggering out-of-memory (OOM) situations.

I plan to write several articles related to performance optimization, beginning with memory analysis in UE. This article will introduce commonly used memory analysis tools and methods, as well as compile memory optimization techniques applicable within UE projects. This information has been previously recorded in the form of notes at notes/ue, and future memory-related content will be supplemented to this article.

Memory optimization primarily focuses on the following four aspects:

  1. Identifying memory leaks
  2. Trimming unnecessary modules
  3. Optimizing memory usage of existing modules
  4. Lossy optimization: reducing content (asset quality, etc.)

Three steps: Find bugs, squeeze out excess, cut down requirements.

Memory Analysis Tools

Before performing memory optimization, it’s essential to gain a general understanding of memory distribution in the UE project. You can use the memory analysis tools provided by UE as well as some native platform analysis tools.

Memory analysis resources:

Some commonly used console commands:

1
2
3
4
stat memory # Displays memory usage of various subsystems in the engine
stat MemoryAllocator # Displays memory allocation information
stat MemoryPlatform # Displays platform memory information
stat MemoryStaticMesh # Displays memory information of static meshes

Enable LLM by adding the -LLM parameter at startup:

1
2
3
4
-LLM # Enable LLM
-LLMCSV # Continuously write all values to a CSV file. Automatically enables -LLM.
-llmtagsets=Assets # Experimental feature. Displays the total memory allocated for each resource.
-llmtagsets=AssetClasses # Experimental feature. Displays the total for each UObject class type.

Then you can use the following console commands at runtime:

1
2
3
4
stat llm # Displays LLM summary. All lower-level engine statistics are aggregated into a single engine statistic.
stat llmfull # Displays all LLM statistics
stat LLMPlatform # Displays all memory information allocated from the OS
stat LLMOverhead # Displays memory used internally by LLM

Memory analysis can also utilize the following tools:

  • memreport
  • MemoryProfiler
  • Heapprofd (Android)
  • Instrument (IOS)

Entering memreport (-full) in the game console will create a directory map and a .memreport file in Saved/Profiling/Memreports, which can be opened with a text editor to observe memory usage in various parts of the game.

Specific usage of memory analysis tools and the analysis process of memory allocation in the UE engine will be detailed later when time allows. For information about LLM, please refer to the UE documentation.

Memory Optimization Strategies

The optimization methods listed below are optional; it is not necessary to implement all of them for the best results. Since memory optimization must balance efficiency, features can be selectively optimized based on project needs across different devices, ensuring compatibility while minimizing resource consumption on low-end devices.

Here are the main areas in UE that can be optimized, along with suggestions on how to do so. Specific optimization data will be analyzed and supplemented as time permits.

Disable Unnecessary Feature Support

Engine module support can be trimmed based on needs:

  • APEX: If the Nvidia APEX destruction system is not in use, APEX support can be removed during engine compilation. Set bCompileAPEX=true in BuildSetting or TargetRules.
  • Recast (NavMesh): If the client does not need Recast support at runtime and does not require local NavMesh navigation operations, NavMesh support can be trimmed at runtime by setting bCompileRecast=false in BuildSetting or TargetRules.
  • FreeType: If FreeType font support is not needed, set bCompileFreeType=false in BuildSetting or TargetRules.
  • ICU (unicode/i18n): The engine’s Core module supports unicode/i18n. Set bCompileICU=false in BuildSetting or TargetRules.
  • CompileForSize: A UE optimization option that strictly controls size during compilation but sacrifices performance. Set bCompileForSize=false in BuildSetting or TargetRules.
  • CEF3: Optional support for the Chromium Embedded Framework, Google’s embedded browser. Set bCompileCEF3=false in BuildSetting or TargetRules.
  • bUsesSteam: Whether or not to use Steam; this can be disabled for mobile games, controlled through bUsesSteam in TargetRules.
  • SpeedTree: If SpeedTree for vegetation modeling is not required, compilation of SpeedTree can be disabled; control via bOverrideCompileSpeedTree in TargetRules.
  • Audio module: If the project uses WWise or others as the audio playback interface, and the built-in Audio module is not needed, this functionality is redundant and can be trimmed.
  • Internationalization module: If the game’s multilingual support does not rely on UE’s text collection and translation features, this module can be trimmed.

This can reduce the size of the statically compiled program and eliminate unnecessary execution logic.

Control AssetRegistry Serialization

AssetRegistry is mainly used in the Editor to facilitate resource lookup and filtering operations, primarily used by ContentBrowser, as described in the UE documentation: Asset Registry.

For projects, there might not be a need for it at Runtime, but as soon as the AssetRegistry module is started, AssetRegistry.bin is loaded into memory. If not needed, this part of the memory is wasted.

Fortunately, UE provides methods to avoid or partially serialize AssetRegistry data. The constructor of UAssetRegistryImpl calls the InitializeSerializationOptionsFromIni function to read configuration from DefaultEngine.ini and constructs an FAssetRegistrySerializationOptions structure to store this. It is then used in a subsequent Serialize function to control which data gets serialized into AssetRegistry.

1
// Code omitted for brevity

This control mechanism can determine whether to generate AssetRegistry.bin at packaging time and control which AssetRegistry data is deserialized at runtime (but will not affect DevelopmentAssetRegistry.bin, which can be used for asset auditing).

Its deserialization process is as follows:

  1. Check bSerializeAssetRegistry; if true, load AssetRegistry.bin into memory in binary form.
  2. Use the Serialize function to deserialize the binary data.
  3. Release the memory occupied by the loaded AssetRegistry.bin.

Thus, AssetRegistry’s memory consumption relates to the data serialized, while FAssetRegistrySerializationOptions controls what data is serialized.

1
// Code omitted for brevity

Creating an AssetRegistry Section in Config/DefaultEngine.ini using the above names can control the serialization of AssetRegistry, reducing package size and memory consumption (since AssetRegistry is loaded into memory when the engine starts).

1
2
3
4
5
6
[AssetRegistry]
bSerializeAssetRegistry=false
bSerializeDependencies=false
bSerializeNameDependencies=false
bSerializeManageDependencies=false
bSerializePackageData=false

You can also specify it for a specific platform by modifying the platform-related INI files:

1
2
3
Config/Windows/WindowsEngine.ini
Config/Android/AndroidEngine.ini
Config/IOS/IOSEngine.ini

Load Only Quality Levels of Shaders Used

By default, the engine loads shaders for all quality levels into memory. In situations where quality switching is unnecessary, unused quality levels can be left unpopulated, reducing shader memory consumption.

In Project Settings - Engine - Rendering - Materials - Game Discards Unused Material Quality Levels:

Alternatively, add the following configuration in DefaultEngine.ini:

1
2
[/Script/Engine.RendererSettings]
r.DiscardUnusedQuality=True

When running in game mode, whether to keep shaders for all quality levels in memory or only those needed for the current quality level.

  • Unchecked: Keep all quality levels in memory allowing a runtime quality level change. (default)
  • Checked: Discard unused quality levels when loading content for the game, saving some memory.

Reduce Shader Variants

You can reduce the number of material instances and enable the following options in Project Settings - Engine - Rendering:

Share Material Shader Code

When packaging, you can reduce package size and memory usage by configuring Share Material Shader Code and Shared Material Native Libraries in Project Settings - Packaging (this may increase loading time).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/** 
* By default, shader code gets saved inline inside material assets,
* enabling this option will store only shader code once as individual files
* This will reduce overall package size but might increase loading time
*/
UPROPERTY(config, EditAnywhere, Category=Packaging)
bool bShareMaterialShaderCode;

/**
* By default, shader code gets saved into individual platform-agnostic files,
* enabling this option will use the platform-specific library format if and only if one is available
* This will reduce overall package size but might increase loading time
*/
UPROPERTY(config, EditAnywhere, Category=Packaging, meta = (EditCondition = "bShareMaterialShaderCode", ConfigRestartRequired = true))
bool bSharedMaterialNativeLibraries;

Upon enabling this, the packaged output will include the following files:

1
2
ShaderArchive-Blank425-PCD3D_SM5.ushaderbytecode
ShaderCode-Global-PCD3D_SM5.ushaderbytecode

However, if changes are made to the shader code in subsequent cooking, while the basic packaged code still holds old shader bytecode information, it could lead to loss of materials.

There are three approaches:

  1. During later packaging, include the Shader bytecode files in the pak and load on mount;
  2. Package shader bytecode with resources during hot updating;
  3. Create a Shader Patch and load it after updates.

For detailed processes on updating shader bytecode, refer to my previous article UE4 Hot Update: Create Shader Patch.

Disable UMG Template Creation

The engine has a feature that caches blueprint controls for accelerated creation, but it can lead to memory waste. This can be configured to be disabled:

Alternatively, modify the engine code to provide default values using class-level initialization:

1
2
UPROPERTY(EditAnywhere, AdvancedDisplay, Category=WidgetBlueprintOptions, AssetRegistrySearchable)
bool bForceSlowConstructionPath;

This variable is checked and utilized in the following code:

1
// Code omitted for brevity

Disable PakCache

The engine by default enables the PakCache mechanism, which reads files from Pak while utilizing additional memory for caching. The memory usage can be significant (viewed via stat memory):

Game startups will log PakCache details:

1
2
3
4
[2021.03.23-10.49.21:354][445]LogPakFile: Precache HighWater 16MB
[2021.03.23-10.49.21:382][447]LogPakFile: Precache HighWater 32MB
[2021.03.23-10.49.21:442][450]LogPakFile: Precache HighWater 48MB
[2021.03.23-10.49.21:470][452]LogPakFile: Precache HighWater 64MB

You can configure it to disable as follows:

1
2
[ConsoleVariables]
pakcache.Enable=0

Disabling PakCache may lead to frequent I/O issues, but specific performance impacts will require further analysis when time permits.

Unload pakentry filenames

Starting from UE4.23, the engine offers memory optimization configurations for mounted PakFiles:

1
2
3
4
5
[Pak]
UnloadPakEntryFilenamesIfPossible=true
DirectoryRootsToKeepInMemoryWhenUnloadingPakEntryFilenames="*/Config/Tags/"
+DirectoryRootsToKeepInMemoryWhenUnloadingPakEntryFilenames="*/Content/Localization/*"
ShrinkPakEntriesMemoryUsage=true

When FPakPlatformFile executes Initialize, it binds FCoreDelegates::OnOptimizeMemoryUsageForMountedPaks, which calls this delegate to notify PakPlatformFile to optimize the memory usage for mounted Paks.

1
// Code omitted for brevity
  • UnloadPakEntryFilenamesIfPossible: Allows unloading memory occupied by PakEntry filenames.
  • DirectoryRootsToKeepInMemoryWhenUnloadingPakEntryFilenames: Directories to keep in memory when unloading PakEntry filenames.
  • bShrinkPakEntriesMemoryUsage: Shrinks memory usage of PakEntry.

After this is invoked, if UnloadPakEntryFilenamesIfPossible is enabled, memory will be saved by hashing the filename list in Pak, but wildcard matching for paths will no longer be available.

1
// Code omitted for brevity

Compress Textures

Texture compression is lossy and can reduce both package size and memory size upon loading. Even with lossy compression, the quality reduction on mobile is often not noticeable and can be set based on project needs.

In previous notes, it was mentioned you can adjust the default resource quality and size level in Project Settings - Cooker - Texture - ASTC Compression vs Size:

1
2
3
4
5
0=12x12 
1=10x10
2=8x8
3=6x6
4=4x4

In the Texture resource editor, you can also set this for individual Textures:

Lowest->Highest corresponds to values 0-4, using Default will apply project settings.

Also, the type set in Compression Settings will affect the compression type of the resource. Default implies project settings; if set as NormalMap type, it will be ASTC_4x4.

本篇文章会持续更新,欢迎交流。

Scan the QR code on WeChat and follow me.

Title:UE Performance Analysis: Memory Optimization
Author:LIPENGZHA
Publish Date:2021/03/30 10:59
Word Count:9.6k Words
Link:https://en.imzlp.com/posts/19135/
License: CC BY-NC-SA 4.0
Reprinting of the full article is prohibited.
Your donation will encourage me to keep creating!