A runtime reorganization scheme for Pak in Unreal Engine

虚幻引擎中Pak的运行时重组方案

Pak is a part of UFS in UE (Unreal File System), constructed as a virtual file system at the application layer. It is used to package game-related resources and files into a Pak file, avoiding the creation of a large number of file handles when accessing game resources at runtime, and allowing for read caching (PakCache) to enhance loading efficiency.

Moreover, within UFS, the priority of each Pak can be controlled to manage the priority of files in the file system. When loading files through UFS, files in Paks with higher priority will be accessed first, allowing them to replace files with lower priority. This is also key to UE’s implementation of hot updates, as detailed in the previous hot update series article .

However, by default, the packaging of Pak is done on the UE side, with PakUtilities and UnrealPak being developer-side functionalities that do not exist at runtime, meaning Pak files cannot be created during runtime. Nevertheless, Pak itself is in the form of an Archive, which theoretically can be reorganized at runtime.

This article will explore the implementation details and application directions of creating Pak files at runtime by discussing the aspects of Pak creation, file format, UFS analysis, and runtime reorganization feasibility.

File Access in UFS

Loading files from a Pak can control which Pak’s files are the latest by managing the priority of mounted Paks.

1
2
3
4
5
6
7
/**
* Mounts a pak file at the specified path.
*
* @param InPakFilename Pak filename.
* @param InPath Path to mount the pak at.
*/
bool Mount(const TCHAR* InPakFilename, uint32 PakOrder, const TCHAR* InPath = NULL, bool bLoadIndex = true);

Create Pak

Generally, creating a Pak file in UE is achieved through ExecuteUnrealPak in PakUtilities. UE encapsulates it into an independent command line program, UnrealPak, which allows desired files to be created through a ResponseFile description file.

ResponseFile file format:

1
2
# Absolute Path Relative Path Arguments
C:/TestPak/A.txt ../../../PROJECT_NAME/A.txt -compress

Each line contains the absolute path of the file, its virtual path in UFS, and parameters (such as compression, etc.).

UnrealPak create command:

1
2
# Engine\Binaries\Win64\UnrealPak.exe GENERATE_PAK_FILE.pak -create=RESPONSE_FILE.txt OTHER_ARGS
UnrealPak.exe "D:/TestGeneratePak.pak" -create="D:/TestGeneratePak_ResponseFile.txt" -AlignForMemoryMapping=0 -compress -compressionformats=Zlib

This process is what UE executes during the original packaging, packing Cooked files, ini files, Slate image resources, etc. into the Pak.

Pak Formats

File Layout of Pak

Taking the creation of Pak as an example, analyzing the file format of Pak by packaging a text file containing ABC named ABC.txt.

The binary layout of its Pak file is as follows:

By analyzing the code, the layout information of Pak is as follows:

Corresponding to the above-packaged Pak file, it is divided into the following four parts:

PakEntry (file description information):

PakEntry

PakEntryContent (file content):

Content

PakIndex (Pak file description area):

PakIndex

PakInfo (Pak information area):

PakInfo

Analysis of Pak Mounting

When the engine mounts a Pak, it reads the PakInfo from the end of the Pak file to determine the Pak version information, and reads the offset, size, and Hash value of the PakIndex in the file, which provides basic information for reading files in the Pak.

Once the PakIndex offset and size are acquired, if the project has bEncryptIndex enabled, the entire PakIndex portion information will be encrypted in the Pak; it needs to be decrypted at runtime to access.

Having obtained the PakIndex allows access to the current Pak’s MountPoint, file count, each file’s name, and offsets in the Pak (noting that this offset value includes PakEntry).

In earlier articles, the role of Pak’s MountPoint has already been mentioned: The Role of Mount Point

The MountPoint is the common directory for all files in Pak, with two main functions:

  1. Reducing storage redundancy of file paths, as only the relative paths to MountPoint are kept in PakIndex.
  2. A method to speed up file searches; when searching for a file in Pak, the presence of the desired file in the current Pak can be determined through simple comparison without needing to actually check the Pak.

For example, when seeking a file from UFS: ../../../FindExample/Content/Database.db, if the query finds the MountPoint for the Pak to be ../../../FindExample/Content/Others, a simple comparison can reveal that the desired file is not in the current Pak, avoiding the need to actually search within the Pak.

When mounting a Pak, it does not actually load the entire Pak into memory; instead, it establishes a virtual file structure in UFS by reading the information from PakInfo and PakIndex. Only when an attempt is made to load a file will it use the information from PakIndex to retrieve the PakEntry, read the file at the specified offset in the Pak, and decompress it with the correct compression algorithm.

Create Pak at Runtime

Having analyzed Pak format and UE’s file mount and load process, let’s return to the earlier part of the article: Is it possible to create Pak files at runtime?

The answer is affirmative; the file format of Pak is a typical Archive format, which serializes files and data into a single file. As long as we serialize according to the format and reading process used by UE to create Paks, we can create Pak files that can be read correctly at runtime.

Referring back to the image of UE’s Pak file format:

Note: In the above image, the Offset value in PakEntry is always 0; this is a protective mechanism. The real offset of PakEntry is recorded in PakIndex. Setting the PakEntry section to 0 prevents the possibility of reading directly from the file by bypassing PakIndex, which may be encrypted, ensuring that file reading is performed after proper decryption.

Taking the creation of a file containing ABC, named ABC.txt, as an example, we can create a Pak at runtime.

PakEntry

Each file in a Pak has a PakEntry describing the current file, serving the following purposes:

  1. Where to start reading from (Offset)
  2. How much to read (Size, post-compression size)
  3. The original size of the file (UncompressedSize)
  4. The Sha1 value of the file data in the Pak (post-compression data)
  5. The offset positions of each compression block
  6. The size of each compression block
  7. The compression algorithm used for the current file (algorithm index in PakInfo; 0 for no compression)
  8. Flags, with three states Flag_None/Flag_Encrypted/Flag_Deleted

This information can be calculated for any given file.

Content

Once we have constructed a PakEntry for a file, the Content is straightforward; if compression is enabled, serialize the compressed data; if uncompressed, directly serialize the entire file into the Pak file.

PakIndex

The PakIndex in a Pak is not a complete data structure with a type; it is a hashed data structure, and may have different sizes based on varying parameters (like the Crc value of the path of the Pak file being serialized, whether it is encrypted, whether the PathHash is serialized, DirectoryIndex, etc.).

A separate function for PakIndex serialization is provided in newer engine versions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
void FPakFile::EncodePakEntriesIntoIndex(
int32 InNumEntries,
const ReadNextEntryFunction& InReadNextEntry,
const TCHAR* InPakFilename,
const FPakInfo& InPakInfo,
const FString& MountPoint,
int32& OutNumEncodedEntries,
int32& OutNumDeletedEntries,
uint64* OutPathHashSeed,
FDirectoryIndex* OutDirectoryIndex,
FPathHashIndex* OutPathHashIndex,
TArray<uint8>& OutEncodedPakEntries,
TArray<FPakEntry>& OutNonEncodableEntries,
TMap<uint64, FString>* InOutCollisionDetection,
int32 PakFileVersion
);

The logic of PakIndex serialization can be imitated based on the Pak creation process in PakUtilities.

PakInfo

Finally, at the end of the Pak file, FPakInfo data is stored, structured as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
struct FPakInfo
{
/** Pak file magic value. */
uint32 Magic;
/** Pak file version. */
int32 Version;
/** Offset to pak file index. */
int64 IndexOffset;
/** Size (in bytes) of pak file index. */
int64 IndexSize;
/** Index SHA1 value. */
FSHAHash IndexHash;
/** Flag indicating if the pak index has been encrypted. */
uint8 bEncryptedIndex;
/** Flag indicating if the pak index has been frozen */
uint8 bIndexIsFrozen;
/** Encryption key guid. Empty if we should use the embedded key. */
FGuid EncryptionKeyGuid;
/** Compression methods used in this pak file (FNames, saved as FStrings) */
TArray<FName> CompressionMethods;
};

Key information (Offset, Size, Hash, Encrypt, etc.) from the PakIndex must be stored to allow for correct access to it after Mounting the Pak, enabling retrieval of information on each element in the Pak.

It also provides a Serialize function:

1
void FPakInfo::Serialize(FArchive& Ar, int32 InVersion)

InVersion can pass FPakInfo::PakFile_Version_Latest:

1
2
3
4
FPakInfo Info;
// need set PakIndex property
// serialize to archive
Info.Serialize(SerializePak,FPakInfo::PakFile_Version_Latest);

The entire Archive is serialized to the file, creating a Pak file that can be accessed at runtime.

Since creating a Pak stores the Crc32 value of the path of the Pak in the PakIndex, it impacts the binary stability of the Pak, meaning that creating different Pak files based on the same file will yield differing contents, yet they can all be used and load files normally.

Application Scenarios

The runtime creation of Pak and its significance leads me to contemplate several use cases.

Hot Updates

Previously, updates required downloading Pak files one by one; if a remote pack has multiple Paks, they all have to be downloaded.

With runtime Pak solutions, regardless of how many Paks are packaged remotely or how many versions differ from the local, all can be downloaded in one step and stored within one Pak.

It is even possible to implement version-free updates: without any differential comparison at the remote end, a complete project is packed. The client performs local and remote file version comparisons to determine what differences need to be downloaded, drastically reducing management costs.

Additionally, it allows merging and cleaning of already available local Paks, alleviating issues with too many Pak handles caused by excessive Pak counts and slow queries due to high Pak numbers.

UGC

Creating Paks on the client side can also serve as a strategy for UGC (User Generated Content); players’ custom content can be packed into Paks and redistributed.

Conclusion

This article researched the creation and loading processes of Pak as well as an analysis of UFS, validating the feasibility of creating Paks at runtime along with its application scenarios. The technical implementation discussed in this article is not open-sourced at this time, but later will be released as a runtime mod of HotPatcher.

The article is finished. If you have any questions, please comment and communicate.

Scan the QR code on WeChat and follow me.

Title:A runtime reorganization scheme for Pak in Unreal Engine
Author:LIPENGZHA
Publish Date:2022/05/23 11:03
Word Count:7.4k Words
Link:https://en.imzlp.com/posts/12188/
License: CC BY-NC-SA 4.0
Reprinting of the full article is prohibited.
Your donation will encourage me to keep creating!