Computer Science

The Cost of CPU Clock Cycles

messsagepack

msgpack is a binary JSON scheme:

Bitcoin Paper

Satoshi Nakamoto’s paper on decentralized currency:

Download the original English version: Bitcoin: A Peer-to-Peer Electronic Cash System

The mov is Turing Complete Paper

The mov in assembly language is indeed Turing complete:

a/lib Format

.a and .lib are static library formats. They are ar compressed .o and .obj files, and internally have .txt files to record symbol information for each object file.

In iOS and macOS, symbol names in the .txt start with _.

JNI Call Signature Reference Table

Basic type and signature correspondence in Java:

Java Native Signature
byte jbyte B
char jchar C
double jdouble D
float jfloat F
int jint I
short jshort S
long jlong J
boolean jboolean Z
void void V

For the function with the signature ()V:

1
public void AndroidThunkJava_SetFullScreenDisplayForP();

The signature rules for non-built-in types are:

  1. Start with L
  2. End with ;
  3. Separate the package and class names with /

For example, in Java’s String:

1
2
// ()Ljava/lang/String;
public String AndroidThunkJava_GetDeviceId();

Note: The parentheses contain the parameter signatures, and the right side of the parentheses indicates the return value type signature.

For example:

1
2
// (Ljava/lang/String;Ljava/lang/String;)I
int Func(String,String);

Race Conditions of Semaphores

[CSAPP,2E,12.5.2] When multiple threads are waiting on the same semaphore, it is unpredictable which thread the V operation will restart.

The Role of Memory Alignment

Previously, I wrote an article introducing the rules of memory alignment in C++: Memory Alignment Issues in Struct Members

But why?

First, the arrangement of data in memory is known as memory layout. Different arrangements in structures occupy different amounts of memory and can also significantly affect CPU access efficiency (the CPU can process regardless of alignment, but efficiency does matter). To balance space utilization and access efficiency, memory alignment rules were introduced.

The set of binary numbers that the CPU can process in a unit time is called a word, and the number of bits in this set is called the word length. For a 32-bit CPU, a word length of 32 bits corresponds to 4 bytes. Generally, the larger the word length, the faster the computer processes information.

Taking a 32-bit CPU as an example, the CPU can only read 4 bytes of data from memory at a time, so it can only read addresses that are multiples of 4.

Assume we have a 4-byte integer data type, and the starting address is not a multiple of 4, suppose at bit 0x3, so this type is stored in the memory space from 0x3~0x7. Therefore, if the CPU reads this data, it has to make two read operations at 0x1 and 0x5, and it also needs to process the read data to obtain that integer, as shown:

The CPU’s processing speed is much faster than reading data from memory, so reducing CPU access to memory space is key to improving program performance. Therefore, adopting a memory alignment strategy is crucial for enhancing program performance. Since it’s a 32-bit CPU, it just needs to align by 4 bytes, making the above example change to require only one CPU read:

Because alignment is in bytes, memory alignment is also called byte alignment. Memory alignment in C++ is handled by the compiler, and generally does not require manual specification, but understanding the rules of memory alignment helps write programs that save memory and perform optimally.

Reference materials:

  • “The Rust Programming Language” 4.1.3 P101.
  • “Computer Systems: A Programmer’s Perspective, Third Edition” 3.9.3 P189

Check the Runtime Library Type of Binary Files

In previous notes, I mentioned that when compiling C++ projects in VS, you can specify the Runtime Library type for the target: MT and MD. In previous notes, I also mentioned using dumpbin.exe to determine the platform information (x86/x64) of a binary file. Now, I have a similar problem: how do you determine whether a binary file’s Runtime Library is MT or MD? To answer this question, you still need to use the dumpbin.exe tool by checking the DLL dependencies of the binary file to determine its Runtime Library type. However, the downside of this method is that it cannot determine whether it is Debug or Release. Generally, mixing debug/release linked libraries will lead to connection errors. Use the following command:

1
dumpbin.exe BINARY_FILE /dependents | findstr dll

If you have the following output, then it is MD:

1
2
3
4
5
6
7
8
MSVCP140.dll
VCRUNTIME140.dll
api-ms-win-crt-runtime-l1-1-0.dll
api-ms-win-crt-math-l1-1-0.dll
api-ms-win-crt-stdio-l1-1-0.dll
api-ms-win-crt-locale-l1-1-0.dll
api-ms-win-crt-heap-l1-1-0.dll
KERNEL32.dll

The reason we can determine it as MD is that this binary file depends on CRT DLLs.

After changing the Runtime library type to MT, test again:

1
KERNEL32.dll

Only one KERNEL32.dll without any other CRT DLL dependencies, so it is MT.

Check If a Binary File is x64 or x86 Version

To determine if a dll or lib file is 64-bit or 32-bit, you can use dumpbin from the VS toolchain to inspect the executable file’s segments. The file header includes machine information. If VS is not installed, you can download it here. A GUI version was developed on GitHub (DumpbinGUI), but binary downloads are not provided in the release. I compiled it and you can download it here.
Then use the command:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ dumpbin.exe /headers EXECUTABLE_FILE
# for example
$ dumpbin.exe /headers UE4Launcher.exe
Microsoft (R) COFF/PE Dumper Version 14.00.24234.1
Copyright (C) Microsoft Corporation. All rights reserved.
Dump of file UE4Launcher.exe
PE signature found
File Type: EXECUTABLE IMAGE
FILE HEADER VALUES
8664 machine (x64)
7 number of sections
5D70B753 time date stamp Thu Sep 5 15:20:51 2019
0 file pointer to symbol table
0 number of symbols
F0 size of optional header
22 characteristics
Executable
Application can handle large (>2GB) addresses
// ...

The rest can be ignored; the key detail you need to focus on is the machine item. For a 64-bit file, it will be x64, and for a 32-bit file, it will be x86.

Guide to x86-64

Floating Point Zero

The zero value for floating-point should be 0.0f, because floating-point values are not stored exactly, and there is floating-point rounding, so you shouldn’t compare with a zero of all bits being zero.

Advantages of sendfile

Typically, we read data from one file descriptor fd and write it to another fd:

1
2
while ((n = read(diskfilefd, buf, BUZ_SIZE)) > 0)
write(sockfd, buf, n);

However, frequently transferring in this manner is highly inefficient, requiring two system calls: one to copy file content from the kernel buffer cache to user space, and another to copy from user space buffer back to kernel space for transfer.
But if we just want to transfer without processing the data, this two-step operation is wasteful. The system call sendfile is designed to counteract this inefficiency.
Unix system call interface: sendfile

1
2
3
#include <sys/sendfile.h>
// returns number of bytes transferred, or -1 on error
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);

Its purpose is:

transfer data between file descriptors

For example, when we transfer data via a socket descriptor, using sendfile will directly send the data to the socket without going through user space. This technique is called zero-copy transfer.

The system call sendfile transfers file content (in bytes) between the file descriptor representing the input file in_fd and the descriptor representing the output out_fd. The descriptor out_fd must point to a socket. The file pointed to by in_fd must be mmapable. In practice, this typically means a regular file, which somewhat limits the use of sendfile. We can use sendfile to pass data from a file to a socket, but not the other way around. Additionally, we cannot directly transfer data between two sockets using sendfile.

In Linux 2.4 and earlier versions, out_fd can point to a regular file.

Concurrency and Parallelism

The following content is excerpted from CSAPP, 2e (in fact, using streams to describe parallelism and concurrency depends on OS implementation):
A logical flow whose execution overlaps in time with another flow is called a concurrent flow, and the two flows are said to run concurrently. More precisely, flows X and Y are concurrent with respect to each other if and only if X begins after Y begins and before Y finishes, or Y begins after X begins and before X finishes. For example, in Figure 8.12, processes A and B run concurrently, as do A and C. On the other hand, B and C do not run concurrently because the last instruction of B executes before the first instruction of C.
The general phenomenon of multiple flows executing concurrently is known as concurrency. The notion of a process taking turns with other processes is also known as multitasking. Each time period that a process executes a portion of its flow is called a time slice. Thus, multitasking is also referred to as time slicing. For example, in Figure 8.12, the flow for Process A consists of two time slices.
Notice that the idea of concurrent flows is independent of the number of processor cores or computers that the flows are running on. If two flows overlap in time, then they are concurrent, even if they are running on the same processor. However, we will sometimes find it useful to identify a proper subset of concurrent flows known as parallel flows. If two flows are running concurrently on different processor cores or computers, then we say that they are parallel flows, that they are running in parallel, and have parallel execution.

Why is the IP Address of a Socket a Structure?

An IPv4 address is a 32-bit unsigned integer. However, it is often stored in a structure called in_addr.
This structure is defined in the <netinet/in.h> header file:
The <netinet/in.h> header shall define the in_addr structure, which shall include at least the following member:

1
2
3
4
struct in_addr{
in_addr_t s_addr;
};

in_addr_t is equivalent to the type uint32_t as described in <inttypes.h>.

But why use a structure to store the IP address?

[CSAPP,2e] Storing a scalar address in a structure is an unfortunate consequence of early socket API implementation. Defining a scalar type for IP addresses should make more sense, but it’s too late to change now since many applications are based on it.

TCP/IP Protocol Suite

The TCP/IP protocol suite is a set of different protocols combined to form a protocol suite (application layer protocols such as FTP/Telnet, transport layer protocols like TCP/UDP, network layer protocols including IP/ICMP/IGMP, and link layer protocols like device drivers and network interface devices). Although often referred to as TCP/IP, TCP and IP are merely two among many protocols in this suite.

POSIX Data Types

[IEEE Std 1003.1™ 2008] The implementation shall support one or more programming environments in which the widths of blksize_t, pid_t, size_t, ssize_t, and suseconds_t are no greater than the width of type long.

MMU and COW

Let’s look at an example of COW: modifying variables inherited from the parent process in a child process triggers COW. Even though they may get the same address at runtime, they do not actually reside at the same physical address.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
int main(int argc,char* argv[])
{
int x=12;
printf("start before: x = %d\n",x);
setbuf(stdout,NULL);
pid_t child;
if((child=fork())<0){
printf("error!\n");
}else{
if(child>0){
printf("x initial value on parent scope is %d address is %x\n",x,&x);
printf("parent! process ID is %d\n",getpid());
x=1;
printf("x on parent scope is %d address is %x\n\n",x,&x);
}else{
printf("x initial value on child scope is %d address is %x\n",x,&x);
printf("child! process ID is %d\n",getpid());
// x=2;
printf("x on child scope is %d address is %x\n\n",x,&x);
}
}
return 0;
}

fork And COW Run Result
Looking at the running result, we see that the addresses accessed for x in the child and parent processes yield the same result at runtime; intuitively, this seems to contradict COW rules. The write operation in the child process should result in a different address from that in the parent process.
The reason is that both & operators obtain virtual addresses. The same virtual address in different processes is mapped to different physical addresses by the MMU. Thus, although their virtual addresses are the same, they actually point to different physical addresses.
fork And COW