The Cost of CPU Clock Cycles
messsagepack
msgpack is a binary JSON scheme:
Bitcoin Paper
Satoshi Nakamoto’s paper on decentralized currency:
Download the original English version: Bitcoin: A Peer-to-Peer Electronic Cash System
The mov is Turing Complete Paper
The mov
in assembly language is indeed Turing complete:
a/lib Format
.a
and .lib
are static library formats. They are ar
compressed .o
and .obj
files, and internally have .txt
files to record symbol information for each object file.
In iOS and macOS, symbol names in the .txt
start with _
.
JNI Call Signature Reference Table
Basic type and signature correspondence in Java:
Java | Native | Signature |
---|---|---|
byte | jbyte | B |
char | jchar | C |
double | jdouble | D |
float | jfloat | F |
int | jint | I |
short | jshort | S |
long | jlong | J |
boolean | jboolean | Z |
void | void | V |
For the function with the signature ()V
:
1 | public void AndroidThunkJava_SetFullScreenDisplayForP(); |
The signature rules for non-built-in types are:
- Start with
L
- End with
;
- Separate the package and class names with
/
For example, in Java’s String:
1 | // ()Ljava/lang/String; |
Note: The parentheses contain the parameter signatures, and the right side of the parentheses indicates the return value type signature.
For example:
1 | // (Ljava/lang/String;Ljava/lang/String;)I |
Race Conditions of Semaphores
[CSAPP,2E,12.5.2] When multiple threads are waiting on the same semaphore, it is unpredictable which thread the V operation will restart.
The Role of Memory Alignment
Previously, I wrote an article introducing the rules of memory alignment in C++: Memory Alignment Issues in Struct Members
But why?
First, the arrangement of data in memory is known as memory layout. Different arrangements in structures occupy different amounts of memory and can also significantly affect CPU access efficiency (the CPU can process regardless of alignment, but efficiency does matter). To balance space utilization and access efficiency, memory alignment rules were introduced.
The set of binary numbers that the CPU can process in a unit time is called a word, and the number of bits in this set is called the word length. For a 32-bit CPU, a word length of 32 bits corresponds to 4 bytes. Generally, the larger the word length, the faster the computer processes information.
Taking a 32-bit CPU as an example, the CPU can only read 4 bytes of data from memory at a time, so it can only read addresses that are multiples of 4.
Assume we have a 4-byte integer data type, and the starting address is not a multiple of 4, suppose at bit 0x3
, so this type is stored in the memory space from 0x3~0x7
. Therefore, if the CPU reads this data, it has to make two read operations at 0x1
and 0x5
, and it also needs to process the read data to obtain that integer, as shown:
The CPU’s processing speed is much faster than reading data from memory, so reducing CPU access to memory space is key to improving program performance. Therefore, adopting a memory alignment strategy is crucial for enhancing program performance. Since it’s a 32-bit CPU, it just needs to align by 4 bytes, making the above example change to require only one CPU read:
Because alignment is in bytes, memory alignment is also called byte alignment. Memory alignment in C++ is handled by the compiler, and generally does not require manual specification, but understanding the rules of memory alignment helps write programs that save memory and perform optimally.
Reference materials:
- “The Rust Programming Language” 4.1.3 P101.
- “Computer Systems: A Programmer’s Perspective, Third Edition” 3.9.3 P189
Check the Runtime Library Type of Binary Files
In previous notes, I mentioned that when compiling C++ projects in VS, you can specify the Runtime Library
type for the target: MT
and MD
. In previous notes, I also mentioned using dumpbin.exe
to determine the platform information (x86/x64) of a binary file. Now, I have a similar problem: how do you determine whether a binary file’s Runtime Library
is MT
or MD
? To answer this question, you still need to use the dumpbin.exe
tool by checking the DLL dependencies of the binary file to determine its Runtime Library type. However, the downside of this method is that it cannot determine whether it is Debug
or Release
. Generally, mixing debug/release linked libraries will lead to connection errors. Use the following command:
1 | dumpbin.exe BINARY_FILE /dependents | findstr dll |
If you have the following output, then it is MD
:
1 | MSVCP140.dll |
The reason we can determine it as MD is that this binary file depends on CRT DLLs.
After changing the Runtime library
type to MT
, test again:
1 | KERNEL32.dll |
Only one KERNEL32.dll
without any other CRT DLL dependencies, so it is MT.
Check If a Binary File is x64 or x86 Version
To determine if a dll or lib file is 64-bit or 32-bit, you can use dumpbin
from the VS toolchain to inspect the executable file’s segments. The file header includes machine information. If VS is not installed, you can download it here. A GUI version was developed on GitHub (DumpbinGUI), but binary downloads are not provided in the release. I compiled it and you can download it here.
Then use the command:
1 | $ dumpbin.exe /headers EXECUTABLE_FILE |
The rest can be ignored; the key detail you need to focus on is the machine
item. For a 64-bit file, it will be x64
, and for a 32-bit file, it will be x86
.
Guide to x86-64
Floating Point Zero
The zero value for floating-point should be 0.0f
, because floating-point values are not stored exactly, and there is floating-point rounding, so you shouldn’t compare with a zero of all bits being zero.
Advantages of sendfile
Typically, we read data from one file descriptor fd
and write it to another fd
:
1 | while ((n = read(diskfilefd, buf, BUZ_SIZE)) > 0) |
However, frequently transferring in this manner is highly inefficient, requiring two system calls: one to copy file content from the kernel buffer cache to user space, and another to copy from user space buffer back to kernel space for transfer.
But if we just want to transfer without processing the data, this two-step operation is wasteful. The system call sendfile
is designed to counteract this inefficiency.
Unix system call interface: sendfile
1 |
|
Its purpose is:
transfer data between file descriptors
For example, when we transfer data via a socket descriptor, using sendfile
will directly send the data to the socket without going through user space. This technique is called zero-copy transfer.
The system call sendfile
transfers file content (in bytes) between the file descriptor representing the input file in_fd
and the descriptor representing the output out_fd
. The descriptor out_fd
must point to a socket. The file pointed to by in_fd
must be mmapable. In practice, this typically means a regular file, which somewhat limits the use of sendfile
. We can use sendfile
to pass data from a file to a socket, but not the other way around. Additionally, we cannot directly transfer data between two sockets using sendfile
.
In Linux 2.4 and earlier versions,
out_fd
can point to a regular file.
Concurrency and Parallelism
The following content is excerpted from CSAPP, 2e (in fact, using streams to describe parallelism and concurrency depends on OS implementation):
A logical flow whose execution overlaps in time with another flow is called a concurrent flow
, and the two flows are said to run concurrently. More precisely, flows X and Y are concurrent with respect to each other if and only if X begins after Y begins and before Y finishes, or Y begins after X begins and before X finishes. For example, in Figure 8.12, processes A and B run concurrently, as do A and C. On the other hand, B and C do not run concurrently because the last instruction of B executes before the first instruction of C.
The general phenomenon of multiple flows executing concurrently is known as concurrency
. The notion of a process taking turns with other processes is also known as multitasking
. Each time period that a process executes a portion of its flow is called a time slice
. Thus, multitasking is also referred to as time slicing
. For example, in Figure 8.12, the flow for Process A consists of two time slices.
Notice that the idea of concurrent flows is independent of the number of processor cores or computers that the flows are running on. If two flows overlap in time, then they are concurrent, even if they are running on the same processor. However, we will sometimes find it useful to identify a proper subset of concurrent flows known as parallel flows. If two flows are running concurrently on different processor cores or computers, then we say that they are parallel flows
, that they are running in parallel, and have parallel execution.
Why is the IP Address of a Socket a Structure?
An IPv4 address is a 32-bit unsigned integer. However, it is often stored in a structure called in_addr
.
This structure is defined in the <netinet/in.h>
header file:
The <netinet/in.h>
header shall define the in_addr
structure, which shall include at least the following member:
1 | struct in_addr{ |
in_addr_t is equivalent to the type
uint32_t
as described in<inttypes.h>
.
But why use a structure to store the IP address?
[CSAPP,2e] Storing a scalar address in a structure is an unfortunate consequence of early socket API implementation. Defining a scalar type for IP addresses should make more sense, but it’s too late to change now since many applications are based on it.
TCP/IP Protocol Suite
The TCP/IP protocol suite is a set of different protocols combined to form a protocol suite (application layer protocols such as FTP/Telnet, transport layer protocols like TCP/UDP, network layer protocols including IP/ICMP/IGMP, and link layer protocols like device drivers and network interface devices). Although often referred to as TCP/IP, TCP and IP are merely two among many protocols in this suite.
POSIX Data Types
[IEEE Std 1003.1™ 2008] The implementation shall support one or more programming environments in which the widths of
blksize_t
,pid_t
,size_t
,ssize_t
, andsuseconds_t
are no greater than the width of typelong
.
MMU and COW
Let’s look at an example of COW: modifying variables inherited from the parent process in a child process triggers COW. Even though they may get the same address at runtime, they do not actually reside at the same physical address.
1 |
|
Looking at the running result, we see that the addresses accessed for x
in the child and parent processes yield the same result at runtime; intuitively, this seems to contradict COW rules. The write operation in the child process should result in a different address from that in the parent process.
The reason is that both &
operators obtain virtual addresses. The same virtual address in different processes is mapped to different physical addresses by the MMU. Thus, although their virtual addresses are the same, they actually point to different physical addresses.