The logic behind array subscript access

数组下标访问背后隐含的逻辑

For arrays, subscript operations are a way of random reading and writing, and they are also the most common method. However, many textbooks (especially domestic ones) state that the array name is a pointer, which is incorrect. Moreover, there are a set of rules behind array subscript access; familiarizing yourself with these rules can help analyze the actual meaning of the code in some complex semantic situations.

Suppose we now have an array x containing 6 int type elements:

1
int x[6]={1,2,3,4,5,6};

For x, we can access it via subscripting:

1
x[2]==3;

One question is: what exactly happens when we perform the subscript operation? How does it access the element at index 2 (the third element) of array x?
Perhaps you have also seen another unusual way of accessing array subscripts:

1
2[x]==x[2];

ISO/IEC 14882:2014(E): The subscript operator [] is interpreted in such a way that E1[E2] is identical to *((E1)+(E2)). Because of the conversion rules that apply to +, if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th member of E1. Therefore, despite its asymmetric appearance, subscripting is a commutative operation.

This means that array subscript access is implemented through the combinations of + and * (dereferencing).
When we use x[2], it is converted to *(x+2).
By the commutative property of addition:

1
*(x+2)==*(2+x);

That is:

1
x[2]==2[x];

Further generalizing we get:

1
2
const int j=2;
x[j]==*(x[0]+j)==*(x+j)==*(j+x)==j[x]

However, there is another question: what is the array name? It appears that the array name is a pointer to the first element of the array, but this is not correct (or rather not precise).
To start with the conclusion: when the array name appears in an expression, it is converted to a pointer to the first element (or sub-object of a multi-dimensional array).

1
int x[3][5]={{1,2,3,4,5},{6,7,8,9,10},{11,12,13,14,15}}; // Here x is a 3 × 5 array of integers.

We can analyze what happens here from the perspective of intermediate code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// Array initialization
@_ZZ4mainE1a = private unnamed_addr constant [3 x [5 x i32]] [[5 x i32] [i32 1, i32 2, i32 3, i32 4, i32 5], [5 x i32] [i32 6, i32 7, i32 8, i32 9, i32 10], [5 x i32] zeroinitializer], align 16
// Allocating storage space
%3 = alloca i32, align 4
%4 = alloca i32, align 4
%5 = alloca i8**, align 8
%6 = alloca [3 x [5 x i32]], align 16
%7 = alloca [5 x i32]*, align 8
store i32 0, i32* %3, align 4
store i32 %0, i32* %4, align 4
store i8** %1, i8*** %5, align 8
// Array converted to pointer
%8 = bitcast [3 x [5 x i32]]* %6 to i8*
// Call initialization to write values from the initializer list into the allocated address space
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %8, i8* bitcast ([3 x [5 x i32]]* @_ZZ4mainE1a to i8*), i64 60, i32 16, i1 false)

It can be seen that it is all layers of pointers.
When we assign x to a pointer object:

1
auto xp=x;

The intermediate code is:

1
2
%9 = getelementptr inbounds [3 x [5 x i32]], [3 x [5 x i32]]* %6, i32 0, i32 0
store [5 x i32]* %9, [5 x i32]** %7, align 8

When x appears in an expression, it is converted to a pointer to (the first of three) five-membered arrays of integers.

That means x appears in an expression is converted to a pointer to the x[0] object (which is an array of five integer elements).

In the expression x[i] which is equivalent to *(x+i), x is first converted to a pointer as described; then x+i is converted to the type of x, which involves multiplying i by the length of the object to which the pointer points, namely five integer objects. The results are added and indirection applied to yield an array (of five integers), which in turn is converted to a pointer to the first of the integers.

This means the array name is not a pointer, but represents an array object; however, it can implicitly be converted into a pointer to the first element (object) in an expression.
By the consistent rule, this concept can be generalized to multi-dimensional arrays.

If there is another subscript the same argument applies again; this time the result is an integer.

That is:

1
2
3
4
5
&x==&x[0];
&x[0]==&x[0][0]
x==&x[0][0]; // x is a int **
**x==x[0][0];
&**x==&x[0][0]

If E is an n-dimensional array of rank i×j×…×k, then E appearing in an expression that is subject to the array-to-pointer conversion (4.2) is converted to a pointer to an (n −1)-dimensional array with rank j×…×k. If the * operator, either explicitly or implicitly as a result of subscripting, is applied to this pointer, the result is the pointed-to (n − 1)-dimensional array, which itself is immediately converted into a pointer.

1
2
3
4
5
const int i=2,j=3,k=4;
int E[i][j][k]; // declares a three-dimensional array of integers, with rank 2×3×4.
// E[1] is an array object pointing to E[1][0][0]~E[1][j-1][k-1];
int *firstElementPtr=&**a;
E[1][2][3]==*(firstElementPtr+(1*j*k)+(2*k)+3);
The article is finished. If you have any questions, please comment and communicate.

Scan the QR code on WeChat and follow me.

Title:The logic behind array subscript access
Author:LIPENGZHA
Publish Date:2017/01/16 00:59
Word Count:2.8k Words
Link:https://en.imzlp.com/posts/20449/
License: CC BY-NC-SA 4.0
Reprinting of the full article is prohibited.
Your donation will encourage me to keep creating!