For arrays, subscript operations are a way of random reading and writing, and they are also the most common method. However, many textbooks (especially domestic ones) state that the array name is a pointer, which is incorrect. Moreover, there are a set of rules behind array subscript access; familiarizing yourself with these rules can help analyze the actual meaning of the code in some complex semantic situations.
Suppose we now have an array x
containing 6 int
type elements:
1 | int x[6]={1,2,3,4,5,6}; |
For x
, we can access it via subscripting:
1 | x[2]==3; |
One question is: what exactly happens when we perform the subscript operation? How does it access the element at index 2 (the third element) of array x
?
Perhaps you have also seen another unusual way of accessing array subscripts:
1 | 2[x]==x[2]; |
ISO/IEC 14882:2014(E): The subscript operator [] is interpreted in such a way that E1[E2] is identical to *((E1)+(E2)). Because of the conversion rules that apply to +, if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th member of E1. Therefore, despite its asymmetric appearance, subscripting is a commutative operation.
This means that array subscript access is implemented through the combinations of +
and *
(dereferencing).
When we use x[2]
, it is converted to *(x+2)
.
By the commutative property of addition:
1 | *(x+2)==*(2+x); |
That is:
1 | x[2]==2[x]; |
Further generalizing we get:
1 | const int j=2; |
However, there is another question: what is the array name? It appears that the array name is a pointer to the first element of the array, but this is not correct (or rather not precise).
To start with the conclusion: when the array name appears in an expression, it is converted to a pointer to the first element (or sub-object of a multi-dimensional array).
1 | int x[3][5]={{1,2,3,4,5},{6,7,8,9,10},{11,12,13,14,15}}; // Here x is a 3 × 5 array of integers. |
We can analyze what happens here from the perspective of intermediate code:
1 | // Array initialization |
It can be seen that it is all layers of pointers.
When we assign x
to a pointer object:
1 | auto xp=x; |
The intermediate code is:
1 | %9 = getelementptr inbounds [3 x [5 x i32]], [3 x [5 x i32]]* %6, i32 0, i32 0 |
When x appears in an expression, it is converted to a pointer to (the first of three) five-membered arrays of integers.
That means x
appears in an expression is converted to a pointer to the x[0]
object (which is an array of five integer elements).
In the expression x[i] which is equivalent to *(x+i), x is first converted to a pointer as described; then x+i is converted to the type of x, which involves multiplying i by the length of the object to which the pointer points, namely five integer objects. The results are added and indirection applied to yield an array (of five integers), which in turn is converted to a pointer to the first of the integers.
This means the array name is not a pointer, but represents an array object; however, it can implicitly be converted into a pointer to the first element (object) in an expression.
By the consistent rule
, this concept can be generalized to multi-dimensional arrays
.
If there is another subscript the same argument applies again; this time the result is an integer.
That is:
1 | &x==&x[0]; |
If E is an n-dimensional array of rank i×j×…×k, then E appearing in an expression that is subject to the array-to-pointer conversion (4.2) is converted to a pointer to an (n −1)-dimensional array with rank j×…×k. If the * operator, either explicitly or implicitly as a result of subscripting, is applied to this pointer, the result is the pointed-to (n − 1)-dimensional array, which itself is immediately converted into a pointer.
1 | const int i=2,j=3,k=4; |