C和C++之间的不兼容

Previously, it has been mentioned several times that C and C++ are not the same language. Even the part inherited from C in C++ has significant differences from ISO C. I will gradually compile some of their incompatible features here.

Function Declaration/Definition

In C language, if the function declaration (including definition) is:

1
2
3
4
5
6
7
8
9
// Represents receiving variable arguments
int func(){
print("func()\n");
return 0;
}

int main(void){
func(1,2,3,4);// call func();
}

It is equivalent to the following in C++:

1
2
3
4
5
6
int func(...){
cout<<"func"<<endl;
}
int main(void){
func(1,2,3,4);// call func();
}

In C language, if a specified function does not receive any parameters, the parameter list must be explicitly specified as void:

1
2
3
int func(void);
// Not equivalent
int func();

In C++, an empty parameter list and void both represent not receiving any parameters.

1
2
3
int func();
// Equivalent
int func(void);

sizeof Operation

In C language, due to the existence of VLA (variable length array), sizeof is not a completely compile-time behavior:

1
2
3
4
void func(unsigned int n){
int iarr[n];
printf("%llu\n",sizeof(iarr));
}

This feature does not exist in C++, but some compiler extensions support it, such as Using the GNU Compiler Collection(GCC).
If your C++ compiler compiles it successfully, you are a victim of compiler extensions (escape).

enum

In C language, enum is equivalent to int type, but in C++, enum is a type different from its base type.

[ISO/IEC 9899:1999]The identifiers in an enumerator list are declared as constants that have type int and may appear wherever such are permitted.

In C language, you can use integer constants to initialize enums, but this is an error in C++:

1
2
3
enum color { red, blue, green };
enum color c = 1; // // valid C, invalid C++
// error: cannot initialize a variable of type 'enum color' with an rvalue of type 'int'

Additionally, in C++, because enum objects and the base type of the enum are different types, the sizeof results for both may not be equal.

1
2
3
4
enum e { A };
sizeof(A) == sizeof(int) // in C
sizeof(A) == sizeof(e) // in C ++
/* and sizeof(int) is not necessarily equal to sizeof(e) */

void* Cannot Be Implicitly Converted to Other Pointer Types

1
2
int *x=malloc(4); // valid C, invalid C++
free(x);

In C++, void* cannot be implicitly converted to other pointer types, while it can be in C.

ISO C will accept this usage of pointer to void being assigned to a pointer to object type. C ++ will not.

Type Definition in Parameter/Return Types

In C++, you cannot define a type in the return type or parameter type of a function, but it is allowed in C.

1
2
void f( struct S { int a; } arg ) {} // valid C, invalid C ++
enum E { A, B, C } f() {} // valid C, invalid C ++

Character Array Initialization

In C++, using an initialization of a char array with a “” must be able to hold all characters of that string literal (including the terminating \0). If a character array cannot accommodate all elements of the initialized string literal, it is an error:

In C++, when initializing an array of characters with a string, the number of characters in the string (including the terminating ’\0’) must not exceed the number of elements in the array.

1
2
char x[4]="abcd"; // valid C, invalid C++
// [clang++ 3.9]error: initializer-string for char array is too long

But in C, it will only initialize the portion that can fit.

In C, an array can be initialized with a string even if the array is not large enough to contain the string-terminating ’\0’.

Character Literal Base Type

In C language, character literals are of type int, but in C++, they are of type char.

**[ISO/IEC 14882:2014]**An ordinary character literal that contains a single c-char representable in the execution character set has type char, with value equal to the numerical value of the encoding of the c-char in the execution character set.

**[ISO/IEC 9899:1999]**An integer character constant has type int. The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer.

Thus, the following code behaves differently in C and C++ (using a C compiler will yield no, while a C++ compiler will yield yes):

1
2
3
4
5
if(sizeof('a')==sizeof(char)){
printf("yes\n");
}else{
printf("no\n");
}

That is:

1
2
// valid in C, invalid in C++
sizeof('a') == sizeof(int)

Why?
Imagine if in C++ int and char were the same type, what would this print?

1
std::cout<<'x';

In function overloading, it would not be possible to distinguish between integer and character types, so this is also the reason C++ treats int and char as two different types.

String Literal Type

In C language, string literals are of type char*, that is:

1
char *p = "abc";

**[ISO/IEC 9899:1999]**defines p with type “pointer to char” and initializes it to point to an object with type “array of char” with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.

In C++, string literals are of type const char*:

A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration.

The above rules also apply to char16_t string literal/char32_t string literal.

The difference in the type of string literals in C/C++ causes the following code to have two behaviors:

1
2
// valid in C, invalid in C++
char *str="hello";

If your C++ compiler compiles it successfully, it indicates you are a victim of compiler extensions (non-standard) (escape).

C Struct and C++ Class Member Requirements

In C language, a struct can contain an incomplete type (but not contain them arbitrarily):

**[ISO/IEC 9899:1999]**A structure or union shall not contain a member with incomplete or function type (hence, a structure shall not contain an instance of itself, but may contain a pointer to an instance of itself), except that the last member of a structure with more than one named member may have incomplete array type; such a structure (and any union containing, possibly recursively, a member that is such a structure) shall not be a member of a structure or an element of an array.

This means: The last member of a structure with multiple members can be an incomplete array.

Using this feature, you can use an Array of length zero to dynamically enlarge a struct:

1
2
3
4
5
6
7
8
9
10
11
12
typedef struct A{
char tag;
char alz[];
}Astruct;

int main(void){

Astruct *x=(struct A*)malloc(sizeof(Astruct)+sizeof(char)*10);
free(x);

return 0;
}

Although you can also use a pointer:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
typedef struct A{
char tag;
char *cp;
}Astruct;

int main(void){

Astruct *x=(Astruct*)malloc(sizeof(Astruct));
x->cp=(char*)malloc(sizeof(char*)*10);

free(x->cp);
free(x);

return 0;
}

However, using Array of length zero can save sizeof(char*) overhead, and the first method creates contiguous allocated memory, which can alleviate memory fragmentation issues.

However, note the last sentence of the cited reference, which implies that this feature cannot exist in C++ (because C++ has inheritance), where the derived class contains members of the base class. However, the C++ standard does not specify how the memory layout of a class should be, so there is no guarantee that the last member of the inherited base class is also the last member of the derived class. Similarly, in C++, there is also no guarantee that the last member of a class is located at the end of the instance in memory layout (the location of the virtual function table is not determinable, depending on implementation).

That is, in C++, members (non-static) of a class are complete types:

**[ISO/IEC 14882:2014]**Non-static (9.4) data members shall not have incomplete types. In particular, a class C shall not contain a non-static member of class C, but it can contain a pointer or reference to an object of class C.

Thus, you cannot use the trick of Array of length zero as in C (this still depends on implementation), but this is undefined behavior (undefined behavior).

Although C++ can also use {} to initialize an array of unknown size, you cannot use an empty initialization list to initialize an array of unknown size.

**[ISO/IEC 14882:2014]**An array of unknown size initialized with a brace-enclosed initializer-list containing n initializer-clauses, where n shall be greater than zero, is defined as having n elements (8.3.4).

1
int x[] = { 1, 3, 5 };

declares and initializes x as a one-dimensional array that has three elements since no size was specified and there are three initializers. — end example ] An empty initializer list {} shall not be used as the initializer-clause for an array of unknown bound.

The syntax provides for empty initializer-lists, but nonetheless C++ does not have zero length arrays.

Therefore, in C++, such code is UB behavior:

1
2
int x[]={};
x[0]=123; // no compile error

The above code compiles successfully in MinGW-W64 G++ 6.2.0 and Clang++ 3.9 x86_64-w64-windows-gnu, but fails to compile in VS2015:

error C2466: cannot allocate an array of constant size 0

Recursive Main Function

C language allows recursive main function (it is not prohibited):

Recursive function calls shall be permitted, both directly and indirectly through any chain of other functions.

However, C++ clearly states that recursive main functions are not allowed:

Recursive calls are permitted, except to the function named main (3.6.1).

If your C++ compiler successfully compiles the following code, then you are also a victim of compiler extensions…

1
2
3
4
5
6
7
8
9
10
11
12
static int index=0;
int main(void)
{
printf("%d\n",index);
if(index==10){
return index;
}else{
++index;
main();
}
return 0;
}

const Object File Scope Linkage

In both C and C++, the linkage of const objects at the file scope is different.
First, let’s look at the definition of file scope:

[ISO/IEC 9899:1999]**If the declarator or type specifier that declares the identifier appears outside of any block or list of parameters, the identifier has **file scope, which terminates at the end of the translation unit.

Well, suppose we have the following object at file scope:

1
const int cival=123;

The above code has different linkage when compiled with C and C++ compilers:

  • In C language, it has external linkage.
  • In C++, it has internal linkage.

To test this (for a manual implementation of linkage through the compiler, see C/C++ Compilation and Linkage Model Analysis):

1
2
3
4
// test.c
const int cival=123;

int main(void){}

When compiled with a C compiler:

1
2
3
4
5
$ gcc -c test.c -o test.o
$ nm test.o
-------- U __main
00000000 R cival
00000000 T main

Whereas when compiled with a C++ compiler:

1
2
3
4
$ g++ -c test.c -o test.o
$ nm test.o
-------- U __main
00000000 T main

You can see that the cival symbol is absent from the object file.

As for the reason, the C++ standard provides ample description:

Change: A name of file scope that is explicitly declared const, and not explicitly declared extern, has internal linkage, while in C it would have external linkage.
Rationale: Because const objects can be used as compile-time values in C ++, this feature urges programmers to provide explicit initializer values for each const. This feature allows the user to put const objects in header files that are included in many compilation units.

In summary, in C, global consts default to having external linkage; in C++ this is not the case unless explicitly declared extern.

Function Definition Parameter List

In C language, the following function definition style is allowed:

1
void f(a,b,c)int a;char b;double c;{/*...*/}

In C++, this is not allowed; it must change to:

1
void f(int a,char b,double c;){/*...*/}

Global Variable Definition

In C, a global data object can be declared multiple times in a single translation unit without the extern specifier. As long as there is at most one that provides an initializer, this object is considered defined only once.

1
2
3
4
// This can compile in C
int i;
int i=123;
int main(void){}

In C++, this results in a redefinition error:

1
2
3
int i;
int i=123; // error: redefinition of 'i'
int main(void){}

Nested Structure Scope

In C, the names of nested structures are in the same scope as the outer structure.

1
2
3
4
5
6
7
8
9
10
11
12
struct A{
int a;
struct B{
int b;
};
};

int main(void)
{
struct A aobj;
struct B bobj;
}

In C++, due to the concept of class scope, the outer name must be specified:

1
2
3
4
5
int main()
{
struct A aobj;
struct A::B bobj;
}
The article is finished. If you have any questions, please comment and communicate.

Scan the QR code on WeChat and follow me.

Title:C和C++之间的不兼容
Author:LIPENGZHA
Publish Date:2017/02/27 15:25
Update Date:2017/05/23 10:46
World Count:8.4k Words
Link:https://en.imzlp.com/posts/14446/
License: CC BY-NC-SA 4.0
Reprinting of the full article is prohibited.
Your donation will encourage me to keep creating!