Why is extern "C" needed?

为什么需要extern "C"?

In the previous article (C/C++ Compilation Model Analysis), the reasons and methods for compilation and linking in C and C++ were introduced. Following the issues discussed in the previous article, this article starts with extern "C" to analyze the differences and causes in the compilation and linking models of C and C++, focusing primarily on function overload, function signatures, and name mangling.

Before introducing the three concepts above, let’s briefly understand the compilation model of the C language, referring to my previous article: C/C++ Compilation Model Analysis.

Function Overload

Function overload is a very useful concept: I can define multiple implementations of the same function for different parameters.

[ISO/IEC 14882:014] When two or more different declarations are specified for a single name in the same scope, that name is said to be overloaded.

C language does not have an overload mechanism, which means that two functions cannot have the same name in C. Thus, the following code has a redefinition error in C:

1
2
int customMax(int x,int y){}
double customMax(double l,double r){}

Therefore, in a large amount of C code, you can see names like fs__xxx, using a naming convention to indicate their behavior. However, this is legal in C++ because C++ supports function overload. But C++ does not have a keyword to denote that a function is overloaded (like the overload keyword in C#). This is because the father of C++, Bjarne Stroustrup, considered that if overloads were to be compatible with existing C libraries without disrupting existing code (non-intrusively), handling an overload keyword would be challenging. Thus, C++ implemented function overloading in a different way.

C++ Function Overload and Linking

Since C language has no overload, the compiler’s linking of functions is quite simple: the function name is customMax, so its symbol name in the object file is also customMax. Let’s see:
For the function:

1
2
3
4
// customMax.c
int customMax(int x,int y){
return x>=y?x:y;
}

Following the method of generating target code mentioned in the previous article, let’s operate on it:

1
2
# Generate the object file customMax.o from customMax.c
$ gcc -c customMax.c -o customMax.o

You can use nm in the gcc toolchain to view the symbol information in the object file:

1
2
3
# View the symbol information in customMax.o
$ nm customMax.o
0000000000000000 T customMax

Function symbol types:

  • T The symbol is placed in the code segment of the current object file, typically for global non-static functions.
  • U The symbol is undefined in the current object file and needs to be linked from other object files.

As we can see, the function name remains the same after the C compiler compiles it. However, this becomes an issue in C++, because using a simple symbol compilation and linking method like in C, how does one differentiate between different overload versions of a function name?

1
2
3
// error: conflicting types for 'customMax'
int customMax(int x,int y);
double customMax(double x,double y);

In C, the symbols compiled from these two functions are identical, which means it is impossible to distinguish between different functions using the same symbol name (the same symbol name is inherently ambiguous). To solve this issue, C++ adopts an approach based on the fundamental idea of: encoding type information in the name that is passed to the linker.

That is, the type information of the function (return type, argument types) is encoded into the symbol name after compilation (the following is for explanation only, not actual compiler behavior):

1
2
3
4
5
6
7
int customMax(int x,int y);
// The compiled symbol name might be
_i_customMax_ii

double customMax(double,double);
// The compiled symbol name might be
_d_customMax_dd

Now, let’s check the actual behavior of the compiler. For the above customMax code, we will use the C++ compiler (g++) to generate the object file and view the symbol information for the function customMax:

1
2
3
$ g++ -c customMax.c -o customMax.o
$ nm customMax.o
0000000000000000 T _Z3customMaxii

Now, if we add an overloaded function double customMax(double,double):

1
2
3
4
5
// customMax.c
int customMax(int x,int y){/*...*/}
double customMax(double x,double y){
return x>=y?x:y;
}

Compile to create the object file again and check the symbol information:

1
2
3
4
$ g++ -c customMax.c -o customMax.o
$ nm customMax.o
000000000000001c T _Z9customMaxdd
0000000000000000 T _Z9customMaxii


You can see that double customMax(double,double) and int customMax(int,int) have different symbol information in the object file. This is the implementation principle that supports C++ overloading.

Using c++filt from the gcc toolchain can convert the symbol information back to the original function information (the reason why the return type is not displayed is due to function Signatures, which will be explained later):

1
2
$ c++filt _Z3customMaxii
customMax(int, int)

You can observe that the symbol information generated by the C language compiler and C++ compiler for the same function customMax is different. The compiler’s operation of generating symbols for function names is called name mangling, which will be elaborated on later.

Compatibility of C++ Function Overloading with C

However, doing this poses a problem: compatibility with C. The father of C++, when designing C++, considered compatibility with the C language to be of utmost importance, and required that C++ should not have arbitrary incompatibilities with C. While the incompatibility caused by function overloading and C must be accepted, C++ provides a way for indirect compatibility.

Consider the following issue:

1
2
// C code and using C compiler
int customMax(int x,int y){}

The function customMax above may reside in some static or dynamic library — that is, a library in non-source code form. We cannot make intrusive modifications to it. Now there is an issue: since C compilers do not alter the mapping of function names to symbols (the function name is the symbol name in the object file), the symbol information generated by the above code is customMax. But because C++ needs to be compatible with C libraries, what happens if I want to call the customMax function from that linked library in my C++ code? That is, using a C compiler to compile the implementation of the customMax function, and compiling the call to customMax with a C++ compiler.

1
2
3
4
// customMax.c
int customMax(int x,double y){
return x>=y?x:y;
}

Using the C compiler to compile:

1
2
3
4
5
# C compiler compiles out the object file customMax.o
$ gcc -c customMax.c -o customMax.o
# View the symbol information in the object file
$ nm customMax.o
0000000000000000 T customMax

For calling customMax in the C++ code:

1
2
3
4
5
6
7
// main.c
extern int customMax(int,int);
int main(int argc,char* argv[])
{
customMax(11, 12);
return 0;
}

Using the C++ compiler to compile:

1
2
3
4
5
$ g++ -c main.c -o main.o
# View the symbol information in the object file
$ nm main.o
---------------- U _Z6customMaxii
0000000000000000 T main

gccsym-gxxsym-1

Let’s try to link them:

1
2
$ g++ main.o customMax.o
main.o:main.c:(.text+0x1f): undefined reference to `customMax(int, int)'

This happens because the symbol name _Z6customMaxii in main.o could not be found during linking. The symbol in customMax.o is customMax, and they do not match, leading to a linking error. As shown below:
gxxsymbol link gccsymbol

Introduction of extern “C”

It is precisely because of the reasons mentioned above that C++ object files cannot directly link to C library object files (this object file includes .o files and static/dynamic libraries generated by the C language compiler). The issue with calling C libraries from C++ lies in: how to “disguise” a function called from C++ as a C function. To achieve this, it must be explicitly stated that the related functions have C linkage; otherwise, the C++ compiler will assume it is a C++ function and apply name mangling to it. C++ introduces a linkage specification:

1
2
3
extern "C"{
int customMax(int,int);
}

This linkage method does not affect the original semantics of the customMax function; it merely informs the compiler that the symbol naming convention for the customMax function should follow C compiler rules in the target code.

Just like the code in the previous main.c, we can rewrite it:

1
2
3
4
5
6
7
8
// main.c
// extern int customMax(int,int);
extern "C" int customMax(int,int);
int main(int argc,char* argv[])
{
customMax(11, 12);
return 0;
}

Now, when we compile this with a C++ compiler and check the symbol information for the customMax function:

1
2
3
4
$ g++ -c main.c -o main.o
$ nm main.o
---------------- U customMax
0000000000000000 T main

We can see that the symbol information for customMax is no longer _Z6customMaxii. Now, using customMax.o, which was previously compiled with the C compiler, link it:

1
2
# OK
$ g++ main.o customMax.o

Rules of Name Mangling

In reality, different compilers have slightly different rules for name mangling. For related content on C++ ABI, see Itanium C++ ABI. The function overload in C++ relies on the mechanism of signatures. The C++ standard (this article references C++14) defines a set of signature rules for different categories of objects (including functions) (the linkage symbols of objects depend on these rules, which indicate what properties different objects have, allowing compilers to implement name mangling). Using signatures for function matching and overload resolution is not the main focus of this article:

Note: The following English sections are quoted from [ISO/IEC 14882:2014]

  • function: name, parameter type list (8.3.5), and enclosing namespace (if any)

Note that the function’s signature does not include the return type, which is also the reason why c++filt does not output the return type when converting symbol names back to function names. This also means that in C++, functions that only differ in return type are not overloaded but are redefined.

1
2
3
// error: conflicting types for 'func'
int func(int x){}
double func(int x){}

Moreover, the function signature does not include CV-qualifiers, which means that if overloaded functions only differ by CV-qualifiers in their parameter types, they are also not overloaded but redefined.

1
2
3
// error: conflicting types for 'func'
int func(int x){}
int func(const int x){}
  • function template: name, parameter type list (8.3.5), enclosing namespace (if any), return type, and template parameter list.
  • function template specialization: signature of the template of which it is a specialization and its template arguments (whether explicitly specified or deduced).
  • class member function: name, parameter type list (8.3.5), class of which the function is a member, cv-qualifiers (if any), and ref-qualifier (if any).
  • class member function template: name, parameter type list (8.3.5), class of which the function is a member, cv-qualifiers (if any), ref-qualifier (if any), return type, and template parameter list.
  • class member function template specialization: signature of the member function template of which it is a specialization and its template arguments (whether explicitly specified or deduced).

Calling C++ Functions from C

The reasons for having extern "C" in C++ and the rules of name mangling have been discussed, but now there is a question: Can C call functions compiled by C++ compilers with C++ style (name mangling)?

First, let’s discuss a fairly straightforward yet rough method:

Directly Specify the Symbol Name in the Object File

1
2
3
4
5
// customMax.cpp
#include "custonmMax.h"
int customMax(int x,int y){
return x>=y?x:y;
}

Compile it to generate the object file and then view its symbol information:

1
2
3
$ g++ -c customMax.cpp -o customMax.o
$ nm customMax.o
00000050 T _Z9customMaxii

After obtaining the symbol information for customMax in the object file, you can write a conditional inclusion in customMax.h:

1
2
3
4
5
6
7
8
9
10
// customMax.h
#ifndef __CUSTOM_MAX_H__
#define __CUSTOM_MAX_H__

#ifdef __cplusplus
int customMax(int,int);
#else
int _Z9customMaxii(int,int);
#endif
#endif

So when we use a C++ compiler to compile a code that includes customMax.h, it will include int customMax(int,int);, while if we compile using a non-C++ compiler, it will include int _Z9customMaxii(int,int);.

Next, write a C code that calls customMax in customMax.o (and compile using the C language compiler):

1
2
3
4
5
6
// main.c
#include "customMax.h"
int main(int argc,char* argv[])
{
_Z9customMaxii(12,14);
}

After compiling the above main.c using a C language compiler, generate main.o:

1
2
3
4
$ gcc -c main.c -o main.o
$ nm main.o
-------- U _Z9customMaxii
00000000 T main

Then link the object file generated from the C++ compiler customMax.o:

1
2
# Successful linking
$ gcc main.o customMax.o -o main.exe

However, there is a very cumbersome aspect: you need to retrieve the symbol name from the object file beforehand and then modify the declaration information in customMax.h, which is very tedious. This method is more suitable for cases where you cannot access the source code of the library you wish to link to and need to manually export the symbol for calls.

Wrapper Function

If we can access the source code of the library we want to link to, we can add a wrapper function: using extern "C" to wrap the target function (still using customMax.cpp as an example):

1
2
3
4
5
6
7
8
9
10
11
12
#include "customMax.h"

int customMax(int x,int y){
return x>=y?x:y;
}

// extern C call
extern "C"{
int customMaxii(int x,int y){
return customMax(x,y);
}
}

Now when we compile this code with a C++ compiler, we will see two symbol information entries in the object file:

1
2
3
$ g++ -c customMax.cpp -o customMax.o
00000000 T _Z9customMaxii
00000040 T customMaxii

Then, similarly create a conditional compilation in customMax.h:

1
2
3
4
5
6
7
8
9
#ifndef __CUSTOM_MAX_H__
#define __CUSTOM_MAX_H__

#ifdef __cplusplus
int customMax(int,int);
#else
int customMaxii(int,int);
#endif
#endif

Unlike the previous method, we do not need to manually retrieve the symbol information for customMax.o; instead, we are exporting a specific C language interface from the library.

Then once again use main.c to call:

1
2
3
4
5
#include "customMax.h"
int main(int argc,char* argv[])
{
customMaxii(12,14);
}

This should compile and link successfully with the object file compiled from C++.

Note: If you used C++ libraries in the customMax.cpp function, you must also include the corresponding symbol files/static/dynamic libraries during the linking process, otherwise, you will encounter undefined identifier errors. This is because the C++ compiler and C language compiler use different libraries for linking (static/dynamic), which would not cause issues when using the C++ compiler, as it automatically (implicitly links) scans symbol information from the library paths of the C++ compiler.

For instance, if I use the C++ standard library in the customMax function in customMax.cpp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include "customMax.h"
#include <iostream>

int customMax(int x,int y){
std::cout<<x<<std::endl<<y<<std::endl;
return x>=y?x:y;
}

// extern C call
extern "C"{
int customMaxii(int x,int y){
return customMax(x,y);
}
}

Compiling with a C++ compiler will not raise any errors, but linking with main.o will notify you of undefined identifier errors:

External References

Update Log

2017.04.12

  • Added code examples
  • Optimized some wording and modified function names
  • Updated images to correspond to the above modifications

2017.04.24

  • Added “Calling C++ Functions from C”
The article is finished. If you have any questions, please comment and communicate.

Scan the QR code on WeChat and follow me.

Title:Why is extern "C" needed?
Author:LIPENGZHA
Publish Date:2017/04/11 22:20
Update Date:2017/04/24 01:14
Word Count:10k Words
Link:https://en.imzlp.com/posts/5392/
License: CC BY-NC-SA 4.0
Reprinting of the full article is prohibited.
Your donation will encourage me to keep creating!