Deep Dive into C/C++ Compilation and Linking Techniques 6 — A2: Dynamic Library Design Basics — ABI Interface Design
Preface
In this blog post, we attempt to summarize and categorize some of the key technical points in dynamic library design, such as the design and export of binary interfaces.
So, Why Bring Up Binary Interfaces?
Fundamentally, the ultimate goal of designing a dynamic library (which we believe must be kept in mind at all times) is to reuse our code for others. Therefore, the details of code collaboration are what we must consider. In a blog post long ago, we simplified the abstract concept of a dynamic library into a set of exported symbols, written in a header file or a dedicated export file, serving as an interface for other users to know how to call the target functionality, along with the hidden machine code details behind it.
However, we know that what is written in human-readable files, such as function names and global variable names under various classes in header files, is indeed an interface, but it is obviously not a binary interface. It seems we have always been accustomed to the idea that as long as we export the specified symbols and provide the machine code for the concrete implementation, everything is fine. But due to the free-form nature of C++ (note that we are not talking about C here; in fact, this problem predominantly manifests in reusable libraries written in C++), the translation from human-readable APIs to machine-level ABIs handled by different compiler vendors' implementations is inconsistent! This has led to a series of issues that are no laughing matter. Below, we enumerate why and under what circumstances our C++ symbol exports and ABI interfacing suffer from severe inconsistencies, thereby causing headaches in software building.
More Complex Naming Rules
The mapping from C++ functions to linker symbols is determined by compiler vendors. Although there are indeed some standards constraining compiler vendors to produce as universal symbols as possible, unfortunately, taking g++ and MSVC as examples, there are still gaps. This means that the symbol lookup and mapping rules for the same symbol make it impossible for a project using the MSVC compiler to seamlessly use a library built with the g++ compiler directly (our other point being that, without taking certain measures, we would need to obtain the source code and recompile it; the methods we discuss later can finally circumvent this approach).
Readers might ask: how does this happen? Actually, it is quite easy to think of a series of code like this:
// 在C++中,我们很喜欢将一些方法放置到类中,
// OOP就是推介我们这样做的!
class Foo {
public:
void someFunc(int a, const char* b);
};
// 或者,我们喜欢放置一些工具类的函数到单独的命名空间中
namespace charlies_tools {
std::vector<std::string_view> split(const std::string& waited_splits, const char ch);
std::vector<std::string_view> split(const std::string& waited_splits, const std::string_view sp_view);
};As C++ programmers, we naturally use these features to avoid symbol-level conflicts and improve readability in software engineering.
Let's look at what the symbol names generated by g++ compilation look like:
0000000000000012 T _ZN14charlies_tools5splitERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEc
0000000000000022 T _ZN14charlies_tools5splitERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt17basic_string_viewIcS3_E
0000000000000000 T _ZN3Foo8someFuncEiPKcThen let's look at what MSVC produces:
00C 00000000 SECT4 notype () External | ?someFunc@Foo@@QAEXHPBD@Z (public: void __thiscall Foo::someFunc(int,char const *))
00D 00000010 SECT4 notype () External | ?split@charlies_tools@@YAXABV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@D@Z (void __cdecl charlies_tools::split(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &,char))
00E 00000020 SECT4 notype () External | ?split@charlies_tools@@YAXABV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@V?$basic_string_view@DU?$char_traits@D@std@@@3@@Z (void __cdecl charlies_tools::split(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &,class std::basic_string_view<char,struct std::char_traits<char> >))In fact, we can see that the symbols written into the relocatable files look completely different, indicating that we cannot universalize our symbols at all. In addition, we have features like overloading, a technique that allows us to provide the same function name with different parameter lists coexisting in a single object file, forcing our toolchains to put extra effort into handling these issues.
This decoration is called name mangling. Great, now we have to deal with these annoying problems.
Static Data Initialization Issues
In C, our data can often be considered trivial (ah, we prefer C too, at least it's controllable). For legacy code reasons, we are used to initializing these variables at the link stage. But in C++, we know that these data items can be objects, meaning there are constructor calls. If these objects are all independent of initialization order (that is, these objects do not have dependencies, meaning we don't absolutely have to initialize static object A before static object B), then it is actually fine. But the fear is having static objects with order dependencies. Because the CPU runs the program, the initialization order of these objects often has no fixed constraints, making it very easy to cause random program crashes.
Of course, this problem is easy to handle. We know that the initialization of data freely scattered in the data segment is uncertain, but if we put it inside a function, the object will only be initialized when execution reaches that point. Thus, if static object A indeed needs to be initialized before static object B, we can do this:
static void init_a_and_b() {
static A network_instance;
static B authentic_networks;
}
auto dummy = [](){
init_a_and_b();
return 0;
}();So, How to Design a Binary Interface with Fewer Headaches
Design C-Style Export Interfaces
Of course, you don't really need to prevent conflicts exactly like a C programmer or adopt C naming conventions. What we mean here is to avoid exporting symbols with the wildly varying ABI rules characteristic of C++. The solution is to decorate the symbols you decide to export with the extern "C" identifier.
#ifdef __cplusplus
extern "C"{
#endif
int functional_a(int a, int b);
#ifdef __cplusplus
}
#endifThis way, we can make the interface seen by the linker look much cleaner.
Provide a Header File with Complete ABI Declarations
Here, "providing a header file with complete ABI declarations" refers to a header file (.h) that contains all the necessary declarations, enabling the compiler to fully understand the interface of a library or module, thereby allowing it to:
- Correctly compile code that calls the library.
- Correctly generate machine code that interacts with the functions in the library.
The core of this "complete ABI declaration" is that it includes not only function names, but also all the details that affect binary-level interaction. That is why we have the saying—provide a header file with complete ABI declarations. Below, we discuss what a header file providing complete ABI declarations contains:
Function Declarations
This is the most basic part. It tells the compiler the function's name, return type, and parameter types.
// 不完整的声明 - 只知道名字和类型,但可能隐藏问题
int do_something(int a, int b);
// 更完整的声明 - 增加了extern "C"和异常规范
extern "C" int do_something(int a, int b) noexcept;Type Definitions
If custom structs or classes are used in the interface, their memory layout must be explicit.
// 完整的结构体声明,编译器能确定其大小和内存布局
struct MyData {
int id;
double value;
char name[32];
};
// 函数使用这个结构体
extern "C" void process_data(const MyData* data);If the header file does not have the complete definition of MyData, the compiler will not know how large sizeof(MyData) is, and will be unable to correctly allocate stack space or pass parameters for the process_data function call.
Macro and Constant Definitions
Used to define magic numbers or configurations used in the interface.
#define MAX_BUFFER_SIZE 1024
#define LIB_VERSION 0x00010002
extern "C" int initialize_lib(int buffer_capacity = MAX_BUFFER_SIZE);Including Other Header Files
If the declarations depend on other types (such as the standard library's size_t or custom types), the corresponding header files need to be included.
#include <stddef.h> // 为了使用 size_t
extern "C" void* allocate_buffer(size_t size);Reference
Verifying the Names
If you want to see the symbol differences produced by the MSVC and g++ compilers firsthand, we will explain how the results above were generated.
The MSVC compiler version we used is 19.44.35217, and the g++ version is 15.2.1.
We write the sample code above into test.cpp.
#include <string>
#include <string_view>
class Foo {
public:
void someFunc(int a, const char* b);
};
namespace charlies_tools {
void split(const std::string& waited_splits, const char ch);
void split(const std::string& waited_splits, const std::string_view sp_view);
};
void Foo::someFunc(int a, const char* b) { }
void charlies_tools::split(const std::string& waited_splits, const char ch) { }
void charlies_tools::split(const std::string& waited_splits, const std::string_view sp_view) { }Then, on a Linux machine, we use the -c flag to translate only test.cpp into machine code:
g++ -c test.cpp -o test_nameNext, we use the nm command to view the ABI:
[charliechen@Charliechen runaable_dynamic_library]$ nm test_name
0000000000000012 T _ZN14charlies_tools5splitERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEc
0000000000000022 T _ZN14charlies_tools5splitERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt17basic_string_viewIcS3_E
0000000000000000 T _ZN3Foo8someFuncEiPKcThis yields the results listed in the main text.
For MSVC, you need to open the VS Developer Prompt to initialize the MSVC toolchain environment. Then, assuming you have saved the code to test.cpp, we use the cl compiler, specifying the compile-only flag and the latest C++ standard flag, to get the following output:
D:\DownloadFromInternet>cl /c /std:c++latest test.cpp
用于 x86 的 Microsoft (R) C/C++ 优化编译器 19.44.35217 版
版权所有(C) Microsoft Corporation。保留所有权利。
/std:c++latest 作为最新的 C++
working 草稿中的语言功能预览提供。我们希望你提供有关 bug 和改进建议的反馈。
但是,请注意,这些功能按原样提供,没有支持,并且会随着工作草稿的变化
而更改或移除。有关详细信息,请参阅
https://go.microsoft.com/fwlink/?linkid=2045807。
test.cppAfterward, using the dumpbin tool, we get:
D:\DownloadFromInternet>dumpbin /SYMBOLS test.obj
Microsoft (R) COFF/PE Dumper Version 14.44.35217.0
Copyright (C) Microsoft Corporation. All rights reserved.
Dump of file test.obj
File Type: COFF OBJECT
COFF SYMBOL TABLE
000 01058991 ABS notype Static | @comp.id
001 80010191 ABS notype Static | @feat.00
002 00000003 ABS notype Static | @vol.md
003 00000000 SECT1 notype Static | .drectve
Section length 178, #relocs 0, #linenums 0, checksum 0
005 00000000 SECT2 notype Static | .debug$S
Section length 74, #relocs 0, #linenums 0, checksum 0
007 00000000 SECT3 notype Static | .bss
Section length 4, #relocs 0, #linenums 0, checksum 0, selection 2 (pick any)
009 00000000 SECT3 notype External | __Avx2WmemEnabledWeakValue
00A 00000000 SECT4 notype Static | .text$mn
Section length 25, #relocs 0, #linenums 0, checksum E54AE742
00C 00000000 SECT4 notype () External | ?someFunc@Foo@@QAEXHPBD@Z (public: void __thiscall Foo::someFunc(int,char const *))
00D 00000010 SECT4 notype () External | ?split@charlies_tools@@YAXABV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@D@Z (void __cdecl charlies_tools::split(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &,char))
00E 00000020 SECT4 notype () External | ?split@charlies_tools@@YAXABV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@V?$basic_string_view@DU?$char_traits@D@std@@@3@@Z (void __cdecl charlies_tools::split(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &,class std::basic_string_view<char,struct std::char_traits<char> >))
00F 00000000 SECT5 notype Static | .chks64
Section length 28, #relocs 0, #linenums 0, checksum 0
String Table Size = 0x123 bytes
Summary
4 .bss
28 .chks64
74 .debug$S
178 .drectve
25 .text$mn