Structs and Memory Alignment

If you have been writing C up to this point and have only used basic types—int, float, char, and the like—it is probably because you have not yet encountered a scenario where you need to bundle related data together for passing around. Once you start writing more substantial programs, such as a sensor data packet, a configuration table, or a communication protocol frame, you will find that loose variables are simply unmanageable. The struct is C's answer to this: it lets us combine data of different types into a single whole, which we can then pass, store, and operate on as one value.

But structs are far more than just "bundling data." The moment we place a struct in memory, the compiler does something behind the scenes that you might never have considered—memory alignment. It silently inserts padding bytes between your fields so that each field lands on an address the processor "prefers." If you are unaware of this, there will come a day when you are designing binary protocol frames, doing DMA transfers, or writing serialization code by hand, and those phantom bytes will drive you to question your sanity.

So in this chapter, we will not only learn how to define and use structs, but also thoroughly understand what a struct truly looks like in memory.

Learning Objectives
After completing this chapter, you will be able to:
[ ] Proficiently define, initialize, and operate on structs and their pointers
[ ] Understand the principles of memory alignment and the distribution rules for padding bytes
[ ] Use _Alignas, alignof, and offsetof for alignment control and verification
[ ] Master the use of designated initializers and flexible array members
[ ] Understand the evolution from C structs to C++ classes

Environment Setup

All of our following experiments will be conducted in this environment:

Platform: Linux x86_64 (WSL2 is also fine)
Compiler: GCC 13+ or Clang 17+
Compiler flags: -Wall -Wextra -std=c17

Step One — Mastering Struct Definition and Basic Operations

Defining a Struct

In C, we define a struct using the struct keyword followed by a pair of curly braces:

struct SensorReading {
    uint32_t timestamp;
    float temperature;
    float humidity;
    uint8_t status;
};

Note the semicolon at the end—forgetting it is one of the most common compilation errors for beginners, and the error message usually points to the next line, leaving you baffled. struct SensorReading is now a type name, but writing struct SensorReading every time is indeed a bit verbose, so we usually pair it with typedef to simplify:

typedef struct {
    uint32_t timestamp;
    float temperature;
    float humidity;
    uint8_t status;
} SensorReading;

This way we can directly write SensorReading reading; to declare variables, which is much cleaner. The two approaches are functionally equivalent; the only difference is how the type name is used: the former requires the struct prefix, while the latter does not. In real-world projects, the typedef approach is far more common, especially in embedded development—if you look at any MCU vendor's SDK, it is full of typedef struct.

Initialization and Assignment

There are several ways to initialize a struct, starting with the most basic. The first is sequential initialization—providing values in the order the fields are defined:

SensorReading r1 = {1700000000, 23.5f, 60.0f, 1};

This works, but readability is not great—you have to remember which position corresponds to which field, and if the struct definition changes the order, all initialization code must be updated accordingly. C99 gave us a better solution: the designated initializer, which allows initializing arbitrary fields by name:

SensorReading r2 = {
    .timestamp = 1700000000,
    .temperature = 23.5f,
    .humidity = 60.0f,
    .status = 1
};

// 不需要按定义顺序，也可以只初始化部分字段
SensorReading r3 = {
    .humidity = 45.0f,
    .status = 0
    // timestamp 和 temperature 自动初始化为 0
};

The benefits of designated initializers are obvious: the code is self-documenting, independent of field order, and unspecified fields are automatically zeroed. Frankly, in modern C code, as long as your compiler supports C99 (which practically all of them do), you should prefer designated initializers.

Struct assignment and initialization are two different things. Initialization happens at declaration, while assignment happens after declaration. C allows direct assignment between structs of the same type, which performs a byte-by-byte copy:

SensorReading r4;
r4 = r2;  // 把 r2 的所有字段复制到 r4

But be aware that struct assignment in C is a shallow copy—if the struct contains pointer members, the pointer fields of both structs will point to the same memory after assignment. This is a classic pitfall when dealing with structs that contain dynamically allocated memory.

Struct Pointers and the Arrow Operator

When a struct is large, or when we need to modify the caller's struct inside a function, passing a pointer is the only reasonable approach. This is where the difference between . and -> comes in:

SensorReading reading = {
    .timestamp = 1700000000,
    .temperature = 25.0f,
    .humidity = 50.0f,
    .status = 1
};

// 通过变量名直接访问——用点号
reading.temperature = 26.0f;

// 通过指针访问——用箭头
SensorReading* ptr = &reading;
ptr->humidity = 55.0f;
// 等价于 (*ptr).humidity = 55.0f

The -> operator is simply syntactic sugar for (*ptr)., nothing mysterious. But this syntactic sugar is so commonly used that you would almost never write (*ptr).—in C, whenever a function parameter is a struct pointer, you are almost certainly using ->.

Passing a struct pointer rather than the struct itself as a function parameter not only avoids expensive copy overhead but also allows the function to modify the caller's data. If you do not want the function to modify the data, just add const:

/// @brief 打印传感器读数（只读访问）
void print_reading(const SensorReading* r) {
    printf("T=%.1fC H=%.1f%% status=%u\n",
           r->temperature, r->humidity, r->status);
}

/// @brief 更新传感器状态（可修改）
void update_status(SensorReading* r, uint8_t new_status) {
    r->status = new_status;
}

This distinction between const SensorReading* and SensorReading* is inherited in C++ into const member functions and reference semantics, forming a more complete "read-only vs. mutable" interface design.

Step Two — Understanding Memory Alignment and Padding Bytes

Now we are entering the most core and potentially confusing part of this tutorial. Let's start with a question: how many bytes does the following struct occupy?

typedef struct {
    uint8_t  a;   // 1 字节
    uint32_t b;   // 4 字节
    uint8_t  c;   // 1 字节
} WeirdLayout;

Intuitively, 1 + 4 + 1 = 6 bytes, right? But in reality, on most 32-bit and 64-bit platforms, sizeof(WeirdLayout) is 12 bytes. Where did the extra 6 bytes go? The answer is that the compiler inserted them as padding bytes into the struct.

Why Alignment Is Necessary

When a processor accesses memory, it does not read one byte at a time. Most CPU architectures prefer to access data on 2-, 4-, or 8-byte boundaries—this is what we call alignment. An uint32_t placed at an address that is a multiple of 4 can be read in a single operation; but if it straddles a 4-byte boundary (for example, placed at address 3), the CPU might need to read twice and stitch the results together, which incurs a performance penalty. Some architectures are even more extreme—they will throw a hardware exception directly (for instance, ARM triggers a fault when accessing unaligned addresses in certain modes).

So, for performance and correctness, the compiler inserts padding bytes between struct members to ensure that each member lands on its naturally aligned address.

Rules for Alignment and Padding

There are essentially two alignment rules, but understanding them requires a bit of patience. The first rule: the starting address of each member must be an integer multiple of that member's alignment requirement. The alignment requirement for uint8_t is 1 (any address works), uint16_t is 2, uint32_t is 4, and double and uint64_t are 8, and so on—the alignment requirement of a basic type usually equals its size. The second rule: the total size of the struct itself must be an integer multiple of its largest alignment requirement—this ensures that in an array of structs, every element satisfies the alignment requirement.

Now let's return to the WeirdLayout example and draw it out byte by byte:

text

偏移  0  1  2  3  4  5  6  7  8  9  10  11
     [a ][pad pad pad][b         ][c ][pad pad pad]
      ^              ^           ^
      |              |           b: 偏移 4（4 的倍数，满足）
      |              填充 3 字节让 b 对齐到 4
      a: 偏移 0（1 的倍数，满足）

a is at offset 0, occupying 1 byte. The alignment requirement of b is 4, but the next available offset is 1, which is not a multiple of 4, so the compiler inserts 3 bytes of padding, letting b start at offset 4. c is at offset 8, with an alignment requirement of 1, which is fine. Finally, the struct's maximum alignment requirement is 4 (from uint32_t b), so the total size must be a multiple of 4—the current size is 9, so it is padded to 12.

This is why data that is only 6 bytes actually occupies 12 bytes—50% of the space is wasted on padding.

Reordering Fields to Reduce Padding

The solution to this problem is surprisingly simple: place fields with larger alignment requirements first, and smaller ones last. Let's reorder the fields of WeirdLayout:

typedef struct {
    uint32_t b;   // 4 字节，偏移 0
    uint8_t  a;   // 1 字节，偏移 4
    uint8_t  c;   // 1 字节，偏移 5
    // 填充 2 字节（偏移 6-7），使总大小为 4 的倍数
} BetterLayout;

Now sizeof(BetterLayout) is 8 bytes—saving one-third compared to the previous 12. b is at offset 0 (naturally aligned), a and c are packed right after it, and only 2 bytes of trailing padding are needed at the end. This technique is extremely useful in real-world engineering, especially on memory-constrained embedded systems—building the habit of ordering fields from largest to smallest alignment requirement is well worth it.

Verifying Offsets with offsetof

The C standard library provides the offsetof macro (defined in <stddef.h>), which can precisely tell you the offset of a specific field within a struct. We frequently use it when debugging alignment issues or designing binary protocols:

#include <stddef.h>
#include <stdio.h>

printf("offset of a: %zu\n", offsetof(WeirdLayout, a));  // 0
printf("offset of b: %zu\n", offsetof(WeirdLayout, b));  // 4
printf("offset of c: %zu\n", offsetof(WeirdLayout, c));  // 8
printf("total size: %zu\n", sizeof(WeirdLayout));         // 12

Make it a habit to print the offsets with offsetof right after defining a struct, especially when designing communication protocol frames—you will find that some fields' offsets differ from what you expected, and this usually indicates an alignment issue.

C11 Alignment Control: _Alignas and alignof

In the C99 era, if you needed to manually control alignment, you could only rely on compiler extensions—GCC's __attribute__((aligned(n))), MSVC's __declspec(align(n)), and the like. C11 finally standardized this capability, providing the _Alignas and _Alignof keywords, along with the more friendly macro aliases alignas and alignof (defined in <stdalign.h>).

alignof: Querying Alignment Requirements

alignof can query the alignment requirement of any type:

#include <stdalign.h>
#include <stdio.h>

printf("alignof(uint8_t)  = %zu\n", alignof(uint8_t));   // 1
printf("alignof(uint32_t) = %zu\n", alignof(uint32_t));  // 4
printf("alignof(double)   = %zu\n", alignof(double));    // 通常 8
printf("alignof(WeirdLayout) = %zu\n", alignof(WeirdLayout)); // 4

A struct's alignment requirement equals the largest alignment requirement among its members. WeirdLayout contains uint32_t, so the overall alignment requirement is 4.

alignas: Forcing Alignment

alignas can be used to force a variable or struct member to be allocated on a specified alignment boundary. This is very useful in embedded development—for example, DMA transfers typically require the buffer's starting address to be 4-byte or even 32-byte aligned:

#include <stdalign.h>

// 强制 DMA 缓冲区 32 字节对齐
alignas(32) uint8_t dma_buffer[256];

// 在结构体中强制某个字段的对齐
typedef struct {
    uint8_t header;
    alignas(4) uint32_t payload;  // 即使前面有 header，也保证 payload 4 字节对齐
} AlignedFrame;

The argument to alignas must be a power of two and cannot be less than the type's natural alignment requirement. If you write alignas(2) for an uint32_t, the compiler will ignore it or report an error—because uint32_t inherently requires 4-byte alignment, and you cannot reduce it to 2.

Designated Initializers in Detail

We briefly mentioned designated initializers earlier; now let's take a deeper look at their full capabilities. Designated initializers are a feature introduced in C99 that allows you to use the .成员名 = 值 syntax to specify which fields to initialize when initializing structs, unions, and arrays.

In addition to the basic usage shown earlier, there are some noteworthy details. For example, you can mix sequential initialization with designated initializers:

typedef struct {
    uint16_t x;
    uint16_t y;
    uint16_t z;
    uint16_t flags;
} Point3D;

Point3D p1 = {
    10, 20,        // x=10, y=20（顺序初始化）
    .flags = 0xFF  // 指定初始化 flags
    // z 自动为 0
};

You can also use designated initializers in arrays:

// 稀疏初始化——只初始化需要的下标
uint8_t lookup[256] = {
    ['A'] = 1,
    ['B'] = 2,
    ['C'] = 3,
    // 其余全部为 0
};

This approach is particularly convenient when creating ASCII character mapping tables or command dispatch tables—it is much clearer than manually writing an initializer list with 256 elements. Unspecified elements are automatically initialized to zero (just like global variables).

Step Three — Understanding Flexible Array Members

A flexible array member (FAM) is a feature introduced in C99 that allows placing an array of unspecified size at the end of a struct. It sounds a bit odd, but its use cases are highly practical—when you need a struct with "variable-length trailing data," FAM is the cleanest approach.

typedef struct {
    uint16_t length;
    uint8_t  type;
    uint8_t  data[];  // 柔性数组成员，不占结构体大小
} Packet;

data[] is an incomplete array type—it occupies no space within the struct (sizeof(Packet) does not include the size of data), but it tells the compiler "this struct might be followed by a contiguous block of memory." When using it, we need to manually allocate enough memory to hold the struct itself plus the data:

#include <stdlib.h>
#include <string.h>

/// @brief 创建一个指定长度的数据包
Packet* create_packet(uint8_t type, const uint8_t* payload, uint16_t len) {
    // 分配：结构体大小 + 数据长度
    Packet* pkt = malloc(sizeof(Packet) + len);
    if (pkt == NULL) {
        return NULL;
    }
    pkt->type = type;
    pkt->length = len;
    memcpy(pkt->data, payload, len);
    return pkt;
}

// 使用
uint8_t payload[] = {0x01, 0x02, 0x03};
Packet* pkt = create_packet(0x42, payload, sizeof(payload));
// 访问 pkt->data[0], pkt->data[1], pkt->data[2]
free(pkt);

Flexible array members are heavily used in communication protocols, variable-length message handling, and packet parsing. In the early days of C, people used a trick called the "struct hack" to achieve similar functionality—placing an array of length 1 (or 0) at the end of a struct and then allocating extra space. But that was undefined behavior; C99's FAM is the standard approach.

One thing to note: structs containing a flexible array member cannot be passed or copied by value—because sizeof does not know how large the trailing data is. You can only operate on them through pointers.

Arrays of Structs

Combining structs with arrays is a very common way to organize data. A configuration table, a set of sensor readings, or a message queue are essentially all arrays of structs:

typedef struct {
    uint8_t  id;
    uint16_t timeout_ms;
    uint8_t  retry_count;
    uint8_t  priority;
} TaskConfig;

// 初始化一个结构体数组
TaskConfig config_table[] = {
    {.id = 1, .timeout_ms = 100, .retry_count = 3, .priority = 2},
    {.id = 2, .timeout_ms = 200, .retry_count = 5, .priority = 1},
    {.id = 3, .timeout_ms = 50,  .retry_count = 1, .priority = 3},
};

// 获取数组元素个数
size_t task_count = sizeof(config_table) / sizeof(config_table[0]);

Iterating over an array of structs is the same as with a regular array—you can use subscripts or pointers:

/// @brief 按优先级查找最高优先级任务的 ID
uint8_t find_highest_priority(const TaskConfig* tasks, size_t count) {
    uint8_t max_priority = 0;
    uint8_t result_id = 0;

    for (size_t i = 0; i < count; i++) {
        if (tasks[i].priority > max_priority) {
            max_priority = tasks[i].priority;
            result_id = tasks[i].id;
        }
    }
    return result_id;
}

The memory layout of a struct array is tightly packed—the size of each element is sizeof(TaskConfig) (including padding), and the address of the i-th element is base + i * sizeof(TaskConfig). This is also why trailing padding is needed at the end of a struct—if there were no padding, the fields of the second element in the array might be misaligned.

`attribute((packed))`: Removing Padding

In some scenarios, we genuinely need a struct without any padding—the most typical case being binary communication protocols. The data received by an MCU via UART/SPI/I2C is a tightly packed byte stream; if the struct has padding, directly casting a pointer to interpret it will yield incorrect values. GCC and Clang provide __attribute__((packed)) to remove padding:

typedef struct __attribute__((packed)) {
    uint8_t  header;
    uint16_t length;
    uint8_t  command;
    uint32_t parameter;
} PackedFrame;

With this attribute, sizeof(PackedFrame) is a pure 1 + 2 + 1 + 4 = 8 bytes, with no padding at all. But be aware of the cost—accessing unaligned fields on some architectures will cause performance degradation or even a hardware exception. Therefore, packed should only be used when you genuinely need a compact layout, not sprinkled everywhere. The ARM Cortex-M series can handle unaligned access in most cases (with a performance penalty), but some older architectures (like the ARM7TDMI) will fault directly.

A safer approach is: use a packed struct at the communication layer to parse raw bytes, then immediately convert it to an aligned internal struct for use. This separates parsing from business logic, letting each do what it does best.

C++ Connections

Evolution from struct to class

In C, a struct can only contain data members—no member functions, no access control, no inheritance. C++ retains the struct keyword but gives it nearly the same capabilities as class. The only difference is the default access specifier: members of struct are public by default, while members of class are private by default. Beyond that, a C++ struct can have constructors, destructors, member functions, inheritance, virtual functions—it can do anything.

cpp

// C++ 中的 struct——可以有成员函数
struct SensorReading {
    uint32_t timestamp;
    float temperature;
    float humidity;

    // 成员函数
    bool is_overheating() const {
        return temperature > 85.0f;
    }

    void print() const {
        printf("T=%.1fC H=%.1f%%\n", temperature, humidity);
    }
};

So when you see struct in C++ code, do not assume it is the same as a C struct—it is simply a class with public default access.

POD Types and Trivially Copyable

C++ has a specific concept for "simple structs compatible with C": POD types (Plain Old Data). Simply put, if a struct has no virtual functions, no non-trivial constructors or destructors, and all members are POD types, then it is itself a POD. POD types can be safely copied with memcpy, zeroed with memset, and safely binary-serialized and deserialized—because their memory layout is completely consistent with C.

After C++11, the POD concept was refined into several more precise type traits: is_trivially_copyable, is_standard_layout, and so on. Understanding these concepts is very important in cross-language interaction (C/C++ mixed programming), binary serialization, and shared memory communication.

std::aligned_storage

The C++ standard library provides std::aligned_storage (from C++11 onward, replaced by alignas in C++23), which is a type trait tool for manually controlling the alignment of a block of raw memory. It is used in advanced scenarios such as implementing type-erased containers, memory pools, and placement new:

cpp

#include <type_traits>

// 分配一块 64 字节对齐的原始内存
alignas(64) std::byte storage[sizeof(MyStruct)];

// 或者使用 std::aligned_storage（C++23 前的做法）
using AlignedStorage = std::aligned_storage_t<sizeof(MyStruct), alignof(MyStruct)>;

These concepts will be discussed in detail in later C++ chapters. For now, you just need to know that the approach to alignment control in C is implemented more systematically and safely in C++.

Summary

In this tutorial, we thoroughly broke down structs from "how to use them" to "what they look like in memory." Structs are the most core composite type in C, and understanding their memory layout—especially alignment and padding—is the foundation for writing efficient, correct, and portable code.

Key Takeaways

[ ] Structs are defined with typedef struct { ... } Name;, and members are accessed via pointers using ->
[ ] C99 designated initializers (.field = value) are safer and more readable than sequential initialization
[ ] The compiler inserts padding bytes between members and at the end of the struct to ensure each member is aligned
[ ] Ordering fields from largest to smallest alignment requirement reduces padding and saves memory
[ ] The offsetof macro can precisely verify the offset of each field
[ ] C11's alignas/alignof provide standardized alignment control capabilities
[ ] Flexible array members are used for variable-length trailing data and must be used via pointers and dynamic allocation
[ ] __attribute__((packed)) removes padding for binary protocol parsing, but incurs performance and portability costs
[ ] C++'s struct is a class with default public access; POD types maintain a C-compatible memory layout

Exercises

Exercise: Design a Communication Protocol Frame with Manual Alignment Control

Please design a binary protocol frame structure for embedded device communication. The requirements are as follows:

The frame header contains a 1-byte start flag 0xAA, a 1-byte frame type, a 2-byte payload length, and a 4-byte timestamp
The payload section is variable-length data (use a flexible array member)
The frame footer contains a 2-byte CRC16 checksum
Use _Alignas to ensure the timestamp field is 4-byte aligned
Use __attribute__((packed)) to ensure the frame structure is compact (suitable for directly casting and parsing a byte stream)
Write a function that uses offsetof to print the offset of each field to verify the layout

#include <stddef.h>
#include <stdint.h>
#include <stdio.h>

// TODO: 定义 Frame 结构体
// typedef struct __attribute__((packed)) {
//     ...
// } Frame;

// TODO: 实现 print_frame_layout() 函数
// 使用 offsetof 打印每个字段的偏移量

// TODO: 实现 create_frame() 函数
// 分配内存并填充帧数据（含柔性数组成员）

int main(void) {
    print_frame_layout();

    // TODO: 创建一个测试帧并验证偏移
    return 0;
}

Hint: You need to be careful when using alignas inside a packed struct—packed removes automatic padding, but alignas can force the alignment of a specific field. Think about this: in a packed struct, if the offset from the frame header to the timestamp is not a multiple of 4, how would you handle it?

进阶专题

Structs and Memory Alignment

Environment Setup

Step One — Mastering Struct Definition and Basic Operations

Defining a Struct

Initialization and Assignment

Struct Pointers and the Arrow Operator

Step Two — Understanding Memory Alignment and Padding Bytes

Why Alignment Is Necessary

Rules for Alignment and Padding

Reordering Fields to Reduce Padding

Verifying Offsets with offsetof

C11 Alignment Control: _Alignas and alignof

alignof: Querying Alignment Requirements

alignas: Forcing Alignment

Designated Initializers in Detail

Step Three — Understanding Flexible Array Members

Arrays of Structs

`attribute((packed))`: Removing Padding

C++ Connections

Evolution from struct to class

POD Types and Trivially Copyable

std::aligned_storage

Summary

Key Takeaways

Exercises

Exercise: Design a Communication Protocol Frame with Manual Alignment Control

References

Structs and Memory Alignment ​

Environment Setup ​

Step One — Mastering Struct Definition and Basic Operations ​

Defining a Struct ​

Initialization and Assignment ​

Struct Pointers and the Arrow Operator ​

Step Two — Understanding Memory Alignment and Padding Bytes ​

Why Alignment Is Necessary ​

Rules for Alignment and Padding ​

Reordering Fields to Reduce Padding ​

Verifying Offsets with offsetof ​

C11 Alignment Control: _Alignas and alignof ​

alignof: Querying Alignment Requirements ​

alignas: Forcing Alignment ​

Designated Initializers in Detail ​

Step Three — Understanding Flexible Array Members ​

Arrays of Structs ​

__attribute__((packed)): Removing Padding ​

C++ Connections ​

Evolution from struct to class ​

POD Types and Trivially Copyable ​

std::aligned_storage ​

Summary ​

Key Takeaways ​

Exercises ​

Exercise: Design a Communication Protocol Frame with Manual Alignment Control ​

References ​

Structs and Memory Alignment

Environment Setup

Step One — Mastering Struct Definition and Basic Operations

Defining a Struct

Initialization and Assignment

Struct Pointers and the Arrow Operator

Step Two — Understanding Memory Alignment and Padding Bytes

Why Alignment Is Necessary

Rules for Alignment and Padding

Reordering Fields to Reduce Padding

Verifying Offsets with offsetof

C11 Alignment Control: _Alignas and alignof

alignof: Querying Alignment Requirements

alignas: Forcing Alignment

Designated Initializers in Detail

Step Three — Understanding Flexible Array Members

Arrays of Structs

`attribute((packed))`: Removing Padding

C++ Connections

Evolution from struct to class

POD Types and Trivially Copyable

std::aligned_storage

Summary

Key Takeaways

Exercises

Exercise: Design a Communication Protocol Frame with Manual Alignment Control

References