跳转至

动态内存的代价:碎片化与不确定性(内存布局、碎片化与内存对齐)

前言

在嵌入式系统中,动态内存看起来方便,但它带来的代价往往被低估——碎片化、时序不确定性、对齐与结构填充问题会悄悄吞噬资源与可靠性。

我们都知道,都是嵌入式了,资源非常有限,内存分配的微小决策会影响稳定性、实时性与功耗。理解动态内存的代价,能让你在设计时避免灾难性错误 —— 或者在不得不使用动态内存时把风险降到最低。


内存布局快速回顾:静态、堆、栈

开始之前,回顾一下概念:

  • 静态区(.data/.bss/.rodata):编译期或链接时大小确定,全局变量、常量、只读数据。生命周期与程序相同,碎片化风险几乎为零,但灵活性低。
  • 栈(stack):函数调用局部变量、自动对象。分配/释放速度非常快(通常是指针增减),规则性强,生命周期由作用域控制。缺点是容量有限、不可跨任务共享、不适合大对象或可变生命周期对象。
  • 堆(heap):运行时动态分配(malloc / new / operator new 等)。灵活但代价明显:分配和释放时间不确定、会产生碎片、内存布局非线性。

在嵌入式里,首选顺序一般是:栈(若大小允许)→ 静态(可预分配)→ 堆(谨慎使用、最好受控)。


碎片化:什么、为什么以及如何影响系统

内部碎片(Internal fragmentation)

当分配器为满足对齐或最小分配单位而分配比实际请求更大的块,这部分未用空间就是内部碎片。例:

  • 分配器以 16 字节粒度分配,一个 20 字节对象会占用 32 字节(16×2),多出的 12 字节即内部碎片。
  • 小对象频繁分配但分配单位较大,会导致内存利用率下降。

外部碎片(External fragmentation)

堆中有许多空闲块,但这些空闲块分散、不连续,无法合并成足够大的连续空间以满足较大分配请求。结果可能出现内存总量足够但无法分配的情况("可用内存碎片化")。我们得到的表现是——

  • 随运行时间增长,可用大块内存减少,偶发 new/malloc 失败。
  • 系统表现为间歇性崩溃、内存泄漏样症状、长期运行后稳定性下降。
  • 实时任务出现长尾延迟(偶发的长时间分配/回收操作)。

对齐(alignment)与填充(padding)

为什么需要对齐

CPU 通常期望某些数据按其自然边界对齐(例如 4 字节对齐、8 字节对齐),否则访问变慢或在某些架构上产生硬件异常。对齐也影响 DMA、外设访问和缓存一致性。

结构体填充示例

// 假设:sizeof(char)=1, sizeof(int32_t)=4
struct A {
    char c;      // offset 0
    int32_t x;   // 如果按照 4 字节对齐,x 的 offset 通常是 4
};              // sizeof(A) 通常是 8(包括 3 字节填充)

char 占 1 字节,int32_t 需要 4 字节对齐,因此编译器在 c 后插入 3 字节填充,结构体总大小对齐到 4 的倍数(这里为 8)。

将大对齐要求的成员放前面可以减少填充:

struct B {
    int32_t x;
    char c;
}; // sizeof(B) 通常是 8,但如果有更多小成员,将更紧凑

或者使用 #pragma pack__attribute__((packed)) 强制去掉填充,但注意:

  • 去掉填充后读取未对齐的成员在某些架构上性能大幅下降或产生硬件异常。
  • 仅在明确知道后果且为节省空间必须时使用。

与 DMA / cacheline 的关系

  • DMA 要求缓冲区对齐到外设要求(例如 32 字节)。未对齐会导致硬件拒绝或性能严重下降。
  • 对齐到 cacheline(通常 32/64 字节)有助于避免伪共享和缓存抖动,尤其在多核或与 DMA 并发访问时重要。

动态内存的不确定性:时间与可重复性问题

  • 分配/释放时间不确定:通用堆实现存在复杂的数据结构(自由列表、树、位图),导致 malloc/free 的执行时间不可预测,可能有长尾延迟。
  • 并发与锁争用:多线程环境下堆通常需要锁或线程局部缓存(TLC);锁争用会影响实时性。
  • 不可恢复的碎片化:对于 C/C++ 的普通堆,碎片化一旦形成,很难在线性时间内恢复,必须通过重启或专门的紧缩策略(通常不现实)来解决。

嵌入式系统尤其敏感:长尾延迟可能导致丢帧、控制超时或安全问题。


嵌入式常用替代方案与混合策略

所以咋办,下面快速说几种常见的策略:

内存池(Pool / Slab)

  • 将内存分成固定大小的块(例如 32B、64B、256B)。分配返回块索引或指针,释放将块放回空闲链表。
  • 优点:分配/释放常数时间(O(1)),不会发生外部碎片(只要所有对象大小匹配某个池)。
  • 缺点:对不同大小对象需要多个池,内存利用取决于分配粒度,会产生内部碎片。

Bump / Arena 分配器(单向分配器)

  • 从一个连续缓冲区线性分配,释放通常是一次性(整个 arena 重置)。
  • 非常快,且没有碎片;适合生命周期一致的对象(例如一次任务或一次初始化期间的临时对象)。
  • 不适合需要任意释放的对象。

Slab 分配(Linux 风格)

  • 适合缓存相同类型对象(内核对象),可在释放时重用已初始化的对象,减少构造/销毁开销。

对象池 + RAII(C++ 风格)

  • std::unique_ptr<T, Deleter> 或自定义智能指针与内存池结合,保证异常安全与自动释放。

代码示例

查看完整可编译示例
#include <iostream>
#include <cstdint>
#include <cstddef>

// 演示内存布局:静态区、栈、堆

// 静态区 - 全局变量
static int global_var = 42;

// .rodata - 只读常量
static const char rodata_str[] = "This is in read-only memory";

void stack_allocation_demo() {
    // 栈上分配
    int stack_var = 100;
    std::cout << "Stack variable address: " << &stack_var << "\n";
    std::cout << "Stack variable value: " << stack_var << "\n";

    // 栈上的数组
    uint8_t stack_buffer[128];
    std::cout << "Stack buffer address: " << static_cast<void*>(stack_buffer) << "\n";

    // 危险:大栈分配可能导致溢出
    // uint8_t big_buffer[64 * 1024];  // 不要这样做!
}

void heap_allocation_demo() {
    // 堆上分配 - 在嵌入式系统中要谨慎使用
    int* heap_var = new int(200);
    std::cout << "Heap variable address: " << heap_var << "\n";
    std::cout << "Heap variable value: " << *heap_var << "\n";

    // 必须记得释放
    delete heap_var;

    // 数组分配
    const size_t n = 10;
    int* heap_array = new int[n];
    for (size_t i = 0; i < n; ++i) {
        heap_array[i] = static_cast<int>(i * i);
    }

    // 使用数组...
    for (size_t i = 0; i < n; ++i) {
        std::cout << "heap_array[" << i << "] = " << heap_array[i] << "\n";
    }

    delete[] heap_array;
}

// 静态分配示例
void static_allocation_demo() {
    // static 局部变量 - 在静态区,只初始化一次
    static int static_counter = 0;
    static_counter++;
    std::cout << "Static counter (preserved across calls): " << static_counter << "\n";
    std::cout << "Static variable address: " << &static_counter << "\n";
}

struct AlignmentDemo {
    char c;      // offset 0
    // 3 bytes padding
    int32_t x;   // offset 4 (4-byte aligned)
    char d;      // offset 8
    // 3 bytes padding to make sizeof 12
}; // sizeof(AlignmentDemo) = 12

struct PackedDemo {
    int32_t x;   // offset 0
    char c;      // offset 4
    char d;      // offset 5
    // 2 bytes padding to make sizeof 8
}; // sizeof(PackedDemo) = 8

void alignment_demo() {
    std::cout << "sizeof(AlignmentDemo) = " << sizeof(AlignmentDemo) << "\n";
    std::cout << "alignof(AlignmentDemo) = " << alignof(AlignmentDemo) << "\n";

    std::cout << "sizeof(PackedDemo) = " << sizeof(PackedDemo) << "\n";
    std::cout << "alignof(PackedDemo) = " << alignof(PackedDemo) << "\n";

    AlignmentDemo a;
    std::cout << "Address of a.c: " << static_cast<void*>(&a.c) << "\n";
    std::cout << "Address of a.x: " << &a.x << "\n";
    std::cout << "Address of a.d: " << static_cast<void*>(&a.d) << "\n";
}

int main() {
    std::cout << "=== Memory Layout Demo ===\n\n";

    std::cout << "Global variable address: " << &global_var << "\n";
    std::cout << "ROdata string address: " << static_cast<const void*>(rodata_str) << "\n\n";

    std::cout << "--- Stack Allocation ---\n";
    stack_allocation_demo();
    std::cout << "\n";

    std::cout << "--- Heap Allocation ---\n";
    heap_allocation_demo();
    std::cout << "\n";

    std::cout << "--- Static Allocation (multiple calls) ---\n";
    static_allocation_demo();
    static_allocation_demo();
    static_allocation_demo();
    std::cout << "\n";

    std::cout << "--- Alignment and Padding ---\n";
    alignment_demo();

    return 0;
}
#include <iostream>
#include <cstdlib>
#include <vector>
#include <random>
#include <iomanip>

// 演示内存碎片化问题

class AllocationTracker {
    static size_t total_allocations;
    static size_t total_deallocations;
    static size_t current_bytes;
    static size_t peak_bytes;

public:
    static void* allocate(size_t size) {
        void* ptr = malloc(size);
        if (ptr) {
            total_allocations++;
            current_bytes += size;
            if (current_bytes > peak_bytes) {
                peak_bytes = current_bytes;
            }
        }
        return ptr;
    }

    static void deallocate(void* ptr, size_t size) {
        if (ptr) {
            total_deallocations++;
            current_bytes -= size;
            free(ptr);
        }
    }

    static void print_stats() {
        std::cout << "=== Allocation Statistics ===\n";
        std::cout << "Total allocations: " << total_allocations << "\n";
        std::cout << "Total deallocations: " << total_deallocations << "\n";
        std::cout << "Current bytes: " << current_bytes << "\n";
        std::cout << "Peak bytes: " << peak_bytes << "\n";
        std::cout << "============================\n";
    }
};

size_t AllocationTracker::total_allocations = 0;
size_t AllocationTracker::total_deallocations = 0;
size_t AllocationTracker::current_bytes = 0;
size_t AllocationTracker::peak_bytes = 0;

void fragmentation_scenario_1() {
    std::cout << "\n--- Scenario 1: Mixed Sizes ---\n";
    std::vector<void*> allocations;

    // 分配不同大小的块
    sizes_t sizes[] = {16, 32, 64, 128, 256, 512, 1024};

    for (int round = 0; round < 3; ++round) {
        for (size_t size : sizes) {
            void* ptr = AllocationTracker::allocate(size);
            if (ptr) {
                allocations.push_back(ptr);
                std::cout << "Allocated " << size << " bytes at " << ptr << "\n";
            }
        }
    }

    AllocationTracker::print_stats();

    // 释放一些(不是全部)- 制造外部碎片
    std::cout << "\nFreeing every other allocation...\n";
    for (size_t i = 1; i < allocations.size(); i += 2) {
        // 估算大小用于统计(实际应用中需要跟踪)
        size_t idx = (i % 7);
        size_t sizes_arr[] = {16, 32, 64, 128, 256, 512, 1024};
        size_t size = sizes_arr[idx];
        AllocationTracker::deallocate(allocations[i], size);
        std::cout << "Freed at " << allocations[i] << "\n";
    }

    AllocationTracker::print_stats();

    // 清理剩余
    for (size_t i = 0; i < allocations.size(); ++i) {
        if (i % 2 == 0 && allocations[i]) {
            size_t idx = (i % 7);
            size_t sizes_arr[] = {16, 32, 64, 128, 256, 512, 1024};
            size_t size = sizes_arr[idx];
            AllocationTracker::deallocate(allocations[i], size);
        }
    }
}

void fragmentation_scenario_2() {
    std::cout << "\n--- Scenario 2: Allocation/Deallocation Pattern ---\n";
    std::vector<void*> pool;

    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_int_distribution<> size_dist(64, 512);
    std::uniform_int_distribution<> action_dist(0, 1);

    for (int i = 0; i < 50; ++i) {
        if (action_dist(gen) == 0 || pool.empty()) {
            // 分配
            size_t size = size_dist(gen);
            void* ptr = malloc(size);
            if (ptr) {
                pool.push_back(ptr);
                std::cout << "Alloc " << size << " bytes at " << ptr << "\n";
            }
        } else {
            // 释放随机一个
            size_t idx = std::uniform_int_distribution<>(0, static_cast<int>(pool.size()) - 1)(gen);
            void* ptr = pool[idx];
            pool.erase(pool.begin() + idx);
            free(ptr);
            std::cout << "Freed at " << ptr << "\n";
        }
    }

    // 清理
    for (void* ptr : pool) {
        free(ptr);
    }

    std::cout << "Pool size: " << pool.size() << " allocations remaining\n";
}

struct Block {
    void* ptr;
    size_t size;
};

void fragmentation_scenario_3() {
    std::cout << "\n--- Scenario 3: Internal Fragmentation ---\n";

    // 假设分配器以16字节为粒度
    constexpr size_t granularity = 16;

    struct Alloc {
        size_t requested;
        size_t actual;
    };

    Alloc allocs[] = {
        {4, 16},   // 请求4字节,实际分配16字节(12字节内部碎片)
        {20, 32},  // 请求20字节,实际分配32字节(12字节内部碎片)
        {33, 48},  // 请求33字节,实际分配48字节(15字节内部碎片)
        {65, 80},  // 请求65字节,实际分配80字节(15字节内部碎片)
    };

    size_t total_requested = 0;
    size_t total_actual = 0;
    size_t total_internal_fragmentation = 0;

    for (const auto& alloc : allocs) {
        total_requested += alloc.requested;
        total_actual += alloc.actual;
        size_t frag = alloc.actual - alloc.requested;
        total_internal_fragmentation += frag;
        std::cout << "Requested: " << std::setw(3) << alloc.requested
                  << " bytes, Allocated: " << std::setw(3) << alloc.actual
                  << " bytes, Wasted: " << std::setw(3) << frag << " bytes\n";
    }

    std::cout << "\nTotal requested: " << total_requested << " bytes\n";
    std::cout << "Total allocated: " << total_actual << " bytes\n";
    std::cout << "Internal fragmentation: " << total_internal_fragmentation << " bytes\n";
    std::cout << "Efficiency: " << (100.0 * total_requested / total_actual) << "%\n";
}

int main() {
    std::cout << "=== Memory Fragmentation Demo ===\n";

    fragmentation_scenario_1();
    fragmentation_scenario_2();
    fragmentation_scenario_3();

    std::cout << "\n=== Key Takeaways ===\n";
    std::cout << "1. Mixed allocation sizes cause external fragmentation\n";
    std::cout << "2. Frequent alloc/dealloc patterns make fragmentation worse\n";
    std::cout << "3. Internal fragmentation wastes memory due to alignment/rounding\n";
    std::cout << "4. Use fixed-size pools or arena allocators to avoid fragmentation\n";

    return 0;
}
#include <iostream>
#include <cstdint>
#include <cstdlib>
#include <cstring>

// 演示对齐和结构体填充

// 示例1:默认对齐
struct DefaultAlignment {
    char a;      // offset 0, size 1
                 // 3 bytes padding
    int b;       // offset 4, size 4
    char c;      // offset 8, size 1
                 // 3 bytes padding
};               // total: 12 bytes

// 示例2:优化后的布局
struct OptimizedAlignment {
    int b;       // offset 0, size 4
    char a;      // offset 4, size 1
    char c;      // offset 5, size 1
                 // 2 bytes padding
};               // total: 8 bytes

// 示例3:使用 #pragma pack (不推荐,除非必要)
#pragma pack(push, 1)
struct PackedStruct {
    char a;      // offset 0, size 1
    int b;       // offset 1, size 4
    char c;      // offset 5, size 1
};               // total: 6 bytes
#pragma pack(pop)

// 示例4:DMA对齐要求示例
struct __attribute__((aligned(32))) AlignedBuffer {
    uint8_t data[256];  // 32字节对齐,适合DMA
};

// 演示未对齐访问的问题
void misaligned_access_demo() {
    std::cout << "\n--- Misaligned Access Demo ---\n";

    // 创建一个故意未对齐的缓冲区
    char buffer[32] = {0};

    // 在偏移1处放置一个int(未对齐!)
    buffer[0] = 0xFF;
    uint32_t* misaligned = reinterpret_cast<uint32_t*>(&buffer[1]);

    std::cout << "Buffer address: " << static_cast<void*>(buffer) << "\n";
    std::cout << "Misaligned address: " << misaligned << "\n";
    std::cout << "Is aligned to 4 bytes? " << (reinterpret_cast<uintptr_t>(misaligned) % 4 == 0) << "\n";

    // 在x86上这可能会工作,但会慢;在某些架构上会崩溃
    // *misaligned = 0x12345678;  // 取消注释以测试
}

// 对齐辅助函数
template<typename T>
T* align_up(T* ptr, size_t alignment) {
    uintptr_t addr = reinterpret_cast<uintptr_t>(ptr);
    uintptr_t aligned = (addr + alignment - 1) & ~(alignment - 1);
    return reinterpret_cast<T*>(aligned);
}

void custom_alignment_demo() {
    std::cout << "\n--- Custom Alignment Demo ---\n";

    // 原始缓冲区
    alignas(8) char buffer[128];

    std::cout << "Buffer address: " << static_cast<void*>(buffer) << "\n";

    // 手动对齐到32字节
    void* aligned_32 = align_up(buffer + 1, 32);  // +1 故意制造未对齐
    std::cout << "32-byte aligned address: " << aligned_32 << "\n";
    std::cout << "Is aligned? " << (reinterpret_cast<uintptr_t>(aligned_32) % 32 == 0) << "\n";

    // 对齐到64字节
    void* aligned_64 = align_up(buffer + 3, 64);
    std::cout << "64-byte aligned address: " << aligned_64 << "\n";
    std::cout << "Is aligned? " << (reinterpret_cast<uintptr_t>(aligned_64) % 64 == 0) << "\n";
}

void struct_size_demo() {
    std::cout << "\n--- Struct Size and Alignment ---\n";

    std::cout << "DefaultAlignment:\n";
    std::cout << "  sizeof: " << sizeof(DefaultAlignment) << " bytes\n";
    std::cout << "  alignof: " << alignof(DefaultAlignment) << " bytes\n";
    std::cout << "  offset of a: " << offsetof(DefaultAlignment, a) << "\n";
    std::cout << "  offset of b: " << offsetof(DefaultAlignment, b) << "\n";
    std::cout << "  offset of c: " << offsetof(DefaultAlignment, c) << "\n";

    std::cout << "\nOptimizedAlignment:\n";
    std::cout << "  sizeof: " << sizeof(OptimizedAlignment) << " bytes\n";
    std::cout << "  alignof: " << alignof(OptimizedAlignment) << " bytes\n";
    std::cout << "  offset of a: " << offsetof(OptimizedAlignment, a) << "\n";
    std::cout << "  offset of b: " << offsetof(OptimizedAlignment, b) << "\n";
    std::cout << "  offset of c: " << offsetof(OptimizedAlignment, c) << "\n";

    std::cout << "\nPackedStruct:\n";
    std::cout << "  sizeof: " << sizeof(PackedStruct) << " bytes\n";
    std::cout << "  alignof: " << alignof(PackedStruct) << " bytes\n";
    std::cout << "  offset of a: " << offsetof(PackedStruct, a) << "\n";
    std::cout << "  offset of b: " << offsetof(PackedStruct, b) << "\n";
    std::cout << "  offset of c: " << offsetof(PackedStruct, c) << "\n";

    std::cout << "\nMemory saved: " << sizeof(DefaultAlignment) - sizeof(OptimizedAlignment) << " bytes\n";
}

// 演示缓存行影响
struct __attribute__((aligned(64))) CacheLineAligned {
    int data;
};

void cache_line_demo() {
    std::cout << "\n--- Cache Line Alignment ---\n";

    CacheLineAligned a;
    CacheLineAligned b;

    std::cout << "Address of a: " << &a << "\n";
    std::cout << "Address of b: " << &b << "\n";
    std::cout << "Difference: " << (reinterpret_cast<char*>(&b) - reinterpret_cast<char*>(&a)) << " bytes\n";
    std::cout << "On different cache lines? " << ((reinterpret_cast<char*>(&b) - reinterpret_cast<char*>(&a)) >= 64) << "\n";
}

int main() {
    std::cout << "=== Alignment and Padding Demo ===\n";

    struct_size_demo();
    misaligned_access_demo();
    custom_alignment_demo();
    cache_line_demo();

    std::cout << "\n=== Key Takeaways ===\n";
    std::cout << "1. Arrange struct members by size (largest first) to minimize padding\n";
    std::cout << "2. Alignment affects DMA, cache performance, and atomic operations\n";
    std::cout << "3. Use alignas() for specific alignment requirements\n";
    std::cout << "4. Avoid #pragma pack unless necessary (performance impact)\n";

    return 0;
}