跳转至

嵌入式 C++ 教程——静态存储与栈上分配策略

最近感冒了,休息了好长一段时间。。。

在嵌入式系统里,内存资源稀缺且分布不均(Flash、SRAM、特殊高速 SRAM 等)。选择把数据放在 静态区(全局、静态变量、常量)还是 栈上(函数局部变量、临时对象)直接关系到程序的可靠性、启动时间、代码可维护性与实时性。本篇博客从概念、实现、常见问题到实战建议,给出工程可用的策略与示例代码。


什么是静态存储和栈上分配(快速定义)

静态存储(Static storage):编译期/链接期分配的位置,包括 .text(代码 + rodata)、.data(已初始化的全局/静态变量,运行时拷贝到 RAM)、.bss(未初始化全局/静态变量,运行时清零)。这些变量在程序整个生命期或直到被显式改变才存在。

栈上分配(Stack allocation):函数调用时由栈指针分配的内存,用于局部变量、返回地址、寄存器保存等。随着函数返回,栈空间释放。


为什么在嵌入式要慎重选择?

  • 可预测性:静态存储大小可在链接时可见;栈增长与运行路径相关,难以静态保证不会溢出。
  • 实时性:动态分配/大栈帧可能导致不可预测延迟。中断上下文对栈的使用需要特别注意。
  • 内存分布:ROM/Flash 与不同等级的 SRAM(片上/外部)在速度与容量上差异大,静态数据可以放到合适的区域(例如把大只读表放在 Flash)。
  • 重入性与线程安全:全局/静态变量默认非线程安全;在 RTOS 环境下需额外同步。栈上数据本质上对当前线程安全(每个线程独立栈)。

所以哪一些是静态存储的?

  • 只读常量(const):在 ARM/GCC 常见情况下放到 Flash 的 .rodata,运行时不占 RAM(如果没被强制复制)。使用 const 放查表、固件版本字符串等是节省 RAM 的好方式。
  • 已初始化静态变量(.data):编译器生成初始化数据在 Flash,启动时会被拷贝到 RAM,因此占用 RAM。
  • 未初始化静态变量(.bss):在启动时会被清零,占用 RAM,但不在 Flash 留大块初始化数据。
  • 放置控制:可以用链接脚本和 __attribute__((section("..."))) 控制数据放置到特殊段(如快速 SRAM、非初始化段 .noinit 等)。
  • 避免的问题
  • 大数组、缓冲区静态化会永久占用内存,若未正确规划会浪费或导致不可用内存短缺。
  • 静态可变变量需考虑并发访问(中断、线程),使用 volatile/互斥/原子操作等。

示例:把大查表放到 Flash

// foo.cpp
static const uint16_t sine_table[256] = {
    // ... 256 entries ...
};

如果需要显式放到 .rodata / Flash 的特定段:

const uint16_t lookup[] __attribute__((section(".rodata.lookup"))) = { ... };

链接器脚本范例

在嵌入式工程,我们通常会改链接脚本来将段放到合适的内存区域

MEMORY
{
  FLASH (rx)  : ORIGIN = 0x08000000, LENGTH = 512K
  RAM   (rwx) : ORIGIN = 0x20000000, LENGTH = 128K
  FASTRAM(rwx) : ORIGIN = 0x20020000, LENGTH = 32K
}

SECTIONS
{
  .text : { *(.text*) *(.rodata*) } > FLASH

  .data : AT(ADDR(.text) + SIZEOF(.text)) {
    __data_start = .;
    *(.data*)
    __data_end = .;
  } > RAM

  .bss : {
    __bss_start = .;
    *(.bss*)
    __bss_end = .;
  } > RAM

  /* 自定义段放在 FASTRAM */
  .fastdata : {
    *(.fastdata*)
  } > FASTRAM
}

这个事情在UBoot里非常的常见,在代码里用 __attribute__((section(".fastdata"))) 把性能敏感的数据放到 FASTRAM。


栈上分配的风险与用法

  • 大局部变量容易触发栈溢出。例如:
void foo() {
    uint8_t big_buf[64*1024]; // 很可能超出单个线程/中断栈
    // ...
}
  • 递归:多数嵌入式系统应避免递归(难以估算最大深度)。
  • 可变长度数组(VLA)/alloca:这类在运行时改变栈占用的特性在嵌入式里风险极高,尽量禁用或谨慎使用。
  • 函数内临时对象:小对象优先放栈,大对象应放静态或堆(若允许)。

替代做法:将大缓冲静态化或放入任务专属内存池。


C++ 相关细节(构造、析构、placement new)

  • 静态对象构造顺序:全局静态对象的构造顺序在不同文件间不保证("静态初始化顺序 Fiasco")。在嵌入式启动阶段,尽量把关键初始化显示写在 main() 或 init 函数里。
  • placement new:可以在静态/栈/特定内存区域上显式构造对象(常用于无堆系统):
alignas(MyType) static uint8_t buffer[sizeof(MyType)];
MyType* p = new (buffer) MyType(args...);  // placement new
p->~MyType(); // 手动析构

这在无 malloc 场景下非常有用,但要管理好对象生命周期。


无 malloc 时的策略(很多嵌入式项目要求)

  • 使用固定大小对象池(object pool)或者是 环形缓冲区来替代堆。
  • 通过模板或手写池实现类型安全的分配接口。
  • 所有长期存在的缓冲区(比如网络包缓冲)优先考虑静态分配并放在合适段。

简单的 ring buffer(示意):

template<size_t N>
class RingBuffer {
  uint8_t buf[N];
  size_t head = 0, tail = 0;
public:
  bool push(uint8_t v) { size_t n = (head+1)%N; if (n==tail) return false; buf[head]=v; head=n; return true; }
  bool pop(uint8_t &out) { if (head==tail) return false; out = buf[tail]; tail=(tail+1)%N; return true; }
};

最后

在嵌入式 C++ 开发中,静态存储带来可预测性与可控的长期内存占用栈带来局部性与线程隔离。选择时要结合:缓冲大小、访问模式(并发/中断)、性能(速度/访问延迟)与可测性(栈使用可测)。实践中,优先将大对象、查表、DMA 缓冲放到静态区域或专用 RAM;将短小、生命周期局限的临时对象放到栈;严控动态分配,必要时使用对象池或 placement-new 管理内存。


代码示例

查看完整可编译示例
#include <iostream>
#include <cstdint>
#include <array>

// 演示静态存储的各种形式

// 1. 全局变量 (.data段 - 已初始化)
int global_initialized = 100;

// 2. 未初始化全局变量 (.bss段)
int global_uninitialized;

// 3. 只读常量 (.rodata段 - 通常在Flash中)
static const uint16_t sine_table[16] = {
    0,  6424,  11773,  15836,
    18479,  19595,  19151,  17205,
    13938,  9605,   4479,    0,
    -4479, -9605, -13938, -17205
};

// 4. 自定义段的变量
__attribute__((section(".rodata.lookup"))) const int lookup_table[8] = {0, 1, 2, 3, 4, 5, 6, 7};

// 5. 放在快速RAM的变量(示例)
__attribute__((section(".fastram"))) int fast_var;

// 6. 不初始化的变量(不会在启动时清零)
__attribute__((section(".noinit"))) int noinit_var;

void print_addresses() {
    std::cout << "=== Static Storage Addresses ===\n\n";

    std::cout << "Global initialized (.data):    " << &global_initialized << "\n";
    std::cout << "Global uninitialized (.bss):   " << &global_uninitialized << "\n";
    std::cout << "Const table (.rodata):         " << sine_table << "\n";
    std::cout << "Lookup table (.rodata.lookup): " << lookup_table << "\n";
    std::cout << "Fast var (.fastram):            " << &fast_var << "\n";
    std::cout << "Noinit var (.noinit):          " << &noinit_var << "\n";
}

void static_local_demo() {
    // static局部变量 - 只初始化一次
    static int counter = 0;
    counter++;

    std::cout << "Static local counter: " << counter
              << " (address: " << &counter << ")\n";
}

void constexpr_static_demo() {
    // constexpr静态变量 - 编译期计算
    static constexpr int fib[] = {0, 1, 1, 2, 3, 5, 8, 13, 21, 34};

    std::cout << "Compile-time Fibonacci: ";
    for (int v : fib) {
        std::cout << v << " ";
    }
    std::cout << "\n";
}

// 演示静态存储用于查表
uint16_t fast_sin(uint8_t angle) {
    // 简化版:只演示查表访问
    return sine_table[angle % 16];
}

void lookup_table_demo() {
    std::cout << "\n--- Lookup Table Demo ---\n";
    std::cout << "sin(0) = " << fast_sin(0) << "\n";
    std::cout << "sin(90) = " << fast_sin(4) << "\n";
    std::cout << "sin(180) = " << fast_sin(8) << "\n";
}

// 模板演示编译期静态常量
template<int N>
struct Factorial {
    static constexpr int value = N * Factorial<N - 1>::value;
};

template<>
struct Factorial<0> {
    static constexpr int value = 1;
};

void template_static_demo() {
    std::cout << "\n--- Template Static Constant Demo ---\n";
    std::cout << "Factorial<5>::value = " << Factorial<5>::value << "\n";
    std::cout << "Factorial<10>::value = " << Factorial<10>::value << "\n";
}

// 静态断言演示编译期检查
static_assert(sizeof(sine_table) == 32, "sine_table size mismatch");

int main() {
    std::cout << "=== Static Allocation Demo ===\n\n";

    print_addresses();

    std::cout << "\n--- Static Local Variable (multiple calls) ---\n";
    static_local_demo();
    static_local_demo();
    static_local_demo();

    constexpr_static_demo();
    lookup_table_demo();
    template_static_demo();

    std::cout << "\n=== Key Takeaways ===\n";
    std::cout << "1. Static storage lifetime = program lifetime\n";
    std::cout << "2. .data: initialized globals (copied from Flash to RAM)\n";
    std::cout << "3. .bss: uninitialized globals (zeroed at startup)\n";
    std::cout << "4. .rodata: constants (stay in Flash,节省RAM)\n";
    std::cout << "5. Use sections to control memory placement\n";

    return 0;
}
#include <iostream>
#include <cstdint>
#include <cstring>

// 演示栈上分配

void simple_stack_allocation() {
    std::cout << "--- Simple Stack Allocation ---\n";

    // 基本类型
    int x = 42;
    double y = 3.14;
    char c = 'A';

    std::cout << "int x at:    " << &x << ", value: " << x << "\n";
    std::cout << "double y at: " << &y << ", value: " << y << "\n";
    std::cout << "char c at:   " << static_cast<void*>(&c) << ", value: " << c << "\n";

    // 数组
    int arr[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
    std::cout << "array at:    " << arr << "\n";

    // 结构体
    struct Point { int x, y; } p = {10, 20};
    std::cout << "struct at:   " << &p << ", p.x=" << p.x << ", p.y=" << p.y << "\n";
}

void stack_frame_growth_demo() {
    std::cout << "\n--- Stack Frame Growth Demo ---\n";

    int a;
    std::cout << "Frame level 1, a at: " << &a << "\n";

    {
        int b;
        std::cout << "Frame level 2, b at: " << &b << "\n";

        {
            int c;
            std::cout << "Frame level 3, c at: " << &c << "\n";
        }
    }

    int d;
    std::cout << "Frame level 1 again, d at: " << &d << "\n";
}

// 危险:大栈分配
void dangerous_stack_allocation() {
    std::cout << "\n--- Dangerous: Large Stack Allocation ---\n";
    std::cout << "WARNING: This can cause stack overflow!\n";

    // 不要在真实代码中这样做!
    // uint8_t big_buffer[64 * 1024];  // 64KB on stack!

    // 安全的替代方案:使用静态分配
    static uint8_t safe_buffer[64 * 1024];
    std::cout << "Static buffer at: " << static_cast<void*>(safe_buffer) << "\n";
}

void stack_vs_heap() {
    std::cout << "\n--- Stack vs Heap Comparison ---\n";

    // 栈分配
    int stack_var = 100;
    std::cout << "Stack var at: " << &stack_var << "\n";

    // 堆分配
    int* heap_var = new int(200);
    std::cout << "Heap var at:  " << heap_var << "\n";

    delete heap_var;
}

void recursive_demo(int depth) {
    if (depth <= 0) {
        std::cout << "Recursion depth reached\n";
        return;
    }

    int local = depth;
    std::cout << "Depth " << depth << ", local at: " << &local << "\n";

    recursive_demo(depth - 1);
}

void recursion_demo() {
    std::cout << "\n--- Recursion and Stack Usage ---\n";
    std::cout << "WARNING: Deep recursion can overflow stack!\n";

    // 限制递归深度
    recursive_demo(5);
}

// VLA (Variable Length Array) - 危险!
// 注:C++标准不支持VLA,但GCC扩展支持
void vla_demo() {
    std::cout << "\n--- Variable Length Arrays (VLA) ---\n";
    std::cout << "WARNING: VLA is non-standard and dangerous in embedded!\n";

    int n = 10;
    // int vla[n];  // 不要使用!

    // 安全替代方案:std::array或固定最大大小
    int safe_arr[100];  // 固定最大大小
    std::cout << "Fixed array at: " << safe_arr << "\n";
}

void alloca_demo() {
    std::cout << "\n--- alloca() Usage ---\n";
    std::cout << "WARNING: alloca is dangerous in embedded systems!\n";

    // void* ptr = alloca(1024);  // 不要使用!

    // 安全替代方案:静态或栈上固定大小
    char buffer[1024];
    std::cout << "Fixed buffer at: " << static_cast<void*>(buffer) << "\n";
}

// 栈使用检测
size_t stack_remaining() {
    // 平台相关的实现
    // 这是一个简化的演示
    char c;
    uintptr_t stack_addr = reinterpret_cast<uintptr_t>(&c);
    // 假设栈大小是8KB,栈底在高地址
    constexpr size_t stack_size = 8 * 1024;
    uintptr_t stack_base = stack_addr | (stack_size - 1);
    return stack_base - stack_addr;
}

void stack_usage_demo() {
    std::cout << "\n--- Stack Usage Estimation ---\n";
    std::cout << "Estimated stack remaining: " << stack_remaining() << " bytes\n";
}

int main() {
    std::cout << "=== Stack Allocation Demo ===\n\n";

    simple_stack_allocation();
    stack_frame_growth_demo();
    dangerous_stack_allocation();
    stack_vs_heap();
    recursion_demo();
    vla_demo();
    alloca_demo();
    stack_usage_demo();

    std::cout << "\n=== Key Takeaways ===\n";
    std::cout << "1. Stack allocation is fast (pointer arithmetic)\n";
    std::cout << "2. Stack size is limited (typically KB range)\n";
    std::cout << "3. Avoid large allocations on stack\n";
    std::cout << "4. Avoid deep recursion in embedded\n";
    std::cout << "5. Use static allocation for large buffers\n";

    return 0;
}
#include <iostream>
#include <cstdint>
#include <array>

// 环形缓冲区实现 - 静态分配的经典应用

template<typename T, size_t N>
class RingBuffer {
    static_assert(N > 0 && (N & (N - 1)) == 0,
                  "Size must be power of 2 for efficient masking");

    std::array<T, N> buffer_;
    size_t head_ = 0;
    size_t tail_ = 0;
    size_t mask_ = N - 1;

public:
    RingBuffer() = default;

    // 非阻塞push
    bool push(const T& value) {
        size_t next = (head_ + 1) & mask_;
        if (next == tail_) {
            return false;  // Full
        }
        buffer_[head_] = value;
        head_ = next;
        return true;
    }

    // 非阻塞pop
    bool pop(T& out) {
        if (head_ == tail_) {
            return false;  // Empty
        }
        out = buffer_[tail_];
        tail_ = (tail_ + 1) & mask_;
        return true;
    }

    bool empty() const { return head_ == tail_; }
    bool full() const { return ((head_ + 1) & mask_) == tail_; }
    size_t size() const { return (head_ - tail_) & mask_; }
    size_t capacity() const { return N - 1; }

    void clear() { head_ = tail_ = 0; }
};

// 字节型环形缓冲区
template<size_t N>
class ByteRingBuffer {
    std::array<uint8_t, N> buffer_;
    size_t head_ = 0;
    size_t tail_ = 0;

public:
    // 写入数据
    size_t write(const uint8_t* data, size_t len) {
        size_t written = 0;
        for (size_t i = 0; i < len; ++i) {
            size_t next = (head_ + 1) % N;
            if (next == tail_) break;  // Full
            buffer_[head_] = data[i];
            head_ = next;
            ++written;
        }
        return written;
    }

    // 读取数据
    size_t read(uint8_t* data, size_t len) {
        size_t read_count = 0;
        for (size_t i = 0; i < len; ++i) {
            if (head_ == tail_) break;  // Empty
            data[i] = buffer_[tail_];
            tail_ = (tail_ + 1) % N;
            ++read_count;
        }
        return read_count;
    }

    size_t size() const {
        if (head_ >= tail_) return head_ - tail_;
        return N - tail_ + head_;
    }

    size_t available() const { return N - size() - 1; }
    bool empty() const { return head_ == tail_; }
    bool full() const { return ((head_ + 1) % N) == tail_; }
};

void ring_buffer_demo() {
    std::cout << "=== Ring Buffer Demo ===\n\n";

    RingBuffer<int, 8> rb;

    std::cout << "--- Initial State ---\n";
    std::cout << "Empty: " << rb.empty() << "\n";
    std::cout << "Full: " << rb.full() << "\n";

    std::cout << "\n--- Pushing 7 elements ---\n";
    for (int i = 0; i < 7; ++i) {
        bool ok = rb.push(i);
        std::cout << "Push " << i << ": " << (ok ? "success" : "failed")
                  << ", size: " << rb.size() << "\n";
    }

    std::cout << "\n--- Popping 3 elements ---\n";
    for (int i = 0; i < 3; ++i) {
        int val;
        bool ok = rb.pop(val);
        std::cout << "Pop: " << (ok ? std::to_string(val) : "failed")
                  << ", size: " << rb.size() << "\n";
    }

    std::cout << "\n--- Pushing 2 more elements ---\n";
    for (int i = 7; i < 9; ++i) {
        bool ok = rb.push(i);
        std::cout << "Push " << i << ": " << (ok ? "success" : "failed")
                  << ", size: " << rb.size() << "\n";
    }

    std::cout << "\n--- Try to overflow ---\n";
    bool overflow = rb.push(999);
    std::cout << "Push 999: " << (overflow ? "unexpected success" : "correctly rejected") << "\n";
}

void byte_ring_buffer_demo() {
    std::cout << "\n=== Byte Ring Buffer Demo ===\n\n";

    ByteRingBuffer<32> buf;

    const char* msg1 = "Hello, ";
    const char* msg2 = "World!";

    size_t written = buf.write(reinterpret_cast<const uint8_t*>(msg1), 7);
    std::cout << "Written: " << written << " bytes\n";
    std::cout << "Buffer size: " << buf.size() << "\n";

    written = buf.write(reinterpret_cast<const uint8_t*>(msg2), 6);
    std::cout << "Written: " << written << " bytes\n";
    std::cout << "Buffer size: " << buf.size() << "\n";

    std::cout << "\n--- Reading back ---\n";
    uint8_t read_buf[32];
    size_t read_count = buf.read(read_buf, sizeof(read_buf));
    std::cout << "Read: " << read_count << " bytes\n";
    std::cout << "Content: ";
    for (size_t i = 0; i < read_count; ++i) {
        std::cout << static_cast<char>(read_buf[i]);
    }
    std::cout << "\n";
}

// UART缓冲区示例
class UARTRxBuffer {
    static constexpr size_t BUFFER_SIZE = 256;
    ByteRingBuffer<BUFFER_SIZE> buffer_;

public:
    // 模拟ISR中调用
    void isr_receive_byte(uint8_t byte) {
        buffer_.write(&byte, 1);
    }

    // 主循环中调用
    size_t read(uint8_t* data, size_t len) {
        return buffer_.read(data, len);
    }

    size_t available() const { return buffer_.size(); }
};

void uart_buffer_demo() {
    std::cout << "\n=== UART Buffer Example ===\n\n";

    UARTRxBuffer uart;

    // 模拟接收数据
    for (uint8_t c = 'A'; c <= 'Z'; ++c) {
        uart.isr_receive_byte(c);
    }

    std::cout << "Received: " << uart.available() << " bytes\n";

    // 读取数据
    uint8_t data[32];
    size_t read = uart.read(data, sizeof(data));
    std::cout << "Read: " << read << " bytes\n";
    std::cout << "Content: ";
    for (size_t i = 0; i < read; ++i) {
        std::cout << static_cast<char>(data[i]);
    }
    std::cout << "\n";
}

int main() {
    ring_buffer_demo();
    byte_ring_buffer_demo();
    uart_buffer_demo();

    std::cout << "\n=== Key Takeaways ===\n";
    std::cout << "1. Ring buffer is a fixed-size static allocation pattern\n";
    std::cout << "2. Perfect for ISR-to-main communication\n";
    std::cout << "3. O(1) push/pop with power-of-2 size\n";
    std::cout << "4. No fragmentation, deterministic timing\n";

    return 0;
}