Skip to content

std::function, std::invoke, and Callable Objects

Introduction

When building an event system, we ran into a very practical problem: we needed to store various types of callbacks—plain functions, member functions, lambdas, and functors, all with different shapes. Function pointers can only point to static and global functions, and they cannot carry context. If we try to store lambdas directly using std::function, every lambda has a different type, so we cannot put them in the same container. std::function solves this problem by using type erasure to unify all sorts of callable objects into a single type. But type erasure comes at a cost, so the question becomes: exactly how large is this cost? Is there a way to get the best of both worlds?

In this chapter, we start with the internal mechanisms of std::function, move on to std::invoke as a "universal invoker," and finally discuss zero-overhead callback design patterns—finding the balance between type safety and performance.

Learning Objectives

  • Understand the type erasure mechanism and SBO of std::function
  • Master how std::invoke uniformly invokes callable objects
  • Learn to design zero-overhead callback systems using templates and lambdas

The Callable Object Family in C++

Before diving into the specific mechanisms, let's sort out the different forms of "callable objects" in C++. A callable object is simply something we can invoke using the () syntax (or std::invoke). Plain functions and function pointers are the most basic—called directly or indirectly through a pointer. Functors are class objects that overload operator(). Lambdas are anonymous functors generated by the compiler. Pointers-to-member-functions point to a class's member functions and require an object instance when invoked. In addition, there are objects wrapped by std::function and the results of std::bind.

The problem is that the invocation syntax for these callable objects varies—plain functions are called directly, member functions require obj.*ptr or obj->*ptr, and functors and lambdas are called like f(args). If we want to write a generic function to "uniformly invoke" all of these, we would have had to write a bunch of template specializations before C++17. With std::invoke, one function handles it all.


std::function—A Type-Erased Function Container

std::function is a general-purpose function wrapper introduced in C++11, defined in the <functional> header. It can store, copy, and invoke any callable object that matches a given signature. Its core capability boils down to one thing: unifying different types of callable objects into a single type.

cpp
#include <functional>
#include <iostream>

int add(int a, int b) { return a + b; }

struct Multiplier {
    int factor;
    int operator()(int x) const { return x * factor; }
};

void demo_std_function() {
    std::function<int(int, int)> func;

    // 存储普通函数
    func = add;
    std::cout << func(3, 4) << "\n";   // 7

    // 存储 lambda
    func = [](int a, int b) { return a * b; };
    std::cout << func(3, 4) << "\n";   // 12

    // 存储仿函数
    func = Multiplier{5};
    std::cout << func(10) << "\n";     // 编译错误:签名不匹配
    // Multiplier 的 operator() 只接受一个参数,但 func 签名是 int(int,int)
}

Type Erasure Mechanism

How does std::function manage to put different types of things into the same shell? The answer is type erasure. The simplified principle works like this: std::function internally defines an abstract base class (Concept) holding a pure virtual function invoke. Then, for each specific callable type, it generates a derived class (Model) that implements invoke. std::function holds a pointer to Concept, and invocation dispatches through the virtual function to the concrete implementation.

We can simulate this process with code:

cpp
#include <memory>
#include <utility>

// 简化版 std::function 原理示意
template<typename Signature>
class SimpleFunction;

template<typename R, typename... Args>
class SimpleFunction<R(Args...)> {
    // 抽象接口
    struct ICallable {
        virtual ~ICallable() = default;
        virtual R invoke(Args... args) = 0;
        virtual ICallable* clone() const = 0;
    };

    // 具体实现:模板化的派生类存储真正的可调用对象
    template<typename T>
    struct CallableImpl : ICallable {
        T callable;
        explicit CallableImpl(T c) : callable(std::move(c)) {}

        R invoke(Args... args) override {
            return callable(std::forward<Args>(args)...);
        }

        ICallable* clone() const override {
            return new CallableImpl(callable);
        }
    };

    ICallable* ptr_ = nullptr;

public:
    SimpleFunction() = default;

    template<typename T>
    SimpleFunction(T callable)
        : ptr_(new CallableImpl<std::decay_t<T>>(std::move(callable))) {}

    SimpleFunction(const SimpleFunction& other)
        : ptr_(other.ptr_ ? other.ptr_->clone() : nullptr) {}

    ~SimpleFunction() { delete ptr_; }

    R operator()(Args... args) {
        return ptr_->invoke(std::forward<Args>(args)...);
    }
};

From this pseudocode, we can see the three elements of type erasure: a unified abstract interface (ICallable), a templated concrete implementation (CallableImpl<T>), and a pointer to the interface (ptr_). During storage, the type information is "erased"—the outside world only sees ICallable*. During invocation, the type information is recovered through the virtual function table.

Small Buffer Optimization (SBO)

The simplified version above has an obvious problem: every construction uses new to allocate on the heap. For a small lambda that captures one or two int values, the cost of this heap allocation might be higher than the lambda itself. Therefore, real std::function implementations use Small Buffer Optimization (SBO, also called SOO)—a fixed-size buffer (typically 16–32 bytes) is reserved inside the std::function object. If the wrapped callable object is small enough, it is stored directly in this buffer, requiring no heap allocation.

cpp
#include <functional>
#include <iostream>
#include <array>

void demo_sbo_size() {
    // 小 lambda:通常能放进 SBO 缓冲区
    auto small = [x = 42]() { return x * 2; };
    std::function<int()> f1 = small;
    std::cout << "sizeof(std::function<int()>): "
              << sizeof(f1) << " bytes\n";
    // 通常 32-64 字节(取决于实现)

    // 大 lambda:超出 SBO 缓冲区,触发堆分配
    auto large = [data = std::array<int, 100>{}]() {
        return data.size();
    };
    std::function<std::size_t()> f2 = large;
    std::cout << "sizeof(std::function<size_t()>): "
              << sizeof(f2) << " bytes\n";
    // 同样大小,但内部有堆分配

    // 对比:函数指针的大小
    std::cout << "sizeof(void(*)()): "
              << sizeof(void(*)()) << " bytes\n";
    // 通常 8 字节(64 位系统)
}

Let's actually test the SBO behavior of libstdc++. On GCC 15.2.1, the size of std::function<int()> is 32 bytes. However, test results show that even a lambda capturing a single int (where the closure object is only 4 bytes) does not trigger heap allocation, while a lambda capturing 5 int values or one pointer does—suggesting that GCC 15.2's SBO implementation is rather conservative, possibly requiring extra space for the virtual function table pointer and management metadata. The libc++ (Clang) implementation may differ, and specific behavior varies by version.

Verification code: code/volumn_codes/vol2/ch03-lambda/test_sbo_size.cpp (GCC 15.2.1, -O2)

Important: SBO behavior varies significantly across different compilers and versions. If your code is performance-sensitive, we recommend using template parameters or hand-written type erasure to achieve predictable behavior.


Function Pointers—Zero Overhead but Limited Functionality

Before discussing zero-overhead alternatives, let's review function pointers. Function pointers are a mechanism inherited from the C era, pointing directly to a code address, simple and efficient. Their size is just one pointer (8 bytes on a 64-bit system), and invocation is a single call instruction (call *%rax) with no extra indirection layer.

Performance benchmark: In our tests (GCC 15.2.1, -O2), function pointer invocation is about 30% slower than direct invocation (1.29x). This is because direct invocation can be fully inlined into computation instructions, while function pointers still require an indirect call. In unoptimized code, however, both require call instructions, so the difference is smaller.

Verification code: code/volumn_codes/vol2/ch03-lambda/test_function_performance.cpp

cpp
// 函数指针的声明和赋值
int (*func_ptr)(int, int) = [](int a, int b) { return a + b; };

// 用 using 简化类型名
using BinaryOp = int(*)(int, int);
BinaryOp op = [](int a, int b) { return a + b; };
int result = op(3, 4);   // 7

The biggest limitation of function pointers is the inability to carry context—they can only point to captureless lambdas (or plain functions, static member functions). Any lambda with captures cannot be converted to a function pointer. When you need to pass a this pointer or some state to a callback, function pointers are helpless.

cpp
// 无捕获 lambda 可以转换为函数指针
int (*fp1)(int, int) = [](int a, int b) { return a + b; };  // OK

// 有捕获 lambda 不能转换
int x = 42;
int (*fp2)(int, int) = [x](int a, int b) { return a + b + x; };  // 编译错误
FeatureFunction Pointerstd::function
Size8 bytes (64-bit)32–64 bytes
Heap allocationNoneTriggered outside SBO range
Indirection layers1 (direct call)1 (virtual table indirection)
Carries contextNoYes
Inline-friendlyYesPoor (type erasure prevents it)
Performance (relative to direct call)~1.3x~7–9x

std::invoke—A Unified Invocation Interface

std::invoke, introduced in C++17 (defined in <functional>), is a "universal invoker." Regardless of your callable object's type—plain function, pointer-to-member-function, lambda, or functor—std::invoke can invoke it with the same syntax. It implements the INVOKE expression semantics defined by the standard:

cpp
#include <functional>
#include <iostream>

struct Widget {
    void greet(const std::string& msg) {
        std::cout << "Widget says: " << msg << "\n";
    }
    int data = 42;
};

void free_func(int x) {
    std::cout << "free_func: " << x << "\n";
}

void demo_invoke() {
    Widget w;

    // 普通函数
    std::invoke(free_func, 42);

    // 仿函数 / lambda
    std::invoke([](int x) { std::cout << "lambda: " << x << "\n"; }, 99);

    // 成员函数指针 + 对象
    std::invoke(&Widget::greet, w, "hello");

    // 成员变量指针 + 对象(可以读取和修改)
    int val = std::invoke(&Widget::data, w);
    std::invoke(&Widget::data, w) = 100;
}

Look at that member function invocation—the traditional syntax is (w.*(&Widget::greet))("hello") or (wg.*mem_func)("hello"), which we have to look up every time we write it. With std::invoke, we only need std::invoke(mem_func, obj, args...), which is much cleaner.

Underlying Principles of invoke

The implementation principle of std::invoke is not complex; the core is compile-time type judgment and dispatch. For plain callable objects (function pointers, lambdas, functors), it invokes them directly using f(args...). For pointers-to-member-functions, it selects the appropriate invocation syntax based on the object's category (pointer, reference, reference_wrapper). For pointers-to-member-data, it returns the corresponding member reference. All of these judgments happen at compile time, with zero runtime overhead.

std::invoke_result_t

C++17 also provides std::invoke_result_t, which can obtain the return type of an std::invoke invocation at compile time. This tool is extremely practical when writing generic code:

cpp
#include <type_traits>
#include <functional>

template<typename Func, typename... Args>
auto safe_call(Func&& func, Args&&... args)
    -> std::invoke_result_t<Func, Args...>
{
    using Ret = std::invoke_result_t<Func, Args...>;

    if constexpr (std::is_void_v<Ret>) {
        std::invoke(std::forward<Func>(func), std::forward<Args>(args)...);
        std::cout << "(void return)\n";
    } else {
        Ret result = std::invoke(std::forward<Func>(func),
                                 std::forward<Args>(args)...);
        std::cout << "result: " << result << "\n";
        return result;
    }
}

Performance of invoke

When using std::invoke in template code, the compiler can see the complete call chain and will inline it to the same extent as a direct call. We tested this: under -O2 optimization, std::invoke invocation has exactly the same performance as direct invocation (within the margin of error, potentially even slightly faster due to measurement noise). This is because std::invoke itself is just a thin compile-time dispatch wrapper that is completely inlined and eliminated after optimization.

Verification code: code/volumn_codes/vol2/ch03-lambda/test_invoke_performance.cpp

Assembly verification: By generating assembly (g++ -O2 -S), we can see that direct invocation, std::invoke, function pointers, and lambdas are all compiled into exactly the same code—directly computing the result and returning, with no call instructions.

Of course, if you invoke a callable object stored via std::function, the indirection overhead comes from std::function's type erasure, not from std::invoke.


Zero-Overhead Callback Design—Templates + Lambdas

After understanding the sources of std::function's overhead (type erasure, potential heap allocation, and indirect invocation), the question becomes: in many scenarios, the callback's type is already determined at registration time, so can we avoid type erasure entirely?

The answer is yes. The simplest zero-overhead approach is to pass the lambda directly as a template parameter—the compiler knows the complete closure type, and the invocation is fully inlined:

cpp
#include <algorithm>
#include <vector>
#include <iostream>

// 模板参数接收任意可调用对象,零开销
template<typename Callback>
void for_each_if(std::vector<int>& data, Callback pred, Callback action) {
    for (auto& elem : data) {
        if (pred(elem)) {
            action(elem);
        }
    }
}

void demo_template_callback() {
    std::vector<int> data = {1, 2, 3, 4, 5, 6, 7, 8};

    int threshold = 5;
    int sum = 0;

    // lambda 直接传给模板参数,完全内联
    for_each_if(data,
        [threshold](int x) { return x > threshold; },   // 谓词
        [&sum](int& x) { sum += x; }                     // 操作
    );

    std::cout << "Sum of elements > " << threshold << ": " << sum << "\n";
    // 输出: Sum of elements > 5: 21 (6+7+8)
}

The problem with this approach is that each different lambda type instantiates a different template function, and we cannot put different types of callbacks into the same container. If our design truly requires runtime polymorphism (for example, storing various types of callbacks in an event queue), then we must introduce some form of type erasure.

Manual Type Erasure: Function Pointer Tables Instead of Virtual Functions

If we need type erasure but want to avoid all the overhead of std::function, we can write a lightweight type-erased container by hand. The core idea is to use a function pointer table instead of a virtual function table, and a fixed-size stack buffer instead of heap allocation:

cpp
#include <cstddef>
#include <utility>
#include <iostream>
#include <new>

template<typename Signature, std::size_t BufSize = 32>
class LightCallback;

template<typename R, typename... Args, std::size_t BufSize>
class LightCallback<R(Args...), BufSize> {
    // 操作表:用函数指针代替虚函数
    struct VTable {
        void (*move)(void* dst, void* src);
        void (*destroy)(void* obj);
        R (*invoke)(void* obj, Args... args);
    };

    // 为每种可调用类型生成专属的 VTable
    template<typename T>
    struct VTableFor {
        static void do_move(void* dst, void* src) {
            new(dst) T(std::move(*static_cast<T*>(src)));
        }
        static void do_destroy(void* obj) {
            static_cast<T*>(obj)->~T();
        }
        static R do_invoke(void* obj, Args... args) {
            return (*static_cast<T*>(obj))(std::forward<Args>(args)...);
        }
        static constexpr VTable value{do_move, do_destroy, do_invoke};
    };

    alignas(std::max_align_t) unsigned char storage_[BufSize];
    const VTable* vtable_ = nullptr;

public:
    LightCallback() = default;

    template<typename T>
    LightCallback(T&& callable) {
        using Decay = std::decay_t<T>;
        static_assert(sizeof(Decay) <= BufSize, "Callable too large for buffer");
        static_assert(alignof(Decay) <= alignof(std::max_align_t),
                     "Callable alignment too high");
        new(storage_) Decay(std::forward<T>(callable));
        vtable_ = &VTableFor<Decay>::value;
    }

    LightCallback(LightCallback&& other) noexcept : vtable_(other.vtable_) {
        if (vtable_) {
            vtable_->move(storage_, other.storage_);
            other.vtable_ = nullptr;
        }
    }

    ~LightCallback() {
        if (vtable_) vtable_->destroy(storage_);
    }

    LightCallback(const LightCallback&) = delete;
    LightCallback& operator=(const LightCallback&) = delete;

    R operator()(Args... args) {
        return vtable_->invoke(storage_, std::forward<Args>(args)...);
    }

    explicit operator bool() const { return vtable_ != nullptr; }
};

void demo_light_callback() {
    int multiplier = 3;
    LightCallback<int(int), 32> cb = [multiplier](int x) {
        return x * multiplier;
    };

    std::cout << cb(14) << "\n";  // 42
}

This LightCallback is not as general-purpose as std::function (it does not support copying or allocators), but it satisfies the most common use case: storing lambdas with captures, no heap allocation, and single-layer indirect invocation. In embedded or high-performance scenarios, this "good enough" design is usually the most pragmatic choice.

Selection Guide

To summarize the trade-offs of callback storage solutions. Function pointers are suitable for scenarios that don't need context—zero overhead, but they can only point to captureless lambdas or plain functions. std::function is suitable for scenarios requiring runtime polymorphism—general-purpose but with significant performance overhead—even when the object is within the SBO range, the virtual table indirection prevents inlining, and benchmarks show it is 7–9 times slower than direct invocation. Template parameters are suitable for scenarios where the type is known at compile time—completely zero overhead, but they cannot be stored in a container. Manual type erasure is suitable for scenarios requiring runtime polymorphism with performance constraints—slightly more code, but the overhead is controllable.

Performance data source: code/volumn_codes/vol2/ch03-lambda/test_function_performance.cpp (GCC 15.2.1, -O2, 100 million invocations)

cpp
// 1. 无上下文、热路径:函数指针
void fast_path(int (*cb)(int)) { cb(42); }

// 2. 有上下文、通用场景:std::function
void generic_path(std::function<void(int)> cb) { cb(42); }

// 3. 编译期类型已知:模板参数
template<typename CB>
void zero_cost_path(CB&& cb) { cb(42); }

// 4. 有上下文、高性能:手动类型擦除
void optimized_path(LightCallback<void(int), 24> cb) { cb(42); }

Summary

In this chapter, we connected the storage and invocation mechanisms for callable objects in C++:

  • std::function unifies the types of various callable objects through type erasure, and SBO avoids heap allocation for small objects
  • Function pointers have zero overhead but cannot carry context, making them suitable for stateless callbacks
  • std::invoke is a unified invocation interface for callable objects, with zero overhead in template code
  • The core idea behind zero-overhead callbacks is "use templates instead of type erasure when possible, and use function pointer tables instead of virtual functions when type erasure is mandatory"
  • Choose the appropriate solution based on the scenario, balancing generality and performance

References

Built with VitePress