0
0
mirror of https://gitlab.nic.cz/labs/bird.git synced 2024-12-31 22:21:54 +00:00
bird/doc/threads/04_memory_management.md

10 KiB
Raw Blame History

BIRD Journey to Threads. Chapter 4: Memory and other resource management.

BIRD is mostly a large specialized database engine, storing mega/gigabytes of Internet routing data in memory. To keep accounts of every byte of allocated data, BIRD has its own resource management system which must be adapted to the multithreaded environment. The resource system has not changed much, yet it deserves a short chapter.

BIRD is a fast, robust and memory-efficient routing daemon designed and implemented at the end of 20th century. We're doing a significant amount of BIRD's internal structure changes to make it run in multiple threads in parallel.

Resources

Inside BIRD, (almost) every piece of allocated memory is a resource. To achieve this, every such memory block includes a generic struct resource header. The node is enlisted inside a linked list of a resource pool (see below), the class pointer defines basic operations done on resources.

typedef struct resource {
  node n;				/* Inside resource pool */
  struct resclass *class;		/* Resource class */
} resource;

struct resclass {
  char *name;				/* Resource class name */
  unsigned size;			/* Standard size of single resource */
  void (*free)(resource *);		/* Freeing function */
  void (*dump)(resource *);		/* Dump to debug output */
  resource *(*lookup)(resource *, unsigned long);	/* Look up address (only for debugging) */
  struct resmem (*memsize)(resource *);	/* Return size of memory used by the resource, may be NULL */
};

void *ralloc(pool *, struct resclass *);

Resource cycle begins with an allocation of a resource. To do that, you should call ralloc(), passing the parent pool and the appropriate resource class as arguments. BIRD allocates a memory block of size given by the given class member size. Beginning of the block is reserved for struct resource itself and initialized by the given arguments. Therefore, you may sometimes see an idiom where a structure has a first member struct resource r;, indicating that this item should be allocated as a resource.

The counterpart is resource freeing. This may be implicit (by resource pool freeing) or explicit (by rfree()). In both cases, the free() function of the appropriate class is called to cleanup the resource before final freeing.

To account for dump and memsize calls, there are CLI commands dump resources and show memory, using these to dump resources or show memory usage as perceived by BIRD.

The last, lookup, is quite an obsolete way to identify a specific pointer from a debug interface. You may call rlookup(pointer) and BIRD should dump that resource to the debug output. This mechanism is probably incomplete as no developer uses it actively for debugging.

Resources can be also moved between pools by rmove when needed.

Resource pools

The first internal resource class is a recursive resource a resource pool. In the singlethreaded version, this is just a simple structure:

struct pool {
  resource r;
  list inside;
  struct birdloop *loop;  /* In multithreaded version only */
  const char *name;
};

Resource pools are used for grouping resources together. There are pools everywhere and it is a common idiom inside BIRD to just rfree the appropriate pool when e.g. a protocol or table is going down. Everything left there is cleaned up.

There are anyway several classes which must be freed with care. In the singlethreaded version, the slab allocator (see below) must be empty before it may be freed and this is kept to the multithreaded version while other restrictions have been added.

There is also a global pool, root_pool, containing every single resource BIRD knows about, either directly or via another resource pool.

Thread safety in resource pools

In the multithreaded version, every resource pool is bound to a specific IO loop and therefore includes an IO loop pointer. This is important for allocations as the resource list inside the pool is thread-unsafe. All pool operations therefore require the IO loop to be entered to do anything with them, if possible. (In case of rfree, the pool data structure is not accessed at all so no assert is possible. We're currently relying on the caller to ensure proper locking. In future, this may change.)

Each IO loop also has its base resource pool for its allocations. All pools inside the IO loop pool must belong to the same loop or to a loop with a subordinate lock (see the previous chapter for lock ordering). If there is a need for multiple IO loops to access one shared data structure, it must be locked by another lock and allocated in such a way that is independent on these accessor loops.

The pool structure should follow the locking order. Any pool should belong to either the same loop as its parent or its loop lock should be after its parent loop lock in the locking order. This is not enforced explicitly, yet it is virtually impossible to write some working code violating this recommendation.

Resource pools in the wilderness

Root pool contains (among others):

  • route attributes and sources
  • routing tables
  • protocols
  • interfaces
  • configuration data

Each table has its IO loop and uses the loop base pool for allocations. The same holds for protocols. Each protocol has its pool; it is either its IO loop base pool or an ordinary pool bound to main loop.

Memory allocators

BIRD stores data in memory blocks allocated by several allocators. There are 3 of them: simple memory blocks, linear pools and slabs.

Simple memory block

When just a chunk of memory is needed, mb_alloc() or mb_allocz() is used to get it. The first with malloc() semantics, the other is also zeroed. There is also mb_realloc() available, mb_free() to explicitly free such a memory and mb_move() to move that memory to another pool.

Simple memory blocks consume a fixed amount of overhead memory (32 bytes on systems with 64-bit pointers) so they are suitable mostly for big chunks, taking advantage of the default stdlib allocator which is used by this allocation strategy. There are anyway some parts of BIRD (in all versions) where this allocator is used for little blocks. This will be fixed some day.

Linear pools

Sometimes, memory is allocated temporarily. When the data may just sit on stack, we put it there. Anyway, many tasks need more structured execution where stack allocation is incovenient or even impossible (e.g. when callbacks from parsers are involved). For such a case, a linpool is the best choice.

This data structure allocates memory blocks of requested size with negligible overhead in functions lp_alloc() (uninitialized) or lp_allocz() (zeroed). There is anyway no realloc and no free call; to have a larger chunk, you need to allocate another block. All this memory is freed at once by lp_flush() when it is no longer needed.

You may see linpools in parsers (BGP, Linux netlink, config) or in filters.

In the multithreaded version, linpools have received an update, allocating memory pages directly by mmap() instead of calling malloc(). More on memory pages below.

Slabs

To allocate lots of same-sized objects, a slab allocator is an ideal choice. In versions until 2.0.8, our slab allocator used blocks allocated by malloc(), every object included a slab head pointer and free objects were linked into a single-linked list. This led to memory inefficiency and to contra-intuitive behavior where a use-after-free bug could do lots of damage before finally crashing.

Versions from 2.0.9, and also all the multithreaded versions, are coming with slabs using directly allocated memory pages and usage bitmaps instead of single-linking the free objects. This approach however relies on the fact that pointers returned by mmap() are always divisible by page size. Freeing of a slab object involves zeroing (mostly) 13 least significant bits of its pointer to get the page pointer where the slab head resides.

This update helps with memory consumption by about 5% compared to previous versions; exact numbers depend on the usage pattern.

Raw memory pages

Until 2.0.8 (incl.), BIRD allocated all memory by malloc(). This method is suitable for lots of use cases, yet when gigabytes of memory should be allocated by little pieces, BIRD uses its internal allocators to keep track about everything. This brings some ineffectivity as stdlib allocator has its own overhead and doesn't allocate aligned memory unless asked for.

Slabs and linear pools are backed by blocks of memory of kilobyte sizes. As a typical memory page size is 4 kB, it is a logical step to drop stdlib allocation from these allocators and to use mmap() directly. This however has some drawbacks, most notably the need of a syscall for every memory mapping and unmapping. For allocations, this is not much a case and the syscall time is typically negligible compared to computation time. When freeing memory, this is much worse as BIRD sometimes frees gigabytes of data in a blink of eye.

To minimize the needed number of syscalls, there is a per-thread page cache, keeping pages for future use:

  • When a new page is requested, first the page cache is tried.
  • When a page is freed, the per-thread page cache keeps it without telling the kernel.
  • When the number of pages in any per-thread page cache leaves a pre-defined range, a cleanup routine is scheduled to free excessive pages or request more in advance.

This method gives the multithreaded BIRD not only faster memory management than ever before but also almost immediate shutdown times as the cleanup routine is not scheduled on shutdown at all.

Other resources

Some objects are not only a piece of memory; notable items are sockets, owning the underlying mechanism of I/O, and object locks, owning the right to use a specific I/O. This ensures that collisions on e.g. TCP port numbers and addresses are resolved in a predictable way.

All these resources should be used with the same locking principles as the memory blocks. There aren't many checks inside BIRD code to ensure that yet, nevertheless violating this recommendation may lead to multiple-access issues.

It's still a long road to the version 2.1. This series of texts should document what is needed to be changed, why we do it and how. The previous chapter showed the locking system and how the parallel execution is done. The next chapter will cover a bit more detailed explanation about route sources and route attributes and how lockless data structures are employed there. Stay tuned!