mirror of
https://gitlab.nic.cz/labs/bird.git
synced 2024-11-15 15:48:43 +00:00
224 lines
10 KiB
Markdown
224 lines
10 KiB
Markdown
|
# BIRD Journey to Threads. Chapter 4: Memory and other resource management.
|
|||
|
|
|||
|
BIRD is mostly a large specialized database engine, storing mega/gigabytes of
|
|||
|
Internet routing data in memory. To keep accounts of every byte of allocated data,
|
|||
|
BIRD has its own resource management system which must be adapted to the
|
|||
|
multithreaded environment. The resource system has not changed much, yet it
|
|||
|
deserves a short chapter.
|
|||
|
|
|||
|
BIRD is a fast, robust and memory-efficient routing daemon designed and
|
|||
|
implemented at the end of 20th century. We're doing a significant amount of
|
|||
|
BIRD's internal structure changes to make it run in multiple threads in parallel.
|
|||
|
|
|||
|
## Resources
|
|||
|
|
|||
|
Inside BIRD, (almost) every piece of allocated memory is a resource. To achieve this,
|
|||
|
every such memory block includes a generic `struct resource` header. The node
|
|||
|
is enlisted inside a linked list of a *resource pool* (see below), the class
|
|||
|
pointer defines basic operations done on resources.
|
|||
|
|
|||
|
```
|
|||
|
typedef struct resource {
|
|||
|
node n; /* Inside resource pool */
|
|||
|
struct resclass *class; /* Resource class */
|
|||
|
} resource;
|
|||
|
|
|||
|
struct resclass {
|
|||
|
char *name; /* Resource class name */
|
|||
|
unsigned size; /* Standard size of single resource */
|
|||
|
void (*free)(resource *); /* Freeing function */
|
|||
|
void (*dump)(resource *); /* Dump to debug output */
|
|||
|
resource *(*lookup)(resource *, unsigned long); /* Look up address (only for debugging) */
|
|||
|
struct resmem (*memsize)(resource *); /* Return size of memory used by the resource, may be NULL */
|
|||
|
};
|
|||
|
|
|||
|
void *ralloc(pool *, struct resclass *);
|
|||
|
```
|
|||
|
|
|||
|
Resource cycle begins with an allocation of a resource. To do that, you should call `ralloc()`,
|
|||
|
passing the parent pool and the appropriate resource class as arguments. BIRD
|
|||
|
allocates a memory block of size given by the given class member `size`.
|
|||
|
Beginning of the block is reserved for `struct resource` itself and initialized
|
|||
|
by the given arguments. Therefore, you may sometimes see an idiom where a structure
|
|||
|
has a first member `struct resource r;`, indicating that this item should be
|
|||
|
allocated as a resource.
|
|||
|
|
|||
|
The counterpart is resource freeing. This may be implicit (by resource pool
|
|||
|
freeing) or explicit (by `rfree()`). In both cases, the `free()` function of
|
|||
|
the appropriate class is called to cleanup the resource before final freeing.
|
|||
|
|
|||
|
To account for `dump` and `memsize` calls, there are CLI commands `dump
|
|||
|
resources` and `show memory`, using these to dump resources or show memory
|
|||
|
usage as perceived by BIRD.
|
|||
|
|
|||
|
The last, `lookup`, is quite an obsolete way to identify a specific pointer
|
|||
|
from a debug interface. You may call `rlookup(pointer)` and BIRD should dump
|
|||
|
that resource to the debug output. This mechanism is probably incomplete as no
|
|||
|
developer uses it actively for debugging.
|
|||
|
|
|||
|
Resources can be also moved between pools by `rmove` when needed.
|
|||
|
|
|||
|
## Resource pools
|
|||
|
|
|||
|
The first internal resource class is a recursive resource – a resource pool. In
|
|||
|
the singlethreaded version, this is just a simple structure:
|
|||
|
|
|||
|
```
|
|||
|
struct pool {
|
|||
|
resource r;
|
|||
|
list inside;
|
|||
|
struct birdloop *loop; /* In multithreaded version only */
|
|||
|
const char *name;
|
|||
|
};
|
|||
|
```
|
|||
|
|
|||
|
Resource pools are used for grouping resources together. There are pools everywhere
|
|||
|
and it is a common idiom inside BIRD to just `rfree` the appropriate pool when
|
|||
|
e.g. a protocol or table is going down. Everything left there is cleaned up.
|
|||
|
|
|||
|
There are anyway several classes which must be freed with care. In the
|
|||
|
singlethreaded version, the *slab* allocator (see below) must be empty before
|
|||
|
it may be freed and this is kept to the multithreaded version while other
|
|||
|
restrictions have been added.
|
|||
|
|
|||
|
There is also a global pool, `root_pool`, containing every single resource BIRD
|
|||
|
knows about, either directly or via another resource pool.
|
|||
|
|
|||
|
### Thread safety in resource pools
|
|||
|
|
|||
|
In the multithreaded version, every resource pool is bound to a specific IO
|
|||
|
loop and therefore includes an IO loop pointer. This is important for allocations
|
|||
|
as the resource list inside the pool is thread-unsafe. All pool operations
|
|||
|
therefore require the IO loop to be entered to do anything with them, if possible.
|
|||
|
(In case of `rfree`, the pool data structure is not accessed at all so no
|
|||
|
assert is possible. We're currently relying on the caller to ensure proper locking.
|
|||
|
In future, this may change.)
|
|||
|
|
|||
|
Each IO loop also has its base resource pool for its allocations. All pools
|
|||
|
inside the IO loop pool must belong to the same loop or to a loop with a
|
|||
|
subordinate lock (see the previous chapter for lock ordering). If there is a
|
|||
|
need for multiple IO loops to access one shared data structure, it must be
|
|||
|
locked by another lock and allocated in such a way that is independent on these
|
|||
|
accessor loops.
|
|||
|
|
|||
|
The pool structure should follow the locking order. Any pool should belong to
|
|||
|
either the same loop as its parent or its loop lock should be after its parent
|
|||
|
loop lock in the locking order. This is not enforced explicitly, yet it is
|
|||
|
virtually impossible to write some working code violating this recommendation.
|
|||
|
|
|||
|
### Resource pools in the wilderness
|
|||
|
|
|||
|
Root pool contains (among others):
|
|||
|
|
|||
|
* route attributes and sources
|
|||
|
* routing tables
|
|||
|
* protocols
|
|||
|
* interfaces
|
|||
|
* configuration data
|
|||
|
|
|||
|
Each table has its IO loop and uses the loop base pool for allocations.
|
|||
|
The same holds for protocols. Each protocol has its pool; it is either its IO
|
|||
|
loop base pool or an ordinary pool bound to main loop.
|
|||
|
|
|||
|
## Memory allocators
|
|||
|
|
|||
|
BIRD stores data in memory blocks allocated by several allocators. There are 3
|
|||
|
of them: simple memory blocks, linear pools and slabs.
|
|||
|
|
|||
|
### Simple memory block
|
|||
|
|
|||
|
When just a chunk of memory is needed, `mb_alloc()` or `mb_allocz()` is used
|
|||
|
to get it. The first with `malloc()` semantics, the other is also zeroed.
|
|||
|
There is also `mb_realloc()` available, `mb_free()` to explicitly free such a
|
|||
|
memory and `mb_move()` to move that memory to another pool.
|
|||
|
|
|||
|
Simple memory blocks consume a fixed amount of overhead memory (32 bytes on
|
|||
|
systems with 64-bit pointers) so they are suitable mostly for big chunks,
|
|||
|
taking advantage of the default *stdlib* allocator which is used by this
|
|||
|
allocation strategy. There are anyway some parts of BIRD (in all versions)
|
|||
|
where this allocator is used for little blocks. This will be fixed some day.
|
|||
|
|
|||
|
### Linear pools
|
|||
|
|
|||
|
Sometimes, memory is allocated temporarily. When the data may just sit on
|
|||
|
stack, we put it there. Anyway, many tasks need more structured execution where
|
|||
|
stack allocation is incovenient or even impossible (e.g. when callbacks from
|
|||
|
parsers are involved). For such a case, a *linpool* is the best choice.
|
|||
|
|
|||
|
This data structure allocates memory blocks of requested size with negligible
|
|||
|
overhead in functions `lp_alloc()` (uninitialized) or `lp_allocz()` (zeroed).
|
|||
|
There is anyway no `realloc` and no `free` call; to have a larger chunk, you
|
|||
|
need to allocate another block. All this memory is freed at once by `lp_flush()`
|
|||
|
when it is no longer needed.
|
|||
|
|
|||
|
You may see linpools in parsers (BGP, Linux netlink, config) or in filters.
|
|||
|
|
|||
|
In the multithreaded version, linpools have received an update, allocating
|
|||
|
memory pages directly by `mmap()` instead of calling `malloc()`. More on memory
|
|||
|
pages below.
|
|||
|
|
|||
|
### Slabs
|
|||
|
|
|||
|
To allocate lots of same-sized objects, a [slab allocator](https://en.wikipedia.org/wiki/Slab_allocation)
|
|||
|
is an ideal choice. In versions until 2.0.8, our slab allocator used blocks
|
|||
|
allocated by `malloc()`, every object included a *slab head* pointer and free objects
|
|||
|
were linked into a single-linked list. This led to memory inefficiency and to
|
|||
|
contra-intuitive behavior where a use-after-free bug could do lots of damage
|
|||
|
before finally crashing.
|
|||
|
|
|||
|
Versions from 2.0.9, and also all the multithreaded versions, are coming with
|
|||
|
slabs using directly allocated memory pages and usage bitmaps instead of
|
|||
|
single-linking the free objects. This approach however relies on the fact that
|
|||
|
pointers returned by `mmap()` are always divisible by page size. Freeing of a
|
|||
|
slab object involves zeroing (mostly) 13 least significant bits of its pointer
|
|||
|
to get the page pointer where the slab head resides.
|
|||
|
|
|||
|
This update helps with memory consumption by about 5% compared to previous
|
|||
|
versions; exact numbers depend on the usage pattern.
|
|||
|
|
|||
|
## Raw memory pages
|
|||
|
|
|||
|
Until 2.0.8 (incl.), BIRD allocated all memory by `malloc()`. This method is
|
|||
|
suitable for lots of use cases, yet when gigabytes of memory should be
|
|||
|
allocated by little pieces, BIRD uses its internal allocators to keep track
|
|||
|
about everything. This brings some ineffectivity as stdlib allocator has its
|
|||
|
own overhead and doesn't allocate aligned memory unless asked for.
|
|||
|
|
|||
|
Slabs and linear pools are backed by blocks of memory of kilobyte sizes. As a
|
|||
|
typical memory page size is 4 kB, it is a logical step to drop stdlib
|
|||
|
allocation from these allocators and to use `mmap()` directly. This however has
|
|||
|
some drawbacks, most notably the need of a syscall for every memory mapping and
|
|||
|
unmapping. For allocations, this is not much a case and the syscall time is typically
|
|||
|
negligible compared to computation time. When freeing memory, this is much
|
|||
|
worse as BIRD sometimes frees gigabytes of data in a blink of eye.
|
|||
|
|
|||
|
To minimize the needed number of syscalls, there is a per-thread page cache,
|
|||
|
keeping pages for future use:
|
|||
|
|
|||
|
* When a new page is requested, first the page cache is tried.
|
|||
|
* When a page is freed, the per-thread page cache keeps it without telling the kernel.
|
|||
|
* When the number of pages in any per-thread page cache leaves a pre-defined range,
|
|||
|
a cleanup routine is scheduled to free excessive pages or request more in advance.
|
|||
|
|
|||
|
This method gives the multithreaded BIRD not only faster memory management than
|
|||
|
ever before but also almost immediate shutdown times as the cleanup routine is
|
|||
|
not scheduled on shutdown at all.
|
|||
|
|
|||
|
## Other resources
|
|||
|
|
|||
|
Some objects are not only a piece of memory; notable items are sockets, owning
|
|||
|
the underlying mechanism of I/O, and *object locks*, owning *the right to use a
|
|||
|
specific I/O*. This ensures that collisions on e.g. TCP port numbers and
|
|||
|
addresses are resolved in a predictable way.
|
|||
|
|
|||
|
All these resources should be used with the same locking principles as the
|
|||
|
memory blocks. There aren't many checks inside BIRD code to ensure that yet,
|
|||
|
nevertheless violating this recommendation may lead to multiple-access issues.
|
|||
|
|
|||
|
*It's still a long road to the version 2.1. This series of texts should document
|
|||
|
what is needed to be changed, why we do it and how. The
|
|||
|
[previous chapter](TODO)
|
|||
|
showed the locking system and how the parallel execution is done.
|
|||
|
The next chapter will cover a bit more detailed explanation about route sources
|
|||
|
and route attributes and how lockless data structures are employed there. Stay tuned!*
|