0
0
mirror of https://gitlab.nic.cz/labs/bird.git synced 2024-12-22 09:41:54 +00:00

Thread documentation: chapters 0, 1 and 2

This commit is contained in:
Maria Matejka 2021-03-04 20:11:26 +01:00
parent 493d45d950
commit f459deee9f
5 changed files with 725 additions and 0 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 300 KiB

View File

@ -0,0 +1,114 @@
# BIRD Journey to Threads. Chapter 0: The Reason Why.
BIRD is a fast, robust and memory-efficient routing daemon designed and
implemented at the end of 20th century. Its concept of multiple routing
tables with pipes between them, as well as a procedural filtering language,
has been unique for a long time and is still one of main reasons why people use
BIRD for big loads of routing data.
## IPv4 / IPv6 duality: Solved
The original design of BIRD has also some drawbacks. One of these was an idea
of two separate daemons one BIRD for IPv4 and another BIRD for IPv6, built from the same
codebase, cleverly using `#ifdef IPV6` constructions to implement the
common parts of BIRD algorithms and data structures only once.
If IPv6 adoption went forward as people thought in that time,
it would work; after finishing the worldwide transition to IPv6, people could
just stop building BIRD for IPv4 and drop the `#ifdef`-ed code.
The history went other way, however. BIRD developers therefore decided to *integrate*
these two versions into one daemon capable of any address family, allowing for
not only IPv6 but for virtually anything. This rework brought quite a lot of
backward-incompatible changes, therefore we decided to release it as a version 2.0.
This work was mostly finished in 2018 and as for March 2021, we have already
switched the 1.6.x branch to a bugfix-only mode.
## BIRD is single-threaded now
The second drawback is a single-threaded design. Looking back to 1998, this was
a good idea. A common PC had one single core and BIRD was targeting exactly
this segment. As the years went by, the manufacturers launched multicore x86 chips
(AMD Opteron in 2004, Intel Pentium D in 2005). This ultimately led to a world
where as of March 2021, there is virtually no new PC sold with a single-core CPU.
Together with these changes, the speed of one single core has not been growing as fast
as the Internet is growing. BIRD is still capable to handle the full BGP table
(868k IPv4 routes in March 2021) with one core, anyway when BIRD starts, it may take
long minutes to converge.
## Intermezzo: Filters
In 2018, we took some data we had from large internet exchanges and simulated
a cold start of BIRD as a route server. We used `linux-perf` to find most time-critical
parts of BIRD and it pointed very clearly to the filtering code. It also showed that the
IPv4 version of BIRD v1.6.x is substantially faster than the *integrated* version, while
the IPv6 version was quite as fast as the *integrated* one.
Here we should show a little bit more about how the filters really work. Let's use
an example of a simple filter:
```
filter foo {
if net ~ [10.0.0.0/8+] then reject;
preference = 2 * preference - 41;
accept;
}
```
This filter gets translated to an infix internal structure.
![Example of filter internal representation](00_filter_structure.png)
When executing, the filter interpreter just walks the filter internal structure recursively in the
right order, executes the instructions, collects their results and finishes by
either rejection or acceptation of the route
## Filter rework
Further analysis of the filter code revealed an absurdly-looking result. The
most executed parts of the interpreter function were the `push` CPU
instructions on its very beginning and the `pop` CPU instructions on its very
end. This came from the fact that the interpreter function was quite long, yet
most of the filter instructions used an extremely short path, doing all the
stack manipulation at the beginning, branching by the filter instruction type,
then it executed just several CPU instructions, popped everything from the
stack back and returned.
After some thoughts how to minimize stack manipulation when everything you need
is to take two numbers and multiply them, we decided to preprocess the filter
internal structure to another structure which is much easier to execute. The
interpreter is now using a data stack and behaves generally as a
postfix-ordered language. We also thought about Lua which showed up to be quite
a lot of work implementing all the glue achieving about the same performance.
After these changes, we managed to reduce the filter execution time by 1040%,
depending on how complex the filter is.
Anyway, even this reduction is quite too little when there is one CPU core
running for several minutes while others are sleeping.
## We need more threads
As a side effect of the rework, the new filter interpreter is also completely
thread-safe. It seemed to be the way to go running the filters in parallel
while keeping everything else single-threaded. The main problem of this
solution is a too fine granularity of parallel jobs. We would spend lots of
time on synchronization overhead.
The only filter parallel execution was also too one-sided, useful only for
configurations with complex filters. In other cases, the major problem is best
route recalculation, OSPF recalculation or also kernel synchronization.
It also turned out to be dirty a lot from the code cleanliness' point of view.
Therefore we chose to make BIRD multithreaded completely. We designed a way how
to gradually enable parallel computation and best usage of all available CPU
cores. Our goals are three:
* We want to keep current functionality. Parallel computation should never drop
a useful feature.
* We want to do little steps. No big reworks, even though even the smallest
possible step will need quite a lot of refactoring before.
* We want to be backwards compatible as much as possible.
*It's still a long road to the version 2.1. This series of texts should document
what is needed to be changed, why we do it and how. In the next chapter, we're
going to describe the structures for routes and their attributes. Stay tuned!*

View File

@ -0,0 +1,159 @@
# BIRD Journey to Threads. Chapter 1: The Route and its Attributes
BIRD is a fast, robust and memory-efficient routing daemon designed and
implemented at the end of 20th century. We're doing a significant amount of
BIRD's internal structure changes to make it possible to run in multiple
threads in parallel. This chapter covers necessary changes of data structures
which store every single routing data.
*If you want to see the changes in code, look (basically) into the
`route-storage-updates` branch. Not all of them are already implemented, anyway
most of them are pretty finished as of end of March, 2021.*
## How routes are stored
BIRD routing table is just a hierarchical noSQL database. On top level, the
routes are keyed by their destination, called *net*. Due to historic reasons,
the *net* is not only *IPv4 prefix*, *IPv6 prefix*, *IPv4 VPN prefix* etc.,
but also *MPLS label*, *ROA information* or *BGP Flowspec record*. As there may
be several routes for each *net*, an obligatory part of the key is *src* aka.
*route source*. The route source is a tuple of the originating protocol
instance and a 32-bit unsigned integer. If a protocol wants to withdraw a route,
it is enough and necessary to have the *net* and *src* to identify what route
is to be withdrawn.
The route itself consists of (basically) a list of key-value records, with
value types ranging from a 16-bit unsigned integer for preference to a complex
BGP path structure. The keys are pre-defined by protocols (e.g. BGP path or
OSPF metrics), or by BIRD core itself (preference, route gateway).
Finally, the user can declare their own attribute keys using the keyword
`attribute` in config.
## Attribute list implementation
Currently, there are three layers of route attributes. We call them *route*
(*rte*), *attributes* (*rta*) and *extended attributes* (*ea*, *eattr*).
The first layer, *rte*, contains the *net* pointer, several fixed-size route
attributes (mostly preference and protocol-specific metrics), flags, lastmod
time and a pointer to *rta*.
The second layer, *rta*, contains the *src* (a pointer to a singleton instance),
a route gateway, several other fixed-size route attributes and a pointer to
*ea* list.
The third layer, *ea* list, is a variable-length list of key-value attributes,
containing all the remaining route attributes.
Distribution of the route attributes between the attribute layers is somehow
arbitrary. Mostly, in the first and second layer, there are attributes that
were thought to be accessed frequently (e.g. in best route selection) and
filled in in most routes, while the third layer is for infrequently used
and/or infrequently accessed route attributes.
## Attribute list deduplication
When protocols originate routes, there are commonly more routes with the
same attribute list. BIRD could ignore this fact, anyway if you have several
tables connected with pipes, it is more memory-efficient to store the same
attribute lists only once.
Therefore, the two lower layers (*rta* and *ea*) are hashed and stored in a
BIRD-global database. Routes (*rte*) contain a pointer to *rta* in this
database, maintaining a use-count of each *rta*. Attributes (*rta*) contain
a pointer to normalized (sorted by numerical key ID) *ea*.
## Attribute list rework
The first thing to change is the distribution of route attributes between
attribute list layers. We decided to make the first layer (*rte*) only the key
and other per-record internal technical information. Therefore we move *src* to
*rte* and preference to *rta* (beside other things). *This is already done.*
We also found out that the nexthop (gateway), originally one single IP address
and an interface, has evolved to a complex attribute with several sub-attributes;
not only considering multipath routing but also MPLS stacks and other per-route
attributes. This has led to a too complex data structure holding the nexthop set.
We decided finally to squash *rta* and *ea* to one type of data structure,
allowing for completely dynamic route attribute lists. This is also supported
by adding other *net* types (BGP FlowSpec or ROA) where lots of the fields make
no sense at all, yet we still want to use the same data structures and implementation
as we don't like duplicating code. *Multithreading doesn't depend on this change,
anyway this change is going to happen soon anyway.*
## Route storage
The process of route import from protocol into a table can be divided into several phases:
1. (In protocol code.) Create the route itself (typically from
protocol-internal data) and choose the right channel to use.
2. (In protocol code.) Create the *rta* and *ea* and obtain an appropriate
hashed pointer. Allocate the *rte* structure and fill it in.
3. (Optionally.) Store the route to the *import table*.
4. Run filters. If reject, free everything.
5. Check whether this is a real change (it may be idempotent). If not, free everything and do nothing more.
6. Run the best route selection algorithm.
7. Execute exports if needed.
We found out that the *rte* structure allocation is done too early. BIRD uses
global optimized allocators for fixed-size blocks (which *rte* is) to reduce
its memory footprint, therefore the allocation of *rte* structure would be a
synchronization point in multithreaded environment.
The common code is also much more complicated when we have to track whether the
current *rte* has to be freed or not. This is more a problem in export than in
import as the export filter can also change the route (and therefore allocate
another *rte*). The changed route must be therefore freed after use. All the
route changing code must also track whether this route is writable or
read-only.
We therefore introduce a variant of *rte* called *rte_storage*. Both of these
hold the same, the layer-1 route information (destination, author, cached
attribute pointer, flags etc.), anyway *rte* is always local and *rte_storage*
is intended to be put in global data structures.
This change allows us to remove lots of the code which only tracks whether any
*rte* is to be freed as *rte*'s are almost always allocated on-stack, naturally
limiting their lifetime. If not on-stack, it's the responsibility of the owner
to free the *rte* after import is done.
This change also removes the need for *rte* allocation in protocol code and
also *rta* can be safely allocated on-stack. As a result, protocols can simply
allocate all the data on stack, call the update routine and the common code in
BIRD's *nest* does all the storage for them.
Allocating *rta* on-stack is however not required. BGP and OSPF use this to
import several routes with the same attribute list. In BGP, this is due to the
format of BGP update messages containing first the attributes and then the
destinations (BGP NLRI's). In OSPF, in addition to *rta* deduplication, it is
also presumed that no import filter (or at most some trivial changes) is applied
as OSPF would typically not work well when filtered.
*This change is already done.*
## Route cleanup and table maintenance
In some cases, the route update is not originated by a protocol/channel code.
When the channel shuts down, all routes originated by that channel are simply
cleaned up. Also routes with recursive routes may get changed without import,
simply by changing the IGP route.
This is currently done by a `rt_event` (see `nest/rt-table.c` for source code)
which is to be converted to a parallel thread, running when nobody imports any
route. *This change is freshly done in branch `guernsey`.*
## Parallel protocol execution
The long-term goal of these reworks is to allow for completely independent
execution of all the protocols. Typically, there is no direct interaction
between protocols; everything is done thought BIRD's *nest*. Protocols should
therefore run in parallel in future and wait/lock only when something is needed
to do externally.
We also aim for a clean and documented protocol API.
*It's still a long road to the version 2.1. This series of texts should document
what is needed to be changed, why we do it and how. In the next chapter, we're
going to describe how the route is exported from table to protocols and how this
process is changing. Stay tuned!*

View File

@ -0,0 +1,437 @@
# BIRD Journey to Threads. Chapter 2: Asynchronous route export
Route export is a core algorithm of BIRD. This chapter covers how we are making
this procedure multithreaded. Desired outcomes are mostly lower latency of
route import, flap dampening and also faster route processing in large
configurations with lots of export from one table.
BIRD is a fast, robust and memory-efficient routing daemon designed and
implemented at the end of 20th century. We're doing a significant amount of
BIRD's internal structure changes to make it possible to run in multiple
threads in parallel.
## How routes are propagated through BIRD
In the [previous chapter](https://en.blog.nic.cz/2021/03/23/bird-journey-to-threads-chapter-1-the-route-and-its-attributes/), you could learn how the route import works. We should
now extend that process by the route export.
1. (In protocol code.) Create the route itself and propagate it through the
right channel by calling `rte_update`.
2. The channel runs its import filter.
3. New best route is selected.
4. For each channel:
1. The channel runs its preexport hook and export filter.
2. (Optionally.) The channel merges the nexthops to create an ECMP route.
3. The channel calls the protocol's `rt_notify` hook.
5. After all exports are finished, the `rte_update` call finally returns and
the source protocol may do anything else.
Let's imagine that all the protocols are running in parallel. There are two
protocols with a route prepared to import. One of those wins the table lock,
does the import and then the export touches the other protocol which must
either:
* store the route export until it finishes its own imports, or
* have independent import and export parts.
Both of these conditions are infeasible for common use. Implementing them would
make protocols much more complicated with lots of new code to test and release
at once and also quite a lot of corner cases. Risk of deadlocks is also worth
mentioning.
## Asynchronous route export
We decided to make it easier for protocols and decouple the import and export
this way:
1. The import is done.
2. Best route is selected.
3. Resulting changes are stored.
Then, after the importing protocol returns, the exports are processed for each
exporting channel. In future, this will be possible in parallel: Some protocols
may process the export directly after it is stored, other protocols will wait
until they finish another job.
This eliminates the risk of deadlocks and all protocols' `rt_notify` hooks can
rely on their independence. There is only one question. How to store the changes?
## Route export modes
To find a good data structure for route export storage, we shall first know the
readers. The exporters may request different modes of route export.
### Export everything
This is the most simple route export mode. The exporter wants to know about all
the routes as they're changing. We therefore simply store the old route until
the change is fully exported and then we free the old stored route.
To manage this, we can simply queue the changes one after another and postpone
old route cleanup after all channels have exported the change. The queue member
would look like this:
```
struct {
struct rte_storage *new;
struct rte_storage *old;
};
```
### Export best
This is another simple route export mode. We check whether the best route has
changed; if not, no export happens. Otherwise, the export is propagated as the
old best route changing to the new best route.
To manage this, we could use the queue from the previous point by adding new
best and old best pointers. It is guaranteed that both the old best and new
best pointers are always valid in time of export as all the changes in them
must be stored in future changes which have not been exported yet by this
channel and therefore not freed yet.
```
struct {
struct rte_storage *new;
struct rte_storage *new_best;
struct rte_storage *old;
struct rte_storage *old_best;
};
```
Anyway, we're getting to the complicated export modes where this simple
structure is simply not enough.
### Export merged
Here we're getting to some kind of problems. The exporting channel requests not
only the best route but also all routes that are good enough to be considered
ECMP-eligible (we call these routes *mergable*). The export is then just one
route with just the nexthops merged. Export filters are executed before
merging and if the best route is rejected, nothing is exported at all.
To achieve this, we have to re-evaluate export filters any time the best route
or any mergable route changes. Until now, the export could just do what it wanted
as there was only one thread working. To change this, we need to access the
whole route list and process it.
### Export first accepted
In this mode, the channel runs export filters on a sorted list of routes, best first.
If the best route gets rejected, it asks for the next one until it finds an
acceptable route or exhausts the list. This export mode requires a sorted table.
BIRD users will know this export mode as `secondary` in BGP.
For now, BIRD stores two bits per route for each channel. The *export bit* is set
if the route has been really exported to that channel. The *reject bit* is set
if the route was rejected by the export filter.
When processing a route change for accepted, the algorithm first checks the
export bit for the old route. If this bit is set, the old route is that one
exported so we have to find the right one to export. Therefore the sorted route
list is walked best to worst to find a new route to export, using the reject
bit to evaluate only routes which weren't rejected in previous runs of this
algorithm.
If the old route bit is not set, the algorithm walks the sorted route list best
to worst, checking the position of new route with respect to the exported route.
If the new route is worse, nothing happens, otherwise the new route is sent to
filters and finally exported if passes.
### Export by feed
To resolve problems arising from previous two export modes (merged and first accepted),
we introduce a way to process a whole route list without locking the table
while export filters are running. To achieve this, we follow this algorithm:
1. The exporting channel sees a pending export.
2. *The table is locked.*
3. The channel counts all the routes for the given destination.
4. The channel stores pointers to that routes to a local array.
5. *The table is unlocked.*
6. The channel processes the local array of route pointers.
7. The pending export is marked as processed by this channel.
After unlocking the table, the pointed-to routes are implicitly guarded by the
sole fact that the pending export has not yet been processed by all channels
and the cleanup routine never frees any resource related to a pending export.
## Pending export data structure
As the two complicated export modes use the export-by-feed algorithm, the
pending export data structure may be quite minimalistic.
```
struct rt_pending_export {
struct rt_pending_export * _Atomic next; /* Next export for the same destination */
struct rte_storage *new; /* New route */
struct rte_storage *new_best; /* New best route in unsorted table */
struct rte_storage *old; /* Old route */
struct rte_storage *old_best; /* Old best route in unsorted table */
_Atomic u64 seq; /* Sequential ID (table-local) of the pending export */
};
```
To allow for squashing outdated pending exports (e.g. for flap dampening
purposes), there is a `next` pointer to the next export for the same
destination. This is also needed for the export-by-feed algorithm: if there are
several exports for one net at once, all of them are processed by one
export-by-feed automatically marked as done.
We should also add several items into `struct channel`.
```
struct coroutine *export_coro; /* Exporter and feeder coroutine */
struct bsem *export_sem; /* Exporter and feeder semaphore */
struct rt_pending_export * _Atomic last_export; /* Last export processed */
struct bmap export_seen_map; /* Keeps track which exports were already processed */
u64 flush_seq; /* Table export seq when the channel announced flushing */
```
To run the exports in parallel, `export_coro` is run and `export_sem` is
used for signalling new exports to it. The exporter coroutine also marks all
seen sequential IDs in its `export_seen_map` to make it possible to skip over
them if seen again.
There is also a table cleaner routine
(see [previous chapter](https://en.blog.nic.cz/2021/03/23/bird-journey-to-threads-chapter-1-the-route-and-its-attributes/))
which must cleanup also the pending exports after all the channels are finished with them.
To signal that, there is `last_export` working as a release point: the channel
guarantees that it doesn't touch the pointed-to pending export, nor any data
from it.
The last tricky point is channel flushing. When any channel stops, all its
routes are automatically freed and withdrawals are exported if appropriate.
Until now, the routes could be flushed synchronously, anyway now flush has
several phases:
1. Flush started (here the channel stores the `seq` of last current pending export).
2. All routes unlinked yet not freed, withdrawals pending.
3. Pending withdrawals processed and cleaned up. Channel may safely stop and free its structures.
Finally, some additional information has to be stored in tables:
```
_Atomic byte export_used; /* Export journal cleanup scheduled */ \
struct rt_pending_export * _Atomic first_export; /* First export to announce */ \
byte export_scheduled; /* Export is scheduled */
list pending_exports; /* List of packed struct rt_pending_export */
struct fib export_fib; /* Auxiliary fib for storing pending exports */
u64 next_export_seq; /* The next export will have this ID */
```
The exports are:
1. Assigned the `next_export_seq` sequential ID, incrementing this item by one.
2. Put into `pending_exports` and `export_fib` for both sequential and by-destination access.
3. Signalled by setting `export_scheduled` and also `first_export` if not `NULL`.
After processing several exports, `export_used` is set and route table maintenance
coroutine is woken up to possibly do cleanup.
The `struct rt_pending_export` seems to be best allocated by requesting a whole
memory page, containing a common list node, a simple header and packed all the
structures in the rest of the page. This may save a significant amount of memory.
In case of congestion, there will be lots of exports and every spare kilobyte
counts. If BIRD is almost idle, the optimization does nothing on the overall performance.
## Export algorithm
As we have explained at the beginning, the current export algorithm is
table-driven. The table walks the channel list and propagates the update.
The now export algorithm is channel-driven. The table just indicates that it
has something new in export queue and the channel decides what to do with that and when.
### Pushing an export
When a table has something to export, it enqueues an appropriate instance of
`struct rt_pending_export`. Then it pings its maintenance coroutine
(`rt_event`) to notify the exporting channels about a new route. This coroutine
runs with only the table locked so the protocol may e.g. prepare the next route inbetween.
The maintenance coroutine, when it wakes up, walks the list of channels and
wakes their export coroutines.
These two levels of asynchronicity are here for two reasons.
1. There may be lots of channels (hundreds of them) and we don't want to
inefficiently iterate the list for each export.
2. The notification is going to wait until the route author finishes, hopefully
importing more routes and therefore allowing more exports to be processed at once.
The table also stores the first and last exports for every destination. These
come handy mostly for cleanup purposes and for channel feeding.
### Processing an export
After these two pings, the channel finally knows that there is an export pending.
The channel checks whether there is a `last_export` stored. If yes, it proceeds
with the next one, otherwise it takes `first_export` from the table (which is
atomic for this reason).
1. The channel checks its `export_seen_map` whether this export has been
already processed. If so, it skips to the next export. No action is needed.
2. The export chain is scanned for the current first and last export. This is
done by following the `next` pointer in the exports as accessing the
auxiliary export fib needs locking.
3. In the best-only and all-routes export mode, the changes are processed
without further table locking.
4. If export-by-feed is used, the current state of routes in table are fetched.
Then the table is unlocked and the changes are processed without further locking.
5. All processed exports are marked as seen.
6. The channel stores the `last_export` and returns to beginning.to wait for next export.
## The full route life-cycle
Until now, we're always assuming that the channels *just exist*. In real life,
any channel may go up or down and we must handle it, flushing the routes
appropriately and freeing all the memory just in time to avoid both
use-after-free and memory leaks. It should be noted here explicitly that BIRD
is written in C which has no garbage collector or other modern features alike.
### Protocols and channels as viewed from a route
BIRD consists effectively of protocols and tables. **Protocols** are active parts,
kind-of subprocesses manipulating routes and other data. **Tables** are passive,
serving as a database of routes. To connect a protocol to a table, a
**channel** is created.
Every route has its `sender` storing the channel which has put the route into
the current table. This comes handy when the channel goes down to know which
routes to flush.
Every route also has its `src`, a route source allocated by the protocol which
originated it first. This is kept when a route is passed through a *pipe*.
Both `src` and `sender` must point to active protocols and channels as inactive
protocols and channels may be deleted in any time.
### Protocol and channel lifecycle
In the beginning, all channels and protocols are down. Until they fully start,
no route from them is allowed to any table. When the protocol and channel is up,
they may originate routes freely. However, the transitions are worth mentioning.
### Channel startup and feed
When protocols and channels start, they need to get the current state of the
appropriate table. Therefore, after a protocol and channel start, also the
export-feed coroutine is initiated.
Tables can contain millions of routes. It may lead to long import latency if a channel
was feeding itself in one step. The table structure is (at least for now) too
complicated to be implemented as lockless, thus even read access needs locking.
To mitigate this, the feeds are split to allow for regular route propagation
with a reasonable latency.
When the exports were synchronous, we simply didn't care and just announced the
exports to the channels from the time they started feeding. When making exports
asynchronous, it was crucial to avoid most of the possible race conditions
which could arise from simultaneous feed and export. As the feeder routines had
to be rewritten, it was a good opportunity to make this precise.
Therefore, when a channel goes up, it also starts exports:
1. Start the feed-export coroutine.
2. *Lock the table.*
3. Store the last export in queue.
4. Read a limited number of routes to local memory together with their pending exports.
5. If there are some routes to process:
1. *Unlock the table.*
2. Process the loaded routes.
3. Set the appropriate pending exports as seen.
4. *Lock the table*
5. Go to 4. to continue feeding.
6. If there was a last export stored, load the next one to be processed. Otherwise take the table's `first_export`.
7. *Unlock the table.*
8. Run the exporter loop.
*Note: There are some nuances not mentioned here how to do things in right
order to avoid missing some events while changing state. Precise description
would be too detailed for this text.*
When the feeder loop finishes, it continues smoothly to process all the exports
that have been queued while the feed was running. Step 5.3 ensures that already
seen exports are skipped, steps 3 and 6 ensure that no export is missed.
### Channel flush
Protocols and channels need to stop for a handful of reasons, All of these
cases follow the same routine.
1. (Maybe.) The protocol requests to go down or restart.
2. The channel requests to go down or restart.
3. The channel requests to stop export.
4. In the feed-export coroutine:
1. At a designated cancellation point, check cancellation.
2. Clean up local data.
3. *Lock main BIRD context locked*
4. If shutdown requested, switch the channel to *flushing* state and request table maintenance.
5. In the table maintenance coroutine:
1. Walk across all channels and check them for *flushing* state.
2. Walk across the table (split to allow for low latency updates) and
generate a withdrawal for each route sent by the flushing channels.
3. Wait until all the withdrawals are processed.
4. Mark the flushing channels as *down* and eventually proceed to the protocol shutdown or restart.
There is also a separate routine that handles bulk cleanup of `src`'s which
contain a pointer to the originating protocol. This routine may get reworked in
future, yet for now it is good enough.
### Route export cleanup
Last but not least is the export cleanup routine. Until now, the withdrawn
routes were exported synchronously and freed directly after the import was
done. This is not possible anymore. The export is stored and the import returns
to let the importing protocol continue its work. We therefore need a routine to
cleanup the withdrawn routes and also the processed exports.
First of all, this routine refuses to cleanup when any export is feeding or
shutting down. This may change in future, anyway for now we aren't sure about
possible race conditions.
Anyway, when all the exports are in a steady state, the routine works as follows:
1. Walk the active exports and find a minimum (oldest export) between their `last_export` values.
2. If there is nothing to clear between the actual oldest export and channels' oldest export, do nothing.
3. Find the table's new `first_export` and set it. Now there is nobody pointing to the old exports.
4. Free the withdrawn routes.
5. Free the old exports, removing them also from the first-last list of exports for the same destination.
## Results of these changes
This step is a first major step to move forward. Using just this version may be
still as slow as the single-threaded version, at least if your export filters are trivial.
Anyway, the main purpose of this step is not an immediate speedup. It is more
of a base for the next steps:
* Unlocking of pipes should enable parallel execution of all the filters on
pipes, limited solely by the principle *one thread for every direction of
pipe*.
* Conversion of CLI's `show route` to the new feed-export coroutines should
enable faster table queries. Moreover, this approach will allow for
better splitting of model and view in CLI with a good opportunity to
implement more output formats, e.g. JSON.
* Unlocking of kernel route synchronization should fix latency issues induced
by long-lasting kernel queries.
* Partial unlocking of BGP packet processing should finally allow for parallel
execution in almost all phases of BGP route propagation.
The development is now being done mostly in the branch `alderney`. If you asked
why such strange branch names like `jersey`, `guernsey` and `alderney`, here is
a kind-of reason. Yes, these names could be named `mq-async-export`,
`mq-async-export-new`, `mq-async-export-new-new`, `mq-another-async-export` and
so on. That's so ugly, isn't it?
Also why so many branches? The development process is quite messy. BIRD's code
heavily depends on single-threaded approach. This is exceptionally good for
performance, as long as you have one thread only. On the other hand, lots of
these assumptions are not documented so in many cases one desired
change yields a chain of other unforeseen changes which must precede. This brings
lots of backtracking, branch rebasing and other Git magic. There is always a
can of worms somewhere in the code.
*It's still a long road to the version 2.1. This series of texts should document
what is needed to be changed, why we do it and how. The
[previous chapter](https://en.blog.nic.cz/2021/03/23/bird-journey-to-threads-chapter-1-the-route-and-its-attributes/)
showed the necessary changes in route storage. In the next chapter, we're going
to describe how the coroutines are implemented and what kind of locking system
are we employing to prevent deadlocks. Stay tuned!*

15
doc/threads/Makefile Normal file
View File

@ -0,0 +1,15 @@
SUFFICES := .pdf -wordpress.html
CHAPTERS := 00_the_name_of_the_game 01_the_route_and_its_attributes 02_asynchronous_export
all: $(foreach ch,$(CHAPTERS),$(addprefix $(ch),$(SUFFICES)))
00_the_name_of_the_game.pdf: 00_filter_structure.png
%.pdf: %.md
pandoc -f markdown -t latex -o $@ $<
%.html: %.md
pandoc -f markdown -t html5 -o $@ $<
%-wordpress.html: %.html Makefile
sed -r 's#</p>#\n#g; s#<p>##g; s#<(/?)code>#<\1tt>#g; s#<pre><tt>#<code>#g; s#</tt></pre>#</code>#g; s#</?figure>##g; s#<figcaption>#<p style="text-align: center">#; s#</figcaption>#</p>#; ' $< > $@