0
0
mirror of https://gitlab.nic.cz/labs/bird.git synced 2025-01-05 08:31:53 +00:00

Thread documentation: Final version of chapter 3

This commit is contained in:
Maria Matejka 2022-02-04 15:18:06 +01:00
parent 765c940094
commit b554457e5e

View File

@ -3,7 +3,7 @@
Parallel execution in BIRD uses an underlying mechanism of dedicated IO loops Parallel execution in BIRD uses an underlying mechanism of dedicated IO loops
and hierarchical locks. The original event scheduling module has been converted and hierarchical locks. The original event scheduling module has been converted
to do message passing in multithreaded environment. These mechanisms are to do message passing in multithreaded environment. These mechanisms are
crucial for understanding what happens inside BIRD and how the protocol API changes. crucial for understanding what happens inside BIRD and how its internal API changes.
BIRD is a fast, robust and memory-efficient routing daemon designed and BIRD is a fast, robust and memory-efficient routing daemon designed and
implemented at the end of 20th century. We're doing a significant amount of implemented at the end of 20th century. We're doing a significant amount of
@ -18,7 +18,7 @@ to lock all the parts which have not been checked and updated yet.
The authors of original BIRD concepts wisely chose a highly modular structure The authors of original BIRD concepts wisely chose a highly modular structure
which allows to create a hierarchy for locks. The main chokepoint was between which allows to create a hierarchy for locks. The main chokepoint was between
protocols and tables which has been solved by implementing asynchronous exports protocols and tables and it has been removed by implementing asynchronous exports
as described in the [previous chapter](https://en.blog.nic.cz/2021/06/14/bird-journey-to-threads-chapter-2-asynchronous-route-export/). as described in the [previous chapter](https://en.blog.nic.cz/2021/06/14/bird-journey-to-threads-chapter-2-asynchronous-route-export/).
Locks in BIRD (called domains, as they always lock some defined part of BIRD) Locks in BIRD (called domains, as they always lock some defined part of BIRD)
@ -72,42 +72,53 @@ The major lock level is still The BIRD Lock, containing not only the
not-yet-converted protocols (like Babel, OSPF or RIP) but also processing CLI not-yet-converted protocols (like Babel, OSPF or RIP) but also processing CLI
commands and reconfiguration. This involves an awful lot of direct access into commands and reconfiguration. This involves an awful lot of direct access into
other contexts which would be unnecessarily complicated to implement by message other contexts which would be unnecessarily complicated to implement by message
passing. Therefore, this lock is simply *"the director"*, sitting on the top. passing. Therefore, this lock is simply *"the director"*, sitting on the top
with its own category.
The lower lock levels are mostly for shared global data structures accessed The lower lock levels under routing tables are mostly for shared global data
from everywhere. We'll address some of these later. structures accessed from everywhere. We'll address some of these later.
## IO Loop ## IO Loop
There has been a protocol, BFD, running in its own thread since 2013. This There has been a protocol, BFD, running in its own thread since 2013. This
separation has a good reason; it needs low latency and the main BIRD loop just separation has a good reason; it needs low latency and the main BIRD loop just
walks round-robin around all the available sockets which may last for a long walks round-robin around all the available sockets and one round-trip may take
time. BFD had its own IO loop implementation and simple message passing a long time (even more than a minute with large configurations). BFD had its
routines. This code could be easily updated for general use so I did it. own IO loop implementation and simple message passing routines. This code could
be easily updated for general use so I did it.
To understand the internal principles, we should say that in the `master` To understand the internal principles, we should say that in the `master`
branch, there is a big loop centered around a `poll()` call, dispatching and branch, there is a big loop centered around a `poll()` call, dispatching and
executing everything as needed. There are several means how to get something dispatched from a loop. executing everything as needed. In the `sark` branch, there are multiple loops
of this kind. BIRD has several means how to get something dispatched from a
loop.
1. Requesting to read from a socket makes the main loop call your hook when there is some data received. 1. Requesting to read from a **socket** makes the main loop call your hook when there is some data received.
The same happens when a socket refuses to write data. Then the data is buffered and you are called when The same happens when a socket refuses to write data. Then the data is buffered and you are called when
the buffer is free. There is also a third callback, an error hook, for obvious reasons. the buffer is free to continue writing. There is also a third callback, an error hook, for obvious reasons.
2. Requesting to be called back after a given amount of time. This is called *timer*. 2. Requesting to be called back after a given amount of time. This is called **timer**.
As is common with all timers, they aren't precise and the callback may be
delayed significantly. This was also the reason to have BFD loop separate
since the very beginning, yet now the abundance of threads may lead to
problems with BFD latency in large-scale configurations. We haven't tested
this yet.
3. Requesting to be called back when possible. This is useful to run anything 3. Requesting to be called back from a clean context when possible. This is
not reentrant which might mess with the caller's data, e.g. when a protocol useful to run anything not reentrant which might mess with the caller's
decides to shutdown due to some inconsistency in received data. This is called *event*. data, e.g. when a protocol decides to shutdown due to some inconsistency
in received data. This is called **event**.
4. Requesting to do some work when possible. These are also events, there is only 4. Requesting to do some work when possible. These are also events, there is only
a difference where this event is enqueued; in the main loop, there is a a difference where this event is enqueued; in the main loop, there is a
special *work queue* with an execution limit, allowing sockets and timers to be special *work queue* with an execution limit, allowing sockets and timers to be
handled with a reasonable latency while still doing all the work needed. handled with a reasonable latency while still doing all the work needed.
Other loops don't have designated work queues (we may add them later).
All these, sockets, timers and events, are tightly bound to some domain. All these, sockets, timers and events, are tightly bound to some domain.
Sockets typically belong to a protocol, timers and events to a protocol or table. Sockets typically belong to a protocol, timers and events to a protocol or table.
With the modular structure of BIRD, the easy and convenient approach to multithreading With the modular structure of BIRD, the easy and convenient approach to multithreading
is to get more IO loops bound to specific domains, running their events, timers and is to get more IO loops, each bound to a specific domain, running their events, timers and
socket hooks in their threads. socket hooks in their threads.
## Message passing and loop entering ## Message passing and loop entering
@ -155,17 +166,20 @@ triple-indirect delayed route announcement is employed:
1. First, when a channel imports a route by entering a loop, it sends an event 1. First, when a channel imports a route by entering a loop, it sends an event
to its own loop (no ping needed in such case). This operation is idempotent, to its own loop (no ping needed in such case). This operation is idempotent,
thus for several routes in a row, only one event is enqueued. This reduces thus for several routes in a row, only one event is enqueued. This reduces
several route imports (even hundreds in case of massive BGP withdrawals) to several route import announcements (even hundreds in case of massive BGP
one single event. withdrawals) to one single event.
2. When the channel is done importing (or at least takes a coffee break and 2. When the channel is done importing (or at least takes a coffee break and
checks its mailbox), the scheduled event in its own loop is run, sending checks its mailbox), the scheduled event in its own loop is run, sending
another event to the table's loop, saying basically *"Hey, table, I've just another event to the table's loop, saying basically *"Hey, table, I've just
imported something."*. This event is also idempotent and further reduces imported something."*. This event is also idempotent and further reduces
route imports from multiple sources to one single event. route import announcements from multiple sources to one single event.
3. The table's announcement event is then executed from its loop, enqueuing export 3. The table's announcement event is then executed from its loop, enqueuing export
events for all connected channels, finally initiating route exports. As we events for all connected channels, finally initiating route exports. As we
already know, imports are done by direct access, therefore if protocols keep already know, imports are done by direct access, therefore if protocols keep
importing, export announcements must wait. importing, export announcements are slowed down.
4. The actual data on what has been updated is stored in a table journal. This
peculiar technique is used only for informing the exporting channels that
*"there is something to do"*.
This may seem overly complicated, yet it should work and it seems to work. In This may seem overly complicated, yet it should work and it seems to work. In
case of low load, all these notifications just come through smoothly. In case case of low load, all these notifications just come through smoothly. In case
@ -191,16 +205,16 @@ as these multiply the exports by propagating them all the way down to other
tables, eventually eating about twice the amount of memory than the single-threaded version. tables, eventually eating about twice the amount of memory than the single-threaded version.
There is therefore a cork to make this stop. Every table is checking how many There is therefore a cork to make this stop. Every table is checking how many
exports it has pending, and when adding a new export to the queue, it may apply exports it has pending, and when adding a new export to the queue, it may request
a cork, saying simply "please stop the flow for a while". When the exports are a cork, saying simply "please stop the flow for a while". When the export buffer
then processed, it uncorks. size is reduced low enough, the table uncorks.
On the other side, there may be events and sockets with a cork assigned. When On the other side, there are events and sockets with a cork assigned. When
trying to enqueue an event and the cork is applied, the event is instead put trying to enqueue an event and the cork is applied, the event is instead put
into the cork's queue and released only when the cork is released. In case of into the cork's queue and released only when the cork is released. In case of
sockets, when `poll` arguments are recalculated, the corked socket is not sockets, when read is indicated or when `poll` arguments are recalculated,
checked for received packets, effectively keeping them in the TCP queue and the corked socket is simply not checked for received packets, effectively
slowing down the flow. keeping them in the TCP queue and slowing down the flow until cork is released.
The cork implementation is quite crude and rough and fragile. It may get some The cork implementation is quite crude and rough and fragile. It may get some
rework while stabilizing the multi-threaded version of BIRD or we may even rework while stabilizing the multi-threaded version of BIRD or we may even