mirror of
https://gitlab.nic.cz/labs/bird.git
synced 2025-01-05 08:31:53 +00:00
Thread documentation: Final version of chapter 3
This commit is contained in:
parent
765c940094
commit
b554457e5e
@ -3,7 +3,7 @@
|
|||||||
Parallel execution in BIRD uses an underlying mechanism of dedicated IO loops
|
Parallel execution in BIRD uses an underlying mechanism of dedicated IO loops
|
||||||
and hierarchical locks. The original event scheduling module has been converted
|
and hierarchical locks. The original event scheduling module has been converted
|
||||||
to do message passing in multithreaded environment. These mechanisms are
|
to do message passing in multithreaded environment. These mechanisms are
|
||||||
crucial for understanding what happens inside BIRD and how the protocol API changes.
|
crucial for understanding what happens inside BIRD and how its internal API changes.
|
||||||
|
|
||||||
BIRD is a fast, robust and memory-efficient routing daemon designed and
|
BIRD is a fast, robust and memory-efficient routing daemon designed and
|
||||||
implemented at the end of 20th century. We're doing a significant amount of
|
implemented at the end of 20th century. We're doing a significant amount of
|
||||||
@ -18,7 +18,7 @@ to lock all the parts which have not been checked and updated yet.
|
|||||||
|
|
||||||
The authors of original BIRD concepts wisely chose a highly modular structure
|
The authors of original BIRD concepts wisely chose a highly modular structure
|
||||||
which allows to create a hierarchy for locks. The main chokepoint was between
|
which allows to create a hierarchy for locks. The main chokepoint was between
|
||||||
protocols and tables which has been solved by implementing asynchronous exports
|
protocols and tables and it has been removed by implementing asynchronous exports
|
||||||
as described in the [previous chapter](https://en.blog.nic.cz/2021/06/14/bird-journey-to-threads-chapter-2-asynchronous-route-export/).
|
as described in the [previous chapter](https://en.blog.nic.cz/2021/06/14/bird-journey-to-threads-chapter-2-asynchronous-route-export/).
|
||||||
|
|
||||||
Locks in BIRD (called domains, as they always lock some defined part of BIRD)
|
Locks in BIRD (called domains, as they always lock some defined part of BIRD)
|
||||||
@ -72,42 +72,53 @@ The major lock level is still The BIRD Lock, containing not only the
|
|||||||
not-yet-converted protocols (like Babel, OSPF or RIP) but also processing CLI
|
not-yet-converted protocols (like Babel, OSPF or RIP) but also processing CLI
|
||||||
commands and reconfiguration. This involves an awful lot of direct access into
|
commands and reconfiguration. This involves an awful lot of direct access into
|
||||||
other contexts which would be unnecessarily complicated to implement by message
|
other contexts which would be unnecessarily complicated to implement by message
|
||||||
passing. Therefore, this lock is simply *"the director"*, sitting on the top.
|
passing. Therefore, this lock is simply *"the director"*, sitting on the top
|
||||||
|
with its own category.
|
||||||
|
|
||||||
The lower lock levels are mostly for shared global data structures accessed
|
The lower lock levels under routing tables are mostly for shared global data
|
||||||
from everywhere. We'll address some of these later.
|
structures accessed from everywhere. We'll address some of these later.
|
||||||
|
|
||||||
## IO Loop
|
## IO Loop
|
||||||
|
|
||||||
There has been a protocol, BFD, running in its own thread since 2013. This
|
There has been a protocol, BFD, running in its own thread since 2013. This
|
||||||
separation has a good reason; it needs low latency and the main BIRD loop just
|
separation has a good reason; it needs low latency and the main BIRD loop just
|
||||||
walks round-robin around all the available sockets which may last for a long
|
walks round-robin around all the available sockets and one round-trip may take
|
||||||
time. BFD had its own IO loop implementation and simple message passing
|
a long time (even more than a minute with large configurations). BFD had its
|
||||||
routines. This code could be easily updated for general use so I did it.
|
own IO loop implementation and simple message passing routines. This code could
|
||||||
|
be easily updated for general use so I did it.
|
||||||
|
|
||||||
To understand the internal principles, we should say that in the `master`
|
To understand the internal principles, we should say that in the `master`
|
||||||
branch, there is a big loop centered around a `poll()` call, dispatching and
|
branch, there is a big loop centered around a `poll()` call, dispatching and
|
||||||
executing everything as needed. There are several means how to get something dispatched from a loop.
|
executing everything as needed. In the `sark` branch, there are multiple loops
|
||||||
|
of this kind. BIRD has several means how to get something dispatched from a
|
||||||
|
loop.
|
||||||
|
|
||||||
1. Requesting to read from a socket makes the main loop call your hook when there is some data received.
|
1. Requesting to read from a **socket** makes the main loop call your hook when there is some data received.
|
||||||
The same happens when a socket refuses to write data. Then the data is buffered and you are called when
|
The same happens when a socket refuses to write data. Then the data is buffered and you are called when
|
||||||
the buffer is free. There is also a third callback, an error hook, for obvious reasons.
|
the buffer is free to continue writing. There is also a third callback, an error hook, for obvious reasons.
|
||||||
|
|
||||||
2. Requesting to be called back after a given amount of time. This is called *timer*.
|
2. Requesting to be called back after a given amount of time. This is called **timer**.
|
||||||
|
As is common with all timers, they aren't precise and the callback may be
|
||||||
|
delayed significantly. This was also the reason to have BFD loop separate
|
||||||
|
since the very beginning, yet now the abundance of threads may lead to
|
||||||
|
problems with BFD latency in large-scale configurations. We haven't tested
|
||||||
|
this yet.
|
||||||
|
|
||||||
3. Requesting to be called back when possible. This is useful to run anything
|
3. Requesting to be called back from a clean context when possible. This is
|
||||||
not reentrant which might mess with the caller's data, e.g. when a protocol
|
useful to run anything not reentrant which might mess with the caller's
|
||||||
decides to shutdown due to some inconsistency in received data. This is called *event*.
|
data, e.g. when a protocol decides to shutdown due to some inconsistency
|
||||||
|
in received data. This is called **event**.
|
||||||
|
|
||||||
4. Requesting to do some work when possible. These are also events, there is only
|
4. Requesting to do some work when possible. These are also events, there is only
|
||||||
a difference where this event is enqueued; in the main loop, there is a
|
a difference where this event is enqueued; in the main loop, there is a
|
||||||
special *work queue* with an execution limit, allowing sockets and timers to be
|
special *work queue* with an execution limit, allowing sockets and timers to be
|
||||||
handled with a reasonable latency while still doing all the work needed.
|
handled with a reasonable latency while still doing all the work needed.
|
||||||
|
Other loops don't have designated work queues (we may add them later).
|
||||||
|
|
||||||
All these, sockets, timers and events, are tightly bound to some domain.
|
All these, sockets, timers and events, are tightly bound to some domain.
|
||||||
Sockets typically belong to a protocol, timers and events to a protocol or table.
|
Sockets typically belong to a protocol, timers and events to a protocol or table.
|
||||||
With the modular structure of BIRD, the easy and convenient approach to multithreading
|
With the modular structure of BIRD, the easy and convenient approach to multithreading
|
||||||
is to get more IO loops bound to specific domains, running their events, timers and
|
is to get more IO loops, each bound to a specific domain, running their events, timers and
|
||||||
socket hooks in their threads.
|
socket hooks in their threads.
|
||||||
|
|
||||||
## Message passing and loop entering
|
## Message passing and loop entering
|
||||||
@ -155,17 +166,20 @@ triple-indirect delayed route announcement is employed:
|
|||||||
1. First, when a channel imports a route by entering a loop, it sends an event
|
1. First, when a channel imports a route by entering a loop, it sends an event
|
||||||
to its own loop (no ping needed in such case). This operation is idempotent,
|
to its own loop (no ping needed in such case). This operation is idempotent,
|
||||||
thus for several routes in a row, only one event is enqueued. This reduces
|
thus for several routes in a row, only one event is enqueued. This reduces
|
||||||
several route imports (even hundreds in case of massive BGP withdrawals) to
|
several route import announcements (even hundreds in case of massive BGP
|
||||||
one single event.
|
withdrawals) to one single event.
|
||||||
2. When the channel is done importing (or at least takes a coffee break and
|
2. When the channel is done importing (or at least takes a coffee break and
|
||||||
checks its mailbox), the scheduled event in its own loop is run, sending
|
checks its mailbox), the scheduled event in its own loop is run, sending
|
||||||
another event to the table's loop, saying basically *"Hey, table, I've just
|
another event to the table's loop, saying basically *"Hey, table, I've just
|
||||||
imported something."*. This event is also idempotent and further reduces
|
imported something."*. This event is also idempotent and further reduces
|
||||||
route imports from multiple sources to one single event.
|
route import announcements from multiple sources to one single event.
|
||||||
3. The table's announcement event is then executed from its loop, enqueuing export
|
3. The table's announcement event is then executed from its loop, enqueuing export
|
||||||
events for all connected channels, finally initiating route exports. As we
|
events for all connected channels, finally initiating route exports. As we
|
||||||
already know, imports are done by direct access, therefore if protocols keep
|
already know, imports are done by direct access, therefore if protocols keep
|
||||||
importing, export announcements must wait.
|
importing, export announcements are slowed down.
|
||||||
|
4. The actual data on what has been updated is stored in a table journal. This
|
||||||
|
peculiar technique is used only for informing the exporting channels that
|
||||||
|
*"there is something to do"*.
|
||||||
|
|
||||||
This may seem overly complicated, yet it should work and it seems to work. In
|
This may seem overly complicated, yet it should work and it seems to work. In
|
||||||
case of low load, all these notifications just come through smoothly. In case
|
case of low load, all these notifications just come through smoothly. In case
|
||||||
@ -191,16 +205,16 @@ as these multiply the exports by propagating them all the way down to other
|
|||||||
tables, eventually eating about twice the amount of memory than the single-threaded version.
|
tables, eventually eating about twice the amount of memory than the single-threaded version.
|
||||||
|
|
||||||
There is therefore a cork to make this stop. Every table is checking how many
|
There is therefore a cork to make this stop. Every table is checking how many
|
||||||
exports it has pending, and when adding a new export to the queue, it may apply
|
exports it has pending, and when adding a new export to the queue, it may request
|
||||||
a cork, saying simply "please stop the flow for a while". When the exports are
|
a cork, saying simply "please stop the flow for a while". When the export buffer
|
||||||
then processed, it uncorks.
|
size is reduced low enough, the table uncorks.
|
||||||
|
|
||||||
On the other side, there may be events and sockets with a cork assigned. When
|
On the other side, there are events and sockets with a cork assigned. When
|
||||||
trying to enqueue an event and the cork is applied, the event is instead put
|
trying to enqueue an event and the cork is applied, the event is instead put
|
||||||
into the cork's queue and released only when the cork is released. In case of
|
into the cork's queue and released only when the cork is released. In case of
|
||||||
sockets, when `poll` arguments are recalculated, the corked socket is not
|
sockets, when read is indicated or when `poll` arguments are recalculated,
|
||||||
checked for received packets, effectively keeping them in the TCP queue and
|
the corked socket is simply not checked for received packets, effectively
|
||||||
slowing down the flow.
|
keeping them in the TCP queue and slowing down the flow until cork is released.
|
||||||
|
|
||||||
The cork implementation is quite crude and rough and fragile. It may get some
|
The cork implementation is quite crude and rough and fragile. It may get some
|
||||||
rework while stabilizing the multi-threaded version of BIRD or we may even
|
rework while stabilizing the multi-threaded version of BIRD or we may even
|
||||||
|
Loading…
Reference in New Issue
Block a user