Had to fix route source locking inside BGP export table as we need to
keep the route sources properly allocated until even last BGP pending
update is sent out, therefore the export table printout is accurate.
Replacing the old 3.0-alpha0 cork mechanism with another one inside the
routing table. This version should be simpler and also quite clear what
it does, why and when.
These routines detect the export congestion (as defined by configurable
thresholds) and propagate the state to readers. There are no readers for
now, they will be added in following commits.
Some unit tests weren't initializing the birdloop, trying to write the
birdloop ping into stdin. Fixed this and also forced stdin close on
startup of every test just to be sure that CI and local build behave the
same in this. (CI was failing on this while local build not.)
In multithreaded environment, we need to pass messages between workers.
This is done by queuing events to their respective queues. The
double-linked list is not really useful for that as it needs locking
everywhere.
This commit rewrites the event subsystem to use a single-linked list
where events are enqueued by a single atomic instruction and the queue
is processed after atomically moving the whole queue aside.
There were more conflicts that I'd like to see, most notably in route
export. If a bisect identifies this commit with something related, it
may be simply true that this commit introduces that bug. Let's hope it
doesn't happen.
The invalid routes were filtered out before they could ever get
exported, yet some of the routines need them available, e.g. for
display or import reload.
Now the invalid routes are properly exported and dropped in channel
export routines instead.
Introduced by 13ef5e53dd, the CLI was not
properly cleaned up when the command finished, causing BIRD to not parse
any other command after "show route".
For BGP LLGR purposes, there was an API allowing a protocol to directly
modify their stale routes in table before flushing them. This API was
called by the table prune routine which violates the future locking
requirements.
Instead of this, BGP now requests a special route export and reimports
these routes into the table, allowing for asynchronous execution without
locking the table on export.
Until now, we were marking routes as REF_STALE and REF_DISCARD to
cleanup old routes after route refresh. This needed a synchronous route
table walk at both beginning and the end of route refresh routine,
marking the routes by the flags.
We avoid these walks by using a stale counter. Every route contains:
u8 stale_cycle;
Every import hook contains:
u8 stale_set;
u8 stale_valid;
u8 stale_pruned;
u8 stale_pruning;
In base_state, stale_set == stale_valid == stale_pruned == stale_pruning
and all routes' stale_cycle also have the same value.
The route refresh looks like follows:
+ ----------- + --------- + ----------- + ------------- + ------------ +
| | stale_set | stale_valid | stale_pruning | stale_pruned |
| Base | x | x | x | x |
| Begin | x+1 | x | x | x |
... now routes are being inserted with stale_cycle == (x+1)
| End | x+1 | x+1 | x | x |
... now table pruning routine is scheduled
| Prune begin | x+1 | x+1 | x+1 | x |
... now routes with stale_cycle not between stale_set and stale_valid
are deleted
| Prune end | x+1 | x+1 | x+1 | x+1 |
+ ----------- + --------- + ----------- + ------------- + ------------ +
The pruning routine is asynchronous and may have high latency in
high-load environments. Therefore, multiple route refresh requests may
happen before the pruning routine starts, leading to this situation:
| Prune begin | x+k | x+k | x -> x+k | x |
... or even
| Prune begin | x+k+1 | x+k | x -> x+k | x |
... if the prune event starts while another route refresh is running.
In such a case, the pruning routine still deletes routes not fitting
between stale_set and and stale_valid, effectively pruning the remnants
of all unpruned route refreshes from before:
| Prune end | x+k | x+k | x+k | x+k |
In extremely rare cases, there may happen too many route refreshes
before any route prune routine finishes. If the difference between
stale_valid and stale_pruned becomes more than 128 when requesting for
another route refresh, the routine walks the table synchronously and
resets all the stale values to a base state, while logging a warning.
The learnt routes are now pushed all into the connected table, not only
the best one. This shouldn't do any damage in well managed setups, yet
it should be noted that it is a change of behavior.
If anybody misses a feature which they implemented by misusing this
internal learn table, let us know, we'll consider implementing it in a
better way.
Until now, if export table was enabled, Nest was storing exactly the
route before rt_notify() was called on it. This was quite sloppy and
spooky and it also wasn't reflecting the changes BGP does before
sending. And as BGP is storing the routes to be sent anyway, we are
simply keeping the already-sent routes in there to better rule out
unneeded reexports.
Some of the route attributes (IGP metric, preference) make no sense in
BGP, therefore these will be probably replaced by something sensible.
Also the nexthop shown in the short output is the BGP nexthop.
It's now possible to pause iteration through hash. This requires
struct hash_iterator to be allocated somewhere handy.
The iteration itself is surrounded by HASH_WALK_ITER and
HASH_WALK_ITER_END. Call HASH_WALK_ITER_PUT to ask for pausing; it may
still do some more iterations until it comes to a suitable pausing
point. The iterator must be initalized to an empty structure. No cleanup
is needed if iteration is abandoned inbetween.