There were some nasty problems with deferred protocol state updates and
race conditions on BGP startup, shutdown, and also with referencing the
cached states.
Now it looks fixed.
For the upcoming rework of protocol state information propagation,
we need some more eattr types to be defined.
These types are probably not defined completely and before using
them for route attributes, you should check that they don't lack
some crucial methods.
This partially reverts commit d617801c31.
The common lockfree doesn't work well for high-volume structures like
eattr cache because it expects the structure to be cleaned up by a
sweeper routine ... which is very ineffective for >1M records.
OTOH, we need the deferred ea_free in all cases ... so keeping that.
In future, this and rtable's data structures should be probably merged
but it isn't a good idea to do now. The used data structure is similar
to rtable -- an array of pointers to linked lists.
Feed is lockless, as with all tables.
Full export (receiving updates) is not supported yet but we don't have
any method how to use it anyway. Gonna implement it later.
To avoid needs for keeping local temporary references for attributes,
now one can use ea_lookup_tmp() to ensure that the attributes are
valid and stored until the task ends. After that, the attributes are
automatically unref'd and also deallocated if needed.
This merge was particularly difficult. I finally resorted to delete the
symbol scope active flag altogether and replace its usage by other
means.
Also I had to update custom route attribute registration to fit
both the scope updates in v2 and the data model in v3.
The import table feed wasn't resetting the table-specific route values
like REF_FILTERED and thus made the route look like filtered even though
it should have been re-evaluated as accepted.
Had to fix route source locking inside BGP export table as we need to
keep the route sources properly allocated until even last BGP pending
update is sent out, therefore the export table printout is accurate.
For BGP LLGR purposes, there was an API allowing a protocol to directly
modify their stale routes in table before flushing them. This API was
called by the table prune routine which violates the future locking
requirements.
Instead of this, BGP now requests a special route export and reimports
these routes into the table, allowing for asynchronous execution without
locking the table on export.
Until now, we were marking routes as REF_STALE and REF_DISCARD to
cleanup old routes after route refresh. This needed a synchronous route
table walk at both beginning and the end of route refresh routine,
marking the routes by the flags.
We avoid these walks by using a stale counter. Every route contains:
u8 stale_cycle;
Every import hook contains:
u8 stale_set;
u8 stale_valid;
u8 stale_pruned;
u8 stale_pruning;
In base_state, stale_set == stale_valid == stale_pruned == stale_pruning
and all routes' stale_cycle also have the same value.
The route refresh looks like follows:
+ ----------- + --------- + ----------- + ------------- + ------------ +
| | stale_set | stale_valid | stale_pruning | stale_pruned |
| Base | x | x | x | x |
| Begin | x+1 | x | x | x |
... now routes are being inserted with stale_cycle == (x+1)
| End | x+1 | x+1 | x | x |
... now table pruning routine is scheduled
| Prune begin | x+1 | x+1 | x+1 | x |
... now routes with stale_cycle not between stale_set and stale_valid
are deleted
| Prune end | x+1 | x+1 | x+1 | x+1 |
+ ----------- + --------- + ----------- + ------------- + ------------ +
The pruning routine is asynchronous and may have high latency in
high-load environments. Therefore, multiple route refresh requests may
happen before the pruning routine starts, leading to this situation:
| Prune begin | x+k | x+k | x -> x+k | x |
... or even
| Prune begin | x+k+1 | x+k | x -> x+k | x |
... if the prune event starts while another route refresh is running.
In such a case, the pruning routine still deletes routes not fitting
between stale_set and and stale_valid, effectively pruning the remnants
of all unpruned route refreshes from before:
| Prune end | x+k | x+k | x+k | x+k |
In extremely rare cases, there may happen too many route refreshes
before any route prune routine finishes. If the difference between
stale_valid and stale_pruned becomes more than 128 when requesting for
another route refresh, the routine walks the table synchronously and
resets all the stale values to a base state, while logging a warning.
Until now, if export table was enabled, Nest was storing exactly the
route before rt_notify() was called on it. This was quite sloppy and
spooky and it also wasn't reflecting the changes BGP does before
sending. And as BGP is storing the routes to be sent anyway, we are
simply keeping the already-sent routes in there to better rule out
unneeded reexports.
Some of the route attributes (IGP metric, preference) make no sense in
BGP, therefore these will be probably replaced by something sensible.
Also the nexthop shown in the short output is the BGP nexthop.
There were quite a lot of conflicts in flowspec validation code which
ultimately led to some code being a bit rewritten, not only adapted from
this or that branch, yet it is still in a limit of a merge.