There is no real need for storing bucket attributes locally and we may
save some memory by caching the attributes in one central place.
If this becomes a contention problem, we should reduce the lock load
of the central attribute cache.
Introducing a new omnipotent internal API to just pass route updates
from whatever point wherever we want.
From now on, all the exports should be processed by RT_WALK_EXPORTS
macro, and you can also issue a separate feed-only request to just get a
feed and finish.
The exporters can now also stop and the readers must expect that to
happen and recover. Main tables don't stop, though.
To avoid needs for keeping local temporary references for attributes,
now one can use ea_lookup_tmp() to ensure that the attributes are
valid and stored until the task ends. After that, the attributes are
automatically unref'd and also deallocated if needed.
This commit makes the route chains in the tables atomic. This allows not
only standard exports but also feeds and bulk exports to be processed
without ever locking the table.
Design note: the overall data structures are quite brittle. We're using
RCU read-locks to keep track about readers, and we're indicating ongoing
work on the data structures by prepending a REF_OBSOLETE sentinel node
to make every reader go waiting.
All the operations are intended to stay inside nest/rt-table.c and it
may be even best to further refactor the code to hide the routing table
internal structure inside there. Nobody shall definitely write any
routines manipulating live routes in tables from outside.
If cork occurred after some incoming data had been already processed,
BGP incorrectly processed them again after uncorking because it forgot
to store the actual socket state.
Now storing the socket state (done at the end of bgp_rx()) and
therefore the bug is fixed.
BGP route attributes have flags (Optional, Transitive) that are validated
on decode and set to valid value on export. But if such attribute is
modified by filter or set internally by BGP during import, then its flags
would be zero in local tables. That usually does not matter, as they are
not used locally and they were fixed on export, but invalid flags leaked
in BMP and MRT dumps.
Keep route attribute flags set to valid values even when set by filters
or modified by BGP.
We can distinguish BGP sessions if at least one side uses a different IP
address. Extend olock mechanism to handle local IP as a part of key, with
optional wildcard, so BGP sessions could local IP in the olock and not
block themselves.
Increase max length of notification data in error logs from 16 to 128.
There is already enough space in the buffer.
Thanks to Marco d'Itri for the suggestion.
Implement BGP Send hold timer according to draft-ietf-idr-bgp-sendholdtimer.
The Send hold timer drops the session if the neighbor is sending keepalives,
but does not receive our messages, causing the TCP connection to stall.
Some BGP capabilities change the BGP behavior in a significant way, so if
the configuration depends on it, it is better to not establish BGP
session when the capability is not available.
Add several BGP option to require individual BGP capabilities during
session negotiation.
Some [redacted] (yes, myself) had a really bad idea
to rename nest/route.h to nest/rt.h while refactoring
some data structures out of it.
This led to unnecessarily complex problems with
merging updates from v2. Reverting this change
to make my life a bit easier.
At least it needed only one find-sed command:
find -name '*.[chlY]' -type f -exec sed -i 's#nest/rt.h#nest/route.h#' '{}' +
This merge was particularly difficult. I finally resorted to delete the
symbol scope active flag altogether and replace its usage by other
means.
Also I had to update custom route attribute registration to fit
both the scope updates in v2 and the data model in v3.
Conflicts:
conf/cf-lex.l
conf/conf.h
conf/confbase.Y
conf/gen_keywords.m4
conf/gen_parser.m4
filter/config.Y
nest/config.Y
proto/bgp/config.Y
proto/static/config.Y
Keywords and attributes are split to separate namespaces, to avoid
collisions between regular keyword use and attribute overlay.
There are now 3 different pools with specific lifetime. All of these are
available since protocol start, anyway they get freed in different
moments.
First, pool_up gets freed immediately after announcing PS_STOP, to e.g.
stop all timers and events regularly updating the routing table when the
imports are already flushing.
Then, pool_inloop gets freed just before the protocol loop is finally
stopped, after all channels, imports and exports and other hooks are
cleaned up.
And finally, the pool itself is freed the last. Unless you explicitly
need the early free, use this pool.