According to RFC 5882, system should not interpret the local or remote
session state transition to AdminDown as failure. We followed that for
the local session state but not for the remote session state (which
just triggered a transition of the local state to Down). The patch
fixes that.
We do not properly generate AdminDown on our side, so the patch is
relevant just for interoperability with other systems.
Thanks to Sunnat Samadov for the bugreport.
After converting BFD to the new IO loop system, reconfiguration never
really worked. Sadly, we missed this case in our testing suite so it
passed under the radar for quite a while.
Thanks to Andrei Dinu <andrei.dinu@digitalit.ro> for reporting and
isolating this issue.
Memory allocation is a fragile part of BIRD and we need checking that
everybody is using the resource pools in an appropriate way. To assure
this, all the resource pools are associated with locking domains and
every resource manipulation is thoroughly checked whether the
appropriate locking domain is locked.
With transitive resource manipulation like resource dumping or mass free
operations, domains are locked and unlocked on the go, thus we require
pool domains to have higher order than their parent to allow for this
transitive operations.
Adding pool locking revealed some cases of insecure memory manipulation
and this commit fixes that as well.
When several BGPs requested a BFD session in short time, chances were
that the second BGP would file a request while the pickup routine was
still running and it would get enqueued into the waiting list instead of
being picked up.
Fixed this by enforcing pickup loop restart when new requests got added,
and also by atomically moving the unpicked requests to a temporary list
to announce admin down before actually being added into the wait list.
Now sk_open() requires an explicit IO loop to open the socket in. Also
specific functions for socket RX pause / resume are added to allow for
BGP corking.
And last but not least, socket reloop is now synchronous to resolve
weird cases of the target loop stopping before actually picking up the
relooped socket. Now the caller must ensure that both loops are locked
while relooping, and this way all sockets always have their respective
loop.
Instead of propagating interface updates as they are loaded from kernel,
they are enqueued and all the notifications are called from a
protocol-specific event. This change allows to break the locking loop
between protocols and interfaces.
Anyway, this change is based on v2 branch to keep the changes between v2
and v3 smaller.
For active sessions, ignore received packets with zero local id and
mismatched remote id. That forces a session timeout instead of an
immediate session restart. It makes BFD sessions more resilient to
packet spoofing.
Thanks to André Grüneberg for the suggestion.
Changes in internal API:
* Every route attribute must be defined as struct ea_class somewhere.
* Registration of route attributes known at startup must be done by
ea_register_init() from protocol build functions.
* Every attribute has now its symbol registered in a global symbol table
defined as SYM_ATTRIBUTE
* All attribute ID's are dynamically allocated.
* Attribute value custom formatting hook is defined in the ea_class.
* Attribute names are the same for display and filters, always prefixed
by protocol name.
Also added some unit testing code for filters with route attributes.
Add BFD protocol option 'strict bind' to use separate listening socket
for each BFD interface bound to its address instead of using shared
listening sockets.
There is a simple universal IO loop, taking care of events, timers and
sockets. Primarily, one instance of a protocol should use exactly one IO
loop to do all its work, as is now done in BFD.
Contrary to previous versions, the loop is now launched and cleaned by
the nest/proto.c code, allowing for a protocol to just request its own
loop by setting the loop's lock order in config higher than the_bird.
It is not supported nor checked if any protocol changed the requested
lock order in reconfigure. No protocol should do it at all.
In previous versions, every thread used its own time structures,
effectively leading to different time in every thread and strange
logging messages.
The time processing code now uses global atomic variables to keep
current time available for fast concurrent reading and safe updates.
Direct BFD sessions needs to be dispatched not only by IP addresses, but
also by interfaces, in order to avoid collisions between neighbors with
the same IPv6 link-local addresses.
Extend BFD session hash_ip key by interface index to handle that. Use 0
for multihop sessions.
Thanks to Sebastian Hahn for the original patch.
BFD session options are configured per interface in BFD protocol. This
patch allows to specify them also per-request in protocols requesting
sessions (currently limited to BGP).
Most commands like 'show ospf neighbors' fail when protocol is not
specified and there are multiple instances of given protocol type.
This is annoying in BIRD 2, as many protocols have IPv4 and IPv6
instances. The patch changes that by showing output from all protocol
instances of appropriate type.
Note that the patch also removes terminating cli_msg() call from these
commands and moves it to the common iterating code.
The bfd_reconfigure_neighbors() returned after first reconfigured
neighbor instead of continuing with the next one.
Thanks to Winston Chen for the bugreport and a patch.
Protocol can have specified VRF, in such case it is restricted to a set
of ifaces associated with the VRF, otherwise it can use all interfaces.
The patch allows to specify VRF as 'default', in which case it is
restricted to a set of iface not associated with any VRF.