RPKI-To-Router (RTR) sessions seem to be similar security-sensitivity as
IBGP sessions. BIRD already offered a choice of either "plain TCP" (meh)
or "SSH" (secure, albeit a bit more hassle to set up than TCP-MD5).
The patch adds TCP-MD5 as another option. TCP-MD5 for RTR is specified
through RFC 6810 section 7.3 and RFC 8210 section 9.3.
Minor changes by committer.
Some vendors do not fill the checksum for IPv6 UDP packets.
For interoperability with such implementations one can set
UDP_NO_CHECK6_RX socket option on Linux.
Thanks to Ville O for the suggestion.
Minor changes by committer.
This reverts commit d01a7c2bda.
It seems that the performance penalty in global ea cache is actually
very high so returning back to local attribute caches in every BGP.
closes#86
The uncork events are running from mainloop so these should just
dispatch the right event to the right loop. Doing anything long there
is bad for performance and latency as the uncork list may be huge.
Channel is now just subscribing to yet another journal announcing
digested tries from the ROA table.
Creating tries in every channel on-the-fly was too slow to handle
and it ate obnoxious amounts of memory. Instead, the tries are
constructed directly in the table and the channels are notified
with the completed tries.
The delayed export-release mechanism is used to keep the tries allocated
until routes get reloaded.
Originally, this mechanism required to check whether there's enough time to work
and then to send an event. This macro combines all the logic and goes more straightforwardly
to the _end_ of the export processing loop.
One should note that there were two cases where the export processing loop
was deferred at the _beginning_, which led to ignoring some routes on
reimports. This wasn't easily noticeable in the tests until the one-task
limit got a ceiling on 300 ms to keep reasonable latency.
In future, this and rtable's data structures should be probably merged
but it isn't a good idea to do now. The used data structure is similar
to rtable -- an array of pointers to linked lists.
Feed is lockless, as with all tables.
Full export (receiving updates) is not supported yet but we don't have
any method how to use it anyway. Gonna implement it later.
There is no real need for storing bucket attributes locally and we may
save some memory by caching the attributes in one central place.
If this becomes a contention problem, we should reduce the lock load
of the central attribute cache.
Introducing a new omnipotent internal API to just pass route updates
from whatever point wherever we want.
From now on, all the exports should be processed by RT_WALK_EXPORTS
macro, and you can also issue a separate feed-only request to just get a
feed and finish.
The exporters can now also stop and the readers must expect that to
happen and recover. Main tables don't stop, though.
Move bfd_opts grammar inside BFD parser code to avoid dependences between
nest and BFD grammars, which breaks when BFD build is disabled.
Add dummy bfd_opts grammar rule, so protocols can use this nonterminal
even with BFD disabled.
Thanks to Yuri Honegger for the bugreport.
We have now better methods how to measure overall performance
and this obsolete protocol has basically rotten away. If anybody
needs its features, feel free to revive it in future.
To avoid needs for keeping local temporary references for attributes,
now one can use ea_lookup_tmp() to ensure that the attributes are
valid and stored until the task ends. After that, the attributes are
automatically unref'd and also deallocated if needed.
This commit makes the route chains in the tables atomic. This allows not
only standard exports but also feeds and bulk exports to be processed
without ever locking the table.
Design note: the overall data structures are quite brittle. We're using
RCU read-locks to keep track about readers, and we're indicating ongoing
work on the data structures by prepending a REF_OBSOLETE sentinel node
to make every reader go waiting.
All the operations are intended to stay inside nest/rt-table.c and it
may be even best to further refactor the code to hide the routing table
internal structure inside there. Nobody shall definitely write any
routines manipulating live routes in tables from outside.
If cork occurred after some incoming data had been already processed,
BGP incorrectly processed them again after uncorking because it forgot
to store the actual socket state.
Now storing the socket state (done at the end of bgp_rx()) and
therefore the bug is fixed.
In OSPFv3-IPv4 there is no requirement that link-local next hop announced
in Link-LSA must be in interface address range. Therefore, for interfaces
that do not have IPv4 address we can use some loopback IP address and
announce it as a next hop. Also we should accept such address.
BFD requires defined local IP, but for nexthop with onlink there might
not be such address. So we reject this combination of nexthop options.
This prevent crash where such combination of options is used.
Allow to explicitly configure the source IP address for RPKI-To-Router
sessions. Predictable source addresses are useful for minimizing the
holes to be poked in ACLs.
Changed from 'source address' to 'local address' by committer.
BGP route attributes have flags (Optional, Transitive) that are validated
on decode and set to valid value on export. But if such attribute is
modified by filter or set internally by BGP during import, then its flags
would be zero in local tables. That usually does not matter, as they are
not used locally and they were fixed on export, but invalid flags leaked
in BMP and MRT dumps.
Keep route attribute flags set to valid values even when set by filters
or modified by BGP.
Allow to define both nexthop and interface using iproute2-like syntax,
e.g.: route 10.0.0.0/16 via 10.1.0.1 dev "eth0";
Now we can avoid to use link-local scope hack (e.g. 10.1.0.1%eth0)
for cases where both nexthop and interface have to be defined.
Thanks to Marcin Saklak for the suggestion.
We can distinguish BGP sessions if at least one side uses a different IP
address. Extend olock mechanism to handle local IP as a part of key, with
optional wildcard, so BGP sessions could local IP in the olock and not
block themselves.
Increase max length of notification data in error logs from 16 to 128.
There is already enough space in the buffer.
Thanks to Marco d'Itri for the suggestion.
With very busy deployments, RPKI may kill cache connection too early.
Instead of that, we want it to keep loading if any data is waiting to
be read and the reason for delay is just our congestion.
Also, when we kill the session because of actually slow cache, we want
to reload from scratch as the data we have is unreliable and nobody
knows whether the state is still valid.