When rtable is pruned and network fib nodes are removed, we also need to
prune prefix trie. Unfortunately, rebuilding prefix trie takes long time
(got about 400 ms for 1M networks), so must not be atomic, we have to
rebuild a new trie while current one is still active. That may require
some considerable amount of temporary memory, so we do that only if
we expect significant trie size reduction.
Implement flowspec validation procedure as described in RFC 8955 sec. 6
and RFC 9117. The Validation procedure enforces that only routers in the
forwarding path for a network can originate flowspec rules for that
network.
The patch adds new mechanism for tracking inter-table dependencies, which
is necessary as the flowspec validation depends on IP routes, and flowspec
rules must be revalidated when best IP routes change.
The validation procedure is disabled by default and requires that
relevant IP table uses trie, as it uses interval queries for subnets.
When output of 'show route' command was generated, the net_format() was
called for each network prematurely, even if the result was not needed.
Fix the code to call net_format() only when needed. This makes queries
that process many networks but show only few (e.g. 'show route where ..',
or 'show route count') much faster (like 5x - 10x faster).
Attach a prefix trie to IP/VPN/ROA tables. Use it for net_route() and
net_roa_check(). This leads to 3-5x speedups for IPv4 and 5-10x
speedup for IPv6 of these calls.
TODO:
- Rebuild the trie during rt_prune_table()
- Better way to avoid trie_add_prefix() in net_get() for existing tables
- Make it configurable (?)
One of previous commits added error logging of invalid routes. This
also inadvertently caused error logging of route loops, which should
be ignored silently. Fix that.
Most error messages in attribute processing are in rx/decode step and
these use L_REMOTE log class. But there are few that are in tx/export
step and these should use L_ERR log class.
Use tx-specific macro (REJECT()) in tx/export code and rename field
err_withdraw to err_reject in struct bgp_export_state to ensure that
appropriate error reporting macros are called in proper contexts.
+ ubuntu:21.10 added into the pipeline,
- ubuntu:20.10 removed from the pipeline,
+ misc/docker/ubuntu-21.10-amd64/Dockerfile added,
- misc/docker/ubuntu-20.10-amd64/Dockerfile removed.
Add option 'netlink rx buffer' to specify netlink socket receive buffer
size. Uses SO_RCVBUFFORCE, so it can override rmem_max limit.
Thanks to Trisha Biswas and Michal for the original patches.
Add strict checking for netlink KRT dumps to avoid PMTU cache records
from FNHE table dump along with KRT.
Linux Kernel added FNHE table dump to the netlink API in patch:
8d3b68cd37.1561131177.git.sbrivio@redhat.com/
Therefore, since Linux 5.3 these route cache entries are dumped together
with regular routes during periodic KRT scans, which in some cases may be
huge amount of useless data. This can be avoided by using strict checking
for netlink dumps:
https://lore.kernel.org/netdev/20181008031644.15989-1-dsahern@kernel.org/
The patch mitigates the risk of receiving unknown and potentially large
number of FNHE records that would block BIRD I/O in each sync. There is a
known issue caused by the GRE tunnels on Linux that seems to be creating
one FNHE record for each destination IP address that is routed through
the tunnel, even when the PMTU equals to GRE interface MTU.
Thanks to Tomas Hlavacek for the original patch.
Kernel uses cloned routes to keep route cache entries, but reports them
together with regular routes. They were skipped implicitly as they
do not have rtm_protocol filled. Add explicit check for cloned flag
and skip such routes explicitly.
Also, improve debug logs of skipped routes.