Some [redacted] (yes, myself) had a really bad idea
to rename nest/route.h to nest/rt.h while refactoring
some data structures out of it.
This led to unnecessarily complex problems with
merging updates from v2. Reverting this change
to make my life a bit easier.
At least it needed only one find-sed command:
find -name '*.[chlY]' -type f -exec sed -i 's#nest/rt.h#nest/route.h#' '{}' +
The Kernel protocol, even with the option 'learn' enabled, ignores
direct routes created by the OS kernel (on Linux these are routes
with rtm_protocol == RTPROT_KERNEL).
Implement optional behavior where both OS kernel and third-party routes
are learned, it can be enabled by 'learn all' option.
Minor changes by committer.
Despite not having defined 'master interface', VRF interfaces should be
treated as being inside respective VRFs. They behave as a loopback for
respective VRFs. Treating the VRF interface as inside the VRF allows
e.g. OSPF to pick up IP addresses defined on the VRF interface.
For this, we also need to tell apart VRF interfaces and regular interfaces.
Extend Netlink code to parse interface type and mark VRF interfaces with
IF_VRF flag.
Based on the patch from Erin Shepherd, thanks!
It is necessary for IPv4 over IPv6 nexthop support on FreeBSD,
and RTA_VIA is not really related to MPLS.
It breaks build for some very old systems like Debian 8 and CentOS 7,
but we generally do not support older kernels than 4.14 LTS anyway.
Now sk_open() requires an explicit IO loop to open the socket in. Also
specific functions for socket RX pause / resume are added to allow for
BGP corking.
And last but not least, socket reloop is now synchronous to resolve
weird cases of the target loop stopping before actually picking up the
relooped socket. Now the caller must ensure that both loops are locked
while relooping, and this way all sockets always have their respective
loop.
Netlink support was added to FreeBSD recently. It is not as full-featured
as its Linux counterpart yet, however the added subset is enough to make
a routing daemon work. Specifically, it supports multiple tables,
multipath, nexthops and nexthops groups. No MPLS support yet.
The attached change adds 'bsd-netlink’ sysconf target, allowing to build
both netlink & rtsock versions on FreeBSD.
While onlink flag is meaningful only with explicit next hops, it can be
defined also on direct routes. Parse it also in this case to avoid
periodic updates of the same route.
Thanks to Marcin Saklak for the bugreport.
Seems like the previous patch was too optimistic, as route replace is
still broken even in Linux 4.19 LTS (but fixed in Linux 5.10 LTS) for:
ip route add 2001:db8::/32 via fe80::1 dev eth0
ip route replace 2001:db8::/32 dev eth0
It ends with two routes instead of just the second.
The issue is limited to direct and special type (e.g. unreachable)
routes, the patch restricts route replace for cases when the new route
is a regular route (with a next hop address).
When IPv6 ECMP support first appeared in Linux kernel, it used different
API than IPv4 ECMP. Individual next hops were updated and announced
separately, instead of using RTA_MULTIPATH as in IPv4. This has several
drawbacks and requires complex code to merge received notifications to
one multipath route.
When Linux came with IPv6 RTA_MULTIPATH support, the initial versions
were somewhat buggy, so we kept using the old API for updates (splitting
multipath routes to sequences of route updates), while accepting both
old-style routes and RTA_MULTIPATH routes in scans / notifications.
As IPv6 RTA_MULTIPATH support is here for a long time, this patch fully
switches Netlink to the IPv6 RTA_MULTIPATH API and removes old complex
code for handling individual next hop announces.
The required Linux version is at least 4.11 for reliable operation.
Thanks to Daniel Gröber for the original patch.
Remove compile-time sysdep option CONFIG_ALL_TABLES_AT_ONCE, replace it
with runtime ability to run either separate table scans or shared scan.
On Linux, use separate table scans by default when the netlink socket
option NETLINK_GET_STRICT_CHK is available, but retreat to shared scan
when it fails.
Running separate table scans has advantages where some routing tables are
managed independently, e.g. when multiple routing daemons are running on
the same machine, as kernel routing table modification performance is
significantly reduced when the table is modified while it is being
scanned.
Thanks Daniel Gröber for the original patch and Toke Høiland-Jørgensen
for suggestions.
The learnt routes are now pushed all into the connected table, not only
the best one. This shouldn't do any damage in well managed setups, yet
it should be noted that it is a change of behavior.
If anybody misses a feature which they implemented by misusing this
internal learn table, let us know, we'll consider implementing it in a
better way.
There were quite a lot of conflicts in flowspec validation code which
ultimately led to some code being a bit rewritten, not only adapted from
this or that branch, yet it is still in a limit of a merge.
For now, all route attributes are stored as eattrs in ea_list. This
should make route manipulation easier and it also allows for a layered
approach of route attributes where updates from filters will be stored
as an overlay over the previous version.
As there is either a nexthop or another destination specification
(or othing in case of ROAs and Flowspec), it may be merged together.
This code is somehow quirky and should be replaced in future by better
implementation of nexthop.
Also flowspec validation result has its own attribute now as it doesn't
have anything to do with route nexthop.
This doesn't do anything more than to put the whole structure inside
adata. The overall performance is certainly going downhill; we'll
optimize this later.
Anyway, this is one of the latest items inside rta and in several
commits we may drop rta completely and move to eattrs-only routes.
The route scope attribute was used for simple user route marking. As
there is a better tool for this (custom attributes), the old and limited
way can be dropped.
Changes in internal API:
* Every route attribute must be defined as struct ea_class somewhere.
* Registration of route attributes known at startup must be done by
ea_register_init() from protocol build functions.
* Every attribute has now its symbol registered in a global symbol table
defined as SYM_ATTRIBUTE
* All attribute ID's are dynamically allocated.
* Attribute value custom formatting hook is defined in the ea_class.
* Attribute names are the same for display and filters, always prefixed
by protocol name.
Also added some unit testing code for filters with route attributes.
This commit removes the EAF_TYPE_* namespace completely and also for
route attributes, filter-based types T_* are used. This simplifies
fetching and setting route attributes from filters.
Also, there is now union bval which serves as an universal value holder
instead of private unions held separately by eattr and filter code.
Before this change, fetch-update-write and bitmasking was hardcoded in
attribute access code cased by the attribute type. Several filter
instructions are used to do it instead.
As this is certainly going to be a little bit slower than before, the
switch block in attribute access code should be completely removed in
near future, helping with both performance and code cleanliness.
The user interface should have stayed intact.
Add option 'netlink rx buffer' to specify netlink socket receive buffer
size. Uses SO_RCVBUFFORCE, so it can override rmem_max limit.
Thanks to Trisha Biswas and Michal for the original patches.