Add basic VRF (virtual routing and forwarding) support. Protocols can be
associated with VRFs, such protocols will be restricted to interfaces
assigned to the VRF (as reported by Linux kernel) and will use sockets
bound to the VRF. E.g., different multihop BGP instances can use diffent
kernel routing tables to handle BGP TCP connections.
The VRF support is preliminary, currently there are several limitations:
- Recent Linux kernels (4.11) do not handle correctly sockets bound
to interaces that are part of VRF, so most protocols other than multihop
BGP do not work. This will be fixed by future kernel versions.
- Neighbor cache ignores VRFs. Breaks config with the same prefix on
local interfaces in different VRFs. Not much problem as single hop
protocols do not work anyways.
- Olock code ignores VRFs. Breaks config with multiple BGP peers with the
same IP address in different VRFs.
- Incoming BGP connections are not dispatched according to VRFs.
Breaks config with multiple BGP peers with the same IP address in
different VRFs. Perhaps we would need some kernel API to read VRF of
incoming connection? Or probably use multiple listening sockets in
int-new branch.
- We should handle master VRF interface up/down events and perhaps
disable associated protocols when VRF goes down. Or at least disable
associated interfaces.
- Also we should check if the master iface is really VRF iface and
not some other kind of master iface.
- BFD session request dispatch should be aware of VRFs.
- Perhaps kernel protocol should read default kernel table ID from VRF
iface so it is not necessary to configure it.
- Perhaps we should have per-VRF default table.
Starting from Linux 4.11, IPv6 ECMP routes are now notified using
RTA_MULTIPATH, like IPv4 ones. The patch adds support for RTA_MULTIPATH
parsing for IPv6 routes. This also enables to parse ECMP alien routes
correctly.
Thanks to Vincent Bernat for the original patch.
Add proper support for per-nexthop onlink flag in routes to handle next
hop addresses that are not covered by interface IP ranges. Supported by
kernel and static protocols.
Thanks to Vincent Bernat for the idea.
Anyway, Bird is now capable to insert both MPLS routes and MPLS encap
routes into kernel.
It was (among others) needed to define platform-specific AF_MPLS to 28
as this constant has been assigned in the linux kernel.
No support for BSD now, it may be added in the future.
Dropped struct mpnh and mpnh_*()
Now struct nexthop exists, nexthop_*(), and also included struct nexthop
into struct rta.
Also converted RTD_DEVICE and RTD_ROUTER to RTD_UNICAST. If it is needed
to distinguish between these two cases, RTD_DEVICE is equivalent to
IPA_ZERO(a->nh.gw), RTD_ROUTER is then IPA_NONZERO(a->nh.gw).
From now on, we also explicitely want C99 compatible compiler. We assume
that this 20-year norm should be known almost everywhere.
Add a new route attribute, krt_scope, to expose the Linux kernel route
scope. Constants from /etc/iproute2/rt_scopes (prefixed by "ips_") are
expected to be used with the attribute. Both import and export are
supported.
Also, the patch fixes device route export to the kernel, by setting link
scope automatically.
Kernel routes with different metrics do not clash with each other,
therefore using dedicated metric value is a reliable way to avoid
overwriting routes from other sources (e.g. kernel device routes).
Although kernel route metric could already be set as a route attribute by
filters, that is not consistent with the way how Linux kernel handles
route metric - not just a route attribute, but a part of a route key.
Linux represents IPv6 ECMP routes as a sequence of unipath routes with
the same prefix. We have to translate between our representation (one
route with multipath next hop) and the Linux representation in both
directions.
Proper learning of alien IPv6 ECMP routes still not supported.
Thanks to Mikhail Sennikovskii for the original patch.
Ignore tentative IPv6 addresses and wait until finish of Duplicate
Address Detection (We got notification when an address is no longer
tentative) to avoid problems when protocols try to use interfaces
with tentative link-local addresses.
Based on patch from Jan Moskyto Matejka
The netlink code assumes an order for the members of struct msghdr.
This breaks recvmsg and sendmsg with musl libc on mips64. Fix this by
using designated initializers instead.
Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be>
sysdep/linux/netlink.c:921:10: error: fields must have a constant size:
'variable length array in structure' extension will never be supported
char buf[128 + KRT_METRICS_MAX*8 + nh_bufsize(a->nexthops)];
^
1 error generated.
Also removed the lib-dir merging with sysdep. Updated #include's
accordingly.
Fixed make doc on recent Debian together with moving generated doc into
objdir.
Moved Makefile.in into root dir
Retired all.o and birdlib.a
Linking the final binaries directly from all the .o files.
When a kernel route changed, function krt_learn_scan() noticed that and
replaced the route in internal kernel FIB, but after that, function
krt_learn_prune() failed to propagate the new route to the nest, because
it confused the new route with the (removed) old best route and decided
that the best route did not changed.
Wow, the original code (and the bug) is almost 17 years old.
The patch adds support for channels, structures connecting protocols and
tables and handling most interactions between them. The documentation is
missing yet.
Explicit setting of AF_INET(6|) in IP socket creation. BFD set to listen
on v6, without setting the V6ONLY flag to catch both v4 and v6 traffic.
Squashing and minor changes by Ondrej Santiago Zajicek
Wanted netlink attributes are defined in a table, specifying
their size and neediness. Removing the long conditions that did the
validation before.
Also parsing IPv4 and IPv6 versions regardless on the IPV6 macro.
Since 2.6.19, the netlink API defines RTA_TABLE routing attribute to
allow 32-bit routing table IDs. Using this attribute to index routing
tables at Linux, instead of 8-bit rtm_table field.
New data types net_addr and variants (in lib/net.h) describing
network addresses (prefix/pxlen). Modifications of FIB structures
to handle these data types and changing everything to use these
data types instead of prefix/pxlen pairs where possible.
The commit is WiP, some protocols are not yet updated (BGP, Kernel),
and the code contains some temporary scaffolding.
Comments are welcome.
Unfortunately, some interfaces support multicast but do not have
this flag set, so we use it only as a positive hint.
Thanks to Clint Armstrong for noticing the problem.
Allows to send and receive multiple routes for one network by one BGP
session. Also contains necessary core changes to support this (routing
tables accepting several routes for one network from one protocol).
It needs some more cleanup before merging to the master branch.