When a kernel route changed, function krt_learn_scan() noticed that and
replaced the route in internal kernel FIB, but after that, function
krt_learn_prune() failed to propagate the new route to the nest, because
it confused the new route with the (removed) old best route and decided
that the best route did not changed.
Wow, the original code (and the bug) is almost 17 years old.
The patch adds support for channels, structures connecting protocols and
tables and handling most interactions between them. The documentation is
missing yet.
Explicit setting of AF_INET(6|) in IP socket creation. BFD set to listen
on v6, without setting the V6ONLY flag to catch both v4 and v6 traffic.
Squashing and minor changes by Ondrej Santiago Zajicek
Wanted netlink attributes are defined in a table, specifying
their size and neediness. Removing the long conditions that did the
validation before.
Also parsing IPv4 and IPv6 versions regardless on the IPV6 macro.
Since 2.6.19, the netlink API defines RTA_TABLE routing attribute to
allow 32-bit routing table IDs. Using this attribute to index routing
tables at Linux, instead of 8-bit rtm_table field.
New data types net_addr and variants (in lib/net.h) describing
network addresses (prefix/pxlen). Modifications of FIB structures
to handle these data types and changing everything to use these
data types instead of prefix/pxlen pairs where possible.
The commit is WiP, some protocols are not yet updated (BGP, Kernel),
and the code contains some temporary scaffolding.
Comments are welcome.
Unfortunately, some interfaces support multicast but do not have
this flag set, so we use it only as a positive hint.
Thanks to Clint Armstrong for noticing the problem.
Although RFC 3542 allows both cases, Theo de Raadt thinks
he knows better, and msg_controllen without last padding
fails on OpenBSD.
Thanks to Job Snijders for the bugreport.
I/O:
- BSD: specify src addr on IP sockets by IP_HDRINCL
- BSD: specify src addr on UDP sockets by IP_SENDSRCADDR
- Linux: specify src addr on IP/UDP sockets by IP_PKTINFO
- IPv6: specify src addr on IP/UDP sockets by IPV6_PKTINFO
- Alternative SKF_BIND flag for binding to IP address
- Allows IP/UDP sockets without tx_hook, on these
sockets a packet is discarded when TX queue is full
- Use consistently SOL_ for socket layer values.
OSPF:
- Packet src addr is always explicitly set
- Support for secondary addresses in BSD
- Dynamic RX/TX buffers
- Fixes some minor buffer overruns
- Interface option 'tx length'
- Names for vlink pseudoifaces (vlinkX)
- Vlinks use separate socket for TX
- Vlinks do not use fixed associated iface
- Fixes TTL for direct unicast packets
- Fixes DONTROUTE for OSPF sockets
- Use ifa->ifname instead of ifa->iface->name
Interfaces for OSPF and RIP could be configured to use (and request)
TTL 255 for traffic to direct neighbors.
Thanks to Simon Dickhoven for the original patch for RIPng.
Implements support for IPv6 traffic class, sets higher priority for OSPF
and RIP outgoing packets by default and allows to configure ToS/DS/TClass
IP header field and the local priority of outgoing packets.
Allows to send and receive multiple routes for one network by one BGP
session. Also contains necessary core changes to support this (routing
tables accepting several routes for one network from one protocol).
It needs some more cleanup before merging to the master branch.
Hostcache is a structure for monitoring changes in a routing table that
is used for routes with dynamic/recursive next hops. This is needed for
proper iBGP next hop handling.
In usual configuration, such export is already restricted
with the aid of the direct protocol but there are some
races that can circumvent it. This makes it harder to
break kernel device routes. Also adds an option to
disable this restriction.
- BSD kernel syncer is now self-conscious and can learn alien routes
- important bugfix in BSD kernel syncer (crash after protocol restart)
- many minor changes and bugfixes in kernel syncers and neighbor cache
- direct protocol does not generate host and link local routes
- min_scope check is removed, all routes have SCOPE_UNIVERSE by default
- also fixes some remaining compiler warnings
Under specific circumstances there might be two mixed-up
netlink sessions (one for scan, the other for route change
request). This patch separates netlink scans and requests
to two fds (and seq counters).
This should fix http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=428865
Bird sometimes reported 'bird: nl_parse_link: Malformed message received'.
The cause is that bird asynchronously received netlink packet from
wireless driver about some wireless event on its link layer. In that
case bird shouldn't complain.
Here is a patch fixing a bug that causes breakage of a local routing
table during shutdown of Bird. The problem was caused by shutdown
of 'device' protocol before shutdown of 'kernel' protocol. When
'device' protocol went down, the route (with local network prefix)
From different protocol (BGP or OSPF) became preferred and installed
to the kernel routing table. Such routes were broken (like
192.168.1.0/24 via 192.168.1.2). I think it is also the cause
of problem reported by Martin Kraus.
The patch disables updating of kernel routing table during shutdown of
Bird. I am not sure whether this is the best way to fix it, I would
prefer to forbid 'kernel' protocol to overwrite routes with
'proto kernel'.
The patch also fixes a problem that during shutdown sometimes routes
created by Bird remained in the kernel routing table.
Please try compiling your code with --enable-warnings to see them. (The
unused parameter warnings are usually bogus, the unused variable ones
are very useful, but gcc is unable to control them separately.)
address, not per interface (hence it's ifa->flags & IA_UNNUMBERED) and
should be set reliably. IF_MULTIACCESS should be fixed now, but it isn't
wise to rely on it on interfaces configured with /30 prefix.
multicast abilities depending on definedness of symbols and use hard-wired
system-dependent configuration defines instead.
Please test whereever you can.
but the core routines are there and seem to be working.
o lib/ipv6.[ch] written
o Lexical analyser recognizes IPv6 addresses and when in IPv6
mode, treats pure IPv4 addresses as router IDs.
o Router ID must be configured manually on IPv6 systems.
o Added SCOPE_ORGANIZATION for org-scoped IPv6 multicasts.
o Fixed few places where ipa_(hton|ntoh) was called as a function
returning converted address.
The changes are just too extensive for lazy me to list them
there, but see the comment at the top of sysdep/unix/krt.c.
The code got a bit more ifdeffy than I'd like, though.
Also fixed a bunch of FIXME's and added a couple of others. :)
addresses per interface (needed for example for IPv6 support).
Visible changes:
o struct iface now contains a list of all interface addresses (represented
by struct ifa), iface->addr points to the primary address (if any).
o Interface has IF_UP set iff it's up and it has a primary address.
o IF_UP is now independent on IF_IGNORED (i.e., you need to test IF_IGNORED
in the protocols; I've added this, but please check).
o The if_notify_change hook has been simplified (only one interface pointer
etc.).
o Introduced a ifa_notify_change hook. (For now, only the Direct protocol
does use it -- it's wise to just listen to device routes in all other
protocols.)
o Removed IF_CHANGE_FLAGS notifier flag (it was meaningless anyway).
o Updated all the code except netlink (I'll look at it tomorrow) to match
the new semantics (please look at your code to ensure I did it right).
Things to fix:
o Netlink.
o Make krt-iface interpret "eth0:1"-type aliases as secondary addresses.
o Now compatible with filtering.
o Learning of kernel routes supported only on CONFIG_SELF_CONSCIOUS
systems (on the others it's impossible to get it semantically correct).
o Learning now stores all of its routes in a separate fib and selects
the ones the kernel really uses for forwarding packets.
o Better treatment of CONFIG_AUTO_ROUTES ports.
o Lots of internal changes.
documented the remaining ones (sysdep/cf/README).
Available configurations:
o linux-20: Old Linux interface via /proc/net/route (selected by default
on pre-2.1 kernels).
o linux-21: Old Linux interface, but device routes handled by the
kernel (selected by default for 2.1 and newer kernels).
o linux-22: Linux with Netlink (I play with it a lot yet, so it isn't
a default).
o linux-ipv6: Prototype config for IPv6 on Linux. Not functional yet.
o Nothing is configured automatically. You _need_ to specify
the kernel syncer in config file in order to get it started.
o Syncing has been split to route syncer (protocol "Kernel") and
interface syncer (protocol "Device"), device routes are generated
by protocol "Direct" (now can exist in multiple instances, so that
it will be possible to feed different device routes to different
routing tables once multiple tables get supported).
See doc/bird.conf.example for a living example of these shiny features.
(via Netlink). Tweaked kernel synchronization rules a bit. Discovered
locking bug in kernel Netlink :-)
Future plans: Hunt all the bugs and solve all the FIXME's.
To build BIRD with Netlink support, just configure it with
./configure --with-sysconfig=linux-21
After it will be tested well enough, I'll probably make it a default
for 2.2 kernels (and rename it to linux-22 :)).
The new kernel syncer is cleanly split between generic UNIX module
and OS dependent submodules:
- krt.c (the generic part)
- krt-iface (low-level functions for interface handling)
- krt-scan (low-level functions for routing table scanning)
- krt-set (low-level functions for setting of kernel routes)
krt-set and krt-iface are common for all BSD-like Unices, krt-scan is heavily
system dependent (most Unices require /dev/kmem parsing, Linux uses /proc),
Netlink substitues all three modules.
We expect each UNIX port supports kernel routing table scanning, kernel
interface table scanning, kernel route manipulation and possibly also
asynchronous event notifications (new route, interface state change;
not implemented yet) and build the KRT protocol on the top of these
primitive operations.
o Interface syncing is now a part of krt and it can have configurable
parameters. Actually, the only one is scan rate now :)
o Kernel routing table syncing is now synchronized with interface
syncing (we need the most recent version of the interface list
to prevent lots of routes to non-existent destinations from
appearing). Instead of its own timer, we just check if it's
route scan time after each iface list scan.
o Syncing of device routes implemented.
o CONFIG_AUTO_ROUTES should control syncing of automatic device routes.
o Rewrote krt_remove_route() to really remove routes :)
o Better diagnostics.
o Fixed a couple of bugs.
the kernel routing table as opposed to modifying it which is approximately
the same on non-netlink systems, I've split the kernel routing table
routines to read and write parts. To be implemented later ;-)