Add a new route attribute, krt_scope, to expose the Linux kernel route
scope. Constants from /etc/iproute2/rt_scopes (prefixed by "ips_") are
expected to be used with the attribute. Both import and export are
supported.
Also, the patch fixes device route export to the kernel, by setting link
scope automatically.
Kernel protocol calls rt_export_merged(), which used @rte_update_pool for
temporary allocations, supposing it is called from other functions from
rt-table.c that handles locking and flushing of the linpool. Therefore,
linpool was not flushed properly and memory leaked.
Add linpool argument to rt_export_merged() and use @krt_filter_lp when
called from kernel protocol.
Thanks to Justin Cattle and Alexander Frolkin for the bugreport.
(Commit squashed and updated by Ondrej Zajicek)
Kernel routes with different metrics do not clash with each other,
therefore using dedicated metric value is a reliable way to avoid
overwriting routes from other sources (e.g. kernel device routes).
Although kernel route metric could already be set as a route attribute by
filters, that is not consistent with the way how Linux kernel handles
route metric - not just a route attribute, but a part of a route key.
Linux represents IPv6 ECMP routes as a sequence of unipath routes with
the same prefix. We have to translate between our representation (one
route with multipath next hop) and the Linux representation in both
directions.
Proper learning of alien IPv6 ECMP routes still not supported.
Thanks to Mikhail Sennikovskii for the original patch.
Ignore tentative IPv6 addresses and wait until finish of Duplicate
Address Detection (We got notification when an address is no longer
tentative) to avoid problems when protocols try to use interfaces
with tentative link-local addresses.
Based on patch from Jan Moskyto Matejka
The netlink code assumes an order for the members of struct msghdr.
This breaks recvmsg and sendmsg with musl libc on mips64. Fix this by
using designated initializers instead.
Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be>
sysdep/linux/netlink.c:921:10: error: fields must have a constant size:
'variable length array in structure' extension will never be supported
char buf[128 + KRT_METRICS_MAX*8 + nh_bufsize(a->nexthops)];
^
1 error generated.
Also removed the lib-dir merging with sysdep. Updated #include's
accordingly.
Fixed make doc on recent Debian together with moving generated doc into
objdir.
Moved Makefile.in into root dir
Retired all.o and birdlib.a
Linking the final binaries directly from all the .o files.
==00:00:00:02.831 2468== Syscall param socketcall.setsockopt(optval) points to uninitialised byte(s)
==00:00:00:02.831 2468== at 0x513BDEA: setsockopt (in /usr/lib/libc-2.23.so)
==00:00:00:02.831 2468== by 0x45C7AF: sk_setup (io.c:1216)
==00:00:00:02.831 2468== by 0x45CDFF: sk_open (io.c:1417)
==00:00:00:02.831 2468== by 0x44B562: rip_open_socket (packets.c:740)
==00:00:00:02.831 2468== by 0x4481A7: rip_iface_locked (rip.c:616)
==00:00:00:02.831 2468== by 0x4133E4: olock_run_event (locks.c:177)
==00:00:00:02.831 2468== by 0x45A6DE: ev_run (event.c:85)
==00:00:00:02.831 2468== by 0x45A7AD: ev_run_list (event.c:142)
==00:00:00:02.831 2468== by 0x45E0FC: io_loop (io.c:2066)
==00:00:00:02.831 2468== by 0x463B56: main (main.c:845)
==00:00:00:02.831 2468== Address 0xffefffd24 is on thread 1's stack
==00:00:00:02.831 2468== in frame #1, created by sk_setup (io.c:1188)
==00:00:00:02.831 2468== Uninitialised value was created by a stack allocation
==00:00:00:02.831 2468== at 0x45C6BB: sk_setup (io.c:1188)
This patch implements the IPv6 subset of the Babel routing protocol.
Based on the patch from Toke Hoiland-Jorgensen, with some heavy
modifications and bugfixes.
Thanks to Toke Hoiland-Jorgensen for the original patch.
Add code for manipulation with TCP-MD5 keys in the IPsec SA/SP database
at FreeBSD systems. Now, BGP MD5 authentication (RFC 2385) keys are
handled automatically on both Linux and FreeBSD.
Based on patches from Pavel Tvrdik.
Many protocols do almost the same when creating a rte_update request
before calling rte_update2(). This commit should simplify the protocol
side of the route-creation routine.
In BIRD, RX has lower priority than TX with the exception of RX from
control socket. The patch replaces heuristic based on socket type with
explicit mark and uses it for both control socket and BGP session waiting
to be established.
This should avoid an issue when during heavy load, outgoing connection
could connect (TX event), send open, but then failed to receive OPEN /
establish in time, not sending notifications between and therefore
got hold timer expired error from the neighbor immediately after it
finally established the connection.
When a kernel route changed, function krt_learn_scan() noticed that and
replaced the route in internal kernel FIB, but after that, function
krt_learn_prune() failed to propagate the new route to the nest, because
it confused the new route with the (removed) old best route and decided
that the best route did not changed.
Wow, the original code (and the bug) is almost 17 years old.
The patch adds support for channels, structures connecting protocols and
tables and handling most interactions between them. The documentation is
missing yet.
Explicit setting of AF_INET(6|) in IP socket creation. BFD set to listen
on v6, without setting the V6ONLY flag to catch both v4 and v6 traffic.
Squashing and minor changes by Ondrej Santiago Zajicek
Wanted netlink attributes are defined in a table, specifying
their size and neediness. Removing the long conditions that did the
validation before.
Also parsing IPv4 and IPv6 versions regardless on the IPV6 macro.
Since 2.6.19, the netlink API defines RTA_TABLE routing attribute to
allow 32-bit routing table IDs. Using this attribute to index routing
tables at Linux, instead of 8-bit rtm_table field.
Symbol lookup by cf_find_symbol() not only did the lookup but also added
new void symbols allocated from cfg_mem linpool, which gets broken when
lookups are done outside of config parsing, which may lead to crashes
during reconfiguration.
The patch separates lookup-only cf_find_symbol() and config-modifying
cf_get_symbol(), while the later is called only during parsing. Also
new_config and cfg_mem global variables are NULLed outside of parsing.
New data types net_addr and variants (in lib/net.h) describing
network addresses (prefix/pxlen). Modifications of FIB structures
to handle these data types and changing everything to use these
data types instead of prefix/pxlen pairs where possible.
The commit is WiP, some protocols are not yet updated (BGP, Kernel),
and the code contains some temporary scaffolding.
Comments are welcome.
If the number of sockets is too much for select(), we should at least
handle it with proper error messages and reject new sockets instead of
breaking the event loop.
Thanks to Alexander V. Chernikov for the patch.
When a new route was imported from kernel and chosen as preferred, then
the old best route was propagated as a withdraw to the kernel protocol.
Under some circumstances such withdraw propagated to the BSD kernel could
remove the new alien route and thus reverting the import.
Unfortunately, some interfaces support multicast but do not have
this flag set, so we use it only as a positive hint.
Thanks to Clint Armstrong for noticing the problem.
When an interface goes down, (Linux) kernel removes routes pointing to
that ifacem but does not send withdraws for them. We rescan the
kernel table to ensure synchronization.
Thanks to Alexander Demenshin for the bugreport.
Although RFC 3542 allows both cases, Theo de Raadt thinks
he knows better, and msg_controllen without last padding
fails on OpenBSD.
Thanks to Job Snijders for the bugreport.
I/O:
- BSD: specify src addr on IP sockets by IP_HDRINCL
- BSD: specify src addr on UDP sockets by IP_SENDSRCADDR
- Linux: specify src addr on IP/UDP sockets by IP_PKTINFO
- IPv6: specify src addr on IP/UDP sockets by IPV6_PKTINFO
- Alternative SKF_BIND flag for binding to IP address
- Allows IP/UDP sockets without tx_hook, on these
sockets a packet is discarded when TX queue is full
- Use consistently SOL_ for socket layer values.
OSPF:
- Packet src addr is always explicitly set
- Support for secondary addresses in BSD
- Dynamic RX/TX buffers
- Fixes some minor buffer overruns
- Interface option 'tx length'
- Names for vlink pseudoifaces (vlinkX)
- Vlinks use separate socket for TX
- Vlinks do not use fixed associated iface
- Fixes TTL for direct unicast packets
- Fixes DONTROUTE for OSPF sockets
- Use ifa->ifname instead of ifa->iface->name
RIP sockets for multiple ifaces collided, because we cannot bind to
a specific iface on BSD. Workarounded by SO_REUSEPORT.
Thanks to Eugene M. Zheganin for the bugreport.
The bug caused that krt_prefsrc attribute was not processed when a route
received from a kernel protocol was exported to another kernel protocol.
Thanks to Sergey Popovich for a bugreport.
Implemented eval command can be used to evaluate expressions.
The patch also documents echo command and allows to use log classes
instead of integer as a mask for echo.
Interfaces for OSPF and RIP could be configured to use (and request)
TTL 255 for traffic to direct neighbors.
Thanks to Simon Dickhoven for the original patch for RIPng.
Implements support for IPv6 traffic class, sets higher priority for OSPF
and RIP outgoing packets by default and allows to configure ToS/DS/TClass
IP header field and the local priority of outgoing packets.
Several new configure command variants:
configure undo - undo last reconfiguration
configure timeout - configure with scheduled undo if not confirmed in timeout
configure confirm - confirm last configuration
configure check - just parse and validate config file
When 'import keep rejected' protocol option is activated, routes
rejected by the import filter are kept in the routing table, but they
are hidden and not propagated to other protocols. It is possible to
examine them using 'show route rejected'.
Allows to send and receive multiple routes for one network by one BGP
session. Also contains necessary core changes to support this (routing
tables accepting several routes for one network from one protocol).
It needs some more cleanup before merging to the master branch.
- ROA tables, which are used as a basic part for RPKI.
- Commands for examining and modifying ROA tables.
- Filter operators based on ROA tables consistent with RFC 6483.
Hostcache is a structure for monitoring changes in a routing table that
is used for routes with dynamic/recursive next hops. This is needed for
proper iBGP next hop handling.
Which was, actually, a bug in timers - on older kernel, monotonic timer
is missing and the other implementation started with now == 0, which
collides with usage 0 as a special value in timer->expires field.
In usual configuration, such export is already restricted
with the aid of the direct protocol but there are some
races that can circumvent it. This makes it harder to
break kernel device routes. Also adds an option to
disable this restriction.
- Adds check to deny config file with no specified protocol to prevent
loading of empty config file.
- Moves CLI init before config parse to receive immediate error message
when cannot open control socket.
- Fixes socket name path check and other error handling in CLI init.
When device protocol goes down, interfaces should be flushed
asynchronously (in the same way like routes from protocols are flushed),
when protocol goes to DOWN/HUNGRY.
This fixes the problem with static routes staying in kernel routing
table after BIRD shutdown.
- BSD kernel syncer is now self-conscious and can learn alien routes
- important bugfix in BSD kernel syncer (crash after protocol restart)
- many minor changes and bugfixes in kernel syncers and neighbor cache
- direct protocol does not generate host and link local routes
- min_scope check is removed, all routes have SCOPE_UNIVERSE by default
- also fixes some remaining compiler warnings
It seems that by adding one pipe-specific exception to route
announcement code and by adding one argument to rt_notify() callback i
could completely eliminate the need for the phantom protocol instance
and therefore make the code more straightforward. It will also fix some
minor bugs (like ignoring debug flag changes from the command line).
There is no reak callback scheduler and previous behavior causes
bad things during hard congestion (like BGP hold timeouts).
Smart callback scheduler is still missing, but main loop was
changed such that it first processes all tx callbacks (which
are fast enough) (but max 4* per socket) + rx callbacks for CLI,
and in the second phase it processes one rx callback per
socket up to four sockets (as rx callback can be slow when
there are too many protocols, because route redistribution
is done synchronously inside rx callback). If there is event
callback ready, second phase is skipped in 90% of iterations
(to speed up CLI during congestion).
This also fixes bug that timer->recurrent was not cleared
in tm_new() and unexpected recurrence of startup timer
in BGP confused state machine and caused crash.
If other side of a socket is sending data faster than
BIRD is processing, BIRD does not schedule any other
callbacks (events, timers, rx/tx callbacks).
Under specific circumstances there might be two mixed-up
netlink sessions (one for scan, the other for route change
request). This patch separates netlink scans and requests
to two fds (and seq counters).
This should fix http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=428865
KRF_INSTALLED flag was not cleared during reconfiguration
that lead to not removing routes during reconfigure when
export rules changed.
We also should not try to remove routes we didi not installed,
on Linux this leads to warnings (as kernel checks route source
field and do not allow to remove non-bird routes) but we should
not rely on it.
Bird sometimes reported 'bird: nl_parse_link: Malformed message received'.
The cause is that bird asynchronously received netlink packet from
wireless driver about some wireless event on its link layer. In that
case bird shouldn't complain.
Here is a patch fixing a bug that causes breakage of a local routing
table during shutdown of Bird. The problem was caused by shutdown
of 'device' protocol before shutdown of 'kernel' protocol. When
'device' protocol went down, the route (with local network prefix)
From different protocol (BGP or OSPF) became preferred and installed
to the kernel routing table. Such routes were broken (like
192.168.1.0/24 via 192.168.1.2). I think it is also the cause
of problem reported by Martin Kraus.
The patch disables updating of kernel routing table during shutdown of
Bird. I am not sure whether this is the best way to fix it, I would
prefer to forbid 'kernel' protocol to overwrite routes with
'proto kernel'.
The patch also fixes a problem that during shutdown sometimes routes
created by Bird remained in the kernel routing table.
Also, removed the `if (s)' test, because I believe that as the whole
socket interface doesn't accent NULL pointers, sk_reallocate() shouldn't
be the only exception.
you can delete the socket from anywhere in the hooks and nothing should break.
Also, the receive/transmit buffers are now regular xmalloc()'ed buffers,
not separate resources which would need shuffling around between pools.
sk_close() is gone, use rfree() instead.
for even only medium sized route table output. Fix a strange garbled
output problem in the client. The latter seems to be caused by some
library doing tcflush while there is still command output pending. So
the best fix here is to do fflush and then tcdrain. Note that this
problem occurs only under certain load situations and is not too easy to
reproduce.
(by Andreas)
C includes as they contain substitutions specific to make.
Worked around by creating sysconf/paths.h which is created from
the Makefile instead of by the configure script.
Please try compiling your code with --enable-warnings to see them. (The
unused parameter warnings are usually bogus, the unused variable ones
are very useful, but gcc is unable to control them separately.)
of calling the protocols manually.
Implemented printing of dynamic attributes in `show route all'.
Each protocol can now register its own attribute class (protocol->attr_class,
set to EAP_xxx) and also a callback for naming and formatting of attributes.
The callback can return one of the following results:
GA_UNKNOWN Attribute not recognized.
GA_NAME Attribute name recognized and put to the buffer,
generic code should format the value.
GA_FULL Both attribute name and value put to the buffer.
Please update protocols generating dynamic attributes to provide
the attr_class and formatting hook.
address, not per interface (hence it's ifa->flags & IA_UNNUMBERED) and
should be set reliably. IF_MULTIACCESS should be fixed now, but it isn't
wise to rely on it on interfaces configured with /30 prefix.
(the current version UNIX-specific) anyway, so it's useless to try splitting it
to sysdep and generic part. Instead of this, configure script decides (based on
system type and user's wish) what (if any) client should be built and what
autoconfiguration it requires. Also, the client provides its own die/bug/...
functions.
used for automatic generation of instance names.
protocol->name is the official name
protocol->template is the name template (usually "name%d"),
should be all lowercase.
Updated all protocols to define the templates, checked that their configuration
grammar includes proto_name which generates the name and interns it in the
symbol table.
multicast abilities depending on definedness of symbols and use hard-wired
system-dependent configuration defines instead.
Please test whereever you can.
with protocols wanting to use the same port on the same interface
during reconfiguration time.
How to use locks: In the if_notify hook, just order locks for the
interfaces you want to work with and do the real socket opening after the
lock hook function gets called. When you stop using the socket, close
it and rfree() the lock.
Please update your protocols to use the new locking mechanism.
but the core routines are there and seem to be working.
o lib/ipv6.[ch] written
o Lexical analyser recognizes IPv6 addresses and when in IPv6
mode, treats pure IPv4 addresses as router IDs.
o Router ID must be configured manually on IPv6 systems.
o Added SCOPE_ORGANIZATION for org-scoped IPv6 multicasts.
o Fixed few places where ipa_(hton|ntoh) was called as a function
returning converted address.
The changes are just too extensive for lazy me to list them
there, but see the comment at the top of sysdep/unix/krt.c.
The code got a bit more ifdeffy than I'd like, though.
Also fixed a bunch of FIXME's and added a couple of others. :)
addresses per interface (needed for example for IPv6 support).
Visible changes:
o struct iface now contains a list of all interface addresses (represented
by struct ifa), iface->addr points to the primary address (if any).
o Interface has IF_UP set iff it's up and it has a primary address.
o IF_UP is now independent on IF_IGNORED (i.e., you need to test IF_IGNORED
in the protocols; I've added this, but please check).
o The if_notify_change hook has been simplified (only one interface pointer
etc.).
o Introduced a ifa_notify_change hook. (For now, only the Direct protocol
does use it -- it's wise to just listen to device routes in all other
protocols.)
o Removed IF_CHANGE_FLAGS notifier flag (it was meaningless anyway).
o Updated all the code except netlink (I'll look at it tomorrow) to match
the new semantics (please look at your code to ensure I did it right).
Things to fix:
o Netlink.
o Make krt-iface interpret "eth0:1"-type aliases as secondary addresses.