Many protocols do almost the same when creating a rte_update request
before calling rte_update2(). This commit should simplify the protocol
side of the route-creation routine.
Counter exp_routes is increased during initial route feed after GR
recovery, so it has to start with zero, otherwise BIRD will end with
double value in exp_routes.
When a kernel route changed, function krt_learn_scan() noticed that and
replaced the route in internal kernel FIB, but after that, function
krt_learn_prune() failed to propagate the new route to the nest, because
it confused the new route with the (removed) old best route and decided
that the best route did not changed.
Wow, the original code (and the bug) is almost 17 years old.
The patch adds support for channels, structures connecting protocols and
tables and handling most interactions between them. The documentation is
missing yet.
Explicit setting of AF_INET(6|) in IP socket creation. BFD set to listen
on v6, without setting the V6ONLY flag to catch both v4 and v6 traffic.
Squashing and minor changes by Ondrej Santiago Zajicek
Symbol lookup by cf_find_symbol() not only did the lookup but also added
new void symbols allocated from cfg_mem linpool, which gets broken when
lookups are done outside of config parsing, which may lead to crashes
during reconfiguration.
The patch separates lookup-only cf_find_symbol() and config-modifying
cf_get_symbol(), while the later is called only during parsing. Also
new_config and cfg_mem global variables are NULLed outside of parsing.
New data types net_addr and variants (in lib/net.h) describing
network addresses (prefix/pxlen). Modifications of FIB structures
to handle these data types and changing everything to use these
data types instead of prefix/pxlen pairs where possible.
The commit is WiP, some protocols are not yet updated (BGP, Kernel),
and the code contains some temporary scaffolding.
Comments are welcome.
The new RIP implementation fixes plenty of old bugs and also adds support
for many new features: ECMP support, link state support, BFD support,
configurable split horizon and more. Most options are now per-interface.
In some circumstances during reconfiguration, routes propagated by pipes
to other tables may hang there even after the primary routes are removed.
There is already a workaround for this issue in the code which removes
these stale routes by flush process when source protocols are shut down.
This patch is a cleaner fix and allows to simplify the flush process
Now the order is:
Up -> iface, addr, neigh
Down -> neigh, addr, iface
It fixes the case when an iface appears, related static routes are
activated and exported to OSPF before the iface notification and
therefore forwarding addresses are not encoded in generated external
LSAs.
Implemented eval command can be used to evaluate expressions.
The patch also documents echo command and allows to use log classes
instead of integer as a mask for echo.
When route was propagated to another rtable through a pipe and then the
pipe was reconfigured softly in such a way that any subsequent route
updates are filtered, then the source protocol shutdown didn't clean up
the route in the second rtable which caused stale routes and potential
crashes.
Implements support for IPv6 traffic class, sets higher priority for OSPF
and RIP outgoing packets by default and allows to configure ToS/DS/TClass
IP header field and the local priority of outgoing packets.
Temporary dummy routes created by a kernel protocol during routing table
scan get mixed with real routes propagated from another kernel protocol
through a pipe.
The RAdv protocol could be configured to change its behavior based on
availability of routes, e.g., do not announce router lifetime when a
default route is not available.
Router ID could be automatically determined based of subset of
ifaces/addresses specified by 'router id from' option. The patch also
does some minor changes related to router ID reconfiguration.
Thanks to Alexander V. Chernikov for most of the work.
Several new configure command variants:
configure undo - undo last reconfiguration
configure timeout - configure with scheduled undo if not confirmed in timeout
configure confirm - confirm last configuration
configure check - just parse and validate config file
When 'import keep rejected' protocol option is activated, routes
rejected by the import filter are kept in the routing table, but they
are hidden and not propagated to other protocols. It is possible to
examine them using 'show route rejected'.
Allows to send and receive multiple routes for one network by one BGP
session. Also contains necessary core changes to support this (routing
tables accepting several routes for one network from one protocol).
It needs some more cleanup before merging to the master branch.
When a protocol went down, all its routes were flushed in one step, that
may block BIRD for too much time. The patch fixes that by limiting
maximum number of routes flushed in one step.
- ROA tables, which are used as a basic part for RPKI.
- Commands for examining and modifying ROA tables.
- Filter operators based on ROA tables consistent with RFC 6483.
The nest-protocol interaction is changed to better handle multitable
protocols. Multitable protocols now declare that by 'multitable' field,
which tells nest that a protocol handles things related to proto-rtable
interaction (table locking, announce hook adding, reconfiguration of
filters) itself.
Filters and stats are moved to announce hooks, a protocol could have
different filters and stats to different tables.
The patch is based on one from Alexander V. Chernikov, thanks.
Hostcache is a structure for monitoring changes in a routing table that
is used for routes with dynamic/recursive next hops. This is needed for
proper iBGP next hop handling.
In usual configuration, such export is already restricted
with the aid of the direct protocol but there are some
races that can circumvent it. This makes it harder to
break kernel device routes. Also adds an option to
disable this restriction.
When device protocol goes down, interfaces should be flushed
asynchronously (in the same way like routes from protocols are flushed),
when protocol goes to DOWN/HUNGRY.
This fixes the problem with static routes staying in kernel routing
table after BIRD shutdown.
- BSD kernel syncer is now self-conscious and can learn alien routes
- important bugfix in BSD kernel syncer (crash after protocol restart)
- many minor changes and bugfixes in kernel syncers and neighbor cache
- direct protocol does not generate host and link local routes
- min_scope check is removed, all routes have SCOPE_UNIVERSE by default
- also fixes some remaining compiler warnings
It seems that by adding one pipe-specific exception to route
announcement code and by adding one argument to rt_notify() callback i
could completely eliminate the need for the phantom protocol instance
and therefore make the code more straightforward. It will also fix some
minor bugs (like ignoring debug flag changes from the command line).
When uncofiguring the pipe and the peer table, the peer table was
unlocked when pipe protocol state changed to down/flushing and not to
down/hungry. This leads to the removal of the peer table before
the routes from the pipe were flushed.
The fix leads to adding some pipe-specific hacks to the nest,
but this seems inevitable.
If protocol announces a route, route is accepted by import filter to
routing table, and later it announces replacement of that route that is
rejected by import filter, old route remains in routing table.
ea_same() sometimes returns true for different route attributes,
which caused that hash table in BGP does not work correctly and
some routes were sent with different attributes.
Allows to add more interface patterns to one common 'options'
section like:
interface "eth3", "eth4" { options common to eth3 and eth4 };
Also removes undocumented and unnecessary ability to specify
more interface patterns with different 'options' sections:
interface "eth3" { options ... }, "eth4" { options ... };
Old AS path maching supposes thath AS number appears
only once in AS path, but that is not true. It also
contains some bugs related to AS path sets.
New code does not use any assumptions about semantic
structure of AS path. It is asymptotically slower than
the old code, but on real paths it is not significant.
It also allows '?' for matching one arbitrary AS number.
Cryptographic authentication in OSPF is defective by
design - there might be several packets independently
sent to the network (for example HELLO, LSUPD and LSACK)
where they might be reordered and that causes crypt.
sequence number error.
That can be workarounded by not incresing sequence number
too often. Now we update it only when last packet was sent
before at least one second. This can constitute a risk of
replay attacks, but RFC supposes something similar (like time
in seconds used as CSN).
Routes comming through pipe from primary to secondary table were
filtered by both EXPORT and IMPORT filters, but they should be
only filtered by EXPORT filters.
AS4 optional attribute errors were handled by session
drop (according to BGP RFC). This patch implements
error handling according to new BGP AS4 draft (*)
- ignoring invalid AS4 optional attributes.
(*) http://www.ietf.org/internet-drafts/draft-chen-rfc4893bis-02.txt
The core state machine was broken - it didn't free resources
in START -> DOWN transition and might freed resources after
UP -> STOP transition before protocol turned down. It leads
to deadlock on olock acquisition when lock was not freed
during previous stop.
The current behavior is that resources, allocated during
DOWN -> * transition, are freed in * -> DOWN transition,
and flushing (scheduled in UP -> *) just counteract
feeding (scheduled in * -> UP). Protocol fell down
when both flushing is done (if needed) and protocol
reports DOWN.
BTW, is thera a reason why neighbour cache item acquired
by protocol is not tracked by resource mechanism?
When protocol started, feeding was scheduled. If protocol
got down before feeding was executed, then function
responsible for connecting protocol to kernel routing
tables was called after the function responsible for
disconnecting, then resource pool of protocol was freed,
but freed linked list structures remains in the list.
values for MD5 password ID changed during reconfigure, Second
bug is that BIRD chooses password in first-fit manner, but RFC
says that it should use the one with the latest generate-from.
It also modifies the syntax for multiple passwords.
Now it is possible to just add more 'password' statements
to the interface section and it is not needed to use
'passwords' section. Old syntax can be used too.
- Old MED handling was completely different from behavior
specified in RFCs - for example they havn't been propagated
to neighboring areas.
- Update tie-breaking according to RFC 4271.
- Change default value for 'default bgp_med' configuration
option according to RFC 4271.
- metric is 3 byte long now
- summary lsa originating
- more OSPF areas possible
- virtual links
- better E1/E2 routes handling
- some bug fixes..
I have to do:
- md5 auth (last mandatory item from rfc2328)
- !!!!DEBUG!!!!! (mainly virtual link system has probably a lot of bugs)
- 2328 appendig E
neighbor->scope now contains proper address scope which is zero (SCOPE_HOST)
for local addresses, higher (SCOPE_LINK, ..., SCOPE_UNIVERSE) for remote ones.
o Use `expr' instead of `NUM' and `ipa' instead of `IPA',
so that defined symbols work everywhere.
o `define' now accepts both numbers and IP addresses.
o Renamed `ipa' in filters to `fipa'.
Pavel, please update filters to accept define'd symbols as well.
contains all attributes, not just the temporary ones. This avoids having
to merge the lists inside protocols or doing searches on both of them.
Also, do filtering of routes properly. (I'd like to avoid it, but it's
needed at least in the krt protocol.)
show the routing table as exported to the protocol given resp. as returned
from its import control hook.
To get handling of filtered extended attributes right (even in the old
`show route where <filter>' command), the get_route_info hook gets an
attribute list and all protocol specific rte attributes are contained
there as temporary ones. Updated RIP to do that.
Added ea_append() which joins two ea_list's.
Please try compiling your code with --enable-warnings to see them. (The
unused parameter warnings are usually bogus, the unused variable ones
are very useful, but gcc is unable to control them separately.)
Added real comparison of BGP routes (inspired by the Cisco one).
Default local preference and default MED are now settable.
Defined filter keywords for all BGP attributes we know.
of calling the protocols manually.
Implemented printing of dynamic attributes in `show route all'.
Each protocol can now register its own attribute class (protocol->attr_class,
set to EAP_xxx) and also a callback for naming and formatting of attributes.
The callback can return one of the following results:
GA_UNKNOWN Attribute not recognized.
GA_NAME Attribute name recognized and put to the buffer,
generic code should format the value.
GA_FULL Both attribute name and value put to the buffer.
Please update protocols generating dynamic attributes to provide
the attr_class and formatting hook.
in configuration files and commands for manipulating them.
Current debug message policy:
o D_STATES, D_ROUTES and D_FILTERS are handled in generic code.
o Other debug flags should be handled in the protocols and whenever
the flag is set, the corresponding messages should be printed
using calls to log(L_TRACE, ...), each message prefixed with
the name of the protocol instance. These messages should cover
the whole normal operation of the protocol and should be useful
for an administrator trying to understand what does the protocol
behave on his network or who is attempting to diagnose network
problems. If your messages don't fit to the categories I've defined,
feel free to add your own ones (by adding them to protocol.h
and on two places in nest/config.Y), but please try to keep the
categories as general as possible (i.e., not tied to your protocol).
o Internal debug messages not interesting even to an experienced
user should be printed by calling DBG() which is either void or
a call to debug() depending on setting of the LOCAL_DEBUG symbol
at the top of your source.
o Dump functions (proto->dump etc.) should call debug() to print
their messages.
o If you are doing any internal consistency checks, use ASSERT
or bug().
o Nobody shall ever call printf() or any other stdio functions.
Also please try to log any protocol errors you encounter and tag them
with the appropriate message category (usually L_REMOTE or L_AUTH). Always
carefully check contents of any message field you receive and verify all
IP addresses you work with (by calling ipa_classify() or by using the
neighbour cache if you want to check direct connectedness as well).
address, not per interface (hence it's ifa->flags & IA_UNNUMBERED) and
should be set reliably. IF_MULTIACCESS should be fixed now, but it isn't
wise to rely on it on interfaces configured with /30 prefix.
used for automatic generation of instance names.
protocol->name is the official name
protocol->template is the name template (usually "name%d"),
should be all lowercase.
Updated all protocols to define the templates, checked that their configuration
grammar includes proto_name which generates the name and interns it in the
symbol table.
with protocols wanting to use the same port on the same interface
during reconfiguration time.
How to use locks: In the if_notify hook, just order locks for the
interfaces you want to work with and do the real socket opening after the
lock hook function gets called. When you stop using the socket, close
it and rfree() the lock.
Please update your protocols to use the new locking mechanism.
of that EA in the same list and causes ea_find() to fail unless you add
EA_ALLOW_UNDEF to the second argument.
ea_sort (resp. ea_do_prune()) removes all undef'd attributes from the list.
I hope this works :)
To define a new command, just add a new rule to the gramar:
CF_CLI(COMMAND NAME, arguments, help-args, help-text) {
what-should-the-command-do
} ;
where <arguments> are appended to the RHS of the rule, <help-args> is the
argument list as shown in the help and <help-text> is description of the
command for the help.
<what-should-the-command-do> is a C code snippet to be executed. It should
not take too much time to execute. If you want to print out a lot of
information, you can schedule a routine to be called after the current
buffer is flushed by making cli->cont point to the routine (see the
TEST LONG command definition for an example); if the connection is closed
in the meantime, cli->cleanup gets called.
You can access `struct cli' belonging to the connection you're currently
servicing as this_cli, but only during parse time, not from routines scheduled
for deferred execution.
Functions to call inside command handlers:
cli_printf(cli, code, printf-args) -- print text to CLI connection,
<code> is message code as assigned in doc/reply_codes or a negative
one if it's a continuation line.
cli_msg(code, printf-args) -- the same for this_cli.
Use 'sock -x bird.ctl' for connecting to the CLI until a client is written.
we want to allow filter and similar complex constructs to be used in commands
and we should avoid code duplication), only with CLI_MARKER token prepended
before the whole input.
Defined macro CF_CLI(cmd, args, help) for defining CLI commands in .Y files.
The first argument specifies the command itself, the remaining two arguments
are copied to the help file (er, will be copied after the help file starts
to exist). This macro automatically creates a skeleton rule for the command,
you only need to append arguments as in:
CF_CLI(STEAL MONEY, <$>, [[Steal <$> US dollars or equivalent in any other currency]]): NUM {
cli_msg(0, "%d$ stolen", $3);
} ;
Also don't forget to reset lexer state between inputs.
but the core routines are there and seem to be working.
o lib/ipv6.[ch] written
o Lexical analyser recognizes IPv6 addresses and when in IPv6
mode, treats pure IPv4 addresses as router IDs.
o Router ID must be configured manually on IPv6 systems.
o Added SCOPE_ORGANIZATION for org-scoped IPv6 multicasts.
o Fixed few places where ipa_(hton|ntoh) was called as a function
returning converted address.
The changes are just too extensive for lazy me to list them
there, but see the comment at the top of sysdep/unix/krt.c.
The code got a bit more ifdeffy than I'd like, though.
Also fixed a bunch of FIXME's and added a couple of others. :)
o Make proto_config->table always point to the right
table even if it should be the default one.
o When shutting down, kill protocol in reverse order
of their priority.
o When stopping a protocol down, disconnect it from
routing tables immediately instead of waiting
for the delayed protocol flush event.
Also added a protocol instance counter (used by KRT code
in very magic ways).
o Parsing of interface patterns moved to generic code,
introduced this_ipatt which works similarly to this_iface.
o Interface patterns now support selection by both interface
names and primary IP addresses.
o Proto `direct' updated.
o RIP updated as well, it also seems the memory corruption
bug there is gone.
definitely gone. Both rte_update() and rte_discard() have an additional
argument telling which table should they modify.
Also, rte_update() no longer walks the whole protocol list -- each table
has a list of all protocols connected to this table and having the
rt_notify hook set. Each protocol can also freely decide (by calling
proto_add_announce_hook) to connect to any other table, but it will
be probably used only by the table-to-table protocol.
The default debugging dumps now include all routing tables and also
all their connections.
addresses per interface (needed for example for IPv6 support).
Visible changes:
o struct iface now contains a list of all interface addresses (represented
by struct ifa), iface->addr points to the primary address (if any).
o Interface has IF_UP set iff it's up and it has a primary address.
o IF_UP is now independent on IF_IGNORED (i.e., you need to test IF_IGNORED
in the protocols; I've added this, but please check).
o The if_notify_change hook has been simplified (only one interface pointer
etc.).
o Introduced a ifa_notify_change hook. (For now, only the Direct protocol
does use it -- it's wise to just listen to device routes in all other
protocols.)
o Removed IF_CHANGE_FLAGS notifier flag (it was meaningless anyway).
o Updated all the code except netlink (I'll look at it tomorrow) to match
the new semantics (please look at your code to ensure I did it right).
Things to fix:
o Netlink.
o Make krt-iface interpret "eth0:1"-type aliases as secondary addresses.
o Introduced rte_cow() which should be used for copying on write the
rte's in filters. Each rte now carries a flag saying whether it's
a real route (possessing table linkage and other insignia) or a local
copy. This function can be expected to be fast since its fast-path
is inlined.
o Introduced rte_update_pool which is a linear memory pool used for
all temporary data during rte_update. You should not reference it directly
-- instead use a pool pointer passed to all related functions.
o Split rte_update to three functions:
rte_update The front end: handles all checking, inbound
filtering and calls rte_recalculate() for the
final version of the route.
rte_recalculate Update the table according to already filtered route.
rte_announce Announce routing table changes to all protocols,
passing them through export filters and so on.
The interface has _not_ changed -- still call rte_update() and it will
do the rest for you automagically.
o Use new filtering semantics to be explained in a separate mail.
version:
EXPORT <filter-spec> for outbound routes (i.e., those announced
by BIRD to the rest of the world).
IMPORT <filter-spec> for inbound routes (i.e., those imported
by BIRD from the rest of the world).
where <filter-spec> is one of:
ALL pass all routes
NONE drop all routes
FILTER <name> use named filter
FILTER { <filter> } use explicitly defined filter
For all protocols, the default is IMPORT ALL, EXPORT NONE. This includes
the kernel protocol, so that you need to add EXPORT ALL to get the previous
configuration of kernel syncer (as usually, see doc/bird.conf.example for
a bird.conf example :)).
o Now compatible with filtering.
o Learning of kernel routes supported only on CONFIG_SELF_CONSCIOUS
systems (on the others it's impossible to get it semantically correct).
o Learning now stores all of its routes in a separate fib and selects
the ones the kernel really uses for forwarding packets.
o Better treatment of CONFIG_AUTO_ROUTES ports.
o Lots of internal changes.
whitespace/semicolon rules for whole config file:
o All non-zero amounts of whitespace are equivalent to single space
(aka `all the whitespace has been born equal' ;-)).
o Comments count as whitespace.
o Whitespace has no syntactic signifance (it can only separate lexical
elements).
o Consequence: line ends are no longer treated as `;'s.
o Every declaration must be terminated by an explicit `;' unless
or by a group enclosed in `{' and `}'.
i.e. struct proto now contains field 'table' pointing to routing table
the protocol is attached to. Use this instead of &master_table.
Modified all protocols except the kernel syncer to use this field.
o Nothing is configured automatically. You _need_ to specify
the kernel syncer in config file in order to get it started.
o Syncing has been split to route syncer (protocol "Kernel") and
interface syncer (protocol "Device"), device routes are generated
by protocol "Direct" (now can exist in multiple instances, so that
it will be possible to feed different device routes to different
routing tables once multiple tables get supported).
See doc/bird.conf.example for a living example of these shiny features.
(via Netlink). Tweaked kernel synchronization rules a bit. Discovered
locking bug in kernel Netlink :-)
Future plans: Hunt all the bugs and solve all the FIXME's.
o Introduced if_find_by_index()
o Recognizing two types of interface updates: full update (starting with
if_start_update(), ending with if_end_update(), guaranteed to see
all existing interfaces) and a partial update (only if_update(),
usually due to asynchronous interface notifications).
o Introduced IF_LINK_UP flag corresponding to real link state.
o Allowed addressless interfaces.
o IF_UP is now automatically calculated and set iff the interface
is administratively up, has link up and has an IP address assigned.
It may be IF_IGNORED, though (as in case of the loopback).
o Any changes which include up/down transition are considered small
enough to not provoke artificial upping and downing of the interface.
o When an interface disappears (i.e., it wasn't seen in the last scan),
we announce this change only once.
o IF_LOOPBACK implies IF_IGNORE.
nodes having no routes attached. Such cleanup must be done from event handler
since most functions manipulating the routing tables expect network entries
won't disappear from under their hands and it's also probably faster when
done asynchronously.
This is implemented in a way similar to lib/slists.h, but it took some
more effort to make rehashing not disturb the readers. We do it by just
taking _highest_ k bits of ipa_hash as our hash value and sorting each
box by whole ipa_hash().
Consult FIB_ITERATE_* macros in nest/route.h.
Implemented fib_check() debugging function and also rewrote the rehashing
algorithm to use better thresholds and not to waste time by rehashing
forth and back.
o rte can now contain a pointer to both cached and uncached rta. Protocols
which don't need their own attribute caching can now just fill-in a rta,
link it to rte without any calls to attribute cache and call rte_update()
which will replace rte->attrs by a cached copy.
o In order to support this, one of previously pad bytes in struct rta
now holds new attribute flags (RTAF_CACHED). If you call rte_update()
with uncached rta, you _must_ clear these flags. In other cases rta_lookup()
sets it appropriately.
o Added rte_free() which is useful when you construct a rte and then the
circumstances change and you decide not to use it for an update. (Needed
for temporary rte's in kernel syncer...)
- cfg_strcpy() -> cfg_strdup()
- mempool -> linpool, mp_* -> lp_* [to avoid confusion with memblock, mb_*]
Anyway, it might be better to stop ranting about names and do some *real* work.
intended to serve as an example of interface pattern list use. As a side
effect, you can disable generating of device routes by disabling
this protocol.
o iface_patt_match(list, iface) -- match interface against list
o iface_patts_equal(a, b, c) -- compare whether two pattern lists are
equal or not. c(x,y) is called for comparison of protocol-dependent
data.
regular interface addresses" rule).
Protocols should NOT rely on router_id existence -- when router ID is not
available, the router_id variable is set to zero and protocols requiring
valid router ID should just refuse to start, reporting such error to the log.
protocol callbacks for route insertion and deletion from the central table.
RIP should maintain its own per-protocol queue of existing routes, scan it
periodically and call rte_discard() for routes that have timed out.
loop detection. This is needed since both RIP and OSPF handle multiple
neighbors and they need to redistribute routes learned from each neighbor
to the remaining ones.
protocols and don't send route/interface updates to them and when they come up,
we resend the whole route/interface tables privately.
Removed the "scan interface list after protocol start" work-around.
interface since it makes much trouble everywhere. Instead, we understand
secondary addresses as subinterfaces.
- In case interface addresses or basic flags change, we simply convert it
to a down/up sequence.
- Implemented the universal neighbour cache. (Just forget what did previous
includes say of neighbour caching, this one is brand new.)