This is a major change of how the filters are interpreted. If everything
works how it should, it should not affect you unless you are hacking the
filters themselves.
Anyway, this change should make a huge improvement in the filter performance
as previous benchmarks showed that our major problem lies in the
recursion itself.
There are also some changes in nest and protocols, related mostly to
spreading const declarations throughout the whole BIRD and also to
refactored dynamic attribute definitions. The need of these came up
during the whole work and it is too difficult to split out these
not-so-related changes.
One of previous workarounds for phantom route avoidance breaks export
counters by expanding sending of spurious withdraws, which are send when
we are not sure whether we have advertised that routes in the past.
If not, then export counter is decreased, but it was not increased
before, so it overflows under zero.
The patch fixes that by sendung spurious withdraws, but not counting them
on export counter. That may lead to error in the other direction, but that
happens only as a race condition (i.e., in normal operation filters
return proper values about old route export state).
The earlier fix loosen conditions for not running filters on old
route when deciding about route propagation to a protocol to avoid
issues with ghost routes in some race conditions.
Unfortunately, the fix also caused back-propagation of withdraws. For
regular updates, back-propagation is prevented in import_control hooks,
but these are not called on withdraws. For them, import_control hooks
are called on old routes instead, changing (old, NULL) notification
to (NULL, NULL), which is ignored. By not calling export processing
in some cases, the withdraw is not ignored and is back-propagated.
This patch fixes that by contract conditions so the earlier fix is not
applied to back-propagated updates.
This protocol is highly experimental and nobody should use it in
production. Anyway it may help you getting some insight into what eats
so much time in filter processing.
The patch d506263d... blocked adding channel during reconfiguration,
that broke protocols which use the same functiona also during init.
This patch fixes that.
The patch implements optional internal import table to a channel and
hooks it to BGP so it can be used as Adj-RIB-In. When enabled, all
received (pre-filtered) routes are stored there and import filters can
be re-evaluated without explicit route refresh. An import table can be
examined using e.g. 'show route import table bgp1.ipv4'.
When a new channel is found during reconfiguration, do force restart
of the protocol, like with any other un-reconfigurable change.
The old behavior was that the new channel was added but remained in down
state, even if the protocol was up, so a manual protocol restart was
often necessary.
In the future this should be improved such that a reconfigurable
channel addition (e.g. direct) is accepted and channel is started,
while an un-reconfigurable addition forces protocol restart.
For local route marking purposes, local custom route attributes may be
defined. These attributes are seamlessly stripped after export filter to
every real protocol like Kernel, BGP or OSPF, they however pass through
pipes. We currently allow at most 256 custom attributes.
This should be much faster than currently used bgp communities
for marking routes.
Once upon a time, far far away, there were the old Bird developers
discussing what direction of route flow shall be called import and
export. They decided to say "import to protocol" and "export to table"
when speaking about a protocol. When speaking about a table, they
spoke about "importing to table" and "exporting to protocol".
The latter terminology was adopted in configuration, then also the
bird CLI in commit ea2ae6dd0 started to use it (in year 2009). Now
it's 2018 and the terminology is the latter. Import is from protocol to
table, export is from table to protocol. Anyway, there was still an
import_control hook which executed right before route export.
One thing is funny. There are two commits in April 1999 with just two
minutes between them. The older announces the final settlement
on config terminology, the newer uses the other definition. Let's see
their commit messages as the git-log tool shows them (the newer first):
commit 9e0e485e50
Author: Martin Mares <mj@ucw.cz>
Date: Mon Apr 5 20:17:59 1999 +0000
Added some new protocol hooks (look at the comments for better explanation):
make_tmp_attrs Convert inline attributes to ea_list
store_tmp_attrs Convert ea_list to inline attributes
import_control Pre-import decisions
commit 5056c559c4
Author: Martin Mares <mj@ucw.cz>
Date: Mon Apr 5 20:15:31 1999 +0000
Changed syntax of attaching filters to protocols to hopefully the final
version:
EXPORT <filter-spec> for outbound routes (i.e., those announced
by BIRD to the rest of the world).
IMPORT <filter-spec> for inbound routes (i.e., those imported
by BIRD from the rest of the world).
where <filter-spec> is one of:
ALL pass all routes
NONE drop all routes
FILTER <name> use named filter
FILTER { <filter> } use explicitly defined filter
For all protocols, the default is IMPORT ALL, EXPORT NONE. This includes
the kernel protocol, so that you need to add EXPORT ALL to get the previous
configuration of kernel syncer (as usually, see doc/bird.conf.example for
a bird.conf example :)).
Let's say RIP to this almost 19-years-old inconsistency. For now, if you
import a route, it is always from protocol to table. If you export a
route, it is always from table to protocol.
And they lived happily ever after.
Modify protocols to use preferred address change notification instead on
depending on hard-reset of interfaces in that case, and remove hard-reset
in that case. This avoids issue when e.g. IPv6 protocol restarts
interface when IPv4 preferred address changed (as hard-reset is
unavoidable and common for whole iface).
The patch also fixes a bug when removing last address does not send
preferred address change notification.
The new MRT protocol is responsible for periodic RIB table dumps in the
MRT format (RFC 6396). Also the existing code for BGP4MP MRT dumps is
refactored and splitted between BGP to MRT protocols, will be more
integrated into MRT in the future.
Example:
protocol mrt {
table "*";
filename "%N_%F_%T.mrt";
period 60;
}
It is partially based on the old MRT code from Pavel Tvrdik.
no more warnings
No more warnings over me
And while it is being compiled all the log is black and white
Release BIRD now and then let it flee
(use the melody of well-known Oh Freedom!)
If export filter is changed during reconfiguration and a route disappears
between reconfiguration and refeed (e.g., if the route is a static route
also removed during the reconfiguration), the route is not withdrawn.
The issue was fixed for regular channels by an earlier patch. This patch
fixes the issue for channels in RA_ACCEPTED mode (first-pass-the-filter),
used by BGP with 'secondary' option.
If export filter is changed during reconfiguration and a route disappears
between reconfiguration and refeed (e.g., if the route is a static route
also removed during the reconfiguration), the route is not withdrawn.
The patch fixes that by adding tx reconfiguration timestamp.
This is a fundamental change of an original (1999) concept of route
processing inside BIRD. During import/export, there was a temporary
ea_list created which was to be used instead of the another one inside
the route itself.
This led to some confusion, quirks, and strange filter code that handled
extended route attributes. Dropping it now.
The protocol interface has changed in an uniform way -- the
`struct ea_list *attrs` argument has been removed from store_tmp_attrs(),
import_control(), rt_notify() and get_route_info().
The bgpmask literals can include expressions. This is OK but they have
to be interpreted as soon as the code is run, not in the time the code
is used as value.
This led to strange behavior like rewriting bgpmasks when they shan't
be rewritten:
function mask_generator(int as)
{
return [= * as * =];
}
function another()
bgpmask m1;
bgpmask m2;
{
m1 = mask_generator(10);
m2 = mask_generator(20);
if (m1 == m2) {
print("strange"); # this would happen
}
}
Moreover, sending this to CLI would cause stack overflow and knock down the
whole BIRD, as soon as there is at least one route to execute the given
filter on.
show route filter bgpmask mmm; bgppath ppp; { ppp = +empty+; mmm = [= (ppp ~ mmm) =]; print(mmm); accept; }
The magic match operator (~) inside the bgpmask literal would try to
resolve mmm, which points to the same bgpmask so it would resolve
itself, call the magic match operator and vice versa.
After this patch, the bgpmask literal will get resolved as soon as it's
assigned to mmm and it also will return a type error as bool is not
convertible to ASN in BIRD.
This patch adds support for source-specific IPv6 routes to BIRD core.
This is based on Dean Luga's original patch, with the review comments
addressed. SADR support is added to network address parsing in confbase.Y
and to the kernel protocol on Linux.
Currently there is no way to mix source-specific and non-source-specific
routes (i.e., SADR tables cannot be connected to non-SADR tables).
Thanks to Toke Hoiland-Jorgensen for the original patch.
Minor changes by Ondrej Santiago Zajicek.
A filter should log messages only if executed explicitly (e.g., during
route export or route import). When a filter is executed for technical
reasons (e.g., to establish whether a route was exported before), it
should run silently.
Several changes and bugfixes in Babel, namely:
- Exported route parameters stored directly in route table entry
- Exported non-babel routes no longer stored in per-entry route list
- Route update, selection and retraction simplified and fixed
- Route feasibility is evalualated per update and stored with route
- Unreachable route handling fixed, based on hold interval
- Added 'show babel routes' command
Overall, it fixes some issues with proper propagation of triggered
updates, making Babel convergence after topology change almost
instant.
The old timer interface is still kept, but implemented by new timers. The
plan is to switch from the old inteface to the new interface, then clean
it up.
The patch implements Default Router Preferences and More-Specific Routes
(RFC 4191) for RAdv protocol, allowing to announce router preference and
more specific routes in router advertisements. Routes can be exported to
RAdv like to regular routing protocols.
Some cleanups, bugfixes and other changes done by Ondrej Zajicek.
The patch implements BGP Administrative Shutdown Communication (RFC 8203)
allowing BGP operators to pass messages related to BGP session
administrative shutdown/restart. It handles both transmit and receive of
shutdown messages. Messages are logged and may be displayed by show
protocol all command.
Thanks to Job Snijders for the basic patch.
Add basic VRF (virtual routing and forwarding) support. Protocols can be
associated with VRFs, such protocols will be restricted to interfaces
assigned to the VRF (as reported by Linux kernel) and will use sockets
bound to the VRF. E.g., different multihop BGP instances can use diffent
kernel routing tables to handle BGP TCP connections.
The VRF support is preliminary, currently there are several limitations:
- Recent Linux kernels (4.11) do not handle correctly sockets bound
to interaces that are part of VRF, so most protocols other than multihop
BGP do not work. This will be fixed by future kernel versions.
- Neighbor cache ignores VRFs. Breaks config with the same prefix on
local interfaces in different VRFs. Not much problem as single hop
protocols do not work anyways.
- Olock code ignores VRFs. Breaks config with multiple BGP peers with the
same IP address in different VRFs.
- Incoming BGP connections are not dispatched according to VRFs.
Breaks config with multiple BGP peers with the same IP address in
different VRFs. Perhaps we would need some kernel API to read VRF of
incoming connection? Or probably use multiple listening sockets in
int-new branch.
- We should handle master VRF interface up/down events and perhaps
disable associated protocols when VRF goes down. Or at least disable
associated interfaces.
- Also we should check if the master iface is really VRF iface and
not some other kind of master iface.
- BFD session request dispatch should be aware of VRFs.
- Perhaps kernel protocol should read default kernel table ID from VRF
iface so it is not necessary to configure it.
- Perhaps we should have per-VRF default table.
Add proper support for per-nexthop onlink flag in routes to handle next
hop addresses that are not covered by interface IP ranges. Supported by
kernel and static protocols.
Thanks to Vincent Bernat for the idea.
The old hash table had fixed size, which makes it slow for config files
with large number of symbols and symbol lookups. The new one is growing
according to needs.
Some code cleanup, multiple bugfixes, allows to specify also channel
for 'show route export'. Interesting how such apparenty simple thing
like show route cmd has plenty of ugly corner cases.
Basic support for SAFI 4 and 128 (MPLS labeled IP and VPN) for IPv4 and
IPv6. Should work for route reflector, but does not properly handle
originating routes with next hop self.
Based on patches from Jan Matejka.
Anyway, Bird is now capable to insert both MPLS routes and MPLS encap
routes into kernel.
It was (among others) needed to define platform-specific AF_MPLS to 28
as this constant has been assigned in the linux kernel.
No support for BSD now, it may be added in the future.
Dropped struct mpnh and mpnh_*()
Now struct nexthop exists, nexthop_*(), and also included struct nexthop
into struct rta.
Also converted RTD_DEVICE and RTD_ROUTER to RTD_UNICAST. If it is needed
to distinguish between these two cases, RTD_DEVICE is equivalent to
IPA_ZERO(a->nh.gw), RTD_ROUTER is then IPA_NONZERO(a->nh.gw).
From now on, we also explicitely want C99 compatible compiler. We assume
that this 20-year norm should be known almost everywhere.
Commit 3c09af41... changed behavior of int_set_add() from prepend to
append, which makes more sense for community list, but prepend must be
used for cluster list. Add int_set_prepend() and use it in cluster list
handling code.
- Unit Testing Framework (BirdTest)
- Integration of BirdTest into the BIRD build system
- Tests for several BIRD modules
Based on squashed Pavel Tvrdik's int-test branch, updated for
current int-new branch.
Add support for large communities (draft-ietf-idr-large-community),
96bit alternative to RFC 1997 communities.
Thanks to Matt Griswold for the original patch.
Kernel protocol calls rt_export_merged(), which used @rte_update_pool for
temporary allocations, supposing it is called from other functions from
rt-table.c that handles locking and flushing of the linpool. Therefore,
linpool was not flushed properly and memory leaked.
Add linpool argument to rt_export_merged() and use @krt_filter_lp when
called from kernel protocol.
Thanks to Justin Cattle and Alexander Frolkin for the bugreport.
(Commit squashed and updated by Ondrej Zajicek)
This updates the documentation to correctly mention Babel when protocols
are listed, and adds examples and route attribute documentation to the
Babel section of the docs.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Also removed the lib-dir merging with sysdep. Updated #include's
accordingly.
Fixed make doc on recent Debian together with moving generated doc into
objdir.
Moved Makefile.in into root dir
Retired all.o and birdlib.a
Linking the final binaries directly from all the .o files.
This patch implements the IPv6 subset of the Babel routing protocol.
Based on the patch from Toke Hoiland-Jorgensen, with some heavy
modifications and bugfixes.
Thanks to Toke Hoiland-Jorgensen for the original patch.
Many protocols do almost the same when creating a rte_update request
before calling rte_update2(). This commit should simplify the protocol
side of the route-creation routine.
Counter exp_routes is increased during initial route feed after GR
recovery, so it has to start with zero, otherwise BIRD will end with
double value in exp_routes.
When a kernel route changed, function krt_learn_scan() noticed that and
replaced the route in internal kernel FIB, but after that, function
krt_learn_prune() failed to propagate the new route to the nest, because
it confused the new route with the (removed) old best route and decided
that the best route did not changed.
Wow, the original code (and the bug) is almost 17 years old.
The patch adds support for channels, structures connecting protocols and
tables and handling most interactions between them. The documentation is
missing yet.
Explicit setting of AF_INET(6|) in IP socket creation. BFD set to listen
on v6, without setting the V6ONLY flag to catch both v4 and v6 traffic.
Squashing and minor changes by Ondrej Santiago Zajicek
Symbol lookup by cf_find_symbol() not only did the lookup but also added
new void symbols allocated from cfg_mem linpool, which gets broken when
lookups are done outside of config parsing, which may lead to crashes
during reconfiguration.
The patch separates lookup-only cf_find_symbol() and config-modifying
cf_get_symbol(), while the later is called only during parsing. Also
new_config and cfg_mem global variables are NULLed outside of parsing.
New data types net_addr and variants (in lib/net.h) describing
network addresses (prefix/pxlen). Modifications of FIB structures
to handle these data types and changing everything to use these
data types instead of prefix/pxlen pairs where possible.
The commit is WiP, some protocols are not yet updated (BGP, Kernel),
and the code contains some temporary scaffolding.
Comments are welcome.
The new RIP implementation fixes plenty of old bugs and also adds support
for many new features: ECMP support, link state support, BFD support,
configurable split horizon and more. Most options are now per-interface.
In some circumstances during reconfiguration, routes propagated by pipes
to other tables may hang there even after the primary routes are removed.
There is already a workaround for this issue in the code which removes
these stale routes by flush process when source protocols are shut down.
This patch is a cleaner fix and allows to simplify the flush process
Now the order is:
Up -> iface, addr, neigh
Down -> neigh, addr, iface
It fixes the case when an iface appears, related static routes are
activated and exported to OSPF before the iface notification and
therefore forwarding addresses are not encoded in generated external
LSAs.
Implemented eval command can be used to evaluate expressions.
The patch also documents echo command and allows to use log classes
instead of integer as a mask for echo.
When route was propagated to another rtable through a pipe and then the
pipe was reconfigured softly in such a way that any subsequent route
updates are filtered, then the source protocol shutdown didn't clean up
the route in the second rtable which caused stale routes and potential
crashes.
Implements support for IPv6 traffic class, sets higher priority for OSPF
and RIP outgoing packets by default and allows to configure ToS/DS/TClass
IP header field and the local priority of outgoing packets.
Temporary dummy routes created by a kernel protocol during routing table
scan get mixed with real routes propagated from another kernel protocol
through a pipe.
The RAdv protocol could be configured to change its behavior based on
availability of routes, e.g., do not announce router lifetime when a
default route is not available.
Router ID could be automatically determined based of subset of
ifaces/addresses specified by 'router id from' option. The patch also
does some minor changes related to router ID reconfiguration.
Thanks to Alexander V. Chernikov for most of the work.
Several new configure command variants:
configure undo - undo last reconfiguration
configure timeout - configure with scheduled undo if not confirmed in timeout
configure confirm - confirm last configuration
configure check - just parse and validate config file
When 'import keep rejected' protocol option is activated, routes
rejected by the import filter are kept in the routing table, but they
are hidden and not propagated to other protocols. It is possible to
examine them using 'show route rejected'.
Allows to send and receive multiple routes for one network by one BGP
session. Also contains necessary core changes to support this (routing
tables accepting several routes for one network from one protocol).
It needs some more cleanup before merging to the master branch.
When a protocol went down, all its routes were flushed in one step, that
may block BIRD for too much time. The patch fixes that by limiting
maximum number of routes flushed in one step.
- ROA tables, which are used as a basic part for RPKI.
- Commands for examining and modifying ROA tables.
- Filter operators based on ROA tables consistent with RFC 6483.
The nest-protocol interaction is changed to better handle multitable
protocols. Multitable protocols now declare that by 'multitable' field,
which tells nest that a protocol handles things related to proto-rtable
interaction (table locking, announce hook adding, reconfiguration of
filters) itself.
Filters and stats are moved to announce hooks, a protocol could have
different filters and stats to different tables.
The patch is based on one from Alexander V. Chernikov, thanks.
Hostcache is a structure for monitoring changes in a routing table that
is used for routes with dynamic/recursive next hops. This is needed for
proper iBGP next hop handling.
In usual configuration, such export is already restricted
with the aid of the direct protocol but there are some
races that can circumvent it. This makes it harder to
break kernel device routes. Also adds an option to
disable this restriction.
When device protocol goes down, interfaces should be flushed
asynchronously (in the same way like routes from protocols are flushed),
when protocol goes to DOWN/HUNGRY.
This fixes the problem with static routes staying in kernel routing
table after BIRD shutdown.
- BSD kernel syncer is now self-conscious and can learn alien routes
- important bugfix in BSD kernel syncer (crash after protocol restart)
- many minor changes and bugfixes in kernel syncers and neighbor cache
- direct protocol does not generate host and link local routes
- min_scope check is removed, all routes have SCOPE_UNIVERSE by default
- also fixes some remaining compiler warnings
It seems that by adding one pipe-specific exception to route
announcement code and by adding one argument to rt_notify() callback i
could completely eliminate the need for the phantom protocol instance
and therefore make the code more straightforward. It will also fix some
minor bugs (like ignoring debug flag changes from the command line).
When uncofiguring the pipe and the peer table, the peer table was
unlocked when pipe protocol state changed to down/flushing and not to
down/hungry. This leads to the removal of the peer table before
the routes from the pipe were flushed.
The fix leads to adding some pipe-specific hacks to the nest,
but this seems inevitable.
If protocol announces a route, route is accepted by import filter to
routing table, and later it announces replacement of that route that is
rejected by import filter, old route remains in routing table.
ea_same() sometimes returns true for different route attributes,
which caused that hash table in BGP does not work correctly and
some routes were sent with different attributes.
Allows to add more interface patterns to one common 'options'
section like:
interface "eth3", "eth4" { options common to eth3 and eth4 };
Also removes undocumented and unnecessary ability to specify
more interface patterns with different 'options' sections:
interface "eth3" { options ... }, "eth4" { options ... };
Old AS path maching supposes thath AS number appears
only once in AS path, but that is not true. It also
contains some bugs related to AS path sets.
New code does not use any assumptions about semantic
structure of AS path. It is asymptotically slower than
the old code, but on real paths it is not significant.
It also allows '?' for matching one arbitrary AS number.
Cryptographic authentication in OSPF is defective by
design - there might be several packets independently
sent to the network (for example HELLO, LSUPD and LSACK)
where they might be reordered and that causes crypt.
sequence number error.
That can be workarounded by not incresing sequence number
too often. Now we update it only when last packet was sent
before at least one second. This can constitute a risk of
replay attacks, but RFC supposes something similar (like time
in seconds used as CSN).
Routes comming through pipe from primary to secondary table were
filtered by both EXPORT and IMPORT filters, but they should be
only filtered by EXPORT filters.
AS4 optional attribute errors were handled by session
drop (according to BGP RFC). This patch implements
error handling according to new BGP AS4 draft (*)
- ignoring invalid AS4 optional attributes.
(*) http://www.ietf.org/internet-drafts/draft-chen-rfc4893bis-02.txt
The core state machine was broken - it didn't free resources
in START -> DOWN transition and might freed resources after
UP -> STOP transition before protocol turned down. It leads
to deadlock on olock acquisition when lock was not freed
during previous stop.
The current behavior is that resources, allocated during
DOWN -> * transition, are freed in * -> DOWN transition,
and flushing (scheduled in UP -> *) just counteract
feeding (scheduled in * -> UP). Protocol fell down
when both flushing is done (if needed) and protocol
reports DOWN.
BTW, is thera a reason why neighbour cache item acquired
by protocol is not tracked by resource mechanism?
When protocol started, feeding was scheduled. If protocol
got down before feeding was executed, then function
responsible for connecting protocol to kernel routing
tables was called after the function responsible for
disconnecting, then resource pool of protocol was freed,
but freed linked list structures remains in the list.
values for MD5 password ID changed during reconfigure, Second
bug is that BIRD chooses password in first-fit manner, but RFC
says that it should use the one with the latest generate-from.
It also modifies the syntax for multiple passwords.
Now it is possible to just add more 'password' statements
to the interface section and it is not needed to use
'passwords' section. Old syntax can be used too.
- Old MED handling was completely different from behavior
specified in RFCs - for example they havn't been propagated
to neighboring areas.
- Update tie-breaking according to RFC 4271.
- Change default value for 'default bgp_med' configuration
option according to RFC 4271.
- metric is 3 byte long now
- summary lsa originating
- more OSPF areas possible
- virtual links
- better E1/E2 routes handling
- some bug fixes..
I have to do:
- md5 auth (last mandatory item from rfc2328)
- !!!!DEBUG!!!!! (mainly virtual link system has probably a lot of bugs)
- 2328 appendig E