If the next hop of a route is not a reachable address, the route should be
installed as onlink. This enables a configuration common in mesh networks
where the mesh interface is assigned a /32 and babel handles the routing by
installing onlink routes.
Thanks to Toke Hoiland-Jorgensen for the patch.
This issue has a long history. In 2012, we changed data field for
unnumbered PtP links from iface id (specified by RFC) to IP address based
on reports of bugs in Quagga that required it, and we used out-of-band
information to distinquish unnumberred PtPs with the same local IP
address.
Then with OSPF graceful restart implementation, we found that we can no
longer use out-of-band information, and we need to use only LSAdb info
for routing table calculation, but i forgot to finish handling of this
case, so multiple unnumbered PtPs with the same local IP addresses were
broken.
Considering that even recent Mikrotik RouterOS has broken next hop
calculation that depends on IP address in PtP link data field, we
cannot just switch back to the iface id for unnumbered PtP links.
The patch makes two changes: First, it goes back to use out-of-band
(position) info for distinguishing local interfaces in SPF when graceful
restart is not enabled, while still uses LSAdb-only approach for SPF
calculation when graceful restart is enabled.
Second, it adds OSPF interface option 'ptp address', which controls
whether IP address or iface id is used in data field. It is enabled
by default except for unnumbered PtP links with enabled graceful
restart.
Thanks to Kenth Eriksson for the bugreport and Joakim Tjernlund for
suggestions.
There are three common ways how to encode IPv6 link-local-only next hops:
(:: ll), (ll), and (ll ll). We use the first one but we should accept all
three. The patch fixes handling of the last one.
Thanks to Sebastian Hahn for the bugreport.
The RFC 5575 does not explicitly reject flowspec rules without dst part,
it just requires dst part in validation procedure for feasibility, which
we do not implement anyway. Thus flow without dst prefix is syntactically
valid, but unfeasible (if feasibilty testing is done).
Thanks to Alex D. for the bugreport.
When dynamic BGP with remote range is configured, MD5SIG needs to use
newer socket option (TCP_MD5SIG_EXT) to specify remote addres range for
listening socket.
Thanks to Adam Kułagowski for the suggestion.
Recent changes in neighbor code caused RIP to access neighbor field which
is NULL during interface/neighbor removal and caused crash when debug
messages are enabled. Use correct field to get iface from neighbor.
During NLRI parsing of IPv6 Flowspec, dst prefix was not properly
extracted from NLRI, therefore a received flow was stored in a different
position in flowspec routing table, and was not reachable by command
'show route <flow>'.
Add proper prefix part accessors to flowspec code and use them from BGP
NLRI parsing code.
Thanks to Alex D. for the bugreport.
This is optional check described in RFC 4271. Although this can be also
done by filters, it is widely implemented option in BGP implementations.
Thanks to Eugene Bogomazov for the original patch.
Transitive extended communities should be removed on external sessions,
the old code them in all cases.
Thanks to Jean-Daniel Pauget for the original patch.
Change of some options requires route refresh, but when import table is
active, channel reload is done from it instead of doing full route
refresh. So in this case we request it internally.
The bfd_reconfigure_neighbors() returned after first reconfigured
neighbor instead of continuing with the next one.
Thanks to Winston Chen for the bugreport and a patch.
There is an improper check for valid message size, which may lead to
stack overflow and buffer leaks to log when a large message is received.
Thanks to Daniel McCarney for bugreport and analysis.
Instead of having large stack buffer for max amount of AFI/SAFI pairs.
The old code is not correct w.r.t. extendeded option length, as more
AFI/SAFI pairs may fit into the capability option.
The patch implements optional internal export table to a channel and
hooks it to BGP so it can be used as Adj-RIB-Out. When enabled, all
exported (post-filtered) routes are stored there. An export table can be
examined using e.g. 'show route export table bgp1.ipv4'.
Several BGP channel options (including 'next hop self') could be
reconfigured without session reset, with just route refeed/refresh.
The patch improves reconfiguration code to do it that way.
The 'deterministic med' option is implemented by suppressing other than
best-in-group routes (grouped by ASN) from best route selection. This
interferes with 'merge paths' as supressed routes are no longer mergable
with best route. This is fixed by suppressing only those routes that are
not mergable with best-in-group route.
Protocol can have specified VRF, in such case it is restricted to a set
of ifaces associated with the VRF, otherwise it can use all interfaces.
The patch allows to specify VRF as 'default', in which case it is
restricted to a set of iface not associated with any VRF.
Per RFC 3101, N-bit signalling NSSA support should be used only in Hello
packets, not in DBDES packets. BIRD since 2.0.4 verifies N-bit in
neighbor structure, which is learned from DBDES packets, therefore
NSSA-LSAs are not propagated to proper implementations of RFC 3101.
This patch fixes that. Both removing the check and removing N-bit from
DBDES packet. This will fix compatibility issues with proper
implementations, but causes compatibility issues with BIRD 2.0.4.
We need to flush learned external LSAs a bit later than other LSAs (after
first feed after end of the graceful restart) to avoid flap of external
routes.
If BGP has too many data to send and BIRD is slower than the link, TX is
always possible until all data is sent. This patch limits maximum number
of generated BGP messages in one iteration of TX hook.
Implement OSPFv2 (RFC 3623) and OSPFv3 (RFC 5187) graceful restart,
for both restarting and helper sides. Graceful restart is initiated
by 'graceful down' command.
When 'graceful down' command is entered, protocols are shut down
with regard to graceful restart. Namely Kernel protocol does
not remove routes and BGP protocol does not send notification,
just closes the connection.
Useful for implementation of agents implementing the SNMP-BGP MIB, which
requires the local AS of a session to be specified.
Thanks to Jan-Philipp Litza for the patch.
Support for dynamically spawning BGP protocols for incoming connections.
Use 'neighbor range' to specify range of valid neighbor addresses, then
incoming connections from these addresses spawn new BGP instances.
When BGP connection is opened, it may happen that rx hook (with remote
OPEN) is called before tx hook (for local OPEN). Therefore, we need to do
internal changes (like setting local_caps) synchronously with OPENSENT
transition and we need to ensure that OPEN is sent before KEEPALIVE.
Allow to specify just 'internal' or 'external' for remote neighbor
instead of specific ASN. In the second case that means BGP peers with
any non-local ASNs are accepted.
The temporary atttributes are no longer removed by ea_do_prune(), but
they are undefined by store_tmp_attrs() protocol hooks. This fixes
several bugs where temporary attributes were removed when they should
not or not removed when they should be. The flag EAF_TEMP is no longer
needed and was removed.
Update all protocol make_tmp_attrs() / store_tmp_attrs() hooks to use
helper functions and to handle unset attributes properly.
Also fix some related bugs like improper handling of empty eattr list.
Keep track of whether OSPF tmpattrs are actually defined for given route
(using flags in rte->pflags). That makes them behave more like real
eattrs so a protocol can define just a subset of them or they can be
undefined by filters.
Do not set ospf_metric2 for other than type 2 external OSPF routes and do
not set ospf_tag for non-external OSPF routes. That also fixes a bug
where internal/inter-area route propagated from one OSPF instance to
another is initiated with infinity ospf_metric2.
Thanks to Yaroslav Dronskii for the bugreport.
This is a major change of how the filters are interpreted. If everything
works how it should, it should not affect you unless you are hacking the
filters themselves.
Anyway, this change should make a huge improvement in the filter performance
as previous benchmarks showed that our major problem lies in the
recursion itself.
There are also some changes in nest and protocols, related mostly to
spreading const declarations throughout the whole BIRD and also to
refactored dynamic attribute definitions. The need of these came up
during the whole work and it is too difficult to split out these
not-so-related changes.
When area is reconfigured to a different type, we need to flush LSAs as
they may not be valid (e.g. NSSA-LSA for non-NSSA area). Also, when we
have have just one OSPF area and that changes type, we could restart OSPF
as there is no state to keep anyway. That solves issue with different
handling of external routes exported to OSPF based of main area type.
External LSAs originated by OSPF routers with VPN-PE behavior enabled are
marked by DN flag and they are ignored by other OSPF routers with VPN-PE
enabled.
Direct acknowledgements should be send as unicast to a corresponding
neighbor. Only delayed acks should be send as multicast to all/designated
routers.
Master may free last DBDES packet immediately. Slave must wait dead
interval before freeing last DBDES packet and then reject duplicate
DBDES packets with SeqNumberMismatch.
Since v2 we have multiple listening BGP sockets, and each BGP protocol
has associated one of them. Use listening socket that accepted the
incoming connection as a key in the dispatch process so only BGP
protocols assocaited with that listening socket can be selected.
This is necesary for proper dispatch when VRFs are used.
This protocol is highly experimental and nobody should use it in
production. Anyway it may help you getting some insight into what eats
so much time in filter processing.
In some circumstances (old LSA flushed but not acknowledged and not
removed) origination of a new LSA may wrongly triggers LSA collision
code. The patch fixes that.
Thanks to Asbjorn Mikkelsen for the bugreport and @mdelagueronniere
for the original patch.
Extend 'next hop keep' and 'next hop self' options to have boolean values
(enabled / disabled) and also values 'ibgp'/ 'ebgp' to restrict it to
routes received from IBGP / EBGP. This allows to have it enabled by
default in some cases, matches features of other implementations, and
allows to handle some strange cases like EBGP border router with 'next
hop self' also doing IBGP route reflecting.
Change default of 'next hop keep' to enabled for route servers, and
'ibgp' for route reflectors.
Update documentation for these options.
When route is exported to regular EBGP, local ASN should be prepended to
AS_PATH. When route is propagated by route server (between RS-marked
EBGP peers), it should not change AS_PATH. Question is what to do in
other cases (from non-RS EBGP, IBGP, or locally originated to RS EBGP).
In 1.6.x, we did not prepend ASN in non-RS EBGP or IBGP to RS EBGP, but
we prepended in local to RS EBGP.
In 2.0.x, we changed that so only RS-EBGP to RS-EBGP is not prepended.
We received some negative responses (thanks to heisenbug and Alexander
Zubkov), we decided to change it back. One reason is that it is simple
to modify the AS_PATH by filters, but not possible to un-modify
changes done by BGP itself. Also, as 1.6.x behavior was not really
consistent, the final behavior is that ASN is never prepended when
exported to RS EBGP, like to IBGP.
Note that i do not express an opinion about whether such configurations
are even reasonable.
The patch implements optional internal import table to a channel and
hooks it to BGP so it can be used as Adj-RIB-In. When enabled, all
received (pre-filtered) routes are stored there and import filters can
be re-evaluated without explicit route refresh. An import table can be
examined using e.g. 'show route import table bgp1.ipv4'.
When a new channel is found during reconfiguration, do force restart
of the protocol, like with any other un-reconfigurable change.
The old behavior was that the new channel was added but remained in down
state, even if the protocol was up, so a manual protocol restart was
often necessary.
In the future this should be improved such that a reconfigurable
channel addition (e.g. direct) is accepted and channel is started,
while an un-reconfigurable addition forces protocol restart.
Fix crash during reconfiguration of OSPF config with vlinks. When vlink
is reconfigured, a generic iface-reconfiguration code is used, which in
one place supposes that it is running on a regular iface.
Thanks to Cybertinus for a bugreport.
Once upon a time, far far away, there were the old Bird developers
discussing what direction of route flow shall be called import and
export. They decided to say "import to protocol" and "export to table"
when speaking about a protocol. When speaking about a table, they
spoke about "importing to table" and "exporting to protocol".
The latter terminology was adopted in configuration, then also the
bird CLI in commit ea2ae6dd0 started to use it (in year 2009). Now
it's 2018 and the terminology is the latter. Import is from protocol to
table, export is from table to protocol. Anyway, there was still an
import_control hook which executed right before route export.
One thing is funny. There are two commits in April 1999 with just two
minutes between them. The older announces the final settlement
on config terminology, the newer uses the other definition. Let's see
their commit messages as the git-log tool shows them (the newer first):
commit 9e0e485e50
Author: Martin Mares <mj@ucw.cz>
Date: Mon Apr 5 20:17:59 1999 +0000
Added some new protocol hooks (look at the comments for better explanation):
make_tmp_attrs Convert inline attributes to ea_list
store_tmp_attrs Convert ea_list to inline attributes
import_control Pre-import decisions
commit 5056c559c4
Author: Martin Mares <mj@ucw.cz>
Date: Mon Apr 5 20:15:31 1999 +0000
Changed syntax of attaching filters to protocols to hopefully the final
version:
EXPORT <filter-spec> for outbound routes (i.e., those announced
by BIRD to the rest of the world).
IMPORT <filter-spec> for inbound routes (i.e., those imported
by BIRD from the rest of the world).
where <filter-spec> is one of:
ALL pass all routes
NONE drop all routes
FILTER <name> use named filter
FILTER { <filter> } use explicitly defined filter
For all protocols, the default is IMPORT ALL, EXPORT NONE. This includes
the kernel protocol, so that you need to add EXPORT ALL to get the previous
configuration of kernel syncer (as usually, see doc/bird.conf.example for
a bird.conf example :)).
Let's say RIP to this almost 19-years-old inconsistency. For now, if you
import a route, it is always from protocol to table. If you export a
route, it is always from table to protocol.
And they lived happily ever after.
Modify protocols to use preferred address change notification instead on
depending on hard-reset of interfaces in that case, and remove hard-reset
in that case. This avoids issue when e.g. IPv6 protocol restarts
interface when IPv4 preferred address changed (as hard-reset is
unavoidable and common for whole iface).
The patch also fixes a bug when removing last address does not send
preferred address change notification.
The new MRT protocol is responsible for periodic RIB table dumps in the
MRT format (RFC 6396). Also the existing code for BGP4MP MRT dumps is
refactored and splitted between BGP to MRT protocols, will be more
integrated into MRT in the future.
Example:
protocol mrt {
table "*";
filename "%N_%F_%T.mrt";
period 60;
}
It is partially based on the old MRT code from Pavel Tvrdik.
Missing argument in MTU change trace message can crash bird when MTU
change happens and trace messages are active.
Thanks to Alexander Velkov for the bugreport.
no more warnings
No more warnings over me
And while it is being compiled all the log is black and white
Release BIRD now and then let it flee
(use the melody of well-known Oh Freedom!)
We currently cannot assing local labels, but we can still be LSP egress
router. Therefore when we announce labeled route with local next-hop, we
should announce implicit-NULL label instead of rejecting it completely.
RFC 3107 was bit vague with regard to labeled withdrawals, RFC 8277
clarified that. The old code was incompatible with some implementations,
namely with Juniper.
Thanks to Vadim Fedorenko for the original patch.
In case of missing IPv4 next hop, we should skip such routes
on transmit and ignore such routes on receive.
Thanks to Julian Schuh for the bugreport and Toke Hoiland-Jorgensen
for the original patch.
This is a fundamental change of an original (1999) concept of route
processing inside BIRD. During import/export, there was a temporary
ea_list created which was to be used instead of the another one inside
the route itself.
This led to some confusion, quirks, and strange filter code that handled
extended route attributes. Dropping it now.
The protocol interface has changed in an uniform way -- the
`struct ea_list *attrs` argument has been removed from store_tmp_attrs(),
import_control(), rt_notify() and get_route_info().
During route export, the receiving protocol often initialized route
metrics to default value in its import_control hook before export filter
was executed. This is inconsistent with the expectation that an export
filter would process the same route as one in the routing table and it
breaks setting these metrics before (e.g. for static routes directly in
static protocol).
The patch removes the initialization of route metrics in import_control
hook, the default values are already handled in rt_notify hook called
after export filters.
The patch also changed the behavior of OSPF to keep metrics when a route
is reannounced between OSPF instances (to be consistent with other
protocols) and the behavior when both ospf_metric1 and ospf_metric2
are specified (to have more expected behavior).
When a Babel node restarts, it loses its sequence number, which can cause
its routes to be rejected by peers until the state is cleared out by other
nodes in the network (which can take on the order of minutes).
There are two ways to fix this: Having stable storage to keep the sequence
number across restarts, or picking a different router ID each time.
This implements the latter, by introducing a new option that will cause
BIRD to randomize a high 32 bits of router ID every time it starts up.
This avoids the problem at the cost of not having stable router IDs in
the network.
Thanks to Toke Hoiland-Jorgensen for the patch.
The router ID being assigned to routes was a uint, which discards the
upper 32 bits. This also has the nice side effect of echoing the wrong
router ID back to other routers.
Thanks to Toke Hoiland-Jorgensen for the patch.
This patch adds support for source-specific routing to the Babel protocol.
It changes the protocol to support both NET_IP6 and NET_IP6_SADR channels
for IPv6 addresses. If only a NET_IP6 channel is configured,
source-specific updates are ignored. Otherwise, non-source-specific
routes are simply treated as source-specific routes with SADR prefix 0.
Thanks to Toke Hoiland-Jorgensen for the original patch.
Minor changes by Ondrej Santiago Zajicek.
Several changes and bugfixes in Babel, namely:
- Exported route parameters stored directly in route table entry
- Exported non-babel routes no longer stored in per-entry route list
- Route update, selection and retraction simplified and fixed
- Route feasibility is evalualated per update and stored with route
- Unreachable route handling fixed, based on hold interval
- Added 'show babel routes' command
Overall, it fixes some issues with proper propagation of triggered
updates, making Babel convergence after topology change almost
instant.
Also fix several minor bugs and add 'limit' option for k-out-of-j
link sensing strategy. Change default from 8-of-16 to 12-of-16.
Change IHU expiry factor from 1.5 to 3.5 (as in RFC 6126).
The old timer interface is still kept, but implemented by new timers. The
plan is to switch from the old inteface to the new interface, then clean
it up.
RFC6126bis introduces a flags field for the Hello TLV, and adds a unicast flag
that is used to signify that a hello was sent as unicast. This adds parsing of
the flags field and ignores such unicast hellos, which preserves compatibility
until we can add a proper implementation of the unicast hello mechanism.
Thanks to Toke Hoiland-Jorgensen for the patch.
The TTL check must be done after instance ID dispatch to avoid warnings
when a physical iface is shared by multiple instances and some use TTL
security and some not.
In such case, next hop has to be taken from Link-LSA like in broadcast
case, not from neighbor source address like in other PtP cases.
Also add some checks, comments and code cleanup.
OSPFv3-AF can handle multiple topologies of diferent address families
(IPv4, IPv6, both unicast and multicast) using separate instances
distinguished by instance ID ranges.
The patch implements Default Router Preferences and More-Specific Routes
(RFC 4191) for RAdv protocol, allowing to announce router preference and
more specific routes in router advertisements. Routes can be exported to
RAdv like to regular routing protocols.
Some cleanups, bugfixes and other changes done by Ondrej Zajicek.
The patch implements BGP Administrative Shutdown Communication (RFC 8203)
allowing BGP operators to pass messages related to BGP session
administrative shutdown/restart. It handles both transmit and receive of
shutdown messages. Messages are logged and may be displayed by show
protocol all command.
Thanks to Job Snijders for the basic patch.
Add basic VRF (virtual routing and forwarding) support. Protocols can be
associated with VRFs, such protocols will be restricted to interfaces
assigned to the VRF (as reported by Linux kernel) and will use sockets
bound to the VRF. E.g., different multihop BGP instances can use diffent
kernel routing tables to handle BGP TCP connections.
The VRF support is preliminary, currently there are several limitations:
- Recent Linux kernels (4.11) do not handle correctly sockets bound
to interaces that are part of VRF, so most protocols other than multihop
BGP do not work. This will be fixed by future kernel versions.
- Neighbor cache ignores VRFs. Breaks config with the same prefix on
local interfaces in different VRFs. Not much problem as single hop
protocols do not work anyways.
- Olock code ignores VRFs. Breaks config with multiple BGP peers with the
same IP address in different VRFs.
- Incoming BGP connections are not dispatched according to VRFs.
Breaks config with multiple BGP peers with the same IP address in
different VRFs. Perhaps we would need some kernel API to read VRF of
incoming connection? Or probably use multiple listening sockets in
int-new branch.
- We should handle master VRF interface up/down events and perhaps
disable associated protocols when VRF goes down. Or at least disable
associated interfaces.
- Also we should check if the master iface is really VRF iface and
not some other kind of master iface.
- BFD session request dispatch should be aware of VRFs.
- Perhaps kernel protocol should read default kernel table ID from VRF
iface so it is not necessary to configure it.
- Perhaps we should have per-VRF default table.
Keep a cache of all the relevant prefixes we send out. When a prefix
appears, insert it into the cache. If it dies, keep it there for a
while, marked as dead.
Send out the dead prefixes with zero lifetime.
Adapt the naming conventions to be a bit closer to the other protocols.
proto_radv -> radv_proto
struct radv_proto *ra -> struct radv_proto *p
struct proto *p -> struct proto *P
Adapt the naming conventions to be a bit closer to the other protocols.
proto_radv -> radv_proto
struct radv_proto *ra -> struct radv_proto *p
struct proto *p -> struct proto *P
Add proper support for per-nexthop onlink flag in routes to handle next
hop addresses that are not covered by interface IP ranges. Supported by
kernel and static protocols.
Thanks to Vincent Bernat for the idea.
The subtlv parsing code was doing byte-based arithmetic with non-void pointers,
causing it to read beyond the end of the packet.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
RFC6126bis formally introduces sub-TLVs to the Babel protocol, including
mandatory sub-TLVs. This adds support for parsing sub-TLVs to the Babel
protocol and skips TLVs that contain mandatory sub-TLVs, as per the spec.
For details, see section 4.4 of
https://tools.ietf.org/html/draft-ietf-babel-rfc6126bis-02
Thanks to Toke Høiland-Jørgensen <toke@toke.dk> for the patch.
Previously, the Babel protocol would never use prefix compression on outgoing
updates (but would parse it on incoming ones). This adds compression of IPv6
addresses of outgoing updates.
The compression only works to the extent that the FIB is walked in lexicographic
order; i.e. a prefix is only compressed if it shares bytes with the previous
prefix in the same packet.
Thanks to Toke Høiland-Jørgensen <toke@toke.dk> for the patch.
This adds support for dual-stack v4/v6 operation to the Babel protocol.
Routing messages will be exchanged over IPv6, but IPv4 routes can be
carried in the messages being exchanged. This matches how the reference
Babel implementation (babeld) works.
The nexthop address for v4 can be configured per interface, and will
default to the first available IPv4 address on the given interface. For
symmetry, a configuration option to configure the IPv6 nexthop address
is also added.
Thanks to Toke Høiland-Jørgensen <toke@toke.dk> for the patch.
Covers IPv4/VPNv4 routes with IPv6 next hop (RFC 5549), IPv6 routes with
IPv4 next hop (RFC 4798) and VPNv6 routes with IPv4 next hop (RFC 4659).
Unfortunately it also makes next hop hooks more messy.
Each BGP channel now could have two IGP tables, one for IPv4 next hops,
the other for IPv6 next hops.
Basic support for SAFI 4 and 128 (MPLS labeled IP and VPN) for IPv4 and
IPv6. Should work for route reflector, but does not properly handle
originating routes with next hop self.
Based on patches from Jan Matejka.
When a BGP session with ADD_PATH is restarted and the neighbor do not
announce ADD_PATH capability during reconnect, the accept_ra_types is
still set to RA_ANY.
Thanks to Lennert Buytenhek for the bugreport
The patch fixes several bugs introduced in previous changes, simplifies
the protocol by handing routes uniformly, introduces asynchronous route
processing to avoid issues with separate notifications for each next-hop
in ECMP routes, and makes reconfiguration faster by avoiding quadratic
complexity.
During reconfiguration, old and new filter expressions in static routes
are compared using i_same() function. When filter expressions contain
function calls, it is necessary that old filter expressions are the
second argument in i_same(), as it is internally modified by i_same().
Otherwise pointers to old (and freed) data appear in the config
structure.
Thanks to Lennert Buytenhek for tracking and reporting the bug.