For now, all route attributes are stored as eattrs in ea_list. This
should make route manipulation easier and it also allows for a layered
approach of route attributes where updates from filters will be stored
as an overlay over the previous version.
As there is either a nexthop or another destination specification
(or othing in case of ROAs and Flowspec), it may be merged together.
This code is somehow quirky and should be replaced in future by better
implementation of nexthop.
Also flowspec validation result has its own attribute now as it doesn't
have anything to do with route nexthop.
This doesn't do anything more than to put the whole structure inside
adata. The overall performance is certainly going downhill; we'll
optimize this later.
Anyway, this is one of the latest items inside rta and in several
commits we may drop rta completely and move to eattrs-only routes.
The route scope attribute was used for simple user route marking. As
there is a better tool for this (custom attributes), the old and limited
way can be dropped.
Some tokens are both keywords and symbols. For now, we allow only
specific keywords to be redefined; in future, more of the keywords may
be added to this category.
The redefinable keywords must be specified in any .Y file as follows:
toksym: THE_KEYWORD ;
See proto/bgp/config.Y for an example.
Also dropped a lot of unused terminals.
Changes in internal API:
* Every route attribute must be defined as struct ea_class somewhere.
* Registration of route attributes known at startup must be done by
ea_register_init() from protocol build functions.
* Every attribute has now its symbol registered in a global symbol table
defined as SYM_ATTRIBUTE
* All attribute ID's are dynamically allocated.
* Attribute value custom formatting hook is defined in the ea_class.
* Attribute names are the same for display and filters, always prefixed
by protocol name.
Also added some unit testing code for filters with route attributes.
This commit removes the EAF_TYPE_* namespace completely and also for
route attributes, filter-based types T_* are used. This simplifies
fetching and setting route attributes from filters.
Also, there is now union bval which serves as an universal value holder
instead of private unions held separately by eattr and filter code.
When shutting down a Babel instance we send a wildcard retraction to make
sure all peers can quickly switch to other route origins. Add another small
optimisation borrowed from babeld: sending a Hello message (along with the
retraction) with a very low interval.
This will cause neighbours to modify their expiry timers for the node's
state to quickly time it out, thus conserving resources in the network.
A recent change in Babel causes ifaces to disappear after
reconfiguration. The patch fixes that.
Thanks to Johannes Kimmel for an insightful bugreport.
Attach a prefix trie to IP/VPN/ROA tables. Use it for net_route() and
net_roa_check(). This leads to 3-5x speedups for IPv4 and 5-10x
speedup for IPv6 of these calls.
TODO:
- Rebuild the trie during rt_prune_table()
- Better way to avoid trie_add_prefix() in net_get() for existing tables
- Make it configurable (?)
This commit prevents use-after-free of routes belonging to protocols
which have been already destroyed, delaying also all the protocols'
shutdown until all of their routes have been finally propagated through
all the pipes down to the appropriate exports.
The use-after-free was somehow hypothetic yet theoretically possible in
rare conditions, when one BGP protocol authors a lot of routes and the
user deletes that protocol by reconfiguring in the same time as next hop
update is requested, causing rte_better() to be called on a
not-yet-pruned network prefix while the owner protocol has been already
freed.
In parallel execution environments, this would happen an inter-thread
use-after-free, causing possible heisenbugs or other nasty problems.
Routes are now allocated only when they are just to be inserted to the
table. Updating a route needs a locally allocated route structure.
Ownership of the attributes is also now not transfered from protocols to
tables and vice versa but just borrowed which should be easier to handle
in a multithreaded environment.
Some cleanups and bugfixes to the previous patch, including:
- Fix rate limiting in index mismatch check
- Fix missing BABEL_AUTH_INDEX_LEN in auth_tx_overhead computation
- Fix missing auth_tx_overhead recalculation during reconfiguration
- Fix pseudoheader construction in babel_auth_sign() (sport vs fport)
- Fix typecasts for ptrdiffs in log messages
- Make auth log messages similar to corresponding RIP/OSPF ones
- Change auth log messages for events that happen during regular
operation to debug messages
- Switch meaning of babel_auth_check*() functions for consistency
with corresponding RIP/OSPF ones
- Remove requirement for min/max key length, only those required by
given MAC code are enforced
This implements support for MAC authentication in the Babel protocol, as
specified by RFC 8967. The implementation seeks to follow the RFC as close
as possible, with the only deliberate deviation being the addition of
support for all the HMAC algorithms already supported by Bird, as well as
the Blake2b variant of the Blake algorithm.
For description of applicability, assumptions and security properties,
see RFC 8967 sections 1.1 and 1.2.
In preparation for adding authentication checks, refactor the TLV
walking code so it can be reused for a separate pass of the packet
for authentication checks.
When an interface disappears, all the neighbors are freed as well. Seqno
requests were anyway not decoupled from them, leading to strange
segfaults. This fix adds a proper seqno request list inside neighbors to
make sure that no pointer to neighbor is kept after free.
The babel protocol code checks whether iface supports multicast, and
whether it has a link-local address assigned. However, it doesn not give
any feedback if any of those checks fail, it just silently ignores the
interface. Fix this by explicitly logging when multicast check fails.
Based on patch from Toke Høiland-Jørgensen, thanks!
The babel protocol code was initialising objects returned from the slab
allocator by assigning to each of the struct members individually, but
wasn't touching the NODE member while doing so. This leads to warnings on
debug builds since commit:
baac700906 ("List expensive check.")
To fix this, introduce an sl_allocz() variant of the slab allocator which
will zero out the memory before returning it, and switch all the babel call
sites to use this version. The overhead for doing this should be negligible
for small objects, and in the case of babel, the largest object being
allocated was being zeroed anyway, so we can drop the memset in
babel_read_tlv().
Most commands like 'show ospf neighbors' fail when protocol is not
specified and there are multiple instances of given protocol type.
This is annoying in BIRD 2, as many protocols have IPv4 and IPv6
instances. The patch changes that by showing output from all protocol
instances of appropriate type.
Note that the patch also removes terminating cli_msg() call from these
commands and moves it to the common iterating code.
If the next hop of a route is not a reachable address, the route should be
installed as onlink. This enables a configuration common in mesh networks
where the mesh interface is assigned a /32 and babel handles the routing by
installing onlink routes.
Thanks to Toke Hoiland-Jorgensen for the patch.
The temporary atttributes are no longer removed by ea_do_prune(), but
they are undefined by store_tmp_attrs() protocol hooks. This fixes
several bugs where temporary attributes were removed when they should
not or not removed when they should be. The flag EAF_TEMP is no longer
needed and was removed.
Update all protocol make_tmp_attrs() / store_tmp_attrs() hooks to use
helper functions and to handle unset attributes properly.
Also fix some related bugs like improper handling of empty eattr list.
This is a major change of how the filters are interpreted. If everything
works how it should, it should not affect you unless you are hacking the
filters themselves.
Anyway, this change should make a huge improvement in the filter performance
as previous benchmarks showed that our major problem lies in the
recursion itself.
There are also some changes in nest and protocols, related mostly to
spreading const declarations throughout the whole BIRD and also to
refactored dynamic attribute definitions. The need of these came up
during the whole work and it is too difficult to split out these
not-so-related changes.
Once upon a time, far far away, there were the old Bird developers
discussing what direction of route flow shall be called import and
export. They decided to say "import to protocol" and "export to table"
when speaking about a protocol. When speaking about a table, they
spoke about "importing to table" and "exporting to protocol".
The latter terminology was adopted in configuration, then also the
bird CLI in commit ea2ae6dd0 started to use it (in year 2009). Now
it's 2018 and the terminology is the latter. Import is from protocol to
table, export is from table to protocol. Anyway, there was still an
import_control hook which executed right before route export.
One thing is funny. There are two commits in April 1999 with just two
minutes between them. The older announces the final settlement
on config terminology, the newer uses the other definition. Let's see
their commit messages as the git-log tool shows them (the newer first):
commit 9e0e485e50
Author: Martin Mares <mj@ucw.cz>
Date: Mon Apr 5 20:17:59 1999 +0000
Added some new protocol hooks (look at the comments for better explanation):
make_tmp_attrs Convert inline attributes to ea_list
store_tmp_attrs Convert ea_list to inline attributes
import_control Pre-import decisions
commit 5056c559c4
Author: Martin Mares <mj@ucw.cz>
Date: Mon Apr 5 20:15:31 1999 +0000
Changed syntax of attaching filters to protocols to hopefully the final
version:
EXPORT <filter-spec> for outbound routes (i.e., those announced
by BIRD to the rest of the world).
IMPORT <filter-spec> for inbound routes (i.e., those imported
by BIRD from the rest of the world).
where <filter-spec> is one of:
ALL pass all routes
NONE drop all routes
FILTER <name> use named filter
FILTER { <filter> } use explicitly defined filter
For all protocols, the default is IMPORT ALL, EXPORT NONE. This includes
the kernel protocol, so that you need to add EXPORT ALL to get the previous
configuration of kernel syncer (as usually, see doc/bird.conf.example for
a bird.conf example :)).
Let's say RIP to this almost 19-years-old inconsistency. For now, if you
import a route, it is always from protocol to table. If you export a
route, it is always from table to protocol.
And they lived happily ever after.
Modify protocols to use preferred address change notification instead on
depending on hard-reset of interfaces in that case, and remove hard-reset
in that case. This avoids issue when e.g. IPv6 protocol restarts
interface when IPv4 preferred address changed (as hard-reset is
unavoidable and common for whole iface).
The patch also fixes a bug when removing last address does not send
preferred address change notification.
In case of missing IPv4 next hop, we should skip such routes
on transmit and ignore such routes on receive.
Thanks to Julian Schuh for the bugreport and Toke Hoiland-Jorgensen
for the original patch.
This is a fundamental change of an original (1999) concept of route
processing inside BIRD. During import/export, there was a temporary
ea_list created which was to be used instead of the another one inside
the route itself.
This led to some confusion, quirks, and strange filter code that handled
extended route attributes. Dropping it now.
The protocol interface has changed in an uniform way -- the
`struct ea_list *attrs` argument has been removed from store_tmp_attrs(),
import_control(), rt_notify() and get_route_info().
During route export, the receiving protocol often initialized route
metrics to default value in its import_control hook before export filter
was executed. This is inconsistent with the expectation that an export
filter would process the same route as one in the routing table and it
breaks setting these metrics before (e.g. for static routes directly in
static protocol).
The patch removes the initialization of route metrics in import_control
hook, the default values are already handled in rt_notify hook called
after export filters.
The patch also changed the behavior of OSPF to keep metrics when a route
is reannounced between OSPF instances (to be consistent with other
protocols) and the behavior when both ospf_metric1 and ospf_metric2
are specified (to have more expected behavior).
When a Babel node restarts, it loses its sequence number, which can cause
its routes to be rejected by peers until the state is cleared out by other
nodes in the network (which can take on the order of minutes).
There are two ways to fix this: Having stable storage to keep the sequence
number across restarts, or picking a different router ID each time.
This implements the latter, by introducing a new option that will cause
BIRD to randomize a high 32 bits of router ID every time it starts up.
This avoids the problem at the cost of not having stable router IDs in
the network.
Thanks to Toke Hoiland-Jorgensen for the patch.
The router ID being assigned to routes was a uint, which discards the
upper 32 bits. This also has the nice side effect of echoing the wrong
router ID back to other routers.
Thanks to Toke Hoiland-Jorgensen for the patch.
This patch adds support for source-specific routing to the Babel protocol.
It changes the protocol to support both NET_IP6 and NET_IP6_SADR channels
for IPv6 addresses. If only a NET_IP6 channel is configured,
source-specific updates are ignored. Otherwise, non-source-specific
routes are simply treated as source-specific routes with SADR prefix 0.
Thanks to Toke Hoiland-Jorgensen for the original patch.
Minor changes by Ondrej Santiago Zajicek.
Several changes and bugfixes in Babel, namely:
- Exported route parameters stored directly in route table entry
- Exported non-babel routes no longer stored in per-entry route list
- Route update, selection and retraction simplified and fixed
- Route feasibility is evalualated per update and stored with route
- Unreachable route handling fixed, based on hold interval
- Added 'show babel routes' command
Overall, it fixes some issues with proper propagation of triggered
updates, making Babel convergence after topology change almost
instant.
Also fix several minor bugs and add 'limit' option for k-out-of-j
link sensing strategy. Change default from 8-of-16 to 12-of-16.
Change IHU expiry factor from 1.5 to 3.5 (as in RFC 6126).
The old timer interface is still kept, but implemented by new timers. The
plan is to switch from the old inteface to the new interface, then clean
it up.
RFC6126bis introduces a flags field for the Hello TLV, and adds a unicast flag
that is used to signify that a hello was sent as unicast. This adds parsing of
the flags field and ignores such unicast hellos, which preserves compatibility
until we can add a proper implementation of the unicast hello mechanism.
Thanks to Toke Hoiland-Jorgensen for the patch.
The patch implements BGP Administrative Shutdown Communication (RFC 8203)
allowing BGP operators to pass messages related to BGP session
administrative shutdown/restart. It handles both transmit and receive of
shutdown messages. Messages are logged and may be displayed by show
protocol all command.
Thanks to Job Snijders for the basic patch.
Add basic VRF (virtual routing and forwarding) support. Protocols can be
associated with VRFs, such protocols will be restricted to interfaces
assigned to the VRF (as reported by Linux kernel) and will use sockets
bound to the VRF. E.g., different multihop BGP instances can use diffent
kernel routing tables to handle BGP TCP connections.
The VRF support is preliminary, currently there are several limitations:
- Recent Linux kernels (4.11) do not handle correctly sockets bound
to interaces that are part of VRF, so most protocols other than multihop
BGP do not work. This will be fixed by future kernel versions.
- Neighbor cache ignores VRFs. Breaks config with the same prefix on
local interfaces in different VRFs. Not much problem as single hop
protocols do not work anyways.
- Olock code ignores VRFs. Breaks config with multiple BGP peers with the
same IP address in different VRFs.
- Incoming BGP connections are not dispatched according to VRFs.
Breaks config with multiple BGP peers with the same IP address in
different VRFs. Perhaps we would need some kernel API to read VRF of
incoming connection? Or probably use multiple listening sockets in
int-new branch.
- We should handle master VRF interface up/down events and perhaps
disable associated protocols when VRF goes down. Or at least disable
associated interfaces.
- Also we should check if the master iface is really VRF iface and
not some other kind of master iface.
- BFD session request dispatch should be aware of VRFs.
- Perhaps kernel protocol should read default kernel table ID from VRF
iface so it is not necessary to configure it.
- Perhaps we should have per-VRF default table.
The subtlv parsing code was doing byte-based arithmetic with non-void pointers,
causing it to read beyond the end of the packet.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
RFC6126bis formally introduces sub-TLVs to the Babel protocol, including
mandatory sub-TLVs. This adds support for parsing sub-TLVs to the Babel
protocol and skips TLVs that contain mandatory sub-TLVs, as per the spec.
For details, see section 4.4 of
https://tools.ietf.org/html/draft-ietf-babel-rfc6126bis-02
Thanks to Toke Høiland-Jørgensen <toke@toke.dk> for the patch.
Previously, the Babel protocol would never use prefix compression on outgoing
updates (but would parse it on incoming ones). This adds compression of IPv6
addresses of outgoing updates.
The compression only works to the extent that the FIB is walked in lexicographic
order; i.e. a prefix is only compressed if it shares bytes with the previous
prefix in the same packet.
Thanks to Toke Høiland-Jørgensen <toke@toke.dk> for the patch.
This adds support for dual-stack v4/v6 operation to the Babel protocol.
Routing messages will be exchanged over IPv6, but IPv4 routes can be
carried in the messages being exchanged. This matches how the reference
Babel implementation (babeld) works.
The nexthop address for v4 can be configured per interface, and will
default to the first available IPv4 address on the given interface. For
symmetry, a configuration option to configure the IPv6 nexthop address
is also added.
Thanks to Toke Høiland-Jørgensen <toke@toke.dk> for the patch.
An interface reconfiguration may change both the hello and update
intervals. An update interval change is immediately put into effect,
while a hello interval change is not. This also updates the hello
interval immediately (if the new interval is shorter than the old one),
and sends a hello to notify peers of the change.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
We do not need to maintain feasibility distances for our own router
ID (we ignore the updates anyway). Not doing so makes the routes be
garbage collected sooner when export filters change.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
When a route becomes infeasible it should not be kept as selected; this
is forbidden by section 3.6 of the RFC and prevents subsequent updates
from the same router ID from replacing it.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
This makes BIRD send a wildcard retraction on all interfaces before
shutting down and right after starting up. This helps ensure that
neighbours will discard the announced routes as soon as possible,
rather than only after the normal timeout procedures.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
An update with wildcard AE and infinite metric should be treated as a
global retraction of all prefixes announced by that neighbour, per
section 4.4.9 of the RFC. In addition, router ID and seqno in retraction
updates should be ignored. This reworks the handling of retractions and
adjusts the parser to handle all this correctly.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Intervals are carried as 16-bit centisecond values, but kept internally
in 16-bit second values, which causes a potential for overflow. This adds
some checks to make sure this does not happen.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>