Add a new protocol offering route aggregation.
User can specify list of route attributes in the configuration file and
run route aggregation on the export side of the pipe protocol. Routes are
sorted and for every group of equivalent routes new route is created and
exported to the routing table. It is also possible to specify filter
which will run for every route before aggregation.
Furthermore, it will be possible to set attributes of new routes
according to attributes of the aggregated routes.
This is a work in progress.
Original work by Igor Putovny, subsequent cleanups and finalization by
Maria Matejka.
This is a split-commit of the neighboring aggregator branch
with a bit improved lvalue handling, to have easier merge into v3.
Some [redacted] (yes, myself) had a really bad idea
to rename nest/route.h to nest/rt.h while refactoring
some data structures out of it.
This led to unnecessarily complex problems with
merging updates from v2. Reverting this change
to make my life a bit easier.
At least it needed only one find-sed command:
find -name '*.[chlY]' -type f -exec sed -i 's#nest/rt.h#nest/route.h#' '{}' +
This merge was particularly difficult. I finally resorted to delete the
symbol scope active flag altogether and replace its usage by other
means.
Also I had to update custom route attribute registration to fit
both the scope updates in v2 and the data model in v3.
Conflicts:
conf/cf-lex.l
conf/conf.h
conf/confbase.Y
conf/gen_keywords.m4
conf/gen_parser.m4
filter/config.Y
nest/config.Y
proto/bgp/config.Y
proto/static/config.Y
Keywords and attributes are split to separate namespaces, to avoid
collisions between regular keyword use and attribute overlay.
There are now 3 different pools with specific lifetime. All of these are
available since protocol start, anyway they get freed in different
moments.
First, pool_up gets freed immediately after announcing PS_STOP, to e.g.
stop all timers and events regularly updating the routing table when the
imports are already flushing.
Then, pool_inloop gets freed just before the protocol loop is finally
stopped, after all channels, imports and exports and other hooks are
cleaned up.
And finally, the pool itself is freed the last. Unless you explicitly
need the early free, use this pool.
Old configs do not define MPLS domains and may use a static protocol
to define static MPLS routes.
When MPLS channel is the only channel of static protocol, handle it
as a main channel. Also, define implicit MPLS domain if needed and
none is defined.
When a MPLS channel is reloaded, it should reload all regular MPLS-aware
channels. This causes re-evaluation of routes in FEC map and possibly
reannouncement of MPLS routes.
Instead of just using route attributes, static routes with
static MPLS labels can be defined just by e.g.:
route 10.1.1.0/24 mpls 100 via 10.1.2.1 mpls 200;
The L3VPN protocol implements RFC 4364 BGP/MPLS VPNs using MPLS backbone.
It works similarly to pipe. It connects IP table (one per VRF) with (global)
VPN table. Routes passed from VPN table to IP table are stripped of RD and
filtered by import targets, routes passed in the other direction are extended
with RD, MPLS labels and export targets in extended communities. A separate
MPLS channel is used to announce MPLS routes for the labels.
When MPLS is active, received routes on MPLS-aware SAFIs (ipvX-mpls,
vpnX-mpls) are automatically labeled according to active label policy and
corresponding MPLS routes are automatically generated. Also routes sent
on MPLS-aware SAFIs announce local labels when it should be done.
When MPLS is active, static IP/VPN routes are automatically labeled
according to active label policy and corresponding MPLS routes are
automatically generated.
If the protocol supports route refresh on export, we keep the stop-start
method of route refeed. This applies for BGP with ERR or with export
table on, for OSPF, Babel, RIP or Pipe.
For BGP without ERR or for future selective ROA reloads, we're adding an
auxiliary export request, doing the refeed while the main export request
is running, somehow resembling the original method of BIRD 2 refeed.
There is also a refeed request queue to keep track of different refeed
requests.
In general, private_id is sparse and protocols may want to map some
internal values directly into it. For example, L3VPN needs to
map VPN route discriminators to private_id.
OTOH, u32 is enough for global_id, as these identifiers are dense.
Add a new protocol offering route aggregation.
User can specify list of route attributes in the configuration file and
run route aggregation on the export side of the pipe protocol. Routes are
sorted and for every group of equivalent routes new route is created and
exported to the routing table. It is also possible to specify filter
which will run for every route before aggregation.
Furthermore, it will be possible to set attributes of new routes
according to attributes of the aggregated routes.
This is a work in progress.
Original work by Igor Putovny, subsequent cleanups and finalization by
Maria Matejka.
For now, there are 4 phases: Necessary (device), Connector (kernel, pipe), Generator (static, rpki) and Regular.
Started and reconfigured are from Necessary to Regular, shutdown backwards.
This way, kernel can flush routes before actually being shutdown.
According to RFC 5882, system should not interpret the local or remote
session state transition to AdminDown as failure. We followed that for
the local session state but not for the remote session state (which
just triggered a transition of the local state to Down). The patch
fixes that.
We do not properly generate AdminDown on our side, so the patch is
relevant just for interoperability with other systems.
Thanks to Sunnat Samadov for the bugreport.
Most syntactic constructs in BIRD configuration (e.g. protocol options)
are defined as keywords, which are distinct from symbols (user-defined
names for protocols, variables, ...). That may cause backwards
compatibility issue when a new feature is added, as it may collide with
existing user names.
We can allow keywords to be shadowed by symbols in almost all cases to
avoid this issue.
This replaces the previous mechanism, where shadowable symbols have to be
explictly added to kw_syms.
Nonterminal bytestring allows to provide expressions to be evaluated in
places where BYTETEXT is used now: passwords, radv custom option.
Based on the patch from Alexander Zubkov <green@qrator.net>, thanks!
- Rename BYTESTRING lexem to BYTETEXT, not to collide with 'bytestring' type name
- Add bytestring type with id T_BYTESTRING (0x2c)
- Add from_hex() filter function to create bytestring from hex string
- Add filter test cases for bytestring type
Minor changes by committer.
Despite not having defined 'master interface', VRF interfaces should be
treated as being inside respective VRFs. They behave as a loopback for
respective VRFs. Treating the VRF interface as inside the VRF allows
e.g. OSPF to pick up IP addresses defined on the VRF interface.
For this, we also need to tell apart VRF interfaces and regular interfaces.
Extend Netlink code to parse interface type and mark VRF interfaces with
IF_VRF flag.
Based on the patch from Erin Shepherd, thanks!
Move all bmp_peer_down() calls to one place and make it synchronous with
BGP session down, ensuring that BMP receives peer_down before route
withdraws from flushing.
Also refactor bmp_peer_down_() message generating code.
Now we use rt_notify() and channels for both feed and notifications,
in both import tables (pre-policy) and regular tables (post-policy).
Remove direct walk in bmp_route_monitor_snapshot().
- Manage BMP state through bmp_peer, bmp_stream, bmp_table structures
- Use channels and rt_notify() hook for route announcements
- Add support for post-policy monitoring
- Send End-of-RIB even when there is no routes
- Remove rte_update_in_notify() hook from import tables
- Update import tables to support channels
- Add bmp_hack (no feed / no flush) flag to channels
Currently one can use only a predefined set of advertised options in RAdv
protocol, which are supported by BIRD configuration. It would be convenient
to be able to specify other possible options at least manually as a blob
so one should not wait until it is supported in the code, released, etc.
This idea is inspired by presentation by Ondřej Caletka at CSNOG, in which
he noticed the lack of either PREF64 option or possibility to add custom
options in various software.
The patch makes it possible to define such options with the syntax:
other type <num> <bytestring>
Add internal BMP functions with plicit bmp_proto *p as first argument,
which allows using TRACE() macro. Keep list of BMP instances and call
internal functions. Old BMP functions are wrappers that call internal
functions for all enabled BMP instances.
Extract End-of-RIB mark into separate function.
Based on patch from Michal Zagorski <mzagorsk@akamai.com>. Thanks!
This adds support to the Babel protocol for the RTT extension specified
in draft-ietf-babel-rtt-extension. While this extension is not yet at the
RFC stage, it is one of the more useful extensions to Babel[0], so it
seems worth having in Bird as well.
The extension adds timestamps to Hello and IHU TLVs and uses these to
compute an RTT to each neighbour. An extra per-neighbour cost is then
computed from the RTT based on a minimum and maximum interval and cost
value specified in the configuration. The primary use case for this is
improving routing in a geographically distributed tunnel-based overlay
network.
The implementation follows the babeld implementation when picking
constants and default configuration values. It also uses the same RTT
smoothing algorithm as babeld, and follows it in adding a new 'tunnel'
interface type which enables RTT by default.
[0] https://alioth-lists.debian.net/pipermail/babel-users/2022-April/003932.html
Fix issue with missing AF cap (e.g. IPv4 unicast when no capabilities
are announced).
Add Linpool save/restore action similar to bgp_create_update().
Based on patch from Michal Zagorski <mzagorsk@akamai.com> co-authored
with Pawel Maslanka <pmaslank@akamai.com>. Thanks!
After converting BFD to the new IO loop system, reconfiguration never
really worked. Sadly, we missed this case in our testing suite so it
passed under the radar for quite a while.
Thanks to Andrei Dinu <andrei.dinu@digitalit.ro> for reporting and
isolating this issue.
When an OPEN message without capability options was parsed, the remote
role field was not initialized with the proper (non-zero) default value,
so it was interpreted as if 'provider' was announced.
Thanks to Mikhail Grishin for the bugreport.
The BMP protocol needs OPEN messages of established BGP sessions to
construct appropriate Peer Up messages. Instead of saving them internally
we use OPEN messages stored in BGP instances. This allows BMP instances
to be restarted or enabled later.
Because of this change, we can simplify BMP data structures. No need to
keep track of BGP sessions when we are not started. We have to iterate
over all (established) BGP sessions when the BMP session is established.
This is just a scaffolding now, but some kind of iteration would be
necessary anyway.
Also, the commit cleans up handling of msg/msg_length arguments to be
body/body_length consistently in both rx/tx and peer_up/peer_down calls.
For whatever reason, parser allocated a symbol for every parsed keyword
in each scope. That wasted time and memory. The effect is worsened with
recent changes allowing local scopes, so keywords often promote soft
scopes (with no symbols) to real scopes.
Do not allocate a symbol for a keyword. Take care of keywords that could
be promoted to symbols (kw_sym) and do it explicitly.
Memory allocation is a fragile part of BIRD and we need checking that
everybody is using the resource pools in an appropriate way. To assure
this, all the resource pools are associated with locking domains and
every resource manipulation is thoroughly checked whether the
appropriate locking domain is locked.
With transitive resource manipulation like resource dumping or mass free
operations, domains are locked and unlocked on the go, thus we require
pool domains to have higher order than their parent to allow for this
transitive operations.
Adding pool locking revealed some cases of insecure memory manipulation
and this commit fixes that as well.
Hooks called from BGP to BMP should not log warning when BMP is not
connected, that is not an error (and we do not want to flood logs with
a ton of messages).
Blocked sk_send() should not log warning, that is expected situation.
Error during sk_send() is handled in error hook anyway.
Replace broken TCP connection management with a simple state machine.
Handle failed attempts properly with a timeout, detect and handle TCP
connection close and try to reconnect after that. Remove useless
'station_connected' flag.
Keep open messages saved even after the BMP session establishment,
so they can be used after BMP session flaps.
Use proper log messages for session events.
Use local variable to refence relevant instance instead of using global
instance ptr. Also, use 'p' variable instead of 'bmp' so we can use
common macros like TRACE().
Most error handling code was was for cases that cannot happen,
or they would be code bugs (and should use ASSERT()). Keep error
handling for just for I/O errors, like in rest of BIRD.
Initial implementation of a basic subset of the BMP (BGP Monitoring
Protocol, RFC 7854) from Akamai team. Submitted for further review
and improvement.
When several BGPs requested a BFD session in short time, chances were
that the second BGP would file a request while the pickup routine was
still running and it would get enqueued into the waiting list instead of
being picked up.
Fixed this by enforcing pickup loop restart when new requests got added,
and also by atomically moving the unpicked requests to a temporary list
to announce admin down before actually being added into the wait list.
Now sk_open() requires an explicit IO loop to open the socket in. Also
specific functions for socket RX pause / resume are added to allow for
BGP corking.
And last but not least, socket reloop is now synchronous to resolve
weird cases of the target loop stopping before actually picking up the
relooped socket. Now the caller must ensure that both loops are locked
while relooping, and this way all sockets always have their respective
loop.
If there are lots of loops in a single thread and only some of the loops
are actually active, the other loops are now kept aside and not checked
until they actually get some timers, events or active sockets.
This should help with extreme loads like 100k tables and protocols.
Also ping and loop pickup mechanism was allowing subtle race
conditions. Now properly handling collisions between loop ping and pickup.