Implement BGP Send hold timer according to draft-ietf-idr-bgp-sendholdtimer.
The Send hold timer drops the session if the neighbor is sending keepalives,
but does not receive our messages, causing the TCP connection to stall.
Some BGP capabilities change the BGP behavior in a significant way, so if
the configuration depends on it, it is better to not establish BGP
session when the capability is not available.
Add several BGP option to require individual BGP capabilities during
session negotiation.
When a MPLS channel is reloaded, it should reload all regular MPLS-aware
channels. This causes re-evaluation of routes in FEC map and possibly
reannouncement of MPLS routes.
When MPLS is active, received routes on MPLS-aware SAFIs (ipvX-mpls,
vpnX-mpls) are automatically labeled according to active label policy and
corresponding MPLS routes are automatically generated. Also routes sent
on MPLS-aware SAFIs announce local labels when it should be done.
In general, private_id is sparse and protocols may want to map some
internal values directly into it. For example, L3VPN needs to
map VPN route discriminators to private_id.
OTOH, u32 is enough for global_id, as these identifiers are dense.
Most syntactic constructs in BIRD configuration (e.g. protocol options)
are defined as keywords, which are distinct from symbols (user-defined
names for protocols, variables, ...). That may cause backwards
compatibility issue when a new feature is added, as it may collide with
existing user names.
We can allow keywords to be shadowed by symbols in almost all cases to
avoid this issue.
This replaces the previous mechanism, where shadowable symbols have to be
explictly added to kw_syms.
Despite not having defined 'master interface', VRF interfaces should be
treated as being inside respective VRFs. They behave as a loopback for
respective VRFs. Treating the VRF interface as inside the VRF allows
e.g. OSPF to pick up IP addresses defined on the VRF interface.
For this, we also need to tell apart VRF interfaces and regular interfaces.
Extend Netlink code to parse interface type and mark VRF interfaces with
IF_VRF flag.
Based on the patch from Erin Shepherd, thanks!
Move all bmp_peer_down() calls to one place and make it synchronous with
BGP session down, ensuring that BMP receives peer_down before route
withdraws from flushing.
Also refactor bmp_peer_down_() message generating code.
- Manage BMP state through bmp_peer, bmp_stream, bmp_table structures
- Use channels and rt_notify() hook for route announcements
- Add support for post-policy monitoring
- Send End-of-RIB even when there is no routes
- Remove rte_update_in_notify() hook from import tables
- Update import tables to support channels
- Add bmp_hack (no feed / no flush) flag to channels
Add internal BMP functions with plicit bmp_proto *p as first argument,
which allows using TRACE() macro. Keep list of BMP instances and call
internal functions. Old BMP functions are wrappers that call internal
functions for all enabled BMP instances.
Extract End-of-RIB mark into separate function.
Based on patch from Michal Zagorski <mzagorsk@akamai.com>. Thanks!
Fix issue with missing AF cap (e.g. IPv4 unicast when no capabilities
are announced).
Add Linpool save/restore action similar to bgp_create_update().
Based on patch from Michal Zagorski <mzagorsk@akamai.com> co-authored
with Pawel Maslanka <pmaslank@akamai.com>. Thanks!
When an OPEN message without capability options was parsed, the remote
role field was not initialized with the proper (non-zero) default value,
so it was interpreted as if 'provider' was announced.
Thanks to Mikhail Grishin for the bugreport.
The BMP protocol needs OPEN messages of established BGP sessions to
construct appropriate Peer Up messages. Instead of saving them internally
we use OPEN messages stored in BGP instances. This allows BMP instances
to be restarted or enabled later.
Because of this change, we can simplify BMP data structures. No need to
keep track of BGP sessions when we are not started. We have to iterate
over all (established) BGP sessions when the BMP session is established.
This is just a scaffolding now, but some kind of iteration would be
necessary anyway.
Also, the commit cleans up handling of msg/msg_length arguments to be
body/body_length consistently in both rx/tx and peer_up/peer_down calls.
For whatever reason, parser allocated a symbol for every parsed keyword
in each scope. That wasted time and memory. The effect is worsened with
recent changes allowing local scopes, so keywords often promote soft
scopes (with no symbols) to real scopes.
Do not allocate a symbol for a keyword. Take care of keywords that could
be promoted to symbols (kw_sym) and do it explicitly.
Initial implementation of a basic subset of the BMP (BGP Monitoring
Protocol, RFC 7854) from Akamai team. Submitted for further review
and improvement.
Missing translation from BGP attribute ID to eattr ID in bgp_unset_attr()
broke automatic removal of bgp_med during export to EBGP peers.
Thanks to Edward Sun for the bugreport.
Even though the free bind option is primarily meant to alleviate problems
with addresses assigned too late, it's also possible to use BIRD with AnyIP
configuration, assigning whole ranges to the machine. Therefore free bind
allows also to create an outbound connection from specific address even though
such address is not assigned.
Some of these new BGP role keywords use generic names that collides with
user-defined symbols. Allow them to be redefined. Also remove duplicit
keyword definition for 'prefer'.
There were some confusion about validity and usage of pflags, which
caused incorrect usage after some flags from (now removed) protocol-
specific area were moved to pflags.
We state that pflags:
- Are secondary data used by protocol-specific hooks
- Can be changed on an existing route (in contrast to copy-on-write
for primary data)
- Are irrelevant for propagation (not propagated when changed)
- Are specific to a routing table (not propagated by pipe)
The patch did these fixes:
- Do not compare pflags in rte_same(), as they may keep cached values
like BGP_REF_STALE, causing spurious propagation.
- Initialize pflags to zero in rte_get_temp(), avoid initialization in
protocol code, fixing at least two forgotten initializations (krt
and one case in babel).
- Improve documentation about pflags
The effective keepalive time now scales relative to the negotiated
hold time, to maintain proportion between the keepalive time and the
hold time. This avoids issues when both keepalive and hold times
were configured, the hold time was negotiated to a smaller value,
but the keepalive time stayed the same.
Add new options 'min hold time' and 'min keepalive time', which reject
session attempts with too small hold time.
Improve validation of config options an their documentation.
Thanks to Alexander Zubkov and Sergei Goriunov for suggestions.
Add BGP channel option 'next hop prefer global' that modifies BGP
recursive next hop resolution to use global next hop IPv6 address instead
of link-local next hop IPv6 address for immediate next hop of received
routes.
- When next hop is reset to local IP, we should remove BGP label stack,
as it is related to original next hop
- BGP next hop or immediate next hop from one VRF should not be passed
to another VRF, as they are different IP namespaces
In principle, the channel list is a list of parent struct proto and can
contain general structures of type struct channel, That is useful e.g.
for adding MPLS channels to BGP.
Implement BGP roles as described in RFC 9234. It is a mechanism for
route leak prevention and automatic route filtering based on common BGP
topology relationships. It defines role capability (controlled by 'local
role' option) and OTC route attribute, which is used for automatic route
filtering and leak detection.
Minor changes done by commiter.
Passing protocol to preexport was in fact a historical relic from the
old times when channels weren't a thing. Refactoring that to match
current extensibility needs.
The prefix hash table in BGP used the same hash function as the rtable.
When a batch of routes are exported during feed/flush to the BGP, they
all have similar hash values, so they are all crowded in a few slots in
the BGP prefix table (which is much smaller - around the size of the
batch - and uses higher bits from hash values), making it much slower due
to excessive collisions. Use a different hash function to avoid this.
Also, increase the batch size to fill 4k BGP packets and increase minimum
BGP bucket and prefix hash sizes to avoid back and forth resizing during
flushes.
This leads to order of magnitude faster flushes (on my test data).