Despite not having defined 'master interface', VRF interfaces should be
treated as being inside respective VRFs. They behave as a loopback for
respective VRFs. Treating the VRF interface as inside the VRF allows
e.g. OSPF to pick up IP addresses defined on the VRF interface.
For this, we also need to tell apart VRF interfaces and regular interfaces.
Extend Netlink code to parse interface type and mark VRF interfaces with
IF_VRF flag.
Based on the patch from Erin Shepherd, thanks!
Move all bmp_peer_down() calls to one place and make it synchronous with
BGP session down, ensuring that BMP receives peer_down before route
withdraws from flushing.
Also refactor bmp_peer_down_() message generating code.
Add internal BMP functions with plicit bmp_proto *p as first argument,
which allows using TRACE() macro. Keep list of BMP instances and call
internal functions. Old BMP functions are wrappers that call internal
functions for all enabled BMP instances.
Extract End-of-RIB mark into separate function.
Based on patch from Michal Zagorski <mzagorsk@akamai.com>. Thanks!
Fix issue with missing AF cap (e.g. IPv4 unicast when no capabilities
are announced).
Add Linpool save/restore action similar to bgp_create_update().
Based on patch from Michal Zagorski <mzagorsk@akamai.com> co-authored
with Pawel Maslanka <pmaslank@akamai.com>. Thanks!
When an OPEN message without capability options was parsed, the remote
role field was not initialized with the proper (non-zero) default value,
so it was interpreted as if 'provider' was announced.
Thanks to Mikhail Grishin for the bugreport.
The BMP protocol needs OPEN messages of established BGP sessions to
construct appropriate Peer Up messages. Instead of saving them internally
we use OPEN messages stored in BGP instances. This allows BMP instances
to be restarted or enabled later.
Because of this change, we can simplify BMP data structures. No need to
keep track of BGP sessions when we are not started. We have to iterate
over all (established) BGP sessions when the BMP session is established.
This is just a scaffolding now, but some kind of iteration would be
necessary anyway.
Also, the commit cleans up handling of msg/msg_length arguments to be
body/body_length consistently in both rx/tx and peer_up/peer_down calls.
Initial implementation of a basic subset of the BMP (BGP Monitoring
Protocol, RFC 7854) from Akamai team. Submitted for further review
and improvement.
There were some confusion about validity and usage of pflags, which
caused incorrect usage after some flags from (now removed) protocol-
specific area were moved to pflags.
We state that pflags:
- Are secondary data used by protocol-specific hooks
- Can be changed on an existing route (in contrast to copy-on-write
for primary data)
- Are irrelevant for propagation (not propagated when changed)
- Are specific to a routing table (not propagated by pipe)
The patch did these fixes:
- Do not compare pflags in rte_same(), as they may keep cached values
like BGP_REF_STALE, causing spurious propagation.
- Initialize pflags to zero in rte_get_temp(), avoid initialization in
protocol code, fixing at least two forgotten initializations (krt
and one case in babel).
- Improve documentation about pflags
The effective keepalive time now scales relative to the negotiated
hold time, to maintain proportion between the keepalive time and the
hold time. This avoids issues when both keepalive and hold times
were configured, the hold time was negotiated to a smaller value,
but the keepalive time stayed the same.
Add new options 'min hold time' and 'min keepalive time', which reject
session attempts with too small hold time.
Improve validation of config options an their documentation.
Thanks to Alexander Zubkov and Sergei Goriunov for suggestions.
Add BGP channel option 'next hop prefer global' that modifies BGP
recursive next hop resolution to use global next hop IPv6 address instead
of link-local next hop IPv6 address for immediate next hop of received
routes.
- When next hop is reset to local IP, we should remove BGP label stack,
as it is related to original next hop
- BGP next hop or immediate next hop from one VRF should not be passed
to another VRF, as they are different IP namespaces
In principle, the channel list is a list of parent struct proto and can
contain general structures of type struct channel, That is useful e.g.
for adding MPLS channels to BGP.
Implement BGP roles as described in RFC 9234. It is a mechanism for
route leak prevention and automatic route filtering based on common BGP
topology relationships. It defines role capability (controlled by 'local
role' option) and OTC route attribute, which is used for automatic route
filtering and leak detection.
Minor changes done by commiter.
It is too cryptic to flush tmp_linpool in these cases and we don't want
anybody in the future to break this code by adding an allocation
somewhere which should persist over that flush.
Saving and restoring linpool state is safer.
Implement flowspec validation procedure as described in RFC 8955 sec. 6
and RFC 9117. The Validation procedure enforces that only routers in the
forwarding path for a network can originate flowspec rules for that
network.
The patch adds new mechanism for tracking inter-table dependencies, which
is necessary as the flowspec validation depends on IP routes, and flowspec
rules must be revalidated when best IP routes change.
The validation procedure is disabled by default and requires that
relevant IP table uses trie, as it uses interval queries for subnets.
One of previous commits added error logging of invalid routes. This
also inadvertently caused error logging of route loops, which should
be ignored silently. Fix that.
Most error messages in attribute processing are in rx/decode step and
these use L_REMOTE log class. But there are few that are in tx/export
step and these should use L_ERR log class.
Use tx-specific macro (REJECT()) in tx/export code and rename field
err_withdraw to err_reject in struct bgp_export_state to ensure that
appropriate error reporting macros are called in proper contexts.
Routes from downed protocols stay in rtable (until next rtable prune
cycle ends) and may be even exported to another protocol. In BGP case,
source BGP protocol is examined, although dynamic parts (including
neighbor entries) are already freed. That may lead to crash under some
race conditions. Ensure that freed neighbor entry is not accessed to
avoid this issue.
This is an implementation of draft-walton-bgp-hostname-capability-02.
It is implemented since quite some time for FRR and in datacenter, this
gives a nice output to avoid using IP addresses.
It is disabled by default. The hostname is retrieved from uname(2) and
can be overriden with "hostname" option. The domain name is never set
nor displayed.
Minor changes by committer.
There are three common ways how to encode IPv6 link-local-only next hops:
(:: ll), (ll), and (ll ll). We use the first one but we should accept all
three. The patch fixes handling of the last one.
Thanks to Sebastian Hahn for the bugreport.
The RFC 5575 does not explicitly reject flowspec rules without dst part,
it just requires dst part in validation procedure for feasibility, which
we do not implement anyway. Thus flow without dst prefix is syntactically
valid, but unfeasible (if feasibilty testing is done).
Thanks to Alex D. for the bugreport.
During NLRI parsing of IPv6 Flowspec, dst prefix was not properly
extracted from NLRI, therefore a received flow was stored in a different
position in flowspec routing table, and was not reachable by command
'show route <flow>'.
Add proper prefix part accessors to flowspec code and use them from BGP
NLRI parsing code.
Thanks to Alex D. for the bugreport.
There is an improper check for valid message size, which may lead to
stack overflow and buffer leaks to log when a large message is received.
Thanks to Daniel McCarney for bugreport and analysis.
Instead of having large stack buffer for max amount of AFI/SAFI pairs.
The old code is not correct w.r.t. extendeded option length, as more
AFI/SAFI pairs may fit into the capability option.
If BGP has too many data to send and BIRD is slower than the link, TX is
always possible until all data is sent. This patch limits maximum number
of generated BGP messages in one iteration of TX hook.