The patch implements BGP Administrative Shutdown Communication (RFC 8203)
allowing BGP operators to pass messages related to BGP session
administrative shutdown/restart. It handles both transmit and receive of
shutdown messages. Messages are logged and may be displayed by show
protocol all command.
Thanks to Job Snijders for the basic patch.
Add basic VRF (virtual routing and forwarding) support. Protocols can be
associated with VRFs, such protocols will be restricted to interfaces
assigned to the VRF (as reported by Linux kernel) and will use sockets
bound to the VRF. E.g., different multihop BGP instances can use diffent
kernel routing tables to handle BGP TCP connections.
The VRF support is preliminary, currently there are several limitations:
- Recent Linux kernels (4.11) do not handle correctly sockets bound
to interaces that are part of VRF, so most protocols other than multihop
BGP do not work. This will be fixed by future kernel versions.
- Neighbor cache ignores VRFs. Breaks config with the same prefix on
local interfaces in different VRFs. Not much problem as single hop
protocols do not work anyways.
- Olock code ignores VRFs. Breaks config with multiple BGP peers with the
same IP address in different VRFs.
- Incoming BGP connections are not dispatched according to VRFs.
Breaks config with multiple BGP peers with the same IP address in
different VRFs. Perhaps we would need some kernel API to read VRF of
incoming connection? Or probably use multiple listening sockets in
int-new branch.
- We should handle master VRF interface up/down events and perhaps
disable associated protocols when VRF goes down. Or at least disable
associated interfaces.
- Also we should check if the master iface is really VRF iface and
not some other kind of master iface.
- BFD session request dispatch should be aware of VRFs.
- Perhaps kernel protocol should read default kernel table ID from VRF
iface so it is not necessary to configure it.
- Perhaps we should have per-VRF default table.
Covers IPv4/VPNv4 routes with IPv6 next hop (RFC 5549), IPv6 routes with
IPv4 next hop (RFC 4798) and VPNv6 routes with IPv4 next hop (RFC 4659).
Unfortunately it also makes next hop hooks more messy.
Each BGP channel now could have two IGP tables, one for IPv4 next hops,
the other for IPv6 next hops.
Basic support for SAFI 4 and 128 (MPLS labeled IP and VPN) for IPv4 and
IPv6. Should work for route reflector, but does not properly handle
originating routes with next hop self.
Based on patches from Jan Matejka.
When a BGP session with ADD_PATH is restarted and the neighbor do not
announce ADD_PATH capability during reconnect, the accept_ra_types is
still set to RA_ANY.
Thanks to Lennert Buytenhek for the bugreport
Dropped struct mpnh and mpnh_*()
Now struct nexthop exists, nexthop_*(), and also included struct nexthop
into struct rta.
Also converted RTD_DEVICE and RTD_ROUTER to RTD_UNICAST. If it is needed
to distinguish between these two cases, RTD_DEVICE is equivalent to
IPA_ZERO(a->nh.gw), RTD_ROUTER is then IPA_NONZERO(a->nh.gw).
From now on, we also explicitely want C99 compatible compiler. We assume
that this 20-year norm should be known almost everywhere.
Prefix and bucket tables are initialized when entering established state
but not explicitly freed when leaving it (that is handled by protocol
restart). With graceful restart, BGP may enter and leave established
state multiple times without hard protocol restart causing memory leak.
Commit 3c09af41... changed behavior of int_set_add() from prepend to
append, which makes more sense for community list, but prepend must be
used for cluster list. Add int_set_prepend() and use it in cluster list
handling code.
- Unit Testing Framework (BirdTest)
- Integration of BirdTest into the BIRD build system
- Tests for several BIRD modules
Based on squashed Pavel Tvrdik's int-test branch, updated for
current int-new branch.
Add support for large communities (draft-ietf-idr-large-community),
96bit alternative to RFC 1997 communities.
Thanks to Matt Griswold for the original patch.
Although RFC 4271 does not forbid empty path segments, they are useless
and some implementations consider them invalid. It is clarified in RFC 7606,
specifying that AS_PATH with empty segment is considered malformed.
Also removed the lib-dir merging with sysdep. Updated #include's
accordingly.
Fixed make doc on recent Debian together with moving generated doc into
objdir.
Moved Makefile.in into root dir
Retired all.o and birdlib.a
Linking the final binaries directly from all the .o files.
Add code for manipulation with TCP-MD5 keys in the IPsec SA/SP database
at FreeBSD systems. Now, BGP MD5 authentication (RFC 2385) keys are
handled automatically on both Linux and FreeBSD.
Based on patches from Pavel Tvrdik.
In BIRD, RX has lower priority than TX with the exception of RX from
control socket. The patch replaces heuristic based on socket type with
explicit mark and uses it for both control socket and BGP session waiting
to be established.
This should avoid an issue when during heavy load, outgoing connection
could connect (TX event), send open, but then failed to receive OPEN /
establish in time, not sending notifications between and therefore
got hold timer expired error from the neighbor immediately after it
finally established the connection.
When a BGP session was established by an outgoing connection with
Graceful Restart behavior negotiated, a pending incoming connection in
OpenSent state, and another incoming connection was received, then the
outgoing connection (and whole BGP session) was closed, but the old
incoming connection was just overwritten by the new one. That later
caused a crash when the hold timer from the old connection fired.
The patch adds support for channels, structures connecting protocols and
tables and handling most interactions between them. The documentation is
missing yet.
Under some circumstances and heavy load, TX could be postponed
until the session fails with hold timer expired.
Thanks to Javor Kliachev for making the bug reproductible.
Permit specifying neighbor address, AS number and port independently.
Add 'interface' parameter for specifying interface for link-local
sessions independently.
Thanks to Alexander V. Chernikov for the original patch.
Stack variable may be used unitialized and that would lead to spurious
rta_free(), which may cause crash. The bug was introduced in 1.4.1 from
merging add-path branch.
Thanks to Peter Andreev for reporting it and Alexander V. Chernikov for
resolving it.
This is more consistent with common usage and also with the behavior of
other implementations (Cisco, Juniper).
Also changes the default for gw mode to be based solely on
direct/multihop.
Neighbor events related to received route next hops got mixed up with
sticky neighbor node for an IP of the BGP peer. If a neighbor for a next
hop disappears, BGP session is shut down.
Router ID could be automatically determined based of subset of
ifaces/addresses specified by 'router id from' option. The patch also
does some minor changes related to router ID reconfiguration.
Thanks to Alexander V. Chernikov for most of the work.
When 'import keep rejected' protocol option is activated, routes
rejected by the import filter are kept in the routing table, but they
are hidden and not propagated to other protocols. It is possible to
examine them using 'show route rejected'.
Allows to send and receive multiple routes for one network by one BGP
session. Also contains necessary core changes to support this (routing
tables accepting several routes for one network from one protocol).
It needs some more cleanup before merging to the master branch.
The nest-protocol interaction is changed to better handle multitable
protocols. Multitable protocols now declare that by 'multitable' field,
which tells nest that a protocol handles things related to proto-rtable
interaction (table locking, announce hook adding, reconfiguration of
filters) itself.
Filters and stats are moved to announce hooks, a protocol could have
different filters and stats to different tables.
The patch is based on one from Alexander V. Chernikov, thanks.
Hostcache is a structure for monitoring changes in a routing table that
is used for routes with dynamic/recursive next hops. This is needed for
proper iBGP next hop handling.
- BSD kernel syncer is now self-conscious and can learn alien routes
- important bugfix in BSD kernel syncer (crash after protocol restart)
- many minor changes and bugfixes in kernel syncers and neighbor cache
- direct protocol does not generate host and link local routes
- min_scope check is removed, all routes have SCOPE_UNIVERSE by default
- also fixes some remaining compiler warnings
It seems that by adding one pipe-specific exception to route
announcement code and by adding one argument to rt_notify() callback i
could completely eliminate the need for the phantom protocol instance
and therefore make the code more straightforward. It will also fix some
minor bugs (like ignoring debug flag changes from the command line).
Process well-known communities before the export filter (old behavior is
to process these attributes after, which does not allow to send route
with such community) and just for routes received from other BGP
protocols. Also fixes a bug in next_hop check.
There is no reak callback scheduler and previous behavior causes
bad things during hard congestion (like BGP hold timeouts).
Smart callback scheduler is still missing, but main loop was
changed such that it first processes all tx callbacks (which
are fast enough) (but max 4* per socket) + rx callbacks for CLI,
and in the second phase it processes one rx callback per
socket up to four sockets (as rx callback can be slow when
there are too many protocols, because route redistribution
is done synchronously inside rx callback). If there is event
callback ready, second phase is skipped in 90% of iterations
(to speed up CLI during congestion).
Although standard says that if we receive AS_PATH_CONFED_*
(and we are not a part of a confederation) segment, we should
drop session, nobody does that and it is unwise to do that.
Now we drop session just in case that peer ASN is in
AS_PATH_CONFED_* segment (to detect peer that considers BIRD
as a part of its confederation).
AS4 optional attribute errors were handled by session
drop (according to BGP RFC). This patch implements
error handling according to new BGP AS4 draft (*)
- ignoring invalid AS4 optional attributes.
(*) http://www.ietf.org/internet-drafts/draft-chen-rfc4893bis-02.txt
This patch extends the length for attributes from 1024 to 2048
(because both AS_PATH and AS4_PATH attributes take 2+4 B per AS).
If there is not enough space for attributes, Bird skips that
route group. Old behavior (skipping remaining attributes)
leads to skipping required attributes and session drop.
When capability related error is received, next connect will be
without capabilities. Also cease error subcodes descriptions
(according to [RFC4486]) are added.
BGP keeps its copy of configuration ptr and didn't update it during
reconfiguration. But old configuration is freed during reconfiguration.
That leads to unnecessary reset of BGP connection during reconfiguration
(old conf is corrupted and therefore different) and possibly other strange
behavior.
Fixes two race conditions causing crash of Bird, several unhandled
cases during BGP initialization, and some other bugs. Also changes
handling of startup delay to be more useful and implement
reporting of last error in 'show protocols' command.
RFC says that only connections in OpenConfirm and Established state
should participate in connection collision detection.
The current implementation leads to race condition when both sides
are trying to connect at the almost same time, then both sides
receive OPEN message by different connections at the almost same
time and close the other connection. Both connections are
closed and the both sides end in start/idle or start/active
state.
- Old MED handling was completely different from behavior
specified in RFCs - for example they havn't been propagated
to neighboring areas.
- Update tie-breaking according to RFC 4271.
- Change default value for 'default bgp_med' configuration
option according to RFC 4271.
- metric is 3 byte long now
- summary lsa originating
- more OSPF areas possible
- virtual links
- better E1/E2 routes handling
- some bug fixes..
I have to do:
- md5 auth (last mandatory item from rfc2328)
- !!!!DEBUG!!!!! (mainly virtual link system has probably a lot of bugs)
- 2328 appendig E
you can delete the socket from anywhere in the hooks and nothing should break.
Also, the receive/transmit buffers are now regular xmalloc()'ed buffers,
not separate resources which would need shuffling around between pools.
sk_close() is gone, use rfree() instead.
address. Need to do it better for the other neighbors -- the current
solution works only if they use the standard 64+64 global addresses
and the interface identifier in lower 64 bits is the same as for the
link-scope addresses.
o Use `expr' instead of `NUM' and `ipa' instead of `IPA',
so that defined symbols work everywhere.
o `define' now accepts both numbers and IP addresses.
o Renamed `ipa' in filters to `fipa'.
Pavel, please update filters to accept define'd symbols as well.
Please try compiling your code with --enable-warnings to see them. (The
unused parameter warnings are usually bogus, the unused variable ones
are very useful, but gcc is unable to control them separately.)
Added real comparison of BGP routes (inspired by the Cisco one).
Default local preference and default MED are now settable.
Defined filter keywords for all BGP attributes we know.